Big Data Hands-on Training

Course Description

This 8-week executive training course in Big Data Analytics provides a comprehensive understanding of modern data architectures, including hands-on experience with leading platforms and tools like Databricks, Spark, and Kafka, culminating in a capstone project on building a Data Lakehouse.

What You’ll Learn From This Course

Grasp the evolution and fundamentals of Big Data and modern architectures.
Understand and compare leading data platforms like AWS, GCP, Azure, and Cloudera.
Dive deep into compute frameworks, including Spark, Kafka, and Hive.
Master batch and stream processing techniques using Databricks.
Gain insights into data governance and cost optimization strategies.
Apply knowledge in a practical capstone project, building a real-time and batch processing Data Lakehouse.

Foundations of Big Data and Modern Architectures
Introduction to Big Data: Historical evolution and key concepts.
Architectural Evolution: Transition from MPP architectures to cloud services.
Modern Architectures: Converged and Hyperconverged infrastructures.
Core components: Compute, Storage, Orchestration/Governance.
Leading platforms: AWS, GCP, Azure, Cloudera, Databricks.
Compute Frameworks: Overview of Spark, Kafka, and Hive.
Storage Essentials: Block vs. Object Storage, File and Table Formats.
Orchestration/Governance: Metastores, Data Federation, and Orchestration methods.

By the end of the week, the candidate has a solid understanding of Big Data foundations and modern architectures.

In-Depth Analysis of Service Providers
Data Lake Solutions: Detailed exploration of Cloudera CDP and SDX.
Data Lakehouse Platforms: Examination of Databricks, Dremio, Starburst, and others.
Augmented Data Warehouses: Review of Snowflake, Teradata Vantage, Microsoft OneLake, and Amazon Redshift Spectrum.

By the end of the week, the candidate understands key service providers for data lake and lakehouse platforms.

Comprehensive Spark Exploration
Spark Fundamentals: Evolution from MapReduce and understanding RDDs, Dataframes, DStreams.
Lab/Assignment 1: Hands-on Spark tasks.

By the end of the week, the candidate has a comprehensive understanding of Spark and its components.

Batch Processing with Databricks
Databricks Recap: Revisiting core features.
Delta Lake vs. Iceberg: Comparative analysis.
Batch Processing: Execution in Databricks.
Analytics: Utilizing SQL and Python in Databricks.
Lab/Assignment 2: Practical batch processing.

By the end of the week, the candidate has practical experience with batch processing using Databricks.

Stream Processing and Kafka Integration
Kafka Overview: Key concepts revisited.
Stream Processing: Implementation within Databricks.
Delta Live Tables: Introduction and practical application.
Lab/Assignment 3: Stream processing tasks.

By the end of the week, the candidate has hands-on experience in stream processing with Kafka and Databricks.

Data Governance
Governance Principles: Unity Data Catalog overview.
Lab/Assignment 4: Governance-focused exercises.

By the end of the week, the candidate has a good understanding of data governance principles and tools.

Optimizing Databricks for Cost Efficiency
Serverless Compute: Execution strategies in Databricks.
Job Compute: Best practices for performance.
Cost Management: Key considerations for optimizing Databricks usage.

By the end of the week, the candidate has knowledge of cost optimization strategies for Databricks.

Capstone Project - Building a Data Lakehouse
Real-time and Batch Pipelines: End-to-end development using Databricks.
Project Evaluation and Presentation: Final assessment and project presentations.

By the end of the week, the candidate has completed a capstone project on building a Data Lakehouse.

Muhammad Umer Ali

SENIOR BIG DATA ENGINEER AT IBM

He has 10+ years of experience in the field of Data Warehouse, Big Data and Cloud. He is an expert in Enterprise Data warehouses, Implementation of Big Data Applications and Hybrid ETL Architecture. Some of his other specialties include Data Modelling, Python, C/C++ & Java, Unix/Linux environments, shell scripting and Cluster based hardware/software design - MPP and SMP systems (MPI, OpenMPI, CUDA).

Big Data Hands-on Training

Big Data Hands-on Training

Course Description

What You’ll Learn From This Course

Week 1

Foundations of Big Data and Modern Architectures

Week 2

In-Depth Analysis of Service Providers

Week 3

Comprehensive Spark Exploration

Week 4

Batch Processing with Databricks

Week 5

Stream Processing and Kafka Integration

Week 6

Data Governance

Week 7

Optimizing Databricks for Cost Efficiency

Week 8

Capstone Project - Building a Data Lakehouse

Muhammad Umer Ali

SENIOR BIG DATA ENGINEER AT IBM