![GitHub stars](https://github.com/DataTalksClub/data-engineering-zoomcamp/stargazers) ![GitHub forks](https://github.com/DataTalksClub/data-engineering-zoomcamp/network/members) ![License](https://opensource.org/licenses/MIT) ![PRs Welcome](http://makeapullrequest.com)

Master the fundamentals of data engineering by building an end-to-end data pipeline from scratch.

A free 9-week course on building production-ready data pipelines.

📋 Table of Contents

🎯 Course Overview

👥 Community & Support

🎯 Course Overview

Data Engineering Zoomcamp is a comprehensive, hands-on course designed to help you master modern data engineering tools and practices. You'll learn by doing - building a complete data pipeline from ingestion to visualization using industry-standard technologies.

Key Features:

✅ 100% Free - No hidden costs

✅ Hands-on Projects - Learn by building real pipelines

✅ Industry Tools - Docker, Terraform, BigQuery, Spark, Kafka, dbt

✅ Active Community - 7,000+ learners on Slack

✅ Production Focus - Learn best practices for real-world scenarios

📅 2026 Cohort

| Item | Details | |------|---------| | Start Date | January 12, 2026 | | Duration | 9 weeks | | Format | Cohort-based with self-paced option | | Registration | Sign up here |

Self-Paced Option: All materials are available year-round for independent learners!

📚 Syllabus

Module 1: Containerization & Infrastructure as Code

Introduction to GCP

Docker and Docker Compose

Running PostgreSQL with Docker

Infrastructure setup with Terraform

📝 Homework: Deploy your first containerized service

Module 2: Workflow Orchestration

Data Lakes and Workflow Orchestration concepts

Workflow orchestration with Kestra

📝 Homework: Build a scheduled data pipeline

Workshop 1: Data Ingestion

API reading and pipeline scalability

Data normalization and incremental loading

📝 Homework: Create an incremental data loader

Module 3: Data Warehousing

Introduction to BigQuery

Partitioning, clustering, and best practices

Machine learning in BigQuery

📝 Homework: Optimize queries with partitioning

Module 4: Analytics Engineering

dbt (data build tool) with DuckDB & BigQuery

Testing, documentation, and deployment

Data visualization with Streamlit & Looker Studio

📝 Homework: Build a dbt project with tests

Module 5: Batch Processing

Introduction to Apache Spark

DataFrames and SQL

Internals of GroupBy and Joins

📝 Homework: Process large datasets with Spark

Module 6: Streaming

Introduction to Kafka

Kafka Streams and KSQL

Schema management with Avro

📝 Homework: Build a real-time streaming pipeline

Final Project

Apply all concepts in a real-world scenario

Peer review and feedback process

🏆 Certificate of completion for successful projects

🏗️ Prerequisites

To get the most out of this course, you should have:

Basic coding experience (any language)

Familiarity with SQL (SELECT, JOIN, GROUP BY)

Python experience (helpful but not required)

No prior data engineering experience needed! 🎉

🚀 Getting Started

For Cohort Participants:

1. Register for the 2026 cohort 2. Join the Slack community 3. Check the #course-data-engineering channel 4. Set up your development environment (instructions in Week 1)

For Self-Paced Learners:

1. Clone this repository:

BASH

   git clone https://github.com/DataTalksClub/data-engineering-zoomcamp.git

About

Data Engineering Zoomcamp is a free, community-driven, 9-week course designed to teach the fundamentals of building production-ready data pipelines. It's structured around hands-on projects, allowing participants to master key technologies like Docker, Terraform, BigQuery, Apache Spark, and Kafka. The course is led by experienced instructors and supported by an active global community on Slack for collaboration and troubleshooting

325 files

131 folders

15.69 MB total size

0 open issues

0 open pull requests

0 watchers

0 forks

0 stars

3388 views

Updated Jan 19, 2026

Recent Commits View all

Update README.md

WebDev committed Jan 19, 2026

Initial commit - Upload project 'data-engineering-zoomcamp'

WebDev committed Jan 19, 2026

Languages

YAML 39.0%

Java 26.4%

Python 17.8%

CSV 9.2%

SQL 3.0%

gradlew 1.5%

Dockerfile 1.0%

Shell 0.8%

Batch 0.5%

Makefile 0.3%

LICENSE 0.2%

XML 0.1%

TOML 0.1%

Text 0.0%

Data Engineering Zoomcamp 🚀

📋 **Table of Contents**

🎯 **Course Overview**

📅 **2026 Cohort**

📚 **Syllabus**

**Module 1: Containerization & Infrastructure as Code**

**Module 2: Workflow Orchestration**

**Workshop 1: Data Ingestion**

**Module 3: Data Warehousing**

**Module 4: Analytics Engineering**

**Module 5: Batch Processing**

**Module 6: Streaming**

**Final Project**

🏗️ **Prerequisites**

🚀 **Getting Started**

**For Cohort Participants:**

**For Self-Paced Learners:**

📋 Table of Contents

🎯 Course Overview

📅 2026 Cohort

📚 Syllabus

Module 1: Containerization & Infrastructure as Code

Module 2: Workflow Orchestration

Workshop 1: Data Ingestion

Module 3: Data Warehousing

Module 4: Analytics Engineering

Module 5: Batch Processing

Module 6: Streaming

Final Project

🏗️ Prerequisites

🚀 Getting Started

For Cohort Participants:

For Self-Paced Learners: