Skip to main content
Public
README.md 4.6 KB

Data Engineering Zoomcamp πŸš€

![GitHub stars](https://github.com/DataTalksClub/data-engineering-zoomcamp/stargazers) ![GitHub forks](https://github.com/DataTalksClub/data-engineering-zoomcamp/network/members) ![License](https://opensource.org/licenses/MIT) ![PRs Welcome](http://makeapullrequest.com)

Master the fundamentals of data engineering by building an end-to-end data pipeline from scratch.
A free 9-week course on building production-ready data pipelines.

πŸ“‹ **Table of Contents**

  • 🎯 Course Overview
  • πŸ“… 2026 Cohort
  • πŸ“š Syllabus
  • πŸ—οΈ Prerequisites
  • πŸš€ Getting Started
  • πŸ‘₯ Community & Support
  • πŸ‘¨β€πŸ« Instructors
  • πŸ™ Testimonials
  • 🀝 Sponsors
  • πŸ“Š Repository Stats
  • πŸ“„ License

  • 🎯 **Course Overview**

    Data Engineering Zoomcamp is a comprehensive, hands-on course designed to help you master modern data engineering tools and practices. You'll learn by doing - building a complete data pipeline from ingestion to visualization using industry-standard technologies.

    Key Features:

  • βœ… 100% Free - No hidden costs
  • βœ… Hands-on Projects - Learn by building real pipelines
  • βœ… Industry Tools - Docker, Terraform, BigQuery, Spark, Kafka, dbt
  • βœ… Active Community - 7,000+ learners on Slack
  • βœ… Production Focus - Learn best practices for real-world scenarios

  • πŸ“… **2026 Cohort**

    | Item | Details | |------|---------| | Start Date | January 12, 2026 | | Duration | 9 weeks | | Format | Cohort-based with self-paced option | | Registration | Sign up here |

    Self-Paced Option: All materials are available year-round for independent learners!


    πŸ“š **Syllabus**

    **Module 1: Containerization & Infrastructure as Code**

  • Introduction to GCP
  • Docker and Docker Compose
  • Running PostgreSQL with Docker
  • Infrastructure setup with Terraform
  • πŸ“ Homework: Deploy your first containerized service
  • **Module 2: Workflow Orchestration**

  • Data Lakes and Workflow Orchestration concepts
  • Workflow orchestration with Kestra
  • πŸ“ Homework: Build a scheduled data pipeline
  • **Workshop 1: Data Ingestion**

  • API reading and pipeline scalability
  • Data normalization and incremental loading
  • πŸ“ Homework: Create an incremental data loader
  • **Module 3: Data Warehousing**

  • Introduction to BigQuery
  • Partitioning, clustering, and best practices
  • Machine learning in BigQuery
  • πŸ“ Homework: Optimize queries with partitioning
  • **Module 4: Analytics Engineering**

  • dbt (data build tool) with DuckDB & BigQuery
  • Testing, documentation, and deployment
  • Data visualization with Streamlit & Looker Studio
  • πŸ“ Homework: Build a dbt project with tests
  • **Module 5: Batch Processing**

  • Introduction to Apache Spark
  • DataFrames and SQL
  • Internals of GroupBy and Joins
  • πŸ“ Homework: Process large datasets with Spark
  • **Module 6: Streaming**

  • Introduction to Kafka
  • Kafka Streams and KSQL
  • Schema management with Avro
  • πŸ“ Homework: Build a real-time streaming pipeline
  • **Final Project**

  • Apply all concepts in a real-world scenario
  • Peer review and feedback process
  • πŸ† Certificate of completion for successful projects

  • πŸ—οΈ **Prerequisites**

    To get the most out of this course, you should have:

  • Basic coding experience (any language)
  • Familiarity with SQL (SELECT, JOIN, GROUP BY)
  • Python experience (helpful but not required)
  • No prior data engineering experience needed! πŸŽ‰

  • πŸš€ **Getting Started**

    **For Cohort Participants:**

    1. Register for the 2026 cohort 2. Join the Slack community 3. Check the #course-data-engineering channel 4. Set up your development environment (instructions in Week 1)

    **For Self-Paced Learners:**

    1. Clone this repository:

    BASH
    1
       git clone https://github.com/DataTalksClub/data-engineering-zoomcamp.git
    About

    Data Engineering Zoomcamp is a free, community-driven, 9-week course designed to teach the fundamentals of building production-ready data pipelines. It's structured around hands-on projects, allowing participants to master key technologies like Docker, Terraform, BigQuery, Apache Spark, and Kafka. The course is led by experienced instructors and supported by an active global community on Slack for collaboration and troubleshooting


    325 files
    131 folders
    15.69 MB total size
    0 open issues
    0 open pull requests
    0 watchers
    0 forks
    0 stars
    573 views
    Updated Jan 19, 2026
    Languages
    YAML 39.0%
    Java 26.4%
    Python 17.8%
    CSV 9.2%
    SQL 3.0%
    gradlew 1.5%
    Dockerfile 1.0%
    Shell 0.8%
    Batch 0.5%
    Makefile 0.3%
    LICENSE 0.2%
    XML 0.1%
    TOML 0.1%
    Text 0.0%