Home
Softono
databricks_bootcamp_2026

databricks_bootcamp_2026

Open source MIT Jupyter Notebook
338
Stars
167
Forks
3
Issues
13
Watchers
4 months
Last Commit

About databricks_bootcamp_2026

End-to-end Data Lakehouse project built on Databricks, following the Medallion Architecture (Bronze, Silver, Gold). Covers real-world data engineering and analytics workflows using Spark, PySpark, SQL, Delta Lake, and Unity Catalog. Designed for learning, portfolio building, and job interviews.

Platforms

Web Self-hosted

Languages

Jupyter Notebook

Databricks Bootcamp 2026

Welcome to the Databricks Data Lakehouse Project by Data With Baraa.

This repository contains a complete, real-world Data Lakehouse implementation built on Databricks, including datasets, notebooks, SQL examples, and exercises. Everything here is designed to help you understand how modern data teams use Databricks in practice, from data ingestion and transformation to analytics-ready data products.


⚠️ Important Note

Build this project on your own first using the Notion roadmap.
Use this repository only as a reference if you get stuck.

Before starting, watch the Databricks Bootcamp, where I explain the architecture and decisions behind this project.


πŸ—οΈ Architecture

This project follows the Medallion Architecture:

πŸ₯‰ Bronze Layer

  • Raw data ingestion
  • Schema inference and storage as Delta tables

πŸ₯ˆ Silver Layer

  • Data cleaning and standardization
  • Type casting and validation

πŸ₯‡ Gold Layer

  • Dimensional Data Model (Business Transformation)
  • Ready for BI and analysis

πŸ› οΈ Technologies Used

  • Databricks
  • Apache Spark
  • PySpark
  • Spark SQL
  • Delta Lake
  • Unity Catalog

Prerequisites

  • Basic SQL, Python and some Pyspark knowledge
  • No prior Databricks experience required

β˜• Stay Connected

🌍 Connect With Me

YouTube LinkedIn Website Newsletter


πŸŽ“ Courses (Structured & Certified)


▢️ Free YouTube Courses


πŸ›‘οΈ License

This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.

🌟 About Me

Hi, I’m Baraa Khatib Salkini, also known as Data With Baraa. I’m a senior data professional and educator with over 17 years of industry experience, working across data engineering, analytics, and modern data platforms. I’ve led large-scale data projects in real companies and now focus on teaching practical, real-world data skills through my courses, YouTube content, and bootcamps. My goal is simple: help you understand how data actually works in real systems, not just how to write code.