Home
Softono
Snowflake_DBT_Project

Snowflake_DBT_Project

Open source Python
21
Stars
2
Forks
0
Issues
0
Watchers
2 weeks
Last Commit

About Snowflake_DBT_Project

Public Snowflake and dbt analytics engineering project for Inside Airbnb data, marts, tests, docs, and Streamlit dashboard.

Platforms

Web Self-hosted

Languages

Python

Links

Airbnb Snowflake dbt Analytics Project

Public analytics engineering portfolio project that loads the Inside Airbnb open dataset into Snowflake, transforms it with dbt, validates data quality, and powers a Streamlit reporting dashboard.

What This Project Demonstrates

  • Snowflake raw-zone setup for open data ingestion
  • Local CSV/GZIP loading into Snowflake internal stages
  • dbt staging, intermediate, and mart model layers
  • Incremental fact modeling with Snowflake merge strategy
  • dbt generic tests and singular tests for data quality
  • Model documentation with persisted docs support
  • Streamlit dashboard connected to analytics marts
  • Public-safe configuration with ignored local credential files

Dataset

Source: Inside Airbnb

Recommended classroom dataset: New York City, New York, United States.

Expected raw files:

data/raw/listings.csv.gz
data/raw/calendar.csv.gz
data/raw/reviews.csv.gz
data/raw/neighbourhoods.csv

Inside Airbnb does not provide booking transactions. This project uses calendar availability and price data as a classroom-friendly proxy for unavailable-night revenue.

Architecture

flowchart LR
    A["Inside Airbnb files<br/>listings, calendar, reviews, neighbourhoods"] --> B["Python loader<br/>load_inside_airbnb_to_snowflake.py"]
    B --> C["Snowflake internal stage<br/>INSIDE_AIRBNB_STAGE"]
    C --> D["RAW schema<br/>text-preserving raw tables"]
    D --> E["dbt staging views<br/>clean + cast + standardize"]
    E --> F["dbt intermediate layer<br/>joins + enrichment + revenue proxy"]
    F --> G["dbt marts<br/>dimensions, facts, monthly aggregates"]
    G --> H["Streamlit dashboard<br/>neighbourhood, host, listing analytics"]

    I["profiles.yml<br/>ignored local Snowflake config"] -.-> J["dbt CLI"]
    J --> E
    J --> F
    J --> G

    style A fill:#E0F2FE,stroke:#0284C7,color:#0F172A
    style B fill:#F8FAFC,stroke:#64748B,color:#0F172A
    style C fill:#FEF3C7,stroke:#D97706,color:#0F172A
    style D fill:#FEF3C7,stroke:#D97706,color:#0F172A
    style E fill:#DCFCE7,stroke:#16A34A,color:#0F172A
    style F fill:#DCFCE7,stroke:#16A34A,color:#0F172A
    style G fill:#DCFCE7,stroke:#16A34A,color:#0F172A
    style H fill:#0F172A,stroke:#22C55E,color:#FFFFFF

dbt Lineage

flowchart TB
    L["RAW.LISTINGS"] --> SL["stg_airbnb__listings"]
    C["RAW.CALENDAR"] --> SC["stg_airbnb__calendar"]
    R["RAW.REVIEWS"] --> SR["stg_airbnb__reviews"]
    N["RAW.NEIGHBOURHOODS"] --> SN["stg_airbnb__neighbourhoods"]

    SL --> IL["int_airbnb__listing_enriched"]
    SN --> IL
    SC --> IC["int_airbnb__calendar_enriched"]
    IL --> IC
    SR --> IR["int_airbnb__reviews_enriched"]
    IL --> IR

    IL --> DL["dim_listings"]
    IL --> DH["dim_hosts"]
    IC --> FC["fct_listing_calendar<br/>incremental merge"]
    IR --> FR["fct_reviews"]
    FC --> AL["agg_listing_monthly_performance"]
    AL --> AN["agg_neighbourhood_monthly_performance"]

    DL --> DASH["Streamlit dashboard"]
    DH --> DASH
    AL --> DASH
    AN --> DASH

    style L fill:#E0F2FE,stroke:#0284C7,color:#0F172A
    style C fill:#E0F2FE,stroke:#0284C7,color:#0F172A
    style R fill:#E0F2FE,stroke:#0284C7,color:#0F172A
    style N fill:#E0F2FE,stroke:#0284C7,color:#0F172A
    style SL fill:#DCFCE7,stroke:#16A34A,color:#0F172A
    style SC fill:#DCFCE7,stroke:#16A34A,color:#0F172A
    style SR fill:#DCFCE7,stroke:#16A34A,color:#0F172A
    style SN fill:#DCFCE7,stroke:#16A34A,color:#0F172A
    style IL fill:#FEF3C7,stroke:#D97706,color:#0F172A
    style IC fill:#FEF3C7,stroke:#D97706,color:#0F172A
    style IR fill:#FEF3C7,stroke:#D97706,color:#0F172A
    style DL fill:#F8FAFC,stroke:#64748B,color:#0F172A
    style DH fill:#F8FAFC,stroke:#64748B,color:#0F172A
    style FC fill:#F8FAFC,stroke:#64748B,color:#0F172A
    style FR fill:#F8FAFC,stroke:#64748B,color:#0F172A
    style AL fill:#F8FAFC,stroke:#64748B,color:#0F172A
    style AN fill:#F8FAFC,stroke:#64748B,color:#0F172A
    style DASH fill:#0F172A,stroke:#22C55E,color:#FFFFFF

dbt Model Layers

Layer Models Purpose
Sources RAW.LISTINGS, RAW.CALENDAR, RAW.REVIEWS, RAW.NEIGHBOURHOODS Raw files loaded into Snowflake
Staging stg_airbnb__* Cleaning, type casting, standard column names
Intermediate int_airbnb__* Reusable joins, enrichment, revenue proxy logic
Marts dim_listings, dim_hosts, fct_listing_calendar, fct_reviews Analytics-ready dimensions and facts
Aggregates agg_listing_monthly_performance, agg_neighbourhood_monthly_performance Reporting-ready monthly metrics

Repository Structure

.
├── analyses/
├── dashboard/
│   └── streamlit_app.py
├── data/raw/
├── dbt_project.yml
├── profiles.yml.example
├── config/
│   └── local_credentials.example.json
├── docs/
├── macros/
├── models/
│   ├── staging/
│   ├── intermediate/
│   └── marts/
├── scripts/
│   └── load_inside_airbnb_to_snowflake.py
├── setup/
│   └── snowflake_setup.sql
├── tests/
└── requirements.txt

Public-Safe Credential Setup

This public repository does not commit real credentials and does not require environment variables.

Create local-only files from the examples:

cp profiles.yml.example profiles.yml
cp config/local_credentials.example.json config/local_credentials.json

Then edit both local files with your Snowflake values. These files are ignored by git:

  • profiles.yml
  • config/local_credentials.json
  • .user.yml

Install

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Load Raw Data

python scripts/load_inside_airbnb_to_snowflake.py

The loader creates Snowflake objects from setup/snowflake_setup.sql, uploads the local raw files to a Snowflake internal stage, recreates raw tables from CSV headers, and copies data into the RAW schema.

Run dbt

dbt debug --profiles-dir .
dbt run --profiles-dir .
dbt test --profiles-dir .
dbt docs generate --profiles-dir .

For a new Inside Airbnb snapshot:

python scripts/load_inside_airbnb_to_snowflake.py
dbt run --full-refresh --select fct_listing_calendar+ --profiles-dir .
dbt test --profiles-dir .

Run the Dashboard

streamlit run dashboard/streamlit_app.py

The dashboard reads local Snowflake values from config/local_credentials.json.

Example Business Questions

  • Which neighbourhoods have the highest estimated unavailable-night revenue?
  • Which room types have the highest average daily price?
  • Which hosts manage the most listings?
  • Which neighbourhoods have the lowest availability rate?
  • How does listing availability vary by month?

Data Quality

The project includes tests for:

  • Source primary identifiers
  • Listing and review uniqueness
  • Required listing/calendar/review fields
  • Room type accepted values
  • Calendar-to-listing relationships
  • Non-negative prices
  • Duplicate listing-date checks

Security Notes

Do not commit:

  • Snowflake usernames or passwords
  • profiles.yml
  • config/local_credentials.json
  • dbt target/
  • dbt logs/
  • Python virtual environments

The committed files contain placeholders only.

Owner

Built by Durgesh Yadav as a Snowflake, dbt, and analytics engineering portfolio project.