Home
Softono
build-etl-using-ssis

build-etl-using-ssis

Open source MIT TSQL
22
Stars
5
Forks
0
Issues
3
Watchers
11 months
Last Commit

About build-etl-using-ssis

Starter project for building an ETL pipeline using SSIS in Visual Studio 2019

Platforms

Web Self-hosted

Languages

TSQL

Links

Build an ETL Pipeline using SSIS

This project provides a comprehensive guide for building an ETL (Extract, Transform, Load) pipeline using SQL Server Integration Services (SSIS) in Visual Studio 2019.


🎯 Goal

The primary goal of this project is to provide a basic yet practical solution for anyone looking to build a new ETL pipeline using SSIS, with AdventureWorks 2019 as the sample data source.


βœ… Prerequisites

Make sure the following components are installed and properly configured:

  • Visual Studio 2019
    Note: Visual Studio 2017 is also supported, but SSIS integration may behave differently.

  • SQL Server Integration Services (SSIS) Extension for VS 2019
    πŸ‘‰ Install SSIS for Visual Studio 2019

  • SQL Server (any supported edition)

  • SQL Server Management Studio (SSMS)

  • AdventureWorks 2019 Sample Database
    πŸ“¦ Download AdventureWorks2019.bak


🧭 Project Overview

The dataset simulates a sample retail system for a bicycle company. The objective is to build an end-to-end ETL pipeline to generate a variety of reports:

  • Sales reports by employee
  • Sales reports by region
  • Number of orders per employee
  • Number of orders per region
  • Sales by product subcategory
  • Sales by product and region

ETL Architecture Overview

ETL Architecture

πŸ› οΈ First-time ETL

Build the staging area, design the data warehouse, and create SSIS packages to perform initial extraction, transformation, and loading of data.

ETL Process - First Run

First ETL Overview

Steps:

  1. Import source data into SSMS
  2. Design a data warehouse schema to support reporting
  3. Create the data warehouse
    AVW_datawarehouse_createDatabase.sql
  4. Create the staging database
    AVW_staging_createDatabase.sql
  5. Extract data into the staging area
  6. Transform source data into dimensions and facts
  7. Load dimension tables (Level 1)
  8. Load dimension tables (Level 2)
  9. Load dimension tables (Level 3)
  10. Load SalesOrder fact table
  11. Load Product fact table
  12. Truncate all dimension and fact tables in the staging area to prepare for future ETL runs

πŸ” Next-time ETL

Incrementally extract only the new, updated, or deleted records from the source and process them.

πŸ’‘ Solution Strategy

  • New & Updated Records:
    Use Slowly Changing Dimension (SCD) and Lookup Transformation to compare incoming data with existing staging data.

  • Deleted Records:
    Mark records as deleted by using an IsDeleted column. Set it to 1 before loading, and back to 0 if matched with source data. Remaining 1 values indicate deletions.

Next-time ETL Architecture


πŸ”¨ Next-time ETL Tasks

  1. Set IsDeleted = 1 in staging for all records before extraction
  2. Extract new/updated/deleted records using:
    • SCD for small datasets
    • Lookup Transformation for large datasets
  3. Use Execute SQL Task to:
    • Set IsDeleted = 0 for matched IDs
    • Truncate Update_IsDelete_nameTable after update
  4. Delete records in staging with IsDeleted = 1
  5. Transform staging data into dimension and fact formats
  6. Update IsDeleted = 1 in DW for records not matched
  7. Load changes to DW using SCD and Lookup, but apply only soft deletes (do not delete rows)
    • Load Dimension Level 1
    • Load Dimension Level 2
    • Load Dimension Level 3
    • Load SalesOrder Facts
    • Load Product Facts
    • Update IsDeleted = 0 and truncate helper table
  8. Truncate all staging dimension/fact tables again to prepare for next cycle

πŸ“ Package Location

All SSIS packages are available in the /Package-SSIS directory.


πŸ“¬ Need Help?

If you have any questions or encounter any issues while implementing the ETL flow, feel free to reach out via email.

Thank you for your interest and happy building!