Data Science Projects Portfolio
This portfolio holds the below projects.
Each project has a folder in this repository or another repository (for easy execution) for data, codes, and key takeaways. Some projects may cover multiple technical aspects, like data dashboarding containing the data engineering process.
Table of Contents
Part A Data Engineering
Job Postings on Linkedin
Tools:
- Python with Jupyter Notebook
- Data Transformation: dbt
- Data Loading: Airflow (Astro Cli)
- Data Visualisation: Power BI
- Data Quality Testing: Soda
- Data Lake: Google Cloud Storage
- Data Warehouse: BigQuery
- Data Orchestration: Airflow
Objectives:
- extract raw data from Kaggle, and process data for a read-to-use dataset
- reduce file size and identify schema by using parquet files
- achieve automation and monitorization with Airflow and dbt
- visualize data for insights with Power BI
PM2.5-Monitoring
Tools:
- Data Extraction, Transformation, Validation: API, Python
- Data Orchestration: Airflow
- Database: DuckDB
- Data Reporting: Streamlit
- Containerization: Docker and Docker Compose
Objectives:
- Ingest pm2.5 data into DuckDB daily
- Transformation is triggered by data ingestion in Airflow
- Streamlit container keeps running and monitors the pm2.5 data in real-time
Data Platform Design for Healthcare Research (Database)
Tool: MySQL
Objectives:
- identify how diseases begin and progress
- integration of genetics and healthcare data
- research-ready, well-curated and well-documented data
Nomalisation (SQL Server)
Tool: SQL Server
Objectives:
- Split a table into a fact table and dimension tables
- Set datatype, primary key, foreign key and referential integrity
Part B Exploratory Data Analysis and Data Modelling
ESG analysis for Pfizer (Exploratory Data Analysis and Linear Regression)
Tool: Python
Objectives:
- identify Pfizer company's position in the pharmaceutical industry
- visualise the development of Pfizer from 2016 to 2018
- linear regression between ESG score and total assets
BT Customer Churn Prediction (Python and Power BI)
Tool: Python and Power BI
Objectives:
- build a logistic regression model
- identify which feature will influence customer churn
Revenue increase strategy analysis for Google merchandise store (Business Intelligence)
Tool: Google Analytics and Looker
Objectives:
- map the persona of customers
- identify the performance of products
- identify the pattern of activity
- the funnel diagrams shows the buyer's journey
Part C Data Visualization and Dashboarding
Lloyds Bank Customer Profiling (Business Intelligence)
Tool: Power BI
Objectives:
- map the persona of customers
- analysis the features of customers based on the loan status variable
Education-Focused Analysis (Power BI and Python)
Tool: Python and Power BI
Objectives:
- Prepare a cleansed dataset for analysis
- A logical story to explain why the mix and weighting of assessment types changed the final result
A self-service platform for GDP, Life Satisfaction and Education Level
Tool: Tableau
Objectives:
- Provide users a platform to retrieve information about GDP, Life Satisfaction, and Education Level for countries in different year
- Give a general idea about this information for regions
- Check the relationship between education level and GDP per capita