π Zepto E-commerce SQL Data Analyst Portfolio Project
This is a complete, real-world data analyst portfolio project based on an e-commerce inventory dataset scraped from Zepto β one of Indiaβs fastest-growing quick-commerce startups. This project simulates real analyst workflows, from raw data exploration to business-focused data analysis.
This project is perfect for:
- π Data Analyst aspirants who want to build a strong Portfolio Project for interviews and LinkedIn
- π Anyone learning SQL hands-on
- πΌ Preparing for interviews in retail, e-commerce, or product analytics
π₯ Watch this YouTube video to implement the full project from scratch:
π Link to Video: Watch on Youtube
π Project Overview
The goal is to simulate how actual data analysts in the e-commerce or retail industries work behind the scenes to use SQL to:
β Set up a messy, real-world e-commerce inventory database
β Perform Exploratory Data Analysis (EDA) to explore product categories, availability, and pricing inconsistencies
β Implement Data Cleaning to handle null values, remove invalid entries, and convert pricing from paise to rupees
β Write business-driven SQL queries to derive insights around pricing, inventory, stock availability, revenue and more
π Dataset Overview
The dataset was sourced from Kaggle and was originally scraped from Zeptoβs official product listings. It mimics what youβd typically encounter in a real-world e-commerce inventory system.
Each row represents a unique SKU (Stock Keeping Unit) for a product. Duplicate product names exist because the same product may appear multiple times in different package sizes, weights, discounts, or categories to improve visibility β exactly how real catalog data looks.
π§Ύ Columns:
-
sku_id: Unique identifier for each product entry (Synthetic Primary Key)
-
name: Product name as it appears on the app
-
category: Product category like Fruits, Snacks, Beverages, etc.
-
mrp: Maximum Retail Price (originally in paise, converted to βΉ)
-
discountPercent: Discount applied on MRP
-
discountedSellingPrice: Final price after discount (also converted to βΉ)
-
availableQuantity: Units available in inventory
-
weightInGms: Product weight in grams
-
outOfStock: Boolean flag indicating stock availability
-
quantity: Number of units per package (mixed with grams for loose produce)
π§ Project Workflow
Hereβs a step-by-step breakdown of what we do in this project:
1. Database & Table Creation
We start by creating a SQL table with appropriate data types:
CREATE TABLE zepto (
sku_id SERIAL PRIMARY KEY,
category VARCHAR(120),
name VARCHAR(150) NOT NULL,
mrp NUMERIC(8,2),
discountPercent NUMERIC(5,2),
availableQuantity INTEGER,
discountedSellingPrice NUMERIC(8,2),
weightInGms INTEGER,
outOfStock BOOLEAN,
quantity INTEGER
);
2. Data Import
-
Loaded CSV using pgAdmin's import feature.
-
If you're not able to use the import feature, write this code instead:
\copy zepto(category,name,mrp,discountPercent,availableQuantity, discountedSellingPrice,weightInGms,outOfStock,quantity) FROM 'data/zepto_v2.csv' WITH (FORMAT csv, HEADER true, DELIMITER ',', QUOTE '"', ENCODING 'UTF8'); -
Faced encoding issues (UTF-8 error), which were fixed by saving the CSV file using CSV UTF-8 format.
3. π Data Exploration
-
Counted the total number of records in the dataset
-
Viewed a sample of the dataset to understand structure and content
-
Checked for null values across all columns
-
Identified distinct product categories available in the dataset
-
Compared in-stock vs out-of-stock product counts
-
Detected products present multiple times, representing different SKUs
4. π§Ή Data Cleaning
-
Identified and removed rows where MRP or discounted selling price was zero
-
Converted mrp and discountedSellingPrice from paise to rupees for consistency and readability
5. π Business Insights
-
Found top 10 best-value products based on discount percentage
-
Identified high-MRP products that are currently out of stock
-
Estimated potential revenue for each product category
-
Filtered expensive products (MRP > βΉ500) with minimal discount
-
Ranked top 5 categories offering highest average discounts
-
Calculated price per gram to identify value-for-money products
-
Grouped products based on weight into Low, Medium, and Bulk categories
-
Measured total inventory weight per product category
π οΈ How to Use This Project
-
Clone the repository
git clone https://github.com/amlanmohanty/zepto-SQL-data-analysis-project.git cd zepto-SQL-data-analysis-project -
Open zepto_SQL_data_analysis.sql
This file contains:
-
Table creation
-
Data exploration
-
Data cleaning
-
SQL Business analysis
-
-
Load the dataset into pgAdmin or any other PostgreSQL client
-
Create a database and run the SQL file
-
Import the dataset (convert to UTF-8 if necessary)
-
-
Follow along with the YouTube video for full walkthrough. π¨βπΌ
π License
MIT β feel free to fork, star, and use in your portfolio.
π¨βπ» About the Author
Hey, Iβm Amlan Mohanty β a Data Analyst & Content Creator. I break down complex data topics into simple, practical content that actually helps you land a job.
π Stay Connected & Join the Data Drool Community
If you enjoyed this project and want to keep learning and growing as a data analyst, letβs stay in touch! I regularly share content around SQL, data analytics, portfolio projects, job tips, and more.
π₯ YouTube: Amlan Mohanty
- Beginner-friendly tutorials, real-world projects, job and career advice
πΊ Instagram: data.drool
- Quick SQL tips, data memes, and behind-the-scenes content
πΌ LinkedIn: Amlan Mohanty
- Letβs connect professionally and grow your data career