Home
Softono
zepto-SQL-data-analysis-project

zepto-SQL-data-analysis-project

Open source MIT
45
Stars
17
Forks
1
Issues
0
Watchers
9 months
Last Commit

About zepto-SQL-data-analysis-project

Complete Data Analyst Portfolio Project with end-to-end SQL Data Analysis of Zepto E-commerce Inventory data using PostgreSQL.

Platforms

Web Self-hosted

Links

πŸ›’ Zepto E-commerce SQL Data Analyst Portfolio Project

This is a complete, real-world data analyst portfolio project based on an e-commerce inventory dataset scraped from Zepto β€” one of India’s fastest-growing quick-commerce startups. This project simulates real analyst workflows, from raw data exploration to business-focused data analysis.

This project is perfect for:

  • πŸ“Š Data Analyst aspirants who want to build a strong Portfolio Project for interviews and LinkedIn
  • πŸ“š Anyone learning SQL hands-on
  • πŸ’Ό Preparing for interviews in retail, e-commerce, or product analytics

πŸŽ₯ Watch this YouTube video to implement the full project from scratch:

SQL Data Analyst Portfolio Project using Zepto Inventory Dataset πŸ”— Link to Video: Watch on Youtube

πŸ“Œ Project Overview

The goal is to simulate how actual data analysts in the e-commerce or retail industries work behind the scenes to use SQL to:

βœ… Set up a messy, real-world e-commerce inventory database

βœ… Perform Exploratory Data Analysis (EDA) to explore product categories, availability, and pricing inconsistencies

βœ… Implement Data Cleaning to handle null values, remove invalid entries, and convert pricing from paise to rupees

βœ… Write business-driven SQL queries to derive insights around pricing, inventory, stock availability, revenue and more

πŸ“ Dataset Overview

The dataset was sourced from Kaggle and was originally scraped from Zepto’s official product listings. It mimics what you’d typically encounter in a real-world e-commerce inventory system.

Each row represents a unique SKU (Stock Keeping Unit) for a product. Duplicate product names exist because the same product may appear multiple times in different package sizes, weights, discounts, or categories to improve visibility – exactly how real catalog data looks.

🧾 Columns:

  • sku_id: Unique identifier for each product entry (Synthetic Primary Key)

  • name: Product name as it appears on the app

  • category: Product category like Fruits, Snacks, Beverages, etc.

  • mrp: Maximum Retail Price (originally in paise, converted to β‚Ή)

  • discountPercent: Discount applied on MRP

  • discountedSellingPrice: Final price after discount (also converted to β‚Ή)

  • availableQuantity: Units available in inventory

  • weightInGms: Product weight in grams

  • outOfStock: Boolean flag indicating stock availability

  • quantity: Number of units per package (mixed with grams for loose produce)

πŸ”§ Project Workflow

Here’s a step-by-step breakdown of what we do in this project:

1. Database & Table Creation

We start by creating a SQL table with appropriate data types:

CREATE TABLE zepto (
  sku_id SERIAL PRIMARY KEY,
  category VARCHAR(120),
  name VARCHAR(150) NOT NULL,
  mrp NUMERIC(8,2),
  discountPercent NUMERIC(5,2),
  availableQuantity INTEGER,
  discountedSellingPrice NUMERIC(8,2),
  weightInGms INTEGER,
  outOfStock BOOLEAN,
  quantity INTEGER
);

2. Data Import

  • Loaded CSV using pgAdmin's import feature.

  • If you're not able to use the import feature, write this code instead:

    \copy zepto(category,name,mrp,discountPercent,availableQuantity,
             discountedSellingPrice,weightInGms,outOfStock,quantity)
    FROM 'data/zepto_v2.csv' WITH (FORMAT csv, HEADER true, DELIMITER ',', QUOTE '"', ENCODING 'UTF8');
  • Faced encoding issues (UTF-8 error), which were fixed by saving the CSV file using CSV UTF-8 format.

3. πŸ” Data Exploration

  • Counted the total number of records in the dataset

  • Viewed a sample of the dataset to understand structure and content

  • Checked for null values across all columns

  • Identified distinct product categories available in the dataset

  • Compared in-stock vs out-of-stock product counts

  • Detected products present multiple times, representing different SKUs

4. 🧹 Data Cleaning

  • Identified and removed rows where MRP or discounted selling price was zero

  • Converted mrp and discountedSellingPrice from paise to rupees for consistency and readability

5. πŸ“Š Business Insights

  • Found top 10 best-value products based on discount percentage

  • Identified high-MRP products that are currently out of stock

  • Estimated potential revenue for each product category

  • Filtered expensive products (MRP > β‚Ή500) with minimal discount

  • Ranked top 5 categories offering highest average discounts

  • Calculated price per gram to identify value-for-money products

  • Grouped products based on weight into Low, Medium, and Bulk categories

  • Measured total inventory weight per product category

πŸ› οΈ How to Use This Project

  1. Clone the repository

    git clone https://github.com/amlanmohanty/zepto-SQL-data-analysis-project.git
    cd zepto-SQL-data-analysis-project
  2. Open zepto_SQL_data_analysis.sql

    This file contains:

    • Table creation

    • Data exploration

    • Data cleaning

    • SQL Business analysis

  3. Load the dataset into pgAdmin or any other PostgreSQL client

    • Create a database and run the SQL file

    • Import the dataset (convert to UTF-8 if necessary)

  4. Follow along with the YouTube video for full walkthrough. πŸ‘¨β€πŸ’Ό

πŸ“œ License

MIT β€” feel free to fork, star, and use in your portfolio.

πŸ‘¨β€πŸ’» About the Author

Hey, I’m Amlan Mohanty β€” a Data Analyst & Content Creator. I break down complex data topics into simple, practical content that actually helps you land a job.

πŸš€ Stay Connected & Join the Data Drool Community

If you enjoyed this project and want to keep learning and growing as a data analyst, let’s stay in touch! I regularly share content around SQL, data analytics, portfolio projects, job tips, and more.

πŸŽ₯ YouTube: Amlan Mohanty

  • Beginner-friendly tutorials, real-world projects, job and career advice

πŸ“Ί Instagram: data.drool

  • Quick SQL tips, data memes, and behind-the-scenes content

πŸ’Ό LinkedIn: Amlan Mohanty

  • Let’s connect professionally and grow your data career

πŸ’‘ Thanks for checking out the project! Your support means a lot β€” feel free to star ⭐ this repo or share it with someone learning SQL.πŸš€