Home
Softono
neru-scrapper

neru-scrapper

Open source Python
31
Stars
1
Forks
1
Issues
0
Watchers
11 months
Last Commit

About neru-scrapper

A comprehensive, production-ready scraper for Indonesian e-commerce platforms including Shopee, Tokopedia, and Lazada. Built with clean architecture, advanced analytics, and professional logging capabilities.

Platforms

Web Self-hosted

Languages

Python

Links

πŸ›’ Multi-Platform E-commerce Scraper

Professional Indonesian Market Analysis Tool

A comprehensive, production-ready scraper for Indonesian e-commerce platforms including Shopee, Tokopedia, and Lazada. Built with clean architecture, advanced analytics, and professional logging capabilities.

✨ Features

πŸš€ Core Functionality

  • Multi-Platform Support: Shopee, Tokopedia, Lazada
  • Advanced Product Analysis: Price trends, rating analysis, merchant insights
  • Real-time Data Collection: Concurrent scraping with rate limiting
  • Professional Logging: Comprehensive error tracking and performance monitoring
  • Export Capabilities: JSON, CSV, Excel formats
  • Interactive CLI: User-friendly command-line interface

πŸ“Š Analytics & Intelligence

  • Price Analysis: Min/max/average pricing with market insights
  • Rating Analysis: Product quality assessment across platforms
  • Merchant Analysis: Top sellers and market concentration
  • Category Analysis: Product distribution and trends
  • Platform Comparison: Cross-platform price and availability analysis

πŸ›‘οΈ Robust Architecture

  • Error Handling: Comprehensive retry mechanisms and graceful failures
  • Rate Limiting: Respectful scraping with configurable delays
  • User Agent Rotation: Anti-detection measures
  • Modular Design: Easy to extend and maintain
  • Configuration Management: Centralized settings and platform management

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • pip package manager
  • Virtual environment (recommended)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd neru-scrapper
  2. Create and activate virtual environment (recommended)

    # Create virtual environment
    python -m venv venv
    
    # Activate virtual environment
    # On macOS/Linux:
    source venv/bin/activate
    # On Windows:
    # venv\Scripts\activate
  3. Upgrade pip and install build tools

    pip install --upgrade pip setuptools wheel
  4. Install dependencies

    pip install -r requirements.txt

    If you encounter metadata generation errors, try:

    # Install packages individually
    pip install requests beautifulsoup4 selenium pandas lxml
    pip install fake-useragent python-dotenv openpyxl
    pip install plotly wordcloud matplotlib seaborn numpy
    pip install fastapi uvicorn pydantic pydantic-settings aiofiles
  5. Run the scraper

    python main.py

πŸ“– Usage

Interactive Mode

Launch the interactive CLI for guided product searches:

python main.py

The interactive mode provides:

  • Platform selection
  • Search term input
  • Real-time progress tracking
  • Automatic analysis and export

Command Line Arguments

python main.py --keyword "laptop gaming" --limit 50 --platforms shopee tokopedia

Available Arguments:

  • --keyword: Search term (required)
  • --limit: Maximum results per platform (default: 50)
  • --platforms: Specific platforms to search (default: all enabled)
  • --output: Output format (json, csv, excel)
  • --export-dir: Custom export directory

Example Usage

Search for baby care products:

python main.py --keyword "baby care" --limit 30

Compare laptop prices across platforms:

python main.py --keyword "laptop" --platforms shopee tokopedia lazada --limit 100

πŸ—οΈ Project Structure

neru-scrapper/
β”œβ”€β”€ main.py                    # Entry point and CLI interface
β”œβ”€β”€ multi_platform_scraper.py  # Core scraper orchestration
β”œβ”€β”€ advanced_analyzer.py       # Analysis and intelligence engine
β”œβ”€β”€ config.py                  # Configuration and constants
β”œβ”€β”€ logger.py                  # Professional logging system
β”œβ”€β”€ base_scraper.py           # Base scraper class
β”œβ”€β”€ shopee_scraper.py         # Shopee-specific implementation
β”œβ”€β”€ tokopedia_scraper.py      # Tokopedia-specific implementation
β”œβ”€β”€ lazada_scraper.py         # Lazada-specific implementation
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ data/                     # Raw scraped data
β”œβ”€β”€ exports/                  # Processed exports
β”œβ”€β”€ logs/                     # Application logs
β”œβ”€β”€ reports/                  # Analysis reports
└── visualizations/           # Generated charts and graphs

βš™οΈ Configuration

Platform Configuration

Edit config.py to customize platform settings:

SUPPORTED_PLATFORMS = {
    'shopee': {
        'name': 'Shopee',
        'enabled': True,
        'region': 'id'
    },
    # ... other platforms
}

Scraping Parameters

Adjust default scraping behavior:

DEFAULT_CONFIG = {
    'max_results_per_platform': 50,
    'request_timeout': 30,
    'retry_attempts': 3,
    'delay_between_requests': 1.0,
    'concurrent_requests': 5
}

πŸ“Š Output Formats

JSON Export

{
  "search_metadata": {
    "keyword": "laptop gaming",
    "timestamp": "2025-07-04T09:03:23",
    "total_products": 150
  },
  "products": [
    {
      "name": "ASUS ROG Gaming Laptop",
      "price": 15000000,
      "rating": 4.8,
      "platform": "shopee",
      "merchant": "ASUS Official Store"
    }
  ],
  "analysis": {
    "price_analysis": {
      "min_price": 8000000,
      "max_price": 25000000,
      "avg_price": 15500000
    }
  }
}

Analysis Reports

  • Price Distribution: Statistical analysis of pricing trends
  • Rating Insights: Quality assessment across platforms
  • Market Share: Platform and merchant analysis
  • Competitive Intelligence: Cross-platform price comparison

πŸ”§ Advanced Features

Custom Scrapers

Extend the base scraper for new platforms:

from base_scraper import BaseScraper

class NewPlatformScraper(BaseScraper):
    def __init__(self, country='id'):
        super().__init__(country)
        self.base_url = 'https://newplatform.co.id'

    def search_products(self, keyword, limit=50):
        # Implementation
        pass

Analytics Integration

Access advanced analytics programmatically:

from advanced_analyzer import AdvancedAnalyzer

analyzer = AdvancedAnalyzer()
results = analyzer.analyze_products(product_data)
print(results['price_analysis'])

πŸ› Troubleshooting

Installation Issues

Metadata Generation Failed Error:

# Solution 1: Upgrade pip and build tools
pip install --upgrade pip setuptools wheel

# Solution 2: Install packages individually
pip install requests beautifulsoup4 selenium pandas lxml
pip install fake-useragent python-dotenv openpyxl
pip install plotly wordcloud matplotlib seaborn numpy

# Solution 3: Use conda instead of pip
conda install requests beautifulsoup4 selenium pandas lxml
conda install -c conda-forge fake-useragent python-dotenv openpyxl

SSL Certificate Errors:

pip install --upgrade certifi requests
# Or on macOS:
/Applications/Python\ 3.x/Install\ Certificates.command

Virtual Environment Issues:

# Recreate virtual environment
rm -rf venv
python -m venv venv
source venv/bin/activate  # macOS/Linux
pip install --upgrade pip setuptools wheel

Runtime Issues

Rate Limiting:

  • Increase delay between requests in config.py
  • Reduce concurrent request count
  • Use VPN if IP is temporarily blocked

Missing Dependencies:

pip install -r requirements.txt --upgrade

Permission Errors:

# Use --user flag if global installation fails
pip install --user -r requirements.txt

Logging

Check logs/scraper.log for detailed error information:

tail -f logs/scraper.log

Note: Log files are automatically excluded from git tracking to prevent large file issues. The logs/ directory and *.log files are in .gitignore.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add comprehensive docstrings
  • Include unit tests for new features
  • Update documentation as needed

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸš€ Roadmap

Upcoming Features

  • [ ] Web Dashboard: Real-time monitoring and control panel
  • [ ] API Endpoints: RESTful API for programmatic access
  • [ ] Database Integration: PostgreSQL/MongoDB support
  • [ ] Machine Learning: Price prediction and trend analysis
  • [ ] Mobile App: React Native companion app
  • [ ] Cloud Deployment: Docker containers and Kubernetes support

Platform Expansion

  • [ ] Blibli support
  • [ ] Bukalapak integration
  • [ ] Amazon Indonesia
  • [ ] JD.ID support

πŸ’‘ Use Cases

Market Research

  • Competitive Analysis: Compare pricing strategies across platforms
  • Product Discovery: Identify trending products and categories
  • Market Entry: Assess market saturation and opportunities

Business Intelligence

  • Price Monitoring: Track competitor pricing in real-time
  • Inventory Planning: Analyze product availability and stock levels
  • Vendor Analysis: Evaluate merchant performance and ratings

Academic Research

  • E-commerce Studies: Analyze Indonesian digital market trends
  • Consumer Behavior: Study purchasing patterns and preferences
  • Economic Analysis: Monitor inflation and price changes

πŸ“ž Support

For questions, issues, or feature requests:


Made with ❀️ for the Indonesian e-commerce ecosystem

Last updated: July 4, 2025