π Multi-Platform E-commerce Scraper
Professional Indonesian Market Analysis Tool
A comprehensive, production-ready scraper for Indonesian e-commerce platforms including Shopee, Tokopedia, and Lazada. Built with clean architecture, advanced analytics, and professional logging capabilities.
β¨ Features
π Core Functionality
- Multi-Platform Support: Shopee, Tokopedia, Lazada
- Advanced Product Analysis: Price trends, rating analysis, merchant insights
- Real-time Data Collection: Concurrent scraping with rate limiting
- Professional Logging: Comprehensive error tracking and performance monitoring
- Export Capabilities: JSON, CSV, Excel formats
- Interactive CLI: User-friendly command-line interface
π Analytics & Intelligence
- Price Analysis: Min/max/average pricing with market insights
- Rating Analysis: Product quality assessment across platforms
- Merchant Analysis: Top sellers and market concentration
- Category Analysis: Product distribution and trends
- Platform Comparison: Cross-platform price and availability analysis
π‘οΈ Robust Architecture
- Error Handling: Comprehensive retry mechanisms and graceful failures
- Rate Limiting: Respectful scraping with configurable delays
- User Agent Rotation: Anti-detection measures
- Modular Design: Easy to extend and maintain
- Configuration Management: Centralized settings and platform management
π Quick Start
Prerequisites
- Python 3.8 or higher
- pip package manager
- Virtual environment (recommended)
Installation
-
Clone the repository
git clone <repository-url> cd neru-scrapper -
Create and activate virtual environment (recommended)
# Create virtual environment python -m venv venv # Activate virtual environment # On macOS/Linux: source venv/bin/activate # On Windows: # venv\Scripts\activate -
Upgrade pip and install build tools
pip install --upgrade pip setuptools wheel -
Install dependencies
pip install -r requirements.txtIf you encounter metadata generation errors, try:
# Install packages individually pip install requests beautifulsoup4 selenium pandas lxml pip install fake-useragent python-dotenv openpyxl pip install plotly wordcloud matplotlib seaborn numpy pip install fastapi uvicorn pydantic pydantic-settings aiofiles -
Run the scraper
python main.py
π Usage
Interactive Mode
Launch the interactive CLI for guided product searches:
python main.py
The interactive mode provides:
- Platform selection
- Search term input
- Real-time progress tracking
- Automatic analysis and export
Command Line Arguments
python main.py --keyword "laptop gaming" --limit 50 --platforms shopee tokopedia
Available Arguments:
--keyword: Search term (required)--limit: Maximum results per platform (default: 50)--platforms: Specific platforms to search (default: all enabled)--output: Output format (json, csv, excel)--export-dir: Custom export directory
Example Usage
Search for baby care products:
python main.py --keyword "baby care" --limit 30
Compare laptop prices across platforms:
python main.py --keyword "laptop" --platforms shopee tokopedia lazada --limit 100
ποΈ Project Structure
neru-scrapper/
βββ main.py # Entry point and CLI interface
βββ multi_platform_scraper.py # Core scraper orchestration
βββ advanced_analyzer.py # Analysis and intelligence engine
βββ config.py # Configuration and constants
βββ logger.py # Professional logging system
βββ base_scraper.py # Base scraper class
βββ shopee_scraper.py # Shopee-specific implementation
βββ tokopedia_scraper.py # Tokopedia-specific implementation
βββ lazada_scraper.py # Lazada-specific implementation
βββ requirements.txt # Python dependencies
βββ data/ # Raw scraped data
βββ exports/ # Processed exports
βββ logs/ # Application logs
βββ reports/ # Analysis reports
βββ visualizations/ # Generated charts and graphs
βοΈ Configuration
Platform Configuration
Edit config.py to customize platform settings:
SUPPORTED_PLATFORMS = {
'shopee': {
'name': 'Shopee',
'enabled': True,
'region': 'id'
},
# ... other platforms
}
Scraping Parameters
Adjust default scraping behavior:
DEFAULT_CONFIG = {
'max_results_per_platform': 50,
'request_timeout': 30,
'retry_attempts': 3,
'delay_between_requests': 1.0,
'concurrent_requests': 5
}
π Output Formats
JSON Export
{
"search_metadata": {
"keyword": "laptop gaming",
"timestamp": "2025-07-04T09:03:23",
"total_products": 150
},
"products": [
{
"name": "ASUS ROG Gaming Laptop",
"price": 15000000,
"rating": 4.8,
"platform": "shopee",
"merchant": "ASUS Official Store"
}
],
"analysis": {
"price_analysis": {
"min_price": 8000000,
"max_price": 25000000,
"avg_price": 15500000
}
}
}
Analysis Reports
- Price Distribution: Statistical analysis of pricing trends
- Rating Insights: Quality assessment across platforms
- Market Share: Platform and merchant analysis
- Competitive Intelligence: Cross-platform price comparison
π§ Advanced Features
Custom Scrapers
Extend the base scraper for new platforms:
from base_scraper import BaseScraper
class NewPlatformScraper(BaseScraper):
def __init__(self, country='id'):
super().__init__(country)
self.base_url = 'https://newplatform.co.id'
def search_products(self, keyword, limit=50):
# Implementation
pass
Analytics Integration
Access advanced analytics programmatically:
from advanced_analyzer import AdvancedAnalyzer
analyzer = AdvancedAnalyzer()
results = analyzer.analyze_products(product_data)
print(results['price_analysis'])
π Troubleshooting
Installation Issues
Metadata Generation Failed Error:
# Solution 1: Upgrade pip and build tools
pip install --upgrade pip setuptools wheel
# Solution 2: Install packages individually
pip install requests beautifulsoup4 selenium pandas lxml
pip install fake-useragent python-dotenv openpyxl
pip install plotly wordcloud matplotlib seaborn numpy
# Solution 3: Use conda instead of pip
conda install requests beautifulsoup4 selenium pandas lxml
conda install -c conda-forge fake-useragent python-dotenv openpyxl
SSL Certificate Errors:
pip install --upgrade certifi requests
# Or on macOS:
/Applications/Python\ 3.x/Install\ Certificates.command
Virtual Environment Issues:
# Recreate virtual environment
rm -rf venv
python -m venv venv
source venv/bin/activate # macOS/Linux
pip install --upgrade pip setuptools wheel
Runtime Issues
Rate Limiting:
- Increase delay between requests in
config.py - Reduce concurrent request count
- Use VPN if IP is temporarily blocked
Missing Dependencies:
pip install -r requirements.txt --upgrade
Permission Errors:
# Use --user flag if global installation fails
pip install --user -r requirements.txt
Logging
Check logs/scraper.log for detailed error information:
tail -f logs/scraper.log
Note: Log files are automatically excluded from git tracking to prevent large file issues. The logs/ directory and *.log files are in .gitignore.
π€ Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Guidelines
- Follow PEP 8 style guidelines
- Add comprehensive docstrings
- Include unit tests for new features
- Update documentation as needed
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Roadmap
Upcoming Features
- [ ] Web Dashboard: Real-time monitoring and control panel
- [ ] API Endpoints: RESTful API for programmatic access
- [ ] Database Integration: PostgreSQL/MongoDB support
- [ ] Machine Learning: Price prediction and trend analysis
- [ ] Mobile App: React Native companion app
- [ ] Cloud Deployment: Docker containers and Kubernetes support
Platform Expansion
- [ ] Blibli support
- [ ] Bukalapak integration
- [ ] Amazon Indonesia
- [ ] JD.ID support
π‘ Use Cases
Market Research
- Competitive Analysis: Compare pricing strategies across platforms
- Product Discovery: Identify trending products and categories
- Market Entry: Assess market saturation and opportunities
Business Intelligence
- Price Monitoring: Track competitor pricing in real-time
- Inventory Planning: Analyze product availability and stock levels
- Vendor Analysis: Evaluate merchant performance and ratings
Academic Research
- E-commerce Studies: Analyze Indonesian digital market trends
- Consumer Behavior: Study purchasing patterns and preferences
- Economic Analysis: Monitor inflation and price changes
π Support
For questions, issues, or feature requests:
- GitHub Issues: Create an issue
- Email: [email protected]
- Documentation: Wiki
Made with β€οΈ for the Indonesian e-commerce ecosystem
Last updated: July 4, 2025