Home
Softono
Universal-News-Scraper

Universal-News-Scraper

Open source MIT Python
13
Stars
1
Forks
0
Issues
0
Watchers
4 months
Last Commit

About Universal-News-Scraper

A robust CLI news scraper and aggregator. Features topic auto-discovery (via Bing RSS), anti-blocking logic, keyword/date filtering, and JSON/CSV export. Built with Python & Rich.

Platforms

Web Self-hosted

Languages

Python

Links

๐ŸŒ Universal News Scraper v4.1

Python License RSS

A powerful, terminal-based news aggregator that supports RSS feeds, Web Scraping, and Topic Auto-Discovery via Bing News RSS.

Demo Screenshot


โœจ Key Features

Feature Description
๐Ÿ•ต๏ธ Auto-Discovery Find news on ANY topic (Crypto, Sports, Politics, AI) without knowing the URL
๐Ÿ“‚ Preset Categories 6 built-in categories with 30+ international news sources
๐Ÿ›ก๏ธ Anti-Blocking Random User-Agent rotation to bypass restrictions
๐Ÿ’พ Multi-Format Export Save results as CSV, JSON, HTML, or ALL formats
๐ŸŽจ HTML Reports Beautiful dark-themed HTML reports with article cards
๐Ÿ”‡ Noise Filter Automatically filters out generic Bing category entries
๐Ÿ”— Real URL Extraction Extracts actual article URLs from Bing redirects
๐Ÿ“… Date Filtering Only get articles from a specific date onwards
๐Ÿ”‘ Keyword Filtering Filter articles by multiple keywords
๐Ÿ”„ Settings Memory Remembers your last configuration for quick re-runs

๐Ÿ†• What's New in v4.1

  • ๐ŸŒ HTML Export - Beautiful dark-themed HTML reports
  • ๐Ÿ”‡ Enhanced Noise Filter - Filters generic Bing entries (Top stories, Entertainment, etc.)
  • ๐Ÿ”— Real URL Extraction - Extracts actual article URLs from Bing redirects
  • ๐Ÿ“ฐ Real Source Detection - Shows the actual news source instead of "Bing"
  • ๐Ÿ“ค 4 Export Options - CSV, JSON, HTML, or All formats

๐Ÿ“‚ Preset Categories

Category Sources
๐Ÿ“ฐ International News BBC, CNN, Reuters, Al Jazeera, The Guardian, NPR
โšฝ Sports ESPN, BBC Sport, Sky Sports, Bleacher Report
๐Ÿ’ป Tech & Science TechCrunch, The Verge, Wired, Ars Technica, Space.com
๐Ÿ”’ Cybersecurity The Hacker News, BleepingComputer, Krebs, Dark Reading
๐Ÿ’ฐ Business & Finance Bloomberg, CNBC, Financial Times, CoinDesk, CoinTelegraph
๐ŸŽฌ Entertainment Variety, Hollywood Reporter, IGN, Kotaku

๐Ÿš€ Quick Start

1. Clone the Repository

git clone https://github.com/Ilias1988/Universal-News-Scraper.git
cd Universal-News-Scraper

2. Create Virtual Environment (Recommended)

python -m venv venv

# Windows
venv\Scripts\activate

# Linux/Mac
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Run the Scraper

python scraper.py

๐Ÿ“– Usage Guide

Main Menu

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚  ๐ŸŒ UNIVERSAL NEWS SCRAPER v4.1         โ”‚
โ”‚  Powered by Python & Bing RSS           โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Main Menu โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ [1] ๐Ÿ”„ Use previous settings               โ”‚
โ”‚ [2] ๐Ÿ“ Enter new settings manually         โ”‚
โ”‚ [3] ๐Ÿ•ต๏ธ Auto-Discover & Scrape by Topic     โ”‚  โ† Recommended!
โ”‚ [4] ๐Ÿ“‹ Choose from preset sources          โ”‚
โ”‚ [5] โŒ Exit                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Export Format Options

๐Ÿ“ค Export Format:
  [1] CSV only
  [2] JSON only
  [3] HTML only      โ† Beautiful dark-themed report!
  [4] All formats    โ† CSV + JSON + HTML

๐Ÿ“ค Output Formats

CSV Output (results.csv)

title,url,date,description,source,matched_keywords
"AI Revolution in 2026...",https://techcrunch.com/...,2026-01-20,"Description...",Techcrunch,"AI, technology"

JSON Output (results.json)

[
  {
    "title": "AI Revolution in 2026...",
    "url": "https://techcrunch.com/...",
    "date": "2026-01-20",
    "description": "Description...",
    "source": "Techcrunch",
    "matched_keywords": "AI, technology"
  }
]

HTML Output (results.html)

Beautiful dark-themed report with:

  • ๐Ÿ“Š Stats header showing article count
  • ๐Ÿ“ฐ Article cards with hover effects
  • ๐Ÿท๏ธ Keyword badges
  • ๐Ÿ”— Clickable links to original articles
  • ๐Ÿ“ฑ Responsive design

๐Ÿ› ๏ธ Requirements

requests>=2.31.0
beautifulsoup4>=4.12.0
feedparser>=6.0.0
fake-useragent>=1.4.0
htmldate>=1.6.0
rich>=13.7.0
lxml>=4.9.0

๐Ÿ“ Project Structure

Universal-News-Scraper/
โ”œโ”€โ”€ scraper.py           # Main application
โ”œโ”€โ”€ sources.json         # Preset RSS sources (editable)
โ”œโ”€โ”€ requirements.txt     # Python dependencies
โ”œโ”€โ”€ .scraper_config.json # Auto-saved settings (ignored by git)
โ”œโ”€โ”€ .gitignore           # Git ignore file
โ”œโ”€โ”€ LICENSE              # MIT License
โ””โ”€โ”€ README.md            # This file

๐Ÿ“Œ Examples

Example 1: Find Bitcoin News

Select option: 3
Enter topic: Bitcoin
Keywords: (empty for all)
Export format: 4 (All)
โ†’ Saves bitcoin_news.csv, bitcoin_news.json, bitcoin_news.html

Example 2: Scrape Cybersecurity Sources

Select option: 4
Select category: 4 (Cybersecurity)
Select sources: A (ALL)
Keywords: ransomware
Export format: 3 (HTML)
โ†’ Generates beautiful HTML report

โš ๏ธ Disclaimer

This tool is intended for educational and research purposes only.

  • Always respect websites' Terms of Service
  • Don't overwhelm servers with excessive requests
  • Use responsibly for legitimate research and news aggregation

๐Ÿ“„ License

MIT License - Feel free to use and modify!


๐Ÿ”„ Changelog

v4.1 (Current)

  • ๐ŸŒ Added HTML Export with dark theme
  • ๐Ÿ”‡ Enhanced Noise Filter for Bing RSS
  • ๐Ÿ”— Real URL Extraction from Bing redirects
  • ๐Ÿ“ฐ Real Source Detection (shows actual source, not "Bing")
  • ๐Ÿ“ค 4 export options (CSV, JSON, HTML, All)

v4.0

  • ๐ŸŽจ Complete UI rebrand - "Universal News Scraper"
  • ๐ŸŒ Switched from Google Search to Bing News RSS
  • ๐Ÿ“‚ 6 international preset categories with 30+ sources

v3.0

  • Added Topic Discovery via Google Search
  • Cybersecurity-focused preset sources

Happy Scraping! ๐ŸŒ๐Ÿ“ฐ