Home
Softono
m

murrough-foley

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Total Products
1

Software by murrough-foley

rs-trafilatura
Open Source

rs-trafilatura

rs-trafilatura is a high-performance Rust library for fast and accurate web content extraction. It serves as a port of the Python trafilatura and Go go-trafilatura projects, designed to extract clean, readable text from web pages while effectively removing boilerplate, navigation, menus, and advertisements. The library features machine learning-based page type classification that identifies seven distinct categories including articles, forums, products, collections, listings, documentation, and service pages with 96.6 F1 score accuracy on the ScrapingHub benchmark. Based on the detected page type, it applies specialized extraction profiles optimized for specific platforms and frameworks. A built-in extraction quality predictor uses an XGBoost model to provide confidence scores, helping users decide when to fallback to LLM processing for low-confidence results. Output supports GitHub Flavored Markdown with preserved formatting for headings, lists, tables, and code blocks, alongside rich metadata extraction fro

ML Frameworks Browser Automation
33 Github Stars