Home
Softono
screaming-frog-shingling

screaming-frog-shingling

Open source MIT Python
46
Stars
12
Forks
0
Issues
4
Watchers
6 years
Last Commit

About screaming-frog-shingling

Uses Screaming Frog Internal HTML with text extraction along with a shingling algorithm to compare content duplication across the pages of a crawled site.

Platforms

Web Self-hosted

Languages

Python

Links

screaming-frog-shingling

Uses Screaming Frog Internal HTML with text extraction along with a shingling algorithm to compare content duplication across the pages of a crawled site.

Example Usage

  1. pip install -r requirements.txt

  2. Run Screaming Frog and use Extraction to pull the content out of a specific DOM element. Screaming Frog Extraction

  3. Export the internal HTML to a CSV file. Export Internal HTML

  4. Run the script using the following arguments.

 Example Usage:
    -i : Input filename
    -o : Output filename
    -c : Column from Screaming Frog that contains your extracted content.
    Example invocation:
    python sf_shingling.py -i internal_html_ap.csv -o output_html_ap.csv -c "BodyContent 1"