ahmedshahriar

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Open Source

bd-medicine-scraper

# bd-medicine-scraper [![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/) ![Django CI](https://github.com/ahmedshahriar/bd-medicine-scraper/actions/workflows/django-ci.yml/badge.svg) [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/ahmedshahriarsakib/bangladesh-medicine-analytics) [![Open in Visual Studio Code](https://img.shields.io/static/v1?logo=visualstudiocode&label=&message=Open%20in%20Visual%20Studio%20Code&labelColor=2c2c32&color=007acc&logoColor=007acc)](https://github.dev/ahmedshahriar/bd-medicine-scraper) ## Overview Welcome to the bd-medicine-scraper repository! In this project, I scraped Medicine data (from [medex.com.bd](https://medex.com.bd)) using **scrapy** and integrated it with **Django REST Framework**. The data is stored in a **PostgreSQL** database. I designed the scraper in a way to keep the relations between models. I also customized the django admin panels, added additional features such as - - auto complete lookup relational fields - custom filtering (alphabetical, model property) - bulk actions (export to csv) Other Customizations: - custom scrapy command to run scrapy spiders from django command line. (ex- `python manage.py <spider_name>`) - custom django commands - to export models to csv. (`python manage.py <export_model_name> <export_data_path>`) ``` python manage.py export_medicine_data /home/ahmed/Desktop/medicine_data.csv - to export generic monograph PDFs ``` python manage.py export_generics_monograph I also added proxy configuration to scrapy. ## Run Create a python virtual environment and run these commands from root directory- ``` pip insrall -r requirements.txt ``` This will run the django app- ``` python manage.py runserver ``` NB: Migrate before running the app ``` python manage.py makemigrations && python manage.py migrate ``` To run all spiders- ``` python run_crawler.py ``` To run a specific spider- ``` python manage.py <spider_name> ``` ex - `python manage.py med` ## Data Analytics ### Dataset The scraped dataset is available in kaggle - - [Assorted Medicine Dataset of Bangladesh](https://www.kaggle.com/ahmedshahriarsakib/assorted-medicine-dataset-of-bangladesh) The dataset has 6 CSV files - Here is a list of the CSV files with their featured columns: 1. medicine.csv (21k+ entries) - brand name - medicine type (allopathic or herbal) - generic - strength - manufacturer - package container (unit price and pack info) - Package Size (unit price) 2. manufacturer.csv (245 entries) - name 3. indication.csv (2k+ entries) - name 4. generic.csv (~1700-1800 entries) - name - monographic link (PDF URL) - drug class - indication - generic details such as "Indication description", "Pharmacology description", "Dosage & Administration description" etc. 5. drug class.csv (~400 entries) - name 6. dosage form.csv (~120 entries) - name ### Analytics [Bangladesh Medicine Analytics - Notebook on Kaggle](https://www.kaggle.com/ahmedshahriarsakib/bangladesh-medicine-analytics) ## Tests Workflow script - [django-ci.yml](https://github.com/ahmedshahriar/bd-medicine-scraper/blob/dev/.github/workflows/django-ci.yml) Run the tests using: ``` coverage run --omit='*/venv/*' manage.py test ``` or ``` python manage.py test ``` Check the coverage ``` coverage html ``` ## Built With ``` Django==3.2.12 djangorestframework==3.12.2 django-admin-autocomplete-filter==0.7.1 django-filter==21.1 coverage==6.2 Scrapy==2.4.1 scrapy-djangoitem==1.1.1 psycopg2==2.9.3 ``` ## Preview ![django_admin_generics](https://user-images.githubusercontent.com/40615350/157111319-f84830b8-f9e3-4a3f-9f72-b0afc586ccb9.png) ![django_admin_medicine](https://user-images.githubusercontent.com/40615350/157111248-31ca4ee0-97e1-412e-92b1-31a451bb846c.png) ![django_admin_dosage_form](https://user-images.githubusercontent.com/40615350/157111180-98bb2b6a-bb15-4159-ba4b-48f92dd97538.png) ![django_admin_manufacturer](https://user-images.githubusercontent.com/40615350/157111404-3e3ff9e3-f9f4-4bd6-b176-c08fa32ecee1.png)

Browser Automation

100 Github Stars

Open Source

youtube-comment-scraper

# Parse Comments from Youtube Videos [![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ahmedshahriar/youtube-comment-scraper/main) This script will dump youtube video comments to a CSV from youtube video links. Video links can be placed inside a variable or list or CSV The script is based on [youtube-comment-downloader](https://github.com/egbertbouman/youtube-comment-downloader) It requires **pandas** and **requests** modules To run : `pip install -r requirements.txt` `python ytb_comment_scraper.py` By default, the script will download most recent 100 comments The comments will be dumped to a CSV file You can set the parameter values as you wish - ``` COMMENT_LIMIT : How many comments you want to download SORT_BY_POPULAR : filter comments by popularity (0 for True , 1 for false) SORT_BY_RECENT : filter comments by recently posted (0 for True , 1 for false) ``` Access to kaggle Notebook - [Scrape Youtube Comments For Free (No Google API)](https://www.kaggle.com/ahmedshahriarsakib/scrape-youtube-comments-for-free-no-google-api)

Analytics & BI Browser Automation

48 Github Stars

ahmedshahriar

Software by ahmedshahriar

bd-medicine-scraper

youtube-comment-scraper