Home
Softono
a

alex000kim

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Total Products
2

Software by alex000kim

nsfw_data_scraper
Open Source

nsfw_data_scraper

# NSFW Data Scraper ## Note: use with caution - the dataset is noisy ## Description This is a set of scripts that allows for an automatic collection of _tens of thousands_ of images for the following (loosely defined) categories to be later used for training an image classifier: - `porn` - pornography images - `hentai` - hentai images, but also includes pornographic drawings - `sexy` - sexually explicit images, but not pornography. Think nude photos, playboy, bikini, etc. - `neutral` - safe for work neutral images of everyday things and people - `drawings` - safe for work drawings (including anime) Here is what each script (located under `scripts` directory) does: - `1_get_urls_.sh` - iterates through text files under `scripts/source_urls` downloading URLs of images for each of the 5 categories above. The `ripme` application performs all the heavy lifting. The source URLs are mostly links to various subreddits, but could be any website that Ripme supports. *Note*: I already ran this script for you, and its outputs are located in `raw_data` directory. No need to rerun unless you edit files under `scripts/source_urls`. - `2_download_from_urls_.sh` - downloads actual images for urls found in text files in `raw_data` directory. - `3_optional_download_drawings_.sh` - (optional) script that downloads SFW anime images from the [Danbooru2018](https://www.gwern.net/Danbooru2018) database. - `4_optional_download_neutral_.sh` - (optional) script that downloads SFW neutral images from the [Caltech256](http://www.vision.caltech.edu/Image_Datasets/Caltech256/) dataset - `5_create_train_.sh` - creates `data/train` directory and copy all `*.jpg` and `*.jpeg` files into it from `raw_data`. Also removes corrupted images. - `6_create_test_.sh` - creates `data/test` directory and moves `N=2000` random files for each class from `data/train` to `data/test` (change this number inside the script if you need a different train/test split). Alternatively, you can run it multiple times, each time it will move `N` images for each class from `data/train` to `data/test`. ## Prerequisites - Docker ## How to collect data ```bash $ docker build . -t docker_nsfw_data_scraper Sending build context to Docker daemon 426.3MB Step 1/3 : FROM ubuntu:18.04 ---> 775349758637 Step 2/3 : RUN apt update && apt upgrade -y && apt install wget rsync imagemagick default-jre -y ---> Using cache ---> b2129908e7e2 Step 3/3 : ENTRYPOINT ["/bin/bash"] ---> Using cache ---> d32c5ae5235b Successfully built d32c5ae5235b Successfully tagged docker_nsfw_data_scraper:latest $ # Next command might run for several hours. It is recommended to leave it overnight $ docker run -v $(pwd):/root/nsfw_data_scraper docker_nsfw_data_scraper scripts/runall.sh Getting images for class: neutral ... ... $ ls data test train $ ls data/train/ drawings hentai neutral porn sexy $ ls data/test/ drawings hentai neutral porn sexy ``` ## How to train a CNN model - Install [fastai](https://github.com/fastai/fastai): `conda install -c pytorch -c fastai fastai` - Run `train_model.ipynb` top to bottom ## Results I was able to train a CNN classifier to 91% accuracy with the following confusion matrix: ![alt text](confusion_matrix.png) As expected, `drawings` and `hentai` are confused with each other more frequently than with other classes. Same with `porn` and `sexy` categories.

ML Frameworks Data Labeling
12.6K Github Stars
slack-gpt-bot
Open Source

slack-gpt-bot

# Slack GPT Bot This repository contains a Python-based Slack GPT Bot that uses OpenAI's GPT model to answer users' questions. Additionally, the bot can extract content from URLs provided in the user's message and take into account their content in its response. ## Features - Extract URLs from user messages - Scrape webpage content from URLs - Integrate with OpenAI's GPT-4 to answer questions - Maintain conversation context in a threaded format - Socket mode integration with Slack ## Dependencies - Python 3.6 or later - beautifulsoup4 - slack-bolt - slack-sdk - openai - requests See `requirements.txt`. ## Installation 1. Clone this repository: ```bash git clone https://github.com/alex000kim/slack-gpt-bot.git cd slack-gpt-bot ``` 2. Install the required packages: ```bash pip install -r requirements.txt ``` 3. Create a .env file in the root directory of the project and add your Slack and OpenAI API keys: ```bash SLACK_BOT_TOKEN=your_slack_bot_token SLACK_APP_TOKEN=your_slack_app_token OPENAI_API_KEY=your_openai_api_key ``` See below how to get those. ## Configuring Permissions in Slack Before you can run the Slack GPT Bot, you need to configure the appropriate permissions for your Slack bot. Follow these steps to set up the necessary permissions: 1. Create [Slack App](https://api.slack.com/authentication/basics) 2. Go to your [Slack API Dashboard](https://api.slack.com/apps) and click on the app you created for this bot. 3. In the left sidebar, click on "OAuth & Permissions". 4. In the "Scopes" section, you will find two types of scopes: "Bot Token Scopes" and "User Token Scopes". Add the following scopes under "Bot Token Scopes": - `app_mentions:read`: Allows the bot to read mention events. - `chat:write`: Allows the bot to send messages. 5. Scroll up to the "OAuth Tokens for Your Workspace" and click "Install App To Workspace" button. This will generate the `SLACK_BOT_TOKEN`. 6. In the left sidebar, click on "Socket Mode" and enable it. You'll be prompted to "Generate an app-level token to enable Socket Mode". Generate a token named `SLACK_APP_TOKEN` and add the `connections:write` scope. 7. In the "Features affected" section of "Socket Mode" page, click "Event Subscriptions" and toggle "Enable Events" to "On". Add `app_mention` event with the `app_mentions:read` scope in the "Subscribe to bot events" section below the toggle. ## Usage 1. Start the bot: ``` python slack_gpt_bot.py ``` 2. Invite the bot to your desired Slack channel. 3. Mention the bot in a message and ask a question (including any URLs). The bot will respond with an answer, taking into account any extracted content from URLs. ## Example Note: The cutoff date of GPT-4 knowledge is Sep 2021, bit scikit-learn v1.2 was released in Dec 2022 ![example](examples/gpt-bot-example-1.png)

AI Agents Browser Automation
74 Github Stars