About Meme Search

AI-powered meme search engine. Automatically extracts descriptions from images using vision-language models, then indexes with vector embeddings for semantic and keyword search.

n

Published by

neonwatty

Visit View Profile

README.md

View on GitHub

A Meme Search Engine built to self-host in Python, Ruby, and Docker

Use AI to index your memes by their content and text, making them easily retrievable for your meme warfare pleasures.

By default, processing from image-to-text extraction, to vector embedding, to search is performed locally. You can also use an OpenAI-compatible vision API for description generation while keeping embeddings and search local.

meme-search-2.0-demo

This repository contains code, a walkthrough notebook, and apps for indexing, searching, and easily retrieving your memes based on semantic search of their content and text.

A table of contents for the remainder of this README:

Meme search
Discord server
Changelog
Feature requests and contributing

Meme search

Features

Features of Meme Search include:

Multiple Image-to-Text Models

Choose the right size image to text model for your needs / resources - from small (~200 Million parameters) to large (~2 Billion parameters).

Current available image-to-text models for Meme Search include the following, starting with the default model:
- Florence-2-base - a popular series of small vision language models built by Microsoft, including a 250 Million (base) and a 700 Million (large) parameter variant. *This is the default model used in Meme Search*.
- Florence-2-large - the 700 Million parameter vision language model variant of the Florence-2 series
- SmolVLM-256 - a 256 Million parameter vision language model built by Hugging Face
- SmolVLM-500 - a 500 Million parameter vision language model built by Hugging Face
- Moondream2 - a 2 Billion parameter vision language model used for image captioning / extracting image text
- Moondream2-INT8 - INT8 quantized version of Moondream2 for memory-constrained hardware. Reduces memory from ~5GB to ~1.5-2GB with minimal quality loss. Ideal for CPU-only machines.
Auto-Generate Meme Descriptions

Target specific memes for auto-description generation (instead of applying to your entire directory).
Manual Meme Description Editing

Edit or add descriptions manually for better search results, no need to wait for auto-generation if you don't want to.
Tags

Create, edit, and assign tags to memes for better organization and search filtering.
Fast Vector Search

Powered by Postgres and pgvector, enjoy faster keyword and vector searches with streamlined database transactions.
Directory Paths

Organize your memes across multiple subdirectories—no need to store everything in one folder.
New Organizational Tools

Filter by tags, directory paths, and description embeddings, plus toggle between keyword and vector search for more control.
Bulk Description Generation

Generate descriptions for multiple memes at once for faster indexing.
Dark Mode

Toggle between light and dark themes for comfortable viewing in any environment.
Directory Rescan

Automatically detect and index new memes added to your directories.
Drag-and-Drop Upload

Upload memes directly through the web interface with drag-and-drop and clipboard paste support. Files are stored in the direct-uploads directory (configurable via Docker volume mount) and automatically scanned for indexing. Supports JPG, PNG, and WEBP formats with bulk upload (up to 50 files), real-time progress tracking, and automatic duplicate filename handling.

Requirements

For Docker deployment (recommended):

Docker and Docker Compose

For local development:

Ruby 3.4.2
Rails 8.0.4
Python 3.12
Node.js 20 LTS
PostgreSQL 17 with pgvector extension

We recommend using mise for managing Ruby, Python, and Node.js versions. See CLAUDE.md for detailed setup instructions.

Installation instructions

To start up the app pull this repository and start the server cluster with docker-compose

docker compose up

This pulls and starts containers for the app, database, Solid Queue job worker, and local auto description generator. The app itself will run on port 3000 and is available at

http://localhost:3000

The Compose files store app data in local bind-mounted directories so upgrades keep using the same files:

./meme_search/db_data/meme-search-db for Postgres data
./meme_search/direct-uploads for drag-and-drop uploads
./meme_search/db_data/image_to_text_generator for generator queue data
./meme_search/models for model downloads

Most Docker installations create missing bind-mount directories automatically. Some Docker frontends, including Synology Container Manager, require the directories to exist before startup. Compose also runs a short setup container at startup to make the upload directory writable by the non-root Rails containers, so the configured upload path may be owned by UID/GID 1000 after the first run.

If you want these persistent files visible on a NAS path, set the storage path variables in .env or directly in your Compose UI:

MEME_SEARCH_DB_PATH=/volume1/docker/meme-search/db
MEME_SEARCH_DIRECT_UPLOADS_PATH=/volume1/docker/meme-search/direct-uploads
MEME_SEARCH_GENERATOR_DB_PATH=/volume1/docker/meme-search/image-to-text-db
MEME_SEARCH_MODELS_PATH=/volume1/docker/meme-search/models

For Docker frontends that require bind-mount directories to exist first, create them before starting:

mkdir -p ./meme_search/db_data/meme-search-db ./meme_search/direct-uploads ./meme_search/db_data/image_to_text_generator ./meme_search/models
mkdir -p /volume1/docker/meme-search/db /volume1/docker/meme-search/direct-uploads /volume1/docker/meme-search/image-to-text-db /volume1/docker/meme-search/models

To start the app alone pull the repo and cd into the meme_search/meme_search/meme_search_app. Once there execute the following to start the app in development mode

./bin/dev

When doing this ensure you have an available Postgres instance running locally on port 5432.

Note Linux users: you may need to add the following extra_hosts to your meme_search service for inter-container communication

extra_hosts:
    - "host.docker.internal:host-gateway"

Time to first generation / downloading models

The first auto generation of description of a meme takes longer than average, as image-to-text model weights are downloaded and cached. Subsequent generations are faster.

You can download additional models in the settings tab of the app.

Description generation providers

Meme Search supports two providers for automatic meme descriptions:

IMAGE_DESCRIPTION_PROVIDER=local uses the bundled Python image_to_text_generator service. This is the default and keeps description generation local.
IMAGE_DESCRIPTION_PROVIDER=openai calls an OpenAI-compatible /chat/completions vision API directly from Rails. In this mode the Python generator service is not required.

OpenAI-compatible descriptions are normalized to the app's description length limit before saving. Bulk generation queues durable Solid Queue background jobs for external providers so the web request does not wait on one API request per image.

For OpenAI-compatible mode, set these environment variables in your .env file:

IMAGE_DESCRIPTION_PROVIDER=openai
OPENAI_API_BASE_URL=https://api.openai.com/v1
OPENAI_API_KEY=your_api_key
OPENAI_VISION_MODEL=gpt-4o-mini

Then start Rails, the Solid Queue worker, and the database without the Python generator:

docker compose -f docker-compose.yml -f docker-compose.openai.yml up meme_search meme_search_jobs meme_search_db

To smoke-test a real OpenAI-compatible call before starting a bulk run, run this from the Rails app directory:

cd meme_search/meme_search_app
OPENAI_API_KEY=your_api_key mise exec -- bin/smoke_openai_description

The smoke test uses the first indexed sample image that exists under public/memes, runs the same job/provider path as background generation, and rolls back database changes after the API call succeeds.

For local inference mode, keep the default docker compose up command so the image_to_text_generator service starts and can access the same meme volumes as Rails.

Index your memes

You can index your memes by creating your own descriptions, or by generating descriptions automatically, as illustrated below.

To start indexing your own memes, first adjust the compose file by adding volume mount to the meme_search and image_to_text_generator services to properly connect your local meme subdirectory to the app.

For example, if suppose (one of your) meme directories was called new_memes and was located at the following path on your machine: /local/path/to/my/memes/new_memes.

To properly mount this subdirectory to the meme_search service adjust the volumes portion of its configuration to the following:

volumes:
  - ./meme_search/memes/:/app/public/memes # <-- example meme directory from the repository
  - /local/path/to/my/memes/new_memes/:/rails/public/memes/new_memes # <-- personal meme collection - must be placed inside /rails/public/memes in the container

Note: your new_memes directory must be mounted internally in the /rails/public/memes directory, as shown above.

To properly mount this same subdirectory to the image_to_text_generator service adjust the volumes portion of its configuration to the following:

volumes:
  - ./meme_search/memes/:/app/public/memes # <-- example meme directory from the repository
  - /local/path/to/my/memes/new_memes/:/app/public/memes/new_memes # <-- personal meme collection - must be placed inside /app/public/memes in the container
...

Note: your new_memes directory must be mounted internally in the /app/public/memes directory, as shown above.

If you are concerned about the application altering your existing meme library, as a precaution you can make the mount read only by adding "ro" to the volume line as follows:

volumes:
  - ./meme_search/memes/:/app/public/memes # <-- example meme directory from the repository
  - /local/path/to/my/memes/new_memes/:/app/public/memes/new_memes:ro
...

Now restart the app, and register the new_memes via the UX by traversing to the settings -> paths -> create new as illustrated below. Type in new_memes in the field provided and press enter.

Once registered in the app, your memes are ready for indexing / tagging / etc.,!

Model downloads

The image-to-text models used to auto generate descriptions for your memes are all open source, and vary in size.

Custom app port

Easily customize the app's port to more easily use the it with tools like Unraid or Portainer, or because you already have services running on the default meme_search app port 3000.

To customize the main app port create a .env file locally in the root of the directory. In this file you can define the following custom environment variables which define how the app, image to text generator, and database are accessed. These values are:

APP_PORT= # the port for the app - defaults to 3000

This value is automatically detected and loaded into each service via the Compose files. The Postgres service is only exposed on Docker's internal network, so app containers always talk to it at meme-search-db:5432.

Building the app locally with Docker

Docker images are built manually only - there are no automated CI builds on releases or tags.

To build the app - including all services defined in the docker-compose.yml file - locally run the local compose file at your terminal as

docker compose -f docker-compose-local-build.yml up --build

For multi-platform builds (AMD64 + ARM64) and pushing to GitHub Container Registry, use the local build script:

bash scripts/build_and_push.sh

This will build the docker images for the app, database, and auto description generator, and start the app at http://localhost:3000.

Running tests

To run tests locally pull the repo and cd into the meme_search/meme_search/meme_search_app directory. Install the required gems as

bundle install

Tests can then be run as

bash run_tests.sh

When doing this ensure you have an available Postgres instance running locally on port 5432.

Run linting tests on the /app subdirectory as

rubocop app

to ensure the code is clean and well formatted.

Running CI Locally (Optional)

You can run the complete GitHub Actions CI workflow locally using act:

# Install act (macOS)
brew install act

# Run all CI jobs
act --container-architecture linux/amd64 -P ubuntu-latest=catthehacker/ubuntu:act-latest

# Run specific job
act -j pro_app_unit_tests --container-architecture linux/amd64 -P ubuntu-latest=catthehacker/ubuntu:act-latest

This validates your changes match CI before pushing to GitHub.

Docker E2E Tests (Local Validation Only)

Docker E2E tests validate the complete microservices stack (Rails + Python + PostgreSQL) in isolated Docker containers. These tests run against fresh Docker builds and test cross-service communication, webhooks, and production-like deployment.

Current Status: 6/7 smoke tests passing (85% coverage) - see playwright-docker/README.md for details

# Run all Docker E2E tests
npm run test:e2e:docker

# Run with UI mode (recommended for debugging)
npm run test:e2e:docker:ui

What these tests cover:

Complete image processing pipeline (Rails → Python → Rails webhooks)
Vector search with embedding generation
Keyword search functionality
Concurrent processing and job queueing
Embedding refresh operations

Important: These tests DO NOT run in CI due to Docker build time (~10-15 minutes) and resource requirements. Contributors MUST run these tests locally before submitting PRs that affect:

Docker configurations
Cross-service communication
Image-to-text generation workflow
Embedding generation

See playwright-docker/README.md for comprehensive documentation.

Discord server

Join our Discord server to discuss new features, bug fixes, and other open source projects (like ytgify - a browser extension for clipping GIFs from YouTube right from the YT Player!).

Changelog

Meme Search is under active development! See the CHANGELOG.md in this repo for a record of the most recent changes.

Feature requests and contributing

Feature requests and contributions are welcome!

See the discussion section of this repository for suggested enhancements to contribute to / weight in on!

Please see CONTRIBUTING.md for some boilerplate ground rules for contributing.

Below is a nice diagram of the repo generated using gitdiagram, laying out its main components and interactions.

flowchart TD
    %% Global Entities
    User["User"]:::user

    %% Docker & Compose Orchestration
    Docker["Docker & Compose Orchestration"]:::docker

    %% Main Services
    Rails["Rails Meme Search Application"]:::rails
    Python["Image-to-Text Generator (Python)"]:::python
    DB["PostgreSQL Database (with pgvector)"]:::database

    %% Shared File Volumes Subgraph
    subgraph "Shared Meme Files"
        PublicMemes["Public Memes"]:::volume
        MemeDir["Meme Directory"]:::volume
    end

    %% Interactions
    User -->|"interaction"| Rails
    Rails -->|"DBQueryUpdate"| DB
    Rails -->|"APIRequest"| Python
    Python -->|"APIResponse"| Rails

    %% Volume Access
    Rails ---|"VolumeMountAccess"| PublicMemes
    Python ---|"VolumeMountAccess"| MemeDir

    %% Docker Orchestration Links
    Docker ---|"orchestrates"| Rails
    Docker ---|"orchestrates"| Python
    Docker ---|"orchestrates"| DB

    %% Click Events
    click Rails "https://github.com/neonwatty/meme-search/tree/main/meme_search/meme_search_app"
    click Python "https://github.com/neonwatty/meme-search/tree/main/meme_search/image_to_text_generator"
    click DB "https://github.com/neonwatty/meme-search/blob/main/meme_search/meme_search_app/config/database.yml"
    click Docker "https://github.com/neonwatty/meme-search/blob/main/docker-compose.yml"
    click PublicMemes "https://github.com/neonwatty/meme-search/tree/main/meme_search/meme_search_app/public/memes"
    click MemeDir "https://github.com/neonwatty/meme-search/tree/main/meme_search/memes"

    %% Styles
    classDef user fill:#fceabb,stroke:#d79b00,stroke-width:2px;
    classDef rails fill:#c8e6c9,stroke:#388e3c,stroke-width:2px;
    classDef python fill:#bbdefb,stroke:#1976d2,stroke-width:2px;
    classDef database fill:#ffe082,stroke:#f9a825,stroke-width:2px,stroke-dasharray: 5 5;
    classDef docker fill:#d1c4e9,stroke:#673ab7,stroke-width:2px,stroke-dasharray: 3 3;
    classDef volume fill:#ffcdd2,stroke:#e53935,stroke-width:2px,stroke-dasharray: 2 2;

Meme Search