semantic-draw
<div align="center"> <h1>SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models</h1> <h4><b>CVPR 2025</b></h4> <p>Previously <em>StreamMultiDiffusion: Real-Time Interactive Generation</br>with Region-Based Semantic Control</em></p> |  |  | | :----------------------------: | :----------------------------: | | Draw multiple prompt-masks in a large canvas | Real-time creation | [**Jaerin Lee**](http://jaerinlee.com/) Β· [**Daniel Sungho Jung**](https://dqj5182.github.io/) Β· [**Kanggeon Lee**](https://github.com/dlrkdrjs97/) Β· [**Kyoung Mu Lee**](https://cv.snu.ac.kr/index.php/~kmlee/) <p align="center"> <img src="assets/logo_cvlab.png" height=60> </p> [](https://jaerinlee.com/research/semantic-draw) [](https://arxiv.org/abs/2403.09055) [](https://github.com/ironjr/semantic-draw) [](https://twitter.com/_ironjr_) [](https://github.com/ironjr/semantic-draw/blob/main/LICENSE) [](https://huggingface.co/papers/2403.09055) [](https://huggingface.co/spaces/ironjr/semantic-draw) [](https://huggingface.co/spaces/ironjr/semantic-draw-canvas-sd15) [](https://huggingface.co/spaces/ironjr/semantic-draw-canvas-sdxl) [](https://huggingface.co/spaces/ironjr/semantic-draw-canvas-sd3) [](https://colab.research.google.com/github/camenduru/SemanticPalette-jupyter/blob/main/SemanticPalette_jupyter.ipynb) </div> **SemanticDraw** is a real-time interactive text-to-image generation framework that allows you to **draw with meanings** π§ using semantic brushes ποΈ. <p align="center"> <img src="./assets/figure_one.png" width=100%> </p> --- ## π Quick Start ```bash # Install conda create -n semdraw python=3.12 && conda activate semdraw git clone https://github.com/ironjr/semantic-draw cd semantic-draw pip install -r requirements.txt # Run streaming demo cd demo/stream python app.py --model "runwayml/stable-diffusion-v1-5" --port 8000 # Open http://localhost:8000 in your browser ``` For SD3 support, additionally run: ```bash pip install git+https://github.com/initml/diffusers.git@clement/feature/flash_sd3 ``` Note: this is default in requirements.txt --- ## π Table of Contents - [Features](#-features) - [Installation](#-installation) - [Demo Applications](#-demo-applications) - [Usage Examples](#-usage-examples) - [Documentation](#-documentation) - [FAQ](#-faq) - [Citation](#-citation) --- ## β Features | Interactive Drawing | Prompt Separation | Real-time Editing | | :---: | :---: | :---: | |  |  |  | | Paint with semantic brushes | No unwanted content mixing | Edit photos in real-time | --- ## π§ Installation ### Basic Installation ```bash conda create -n smd python=3.12 && conda activate smd git clone https://github.com/ironjr/StreamMultiDiffusion cd StreamMultiDiffusion pip install -r requirements.txt ``` ### Stable Diffusion 3 Support ```bash pip install git+https://github.com/initml/diffusers.git@clement/feature/flash_sd3 ``` --- ## π¨ Demo Applications We provide several demo applications with different features and model support: ### 1. StreamMultiDiffusion (Main Demo) Real-time streaming interface with semantic drawing capabilities. ```bash cd demo/stream python app.py --model "your-model" --height 512 --width 512 --port 8000 ``` <details> <summary><b>Options</b></summary> | Option | Description | Default | |--------|-------------|---------| | `--model` | Path to SD1.5 checkpoint (HF or local .safetensors) | None | | `--height` | Canvas height | 768 | | `--width` | Canvas width | 1920 | | `--bootstrap_steps` | Semantic region separation (1-3 recommended) | 1 | | `--seed` | Random seed | 2024 | | `--device` | GPU device number | 0 | | `--port` | Web server port | 8000 | </details> ### 2. Semantic Palette Simplified interface for different SD versions: #### SD 1.5 Version ```bash cd demo/semantic_palette python app.py --model "runwayml/stable-diffusion-v1-5" --port 8000 ``` #### SDXL Version ```bash cd demo/semantic_palette_sdxl python app.py --model "your-sdxl-model" --port 8000 ``` #### SD3 Version ```bash cd demo/semantic_palette_sd3 python app.py --port 8000 ``` <details> <summary><b>Using Custom Models (.safetensors)</b></summary> 1. Place your `.safetensors` file in the demo's `checkpoints` folder 2. Run with: `python app.py --model "your-model.safetensors"` </details> --- ## π» Usage Examples ### Python API <details> <summary><b>Basic Generation</b></summary> ```python import torch from model import StableMultiDiffusionPipeline # Initialize device = torch.device('cuda:0') smd = StableMultiDiffusionPipeline(device, hf_key='runwayml/stable-diffusion-v1-5') # Generate image = smd.sample('A photo of the dolomites') image.save('output.png') ``` </details> <details> <summary><b>Region-Based Generation</b></summary> ```python import torch from model import StableMultiDiffusionPipeline from util import seed_everything # Setup seed_everything(2024) device = torch.device('cuda:0') smd = StableMultiDiffusionPipeline(device) # Define prompts and masks prompts = ['background: city', 'foreground: a cat', 'foreground: a dog'] masks = load_masks() # Your mask loading logic # Generate image = smd(prompts, masks=masks, height=768, width=768) image.save('output.png') ``` </details> <details> <summary><b>Streaming Generation</b></summary> ```python from model import StreamMultiDiffusion # Initialize streaming pipeline smd = StreamMultiDiffusion(device, height=512, width=512) # Register layers smd.update_single_layer(idx=0, prompt='background', mask=bg_mask) smd.update_single_layer(idx=1, prompt='object', mask=obj_mask) # Stream generation while True: image = smd() display(image) ``` </details> ### Jupyter Notebooks Explore our [notebooks](./notebooks) directory for interactive examples: - Basic usage tutorial - Advanced region control - SD3 examples - Custom model integration --- ## π Documentation ### Detailed Guides - [Old README](./README_old.md) - [Notebooks](./notebooks) ### Paper For technical details, see our [paper](https://arxiv.org/abs/2403.09055) and [project page](https://jaerinlee.com/research/semantic-draw). --- ## π FAQ <details> <summary><b>What is Semantic Palette?</b></summary> Semantic Palette lets you paint with text prompts instead of colors. Each brush carries a meaning (prompt) that generates appropriate content in real-time. </details> <details> <summary><b>Which models are supported?</b></summary> - β Stable Diffusion 1.5 and variants - β SDXL and variants (with Lightning LoRA) - β Stable Diffusion 3 - β Custom .safetensors checkpoints </details> <details> <summary><b>Hardware requirements?</b></summary> - Minimum: GPU with 8GB VRAM (for 512x512) - Recommended: GPU with 11GB VRAM (for larger resolutions) (Tested with 1080 ti). </details> --- ## π© Recent Updates - π₯ **June 2025**: Presented at CVPR 2025 - β **June 2024**: SD3 support with Flash Diffusion - β **April 2024**: StreamMultiDiffusion v2 with responsive UI - β **March 2024**: SDXL support with Lightning LoRA - β **March 2024**: First version released See [README_old.md](./README_old.md) for full history. --- ## π Citation ```bibtex @inproceedings{lee2025semanticdraw, title="{SemanticDraw:} Towards Real-Time Interactive Content Creation from Image Diffusion Models", author={Lee, Jaerin and Jung, Daniel Sungho and Lee, Kanggeon and Lee, Kyoung Mu}, booktitle={CVPR}, year={2025} } ``` --- ## π€ Acknowledgements Built upon [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion), [MultiDiffusion](https://multidiffusion.github.io/), and [LCM](https://latent-consistency-models.github.io/). Special thanks to the Hugging Face team and the model contributors. --- ## π§ Contact Please email `[email protected]` or [open an issue](https://github.com/ironjr/StreamMultiDiffusion/issues).