๐ Edit Banana
ไธญๆ | English
Universal Content Re-Editor: Make the Uneditable, Editable
Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.
Try It Now!
๐ Click above or https://www.editbanana.net/ to try Edit Banana online! Upload an image to get editable DrawIO (XML) in seconds.
[!WARNING] Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.
๐ฌ Join WeChat Group
Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:
Scan to join the Edit Banana community
[!TIP] If the QR code has expired, please submit an Issue to request an updated one.
๐ฎ Contact Us
For academic cooperation, technical docking, commercial licensing, project customization and other business inquiries, please contact us via email:
E-mail: [email protected]
๐ Table of Contents
- ๐ธ Effect Demonstration
- ๐ Key Features
- ๐ ๏ธ Architecture Pipeline
- ๐ Project Structure
- ๐ฆ Installation & Setup
- ๐ค Usage
- โ๏ธ Configuration
- ๐ Development Roadmap
- ๐ฌ Join WeChat Group
- ๐ค Contribution Guidelines
- ๐คฉ Contributors
- ๐ License
- ๐ Star History
๐ธ Effect Demonstration
High-Definition Input-Output Comparison (4 Typical Scenarios)
To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 4 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified.
Scenario 1: Figures to DrawIO
| ๐ Original Static Diagram (Input ยท Non-editable) | ๐ DrawIO Reconstruction Result (Output ยท Fully Editable) |
|---|---|
Example 1: Basic Flowchart ![]() |
โจ Editable Flowchart ![]() |
Example 2: Multi-level Architecture ![]() |
โจ Editable Architecture ![]() |
Example 3: Technical Schematic ![]() |
โจ Editable Schematic ![]() |
Example 4: Scientific Formula ![]() |
โจ Editable Formula ![]() |
Scenario 2: Human in the Loop Modification
โจ Manual repair
โจ Save locally
[!NOTE] โจ Conversion Highlights:
- Preserves the layout logic, color matching, and element hierarchy of the original diagram.
- 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness).
- Accurate text recognition, supporting direct subsequent editing and format adjustment.
- All elements are independently selectable, supporting native DrawIO template replacement and layout optimization.
๐ Key Features
-
Advanced Segmentation: Using our fine-tuned SAM 3 (Segment Anything Model 3) for segmentation of diagram elements.
-
Fixed Multi-Round VLM Scanning: An extraction process guided by Multimodal LLMs.
-
Text Recognition:
- Local OCR for text localization; easy to install, runs offline.
- Pix2Text for mathematical formula recognition and LaTeX conversion .
- Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to the formula engine.
-
User System:
- Registration: New users receive 10 free credits.
- Credit System: Pay-per-use model prevents resource abuse.
- Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.
๐ ๏ธ Architecture Pipeline
- Input: Image (PNG/JPG/BMP/TIFF/WebP).
- Segmentation (SAM3): Using our fine-tuned SAM3 mask decoder.
- Text Extraction (Parallel):
- Local OCR (Tesseract) detects text bounding boxes.
- High-res crops of text/formula regions are sent to Pix2Text for LaTeX conversion.
- DrawIO XML Generation: Merging spatial data from SAM3 and text OCR results.
๐ Project Structure
Click to expand project structure
Edit-Banana/
โโโ config/ # Configuration files (copy config.yaml.example โ config.yaml)
โโโ flowchart_text/ # OCR & Text Extraction Module (standalone entry)
โ โโโ src/
โ โโโ main.py # OCR-only entry point
โโโ input/ # [Manual] Input images directory
โโโ models/ # [Manual] Model weights (SAM3) and optional BPE vocab
โโโ output/ # [Manual] Results directory
โโโ sam3/ # SAM3 library (see Installation: install from facebookresearch/sam3)
โโโ sam3_service/ # SAM3 HTTP service (optional, for multi-process deployment)
โโโ scripts/ # Setup and utility scripts
โ โโโ setup_sam3.sh # Install SAM3 lib and copy BPE to models/
โ โโโ setup_rmbg.py # Download RMBG model from ModelScope
โ โโโ merge_xml.py # XML merge utilities
โโโ main.py # CLI entry (modular pipeline)
โโโ server_pa.py # FastAPI backend server
โโโ requirements.txt # Python dependencies
๐ฆ Installation & Setup
Follow these core phases to set up the project locally.
Phase 1: Environment & Base Setup
Configure your base environment and directory structure.
1. Prerequisites & Environment
-
Python 3.10+** & CUDA-capable GPU (Highly recommended)
-
Install PyTorch with CUDA support (e.g., for CUDA 11.8):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
2. Clone Repository & Init Directories
git clone https://github.com/BIT-DataLab/Edit-Banana.git
cd Edit-Banana
mkdir -p input output sam3_output
Phase 2: Models & Core Dependencies
Next, install the required packages and download necessary model weights (which should be placed in models/ and not committed).
1. Base Dependencies
pip install -r requirements.txt
2. SAM3 & Model Assets
-
SAM3 Library & BPE: Run
bash scripts/setup_sam3.shto install the lib and copy the BPE vocab tomodels/. Verify with:python -c "from sam3.model_builder import build_sam3_image_model; print('OK')" -
SAM3 Weights: Download sam3.pt from ModelScope or Hugging Face and place it under
models/sam3_ms. -
Text Local OCR (Tesseract):
sudo apt install tesseract-ocr tesseract-ocr-chi-sim
๐งฉ Optional Capabilities (OCR Engine, Formula, RMBG) - Click to expand
-
PaddleOCR (Alternative/Better for mixed text): Use paddlepaddle==3.2.2 (avoiding 3.3.0 bug).
pip install paddlepaddle==3.2.2 paddleocr. -
Formula (Pix2Text):
pip install pix2text onnxruntime-gpu. -
Background Removal (RMBG):
pip install onnxruntime modelscopethen runpython scripts/setup_rmbg.py.
Phase 3: Configuration & Troubleshooting
1. Final Configuration
Copy the example config and adjust the asset paths:
cp config/config.yaml.example config/config.yaml
Edit config.yaml to ensure sam3.checkpoint_path and sam3.bpe_path match your models/ locations.
๐ ๏ธ Before First Run Checklist & Troubleshooting - Click to expand
Checklist:
- [ ] Config files copied and model paths set in
config.yaml - [ ] SAM3 weights (
sam3.pt) and BPE vocab placed undermodels/ - [ ] Extracted SAM3 library via
scripts/setup_sam3.shTesseract or PaddleOCR installed
Common Issues:
- "no kernel image is available...": GPU arch mismatch. Upgrade PyTorch or set
sam3.device: "cpu". - "Model file not found at ...rmbg/...": RMBG is optional. Enable by downloading via script.
- "PaddleOCR inference failed...": Use
paddlepaddle==3.2.2or fallback to Tesseract.
๐ค Usage
Command Line Interface (CLI)
Supports image files (PNG, JPG, BMP, TIFF, WebP). To process a single image:
python main.py -i input/test_diagram.png
The output XML will be saved in the output/ directory. For batch processing, put images in input/ and run python main.py without -i.
Run and test locally
-
One-time setup
git clone https://github.com/BIT-DataLab/Edit-Banana.git && cd Edit-Banana python3 -m venv .venv && source .venv/bin/activate # Linux/macOS; Windows: .venv\Scripts\activate pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 # or CPU build pip install -r requirements.txt sudo apt install tesseract-ocr tesseract-ocr-chi-sim # OCR (or equivalent on your OS)Install the SAM3 library and download model weights + BPE. Then:
mkdir -p input output cp config/config.yaml.example config/config.yaml # Edit config/config.yaml: set sam3.checkpoint_path and sam3.bpe_path to your models/ paths -
Test with CLI
# Put a diagram image in input/, e.g. input/test.png python main.py -i input/test.png # Output appears under output/<image_stem>/ (DrawIO XML and intermediates) -
Optional: test the web API
python server_pa.py # In another terminal: curl -X POST http://localhost:8000/convert -F "file=@input/test.png" # Or open http://localhost:8000/docs and use the /convert endpoint with a file upload
โ๏ธ Configuration
Customize the pipeline behavior in config/config.yaml:
-
sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
-
paths: Set input/output directories.
-
dominant_color: Fine-tune color extraction sensitivity.
๐ Development Roadmap
| Feature Module | Status | Description |
|---|---|---|
| Core Conversion Pipeline | โ Completed | Full pipeline of segmentation, reconstruction and OCR |
| Intelligent Arrow Connection | โ ๏ธ In Development | Automatically associate arrows with target shapes |
| DrawIO Template Adaptation | ๐ Planned | Support custom template import |
| Batch Export Optimization | ๐ Planned | Batch export to DrawIO files (.drawio) |
| Local LLM Adaptation | ๐ Planned | Support local VLM deployment, independent of APIs |
๐ค Contribution Guidelines
Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):
- Fork this repository
- Create a feature branch (
git checkout -b feature/xxx) - Commit your changes (
git commit -m 'feat: add xxx') - Push to the branch (
git push origin feature/xxx) - Open a Pull Request
Bug Reports: Issues Feature Suggestions: Discussions
๐ License
This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).
๐ Star History
๐ If this project helps you, please star it to show your support!







