About Edit-Banana

<p align="center"> <img src="/static/banana.jpg" width="180" alt="Edit Banana Logo"/> </p> <h1 align="center">🍌 Edit Banana</h1> <p align="center"> <a href="README_CN.md">中文</a> | English </p> <h3 align="center">Universal Content Re-Editor: Make the Uneditable, Editable</h3> <p align="center"> Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships. </p> <p align="center"> <a href="https://www.python.org/"><img src="https://img.shields.io/badge/Python-3.10+-3776AB?style=flat-square&logo=python&logoColor=white" alt="Python"/></a> <a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache_2.0-2F80ED?style=flat-square&logo=apache&logoColor=white" alt="License"/></a> <a href="https://developer.nvidia.com/cuda-downloads"><img src="https://img.shields.io/badge/GPU-CUDA%20R ...

b

Published by

bit-datalab

Visit View Profile

README.md

View on GitHub

Edit Banana Logo

🍌 Edit Banana

中文 | English

Universal Content Re-Editor: Make the Uneditable, Editable

Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.

Try It Now!

👆 Click above or https://www.editbanana.net/ to try Edit Banana online! Upload an image to get editable DrawIO (XML) in seconds.

[!WARNING] Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.

💬 Join WeChat Group

Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:

WeChat Group QR Code
Scan to join the Edit Banana community

[!TIP] If the QR code has expired, please submit an Issue to request an updated one.

📮 Contact Us

For academic cooperation, technical docking, commercial licensing, project customization and other business inquiries, please contact us via email:

E-mail: [email protected]

📸 Effect Demonstration

High-Definition Input-Output Comparison (4 Typical Scenarios)

To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 4 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified.

Scenario 1: Figures to DrawIO

🔒 Original Static Diagram (Input · Non-editable)	🔓 DrawIO Reconstruction Result (Output · Fully Editable)
Example 1: Basic Flowchart	✨ Editable Flowchart
Example 2: Multi-level Architecture	✨ Editable Architecture
Example 3: Technical Schematic	✨ Editable Schematic
Example 4: Scientific Formula	✨ Editable Formula

Scenario 2: Human in the Loop Modification

_{✨ Manual repair}

_{✨ Save locally}

[!NOTE] ✨ Conversion Highlights:

Preserves the layout logic, color matching, and element hierarchy of the original diagram.

1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness).

Accurate text recognition, supporting direct subsequent editing and format adjustment.

All elements are independently selectable, supporting native DrawIO template replacement and layout optimization.

🚀 Key Features

Advanced Segmentation: Using our fine-tuned SAM 3 (Segment Anything Model 3) for segmentation of diagram elements.
Fixed Multi-Round VLM Scanning: An extraction process guided by Multimodal LLMs.
Text Recognition:
- Local OCR for text localization; easy to install, runs offline.
- Pix2Text for mathematical formula recognition and LaTeX conversion .
- Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to the formula engine.
User System:
- Registration: New users receive 10 free credits.
- Credit System: Pay-per-use model prevents resource abuse.
- Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.

🛠️ Architecture Pipeline

Input: Image (PNG/JPG/BMP/TIFF/WebP).
Segmentation (SAM3): Using our fine-tuned SAM3 mask decoder.
Text Extraction (Parallel):
- Local OCR (Tesseract) detects text bounding boxes.
- High-res crops of text/formula regions are sent to Pix2Text for LaTeX conversion.
DrawIO XML Generation: Merging spatial data from SAM3 and text OCR results.

📂 Project Structure

Click to expand project structure

  Edit-Banana/
  ├── config/               # Configuration files (copy config.yaml.example → config.yaml)
  ├── flowchart_text/       # OCR & Text Extraction Module (standalone entry)
  │   ├── src/
  │   └── main.py             # OCR-only entry point
  ├── input/                # [Manual] Input images directory
  ├── models/               # [Manual] Model weights (SAM3) and optional BPE vocab
  ├── output/               # [Manual] Results directory
  ├── sam3/                 # SAM3 library (see Installation: install from facebookresearch/sam3)
  ├── sam3_service/         # SAM3 HTTP service (optional, for multi-process deployment)
  ├── scripts/              # Setup and utility scripts
  │   ├── setup_sam3.sh       # Install SAM3 lib and copy BPE to models/
  │   ├── setup_rmbg.py       # Download RMBG model from ModelScope
  │   └── merge_xml.py        # XML merge utilities
  ├── main.py               # CLI entry (modular pipeline)
  ├── server_pa.py          # FastAPI backend server
  └── requirements.txt      # Python dependencies

📦 Installation & Setup

Follow these core phases to set up the project locally.

Phase 1: Environment & Base Setup

Configure your base environment and directory structure.

1. Prerequisites & Environment

Python 3.10+** & CUDA-capable GPU (Highly recommended)

Install PyTorch with CUDA support (e.g., for CUDA 11.8):

  pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

2. Clone Repository & Init Directories

  git clone https://github.com/BIT-DataLab/Edit-Banana.git
  cd Edit-Banana
  mkdir -p input output sam3_output

Phase 2: Models & Core Dependencies

Next, install the required packages and download necessary model weights (which should be placed in models/ and not committed).

1. Base Dependencies

pip install -r requirements.txt

2. SAM3 & Model Assets

SAM3 Library & BPE: Run bash scripts/setup_sam3.shto install the lib and copy the BPE vocab to models/. Verify with:
```
python -c "from sam3.model_builder import build_sam3_image_model; print('OK')"
```
SAM3 Weights: Download sam3.pt from ModelScope or Hugging Face and place it under models/sam3_ms.

Text Local OCR (Tesseract):

sudo apt install tesseract-ocr tesseract-ocr-chi-sim

🧩 Optional Capabilities (OCR Engine, Formula, RMBG) - Click to expand

PaddleOCR (Alternative/Better for mixed text): Use paddlepaddle==3.2.2 (avoiding 3.3.0 bug).
```
pip install paddlepaddle==3.2.2 paddleocr.
```
Formula (Pix2Text):
```
pip install pix2text onnxruntime-gpu.
```
Background Removal (RMBG): pip install onnxruntime modelscope then run python scripts/setup_rmbg.py.

Phase 3: Configuration & Troubleshooting

1. Final Configuration

Copy the example config and adjust the asset paths:

  cp config/config.yaml.example config/config.yaml

Edit config.yaml to ensure sam3.checkpoint_path and sam3.bpe_path match your models/ locations.

🛠️ Before First Run Checklist & Troubleshooting - Click to expand

Checklist:

[ ] Config files copied and model paths set in config.yaml
[ ] SAM3 weights (sam3.pt) and BPE vocab placed under models/
[ ] Extracted SAM3 library via scripts/setup_sam3.sh Tesseract or PaddleOCR installed

Common Issues:

"no kernel image is available...": GPU arch mismatch. Upgrade PyTorch or set sam3.device: "cpu".
"Model file not found at ...rmbg/...": RMBG is optional. Enable by downloading via script.
"PaddleOCR inference failed...": Use paddlepaddle==3.2.2 or fallback to Tesseract.

🔤 Usage

Command Line Interface (CLI)

Supports image files (PNG, JPG, BMP, TIFF, WebP). To process a single image:

python main.py -i input/test_diagram.png

The output XML will be saved in the output/ directory. For batch processing, put images in input/ and run python main.py without -i.

Run and test locally

One-time setup

git clone https://github.com/BIT-DataLab/Edit-Banana.git && cd Edit-Banana
python3 -m venv .venv && source .venv/bin/activate   # Linux/macOS; Windows: .venv\Scripts\activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118   # or CPU build
pip install -r requirements.txt
sudo apt install tesseract-ocr tesseract-ocr-chi-sim   # OCR (or equivalent on your OS)

Install the SAM3 library and download model weights + BPE. Then:

mkdir -p input output
cp config/config.yaml.example config/config.yaml
# Edit config/config.yaml: set sam3.checkpoint_path and sam3.bpe_path to your models/ paths

Test with CLI

# Put a diagram image in input/, e.g. input/test.png
python main.py -i input/test.png
# Output appears under output/<image_stem>/ (DrawIO XML and intermediates)

Optional: test the web API

python server_pa.py
# In another terminal:
curl -X POST http://localhost:8000/convert -F "file=@input/test.png"
# Or open http://localhost:8000/docs and use the /convert endpoint with a file upload

⚙️ Configuration

Customize the pipeline behavior in config/config.yaml:

sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
paths: Set input/output directories.
dominant_color: Fine-tune color extraction sensitivity.

📌 Development Roadmap

Feature Module	Status	Description
Core Conversion Pipeline	✅ Completed	Full pipeline of segmentation, reconstruction and OCR
Intelligent Arrow Connection	⚠️ In Development	Automatically associate arrows with target shapes
DrawIO Template Adaptation	📍 Planned	Support custom template import
Batch Export Optimization	📍 Planned	Batch export to DrawIO files (.drawio)
Local LLM Adaptation	📍 Planned	Support local VLM deployment, independent of APIs

🤝 Contribution Guidelines

Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):

Fork this repository
Create a feature branch (git checkout -b feature/xxx)
Commit your changes (git commit -m 'feat: add xxx')
Push to the branch (git push origin feature/xxx)
Open a Pull Request

Bug Reports: Issues Feature Suggestions: Discussions

📄 License

This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).

🌟 Star History

🌟 If this project helps you, please star it to show your support!

Edit-Banana