Home
Softono
Edit-Banana

Edit-Banana

Open source Python
5.3K
Stars
360
Forks
36
Issues
14
Watchers
1 week
Last Commit

About Edit-Banana

<p align="center"> <img src="/static/banana.jpg" width="180" alt="Edit Banana Logo"/> </p> <h1 align="center">๐ŸŒ Edit Banana</h1> <p align="center"> <a href="README_CN.md">ไธญๆ–‡</a> | English </p> <h3 align="center">Universal Content Re-Editor: Make the Uneditable, Editable</h3> <p align="center"> Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships. </p> <p align="center"> <a href="https://www.python.org/"><img src="https://img.shields.io/badge/Python-3.10+-3776AB?style=flat-square&logo=python&logoColor=white" alt="Python"/></a> <a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache_2.0-2F80ED?style=flat-square&logo=apache&logoColor=white" alt="License"/></a> <a href="https://developer.nvidia.com/cuda-downloads"><img src="https://img.shields.io/badge/GPU-CUDA%20R ...

Platforms

Web Self-hosted

Languages

Python

Edit Banana Logo

๐ŸŒ Edit Banana

ไธญๆ–‡ | English

Universal Content Re-Editor: Make the Uneditable, Editable

Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.

Python License CUDA WeChat GitHub stars AtomGit Star


Try It Now!

Try Online Demo

๐Ÿ‘† Click above or https://www.editbanana.net/ to try Edit Banana online! Upload an image to get editable DrawIO (XML) in seconds.

[!WARNING] Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.

๐Ÿ’ฌ Join WeChat Group

Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:

WeChat Group QR Code
Scan to join the Edit Banana community

[!TIP] If the QR code has expired, please submit an Issue to request an updated one.

๐Ÿ“ฎ Contact Us

For academic cooperation, technical docking, commercial licensing, project customization and other business inquiries, please contact us via email:

E-mail: [email protected]


๐Ÿ“‘ Table of Contents


๐Ÿ“ธ Effect Demonstration

High-Definition Input-Output Comparison (4 Typical Scenarios)

To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 4 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified.

Scenario 1: Figures to DrawIO

๐Ÿ”’ Original Static Diagram (Input ยท Non-editable) ๐Ÿ”“ DrawIO Reconstruction Result (Output ยท Fully Editable)

Example 1: Basic Flowchart

Original Diagram 1

โœจ Editable Flowchart

Reconstruction Result 1

Example 2: Multi-level Architecture

Original Diagram 2

โœจ Editable Architecture

Reconstruction Result 2

Example 3: Technical Schematic

Original Diagram 3

โœจ Editable Schematic

Reconstruction Result 3

Example 4: Scientific Formula

Original Diagram 4

โœจ Editable Formula

Reconstruction Result 4

Scenario 2: Human in the Loop Modification



โœจ Manual repair




โœจ Save locally

[!NOTE] โœจ Conversion Highlights:

  1. Preserves the layout logic, color matching, and element hierarchy of the original diagram.
  2. 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness).
  3. Accurate text recognition, supporting direct subsequent editing and format adjustment.
  4. All elements are independently selectable, supporting native DrawIO template replacement and layout optimization.

๐Ÿš€ Key Features

  • Advanced Segmentation: Using our fine-tuned SAM 3 (Segment Anything Model 3) for segmentation of diagram elements.

  • Fixed Multi-Round VLM Scanning: An extraction process guided by Multimodal LLMs.

  • Text Recognition:

    • Local OCR for text localization; easy to install, runs offline.
    • Pix2Text for mathematical formula recognition and LaTeX conversion .
    • Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to the formula engine.
  • User System:

    • Registration: New users receive 10 free credits.
    • Credit System: Pay-per-use model prevents resource abuse.
    • Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.

๐Ÿ› ๏ธ Architecture Pipeline

  1. Input: Image (PNG/JPG/BMP/TIFF/WebP).
  2. Segmentation (SAM3): Using our fine-tuned SAM3 mask decoder.
  3. Text Extraction (Parallel):
    • Local OCR (Tesseract) detects text bounding boxes.
    • High-res crops of text/formula regions are sent to Pix2Text for LaTeX conversion.
  4. DrawIO XML Generation: Merging spatial data from SAM3 and text OCR results.

๐Ÿ“‚ Project Structure

Click to expand project structure
  Edit-Banana/
  โ”œโ”€โ”€ config/               # Configuration files (copy config.yaml.example โ†’ config.yaml)
  โ”œโ”€โ”€ flowchart_text/       # OCR & Text Extraction Module (standalone entry)
  โ”‚   โ”œโ”€โ”€ src/
  โ”‚   โ””โ”€โ”€ main.py             # OCR-only entry point
  โ”œโ”€โ”€ input/                # [Manual] Input images directory
  โ”œโ”€โ”€ models/               # [Manual] Model weights (SAM3) and optional BPE vocab
  โ”œโ”€โ”€ output/               # [Manual] Results directory
  โ”œโ”€โ”€ sam3/                 # SAM3 library (see Installation: install from facebookresearch/sam3)
  โ”œโ”€โ”€ sam3_service/         # SAM3 HTTP service (optional, for multi-process deployment)
  โ”œโ”€โ”€ scripts/              # Setup and utility scripts
  โ”‚   โ”œโ”€โ”€ setup_sam3.sh       # Install SAM3 lib and copy BPE to models/
  โ”‚   โ”œโ”€โ”€ setup_rmbg.py       # Download RMBG model from ModelScope
  โ”‚   โ””โ”€โ”€ merge_xml.py        # XML merge utilities
  โ”œโ”€โ”€ main.py               # CLI entry (modular pipeline)
  โ”œโ”€โ”€ server_pa.py          # FastAPI backend server
  โ””โ”€โ”€ requirements.txt      # Python dependencies

๐Ÿ“ฆ Installation & Setup

Follow these core phases to set up the project locally.

Phase 1: Environment & Base Setup

Configure your base environment and directory structure.

1. Prerequisites & Environment

  • Python 3.10+** & CUDA-capable GPU (Highly recommended)

  • Install PyTorch with CUDA support (e.g., for CUDA 11.8):

      pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

2. Clone Repository & Init Directories

  git clone https://github.com/BIT-DataLab/Edit-Banana.git
  cd Edit-Banana
  mkdir -p input output sam3_output

Phase 2: Models & Core Dependencies

Next, install the required packages and download necessary model weights (which should be placed in models/ and not committed).

1. Base Dependencies

pip install -r requirements.txt

2. SAM3 & Model Assets

  • SAM3 Library & BPE: Run bash scripts/setup_sam3.shto install the lib and copy the BPE vocab to models/. Verify with:

    python -c "from sam3.model_builder import build_sam3_image_model; print('OK')"
  • SAM3 Weights: Download sam3.pt from ModelScope or Hugging Face and place it under models/sam3_ms.

  • Text Local OCR (Tesseract):

    sudo apt install tesseract-ocr tesseract-ocr-chi-sim
๐Ÿงฉ Optional Capabilities (OCR Engine, Formula, RMBG) - Click to expand
  • PaddleOCR (Alternative/Better for mixed text): Use paddlepaddle==3.2.2 (avoiding 3.3.0 bug).

    pip install paddlepaddle==3.2.2 paddleocr.
  • Formula (Pix2Text):

    pip install pix2text onnxruntime-gpu.
  • Background Removal (RMBG): pip install onnxruntime modelscope then run python scripts/setup_rmbg.py.

Phase 3: Configuration & Troubleshooting

1. Final Configuration

Copy the example config and adjust the asset paths:

  cp config/config.yaml.example config/config.yaml

Edit config.yaml to ensure sam3.checkpoint_path and sam3.bpe_path match your models/ locations.

๐Ÿ› ๏ธ Before First Run Checklist & Troubleshooting - Click to expand

Checklist:

  • [ ] Config files copied and model paths set in config.yaml
  • [ ] SAM3 weights (sam3.pt) and BPE vocab placed under models/
  • [ ] Extracted SAM3 library via scripts/setup_sam3.sh Tesseract or PaddleOCR installed

Common Issues:

  • "no kernel image is available...": GPU arch mismatch. Upgrade PyTorch or set sam3.device: "cpu".
  • "Model file not found at ...rmbg/...": RMBG is optional. Enable by downloading via script.
  • "PaddleOCR inference failed...": Use paddlepaddle==3.2.2 or fallback to Tesseract.

๐Ÿ”ค Usage

Command Line Interface (CLI)

Supports image files (PNG, JPG, BMP, TIFF, WebP). To process a single image:

python main.py -i input/test_diagram.png

The output XML will be saved in the output/ directory. For batch processing, put images in input/ and run python main.py without -i.

Run and test locally

  1. One-time setup

    git clone https://github.com/BIT-DataLab/Edit-Banana.git && cd Edit-Banana
    python3 -m venv .venv && source .venv/bin/activate   # Linux/macOS; Windows: .venv\Scripts\activate
    pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118   # or CPU build
    pip install -r requirements.txt
    sudo apt install tesseract-ocr tesseract-ocr-chi-sim   # OCR (or equivalent on your OS)

    Install the SAM3 library and download model weights + BPE. Then:

    mkdir -p input output
    cp config/config.yaml.example config/config.yaml
    # Edit config/config.yaml: set sam3.checkpoint_path and sam3.bpe_path to your models/ paths
  2. Test with CLI

    # Put a diagram image in input/, e.g. input/test.png
    python main.py -i input/test.png
    # Output appears under output/<image_stem>/ (DrawIO XML and intermediates)
  3. Optional: test the web API

    python server_pa.py
    # In another terminal:
    curl -X POST http://localhost:8000/convert -F "file=@input/test.png"
    # Or open http://localhost:8000/docs and use the /convert endpoint with a file upload

โš™๏ธ Configuration

Customize the pipeline behavior in config/config.yaml:

  • sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.

  • paths: Set input/output directories.

  • dominant_color: Fine-tune color extraction sensitivity.


๐Ÿ“Œ Development Roadmap

Feature Module Status Description
Core Conversion Pipeline โœ… Completed Full pipeline of segmentation, reconstruction and OCR
Intelligent Arrow Connection โš ๏ธ In Development Automatically associate arrows with target shapes
DrawIO Template Adaptation ๐Ÿ“ Planned Support custom template import
Batch Export Optimization ๐Ÿ“ Planned Batch export to DrawIO files (.drawio)
Local LLM Adaptation ๐Ÿ“ Planned Support local VLM deployment, independent of APIs

๐Ÿค Contribution Guidelines

Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):

  1. Fork this repository
  2. Create a feature branch (git checkout -b feature/xxx)
  3. Commit your changes (git commit -m 'feat: add xxx')
  4. Push to the branch (git push origin feature/xxx)
  5. Open a Pull Request

Bug Reports: Issues Feature Suggestions: Discussions


๐Ÿ“„ License

This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).


๐ŸŒŸ Star History

๐ŸŒŸ If this project helps you, please star it to show your support!

Star History Chart