Home
Softono
b

bit-datalab

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Total Products
1

Software by bit-datalab

Edit-Banana
Open Source

Edit-Banana

<p align="center"> <img src="/static/banana.jpg" width="180" alt="Edit Banana Logo"/> </p> <h1 align="center">๐ŸŒ Edit Banana</h1> <p align="center"> <a href="README_CN.md">ไธญๆ–‡</a> | English </p> <h3 align="center">Universal Content Re-Editor: Make the Uneditable, Editable</h3> <p align="center"> Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships. </p> <p align="center"> <a href="https://www.python.org/"><img src="https://img.shields.io/badge/Python-3.10+-3776AB?style=flat-square&logo=python&logoColor=white" alt="Python"/></a> <a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache_2.0-2F80ED?style=flat-square&logo=apache&logoColor=white" alt="License"/></a> <a href="https://developer.nvidia.com/cuda-downloads"><img src="https://img.shields.io/badge/GPU-CUDA%20Recommended-76B900?style=flat-square&logo=nvidia" alt="CUDA"/></a> <a href="#-join-wechat-group"><img src="https://img.shields.io/badge/WeChat-Join%20Group-07C160?style=flat-square&logo=wechat&logoColor=white" alt="WeChat"/></a> <a href="https://github.com/BIT-DataLab/Edit-Banana/stargazers"><img src="https://img.shields.io/github/stars/BIT-DataLab/Edit-Banana?style=flat-square&logo=github" alt="GitHub stars"/></a> <a href="https://atomgit.com/BIT-DataLab/Edit-Banana/stargazers"><img src="https://atomgit.com/BIT-DataLab/Edit-Banana/star/badge.svg" alt="AtomGit Star"/></a > </p> --- <h3 align="center">Try It Now!</h3> <p align="center"> <a href="https://www.editbanana.net/"> <img src="https://img.shields.io/badge/๐Ÿš€%20Try%20Online%20Demo-www.editbanana.net-FF6B6B?style=for-the-badge&logoColor=white" alt="Try Online Demo"/> </a> </p> <p align="center"> ๐Ÿ‘† <b>Click above or https://www.editbanana.net/ to try Edit Banana online!</b> Upload an image to get <b>editable DrawIO (XML)</b> in seconds. </p> > [!WARNING] > **Please note**: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform. --- ## ๐Ÿ’ฌ Join WeChat Group Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join: <p align="center"> <img src="/static/wechatGroup.jpg" width="70%" alt="WeChat Group QR Code"/> <br/> <em>Scan to join the Edit Banana community</em> </p> > [!TIP] > If the QR code has expired, please submit an [Issue](https://github.com/BIT-DataLab/Edit-Banana/issues) to request an updated one. ## ๐Ÿ“ฎ Contact Us For academic cooperation, technical docking, commercial licensing, project customization and other business inquiries, please contact us via email: > **E-mail: [email protected]** --- ## ๐Ÿ“‘ Table of Contents - [๐Ÿ“ธ Effect Demonstration](#-effect-demonstration) - [๐Ÿš€ Key Features](#-key-features) - [๐Ÿ› ๏ธ Architecture Pipeline](#๏ธ-architecture-pipeline) - [๐Ÿ“‚ Project Structure](#-project-structure) - [๐Ÿ“ฆ Installation & Setup](#-installation--setup) - [๐Ÿ”ค Usage](#-usage) - [โš™๏ธ Configuration](#๏ธ-configuration) - [๐Ÿ“Œ Development Roadmap](#-development-roadmap) - [๐Ÿ’ฌ Join WeChat Group](#-join-wechat-group) - [๐Ÿค Contribution Guidelines](#-contribution-guidelines) - [๐Ÿคฉ Contributors](#-contributors) - [๐Ÿ“„ License](#-license) - [๐ŸŒŸ Star History](#-star-history) --- ## ๐Ÿ“ธ Effect Demonstration ### High-Definition Input-Output Comparison (4 Typical Scenarios) To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 4 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified. #### Scenario 1: Figures to DrawIO | ๐Ÿ”’ Original Static Diagram (Input ยท Non-editable) | ๐Ÿ”“ DrawIO Reconstruction Result (Output ยท Fully Editable) | |:---:|:---:| | <br><b>Example 1: Basic Flowchart</b><br><br><img src="/static/demo/original_1.jpg" width="450" alt="Original Diagram 1" style="border: 1px solid #eee; border-radius: 8px;"/> | <br><b>โœจ Editable Flowchart</b><br><br><img src="/static/demo/recon_1.png" width="450" alt="Reconstruction Result 1" style="border: 1px solid #eee; border-radius: 8px;"/> | | <br><b>Example 2: Multi-level Architecture</b><br><br><img src="/static/demo/original_2.png" width="450" alt="Original Diagram 2" style="border: 1px solid #eee; border-radius: 8px;"/> | <br><b>โœจ Editable Architecture</b><br><br><img src="/static/demo/recon_2.png" width="450" alt="Reconstruction Result 2" style="border: 1px solid #eee; border-radius: 8px;"/> | | <br><b>Example 3: Technical Schematic</b><br><br><img src="/static/demo/original_3.jpg" width="450" alt="Original Diagram 3" style="border: 1px solid #eee; border-radius: 8px;"/> | <br><b>โœจ Editable Schematic</b><br><br><img src="/static/demo/recon_3.png" width="450" alt="Reconstruction Result 3" style="border: 1px solid #eee; border-radius: 8px;"/> | | <br><b>Example 4: Scientific Formula</b><br><br><img src="/static/demo/original_4.jpg" width="450" alt="Original Diagram 4" style="border: 1px solid #eee; border-radius: 8px;"/> | <br><b>โœจ Editable Formula</b><br><br><img src="/static/demo/recon_4.png" width="450" alt="Reconstruction Result 4" style="border: 1px solid #eee; border-radius: 8px;"/> | #### Scenario 2: Human in the Loop Modification <div align="center"> <br> <img src="static/demo/cut.gif" width="90%"/> <br><sub>โœจ Manual repair</sub> <br><br> <img src="static/demo/save.gif" width="90%"/> <br><sub>โœจ Save locally</sub> </div> > [!NOTE] > **โœจ Conversion Highlights:** > 1. Preserves the layout logic, color matching, and element hierarchy of the original diagram. > 2. 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness). > 3. Accurate text recognition, supporting direct subsequent editing and format adjustment. > 4. All elements are independently selectable, supporting native DrawIO template replacement and layout optimization. ## ๐Ÿš€ Key Features - **Advanced Segmentation**: Using our fine-tuned **SAM 3 (Segment Anything Model 3)** for segmentation of diagram elements. - **Fixed Multi-Round VLM Scanning**: An extraction process guided by **Multimodal LLMs**. - **Text Recognition**: - **Local OCR** for text localization; easy to install, runs offline. - **Pix2Text** for mathematical formula recognition and **LaTeX** conversion . - **Crop-Guided Strategy**: Extracts text/formula regions and sends high-res crops to the formula engine. - **User System**: - **Registration**: New users receive **10 free credits**. - **Credit System**: Pay-per-use model prevents resource abuse. - **Multi-User Concurrency**: Built-in support for concurrent user sessions using a **Global Lock** mechanism for thread-safe GPU access and an **LRU Cache** (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability. --- ## ๐Ÿ› ๏ธ Architecture Pipeline 1. **Input**: Image (PNG/JPG/BMP/TIFF/WebP). 2. **Segmentation (SAM3)**: Using our fine-tuned SAM3 mask decoder. 3. **Text Extraction (Parallel)**: * Local OCR (Tesseract) detects text bounding boxes. * High-res crops of text/formula regions are sent to Pix2Text for LaTeX conversion. 4. **DrawIO XML Generation**: Merging spatial data from SAM3 and text OCR results. --- ## ๐Ÿ“‚ Project Structure <details> <summary><b>Click to expand project structure </b></summary> ```text Edit-Banana/ โ”œโ”€โ”€ config/ # Configuration files (copy config.yaml.example โ†’ config.yaml) โ”œโ”€โ”€ flowchart_text/ # OCR & Text Extraction Module (standalone entry) โ”‚ โ”œโ”€โ”€ src/ โ”‚ โ””โ”€โ”€ main.py # OCR-only entry point โ”œโ”€โ”€ input/ # [Manual] Input images directory โ”œโ”€โ”€ models/ # [Manual] Model weights (SAM3) and optional BPE vocab โ”œโ”€โ”€ output/ # [Manual] Results directory โ”œโ”€โ”€ sam3/ # SAM3 library (see Installation: install from facebookresearch/sam3) โ”œโ”€โ”€ sam3_service/ # SAM3 HTTP service (optional, for multi-process deployment) โ”œโ”€โ”€ scripts/ # Setup and utility scripts โ”‚ โ”œโ”€โ”€ setup_sam3.sh # Install SAM3 lib and copy BPE to models/ โ”‚ โ”œโ”€โ”€ setup_rmbg.py # Download RMBG model from ModelScope โ”‚ โ””โ”€โ”€ merge_xml.py # XML merge utilities โ”œโ”€โ”€ main.py # CLI entry (modular pipeline) โ”œโ”€โ”€ server_pa.py # FastAPI backend server โ””โ”€โ”€ requirements.txt # Python dependencies ``` </details> --- ## ๐Ÿ“ฆ Installation & Setup Follow these core phases to set up the project locally. ### Phase 1: Environment & Base Setup Configure your base environment and directory structure. #### 1. Prerequisites & Environment - Python 3.10+** & CUDA-capable GPU (Highly recommended) - Install PyTorch with CUDA support (e.g., for CUDA 11.8): ```bash pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 ``` #### 2. Clone Repository & Init Directories ```bash git clone https://github.com/BIT-DataLab/Edit-Banana.git cd Edit-Banana mkdir -p input output sam3_output ``` ### Phase 2: Models & Core Dependencies Next, install the required packages and download necessary model weights (which should be placed in models/ and not committed). #### 1. Base Dependencies ```bash pip install -r requirements.txt ``` #### 2. SAM3 & Model Assets - SAM3 Library & BPE: Run `bash scripts/setup_sam3.sh`to install the lib and copy the BPE vocab to `models/`. Verify with: ```bash python -c "from sam3.model_builder import build_sam3_image_model; print('OK')" ``` - SAM3 Weights: Download sam3.pt from [ModelScope](https://modelscope.cn/models/facebook/sam3) or [Hugging Face](https://huggingface.co/facebook/sam3) and place it under `models/sam3_ms`. - Text Local OCR (Tesseract): ```bash sudo apt install tesseract-ocr tesseract-ocr-chi-sim ``` <details> <summary><b>๐Ÿงฉ Optional Capabilities (OCR Engine, Formula, RMBG) - Click to expand</b></summary> - PaddleOCR (Alternative/Better for mixed text): Use paddlepaddle==3.2.2 (avoiding 3.3.0 bug). ```bash pip install paddlepaddle==3.2.2 paddleocr. ``` - Formula (Pix2Text): ```bash pip install pix2text onnxruntime-gpu. ``` - Background Removal (RMBG): `pip install onnxruntime modelscope` then run `python scripts/setup_rmbg.py`. </details> ### Phase 3: Configuration & Troubleshooting #### 1. Final Configuration Copy the example config and adjust the asset paths: ```bash cp config/config.yaml.example config/config.yaml ``` Edit `config.yaml` to ensure `sam3.checkpoint_path` and `sam3.bpe_path` match your `models/ locations`. <details> <summary><b>๐Ÿ› ๏ธ Before First Run Checklist & Troubleshooting - Click to expand</b></summary> **Checklist**: - [ ] Config files copied and model paths set in `config.yaml` - [ ] SAM3 weights (`sam3.pt`) and BPE vocab placed under `models/` - [ ] Extracted SAM3 library via `scripts/setup_sam3.sh` Tesseract or PaddleOCR installed **Common Issues**: - "no kernel image is available...": GPU arch mismatch. Upgrade PyTorch or set `sam3.device: "cpu"`. - "Model file not found at ...rmbg/...": RMBG is optional. Enable by downloading via script. - "PaddleOCR inference failed...": Use `paddlepaddle==3.2.2` or fallback to Tesseract. </details> --- ## ๐Ÿ”ค Usage ### Command Line Interface (CLI) Supports image files (PNG, JPG, BMP, TIFF, WebP). To process a single image: ```bash python main.py -i input/test_diagram.png ``` The output XML will be saved in the `output/` directory. For batch processing, put images in `input/` and run `python main.py` without `-i`. ### Run and test locally 1. **One-time setup** ```bash git clone https://github.com/BIT-DataLab/Edit-Banana.git && cd Edit-Banana python3 -m venv .venv && source .venv/bin/activate # Linux/macOS; Windows: .venv\Scripts\activate pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 # or CPU build pip install -r requirements.txt sudo apt install tesseract-ocr tesseract-ocr-chi-sim # OCR (or equivalent on your OS) ``` Install the SAM3 library and download model weights + BPE. Then: ```bash mkdir -p input output cp config/config.yaml.example config/config.yaml # Edit config/config.yaml: set sam3.checkpoint_path and sam3.bpe_path to your models/ paths ``` 2. **Test with CLI** ```bash # Put a diagram image in input/, e.g. input/test.png python main.py -i input/test.png # Output appears under output/<image_stem>/ (DrawIO XML and intermediates) ``` 3. **Optional: test the web API** ```bash python server_pa.py # In another terminal: curl -X POST http://localhost:8000/convert -F "file=@input/test.png" # Or open http://localhost:8000/docs and use the /convert endpoint with a file upload ``` --- ## โš™๏ธ Configuration Customize the pipeline behavior in `config/config.yaml`: - **sam3**: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops. - **paths**: Set input/output directories. - **dominant_color**: Fine-tune color extraction sensitivity. --- ## ๐Ÿ“Œ Development Roadmap | Feature Module | Status | Description | |--------------------------|--------------|---------------------------------| | Core Conversion Pipeline | โœ… Completed | Full pipeline of segmentation, reconstruction and OCR | | Intelligent Arrow Connection | โš ๏ธ In Development | Automatically associate arrows with target shapes | | DrawIO Template Adaptation | ๐Ÿ“ Planned | Support custom template import | | Batch Export Optimization | ๐Ÿ“ Planned | Batch export to DrawIO files (.drawio) | | Local LLM Adaptation | ๐Ÿ“ Planned | Support local VLM deployment, independent of APIs | --- ## ๐Ÿค Contribution Guidelines Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions): 1. Fork this repository 2. Create a feature branch (`git checkout -b feature/xxx`) 3. Commit your changes (`git commit -m 'feat: add xxx'`) 4. Push to the branch (`git push origin feature/xxx`) 5. Open a Pull Request Bug Reports: [Issues](https://github.com/BIT-DataLab/Edit-Banana/issues) Feature Suggestions: [Discussions](https://github.com/BIT-DataLab/Edit-Banana/discussions) --- ## ๐Ÿ“„ License This project is open-source under the [Apache License 2.0](LICENSE), allowing commercial use and secondary development (with copyright notice retained). --- ## ๐ŸŒŸ Star History ๐ŸŒŸ If this project helps you, please star it to show your support! [![Star History Chart](https://api.star-history.com/svg?repos=bit-datalab/edit-banana&type=date&legend=top-left)](https://www.star-history.com/#bit-datalab/edit-banana&type=date&legend=top-left)

ML Frameworks
5.3K Github Stars