Home
Softono
LinearRAG

LinearRAG

Open source Python
503
Stars
60
Forks
3
Issues
4
Watchers
3 weeks
Last Commit

About LinearRAG

LinearRAG is an advanced Graph Retrieval-Augmented Generation (GraphRAG) system designed for efficient processing of large-scale corpora. Presented at ICLR 2026, it introduces a novel relation-free graph construction method that eliminates the need for Large Language Model (LLM) token consumption during graph building, significantly reducing costs and improving speed. Unlike traditional approaches that rely on explicit relational extraction, LinearRAG utilizes lightweight entity recognition and semantic linking to construct comprehensive knowledge graphs. This architecture supports deep multi-hop reasoning within a single retrieval pass while maintaining context preservation across complex queries. The system features linear time and space complexity, making it highly scalable for extensive datasets without sacrificing performance. It is compatible with standard embedding models like all-mpnet-base-v2 and supports various pre-trained NLP models for both general and specialized domains such as medical literatu

Platforms

Web Self-hosted

Languages

Python

LinearRAG: Linear Graph Retrieval-Augmented Generation on Large-scale Corpora

A relation-free graph construction method for efficient GraphRAG. It eliminates LLM token costs during graph construction, making GraphRAG faster and more efficient than ever.

arXiv:2506.08938 HuggingFace GitHub


🎉 News

  • [2026-04-07] Our ProbeRAG for RAG faithfulness is accepted by ACL'26.
  • [2026-04-07] Our BAPO for reliable agentic search is accepted by ACL'26.
  • [2026-04-07] Our LegalGraphRAG for reliable legal reasoning is accepted by ACL'26.
  • [2026-04-07] Our LogicPoison, a GraphRAG attack model, is accepted by ACL'26.
  • [2026-01-26] Our LinearRAG for efficient GraphRAG is accepted by ICLR’26.
  • [2026-01-26] Our GraphRAG Benchmark is accepted by ICLR’26.
  • [2025-10-27] We release LinearRAG, a relation-free graph construction method for efficient GraphRAG.
  • [2025-06-06] We release the GraphRAG Benchmark for evaluating GraphRAG models.
  • [2025-01-21] We release the GraphRAG survey.

🚀 Highlights

  • Context-Preserving: Relation-free graph construction, relying on lightweight entity recognition and semantic linking to achieve comprehensive contextual comprehension.
  • Complex Reasoning: Enables deep retrieval via semantic bridging, achieving multi-hop reasoning in a single retrieval pass without requiring explicit relational graphs.
  • High Scalability: Zero LLM token consumption, faster processing speed, and linear time/space complexity.

Framework Overview


🛠️ Usage

1️⃣ Install Dependencies

Step 1: Install Python packages

pip install -r requirements.txt
(Use Python 3.9 preferably)

Step 2: Download Spacy language model

python -m spacy download en_core_web_trf

Note: For the medical dataset, you need to install the scientific/biomedical Spacy model:

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.3/en_core_sci_scibert-0.5.3.tar.gz

Step 3: Set up your OpenAI API key

export OPENAI_API_KEY="your-api-key-here"
export OPENAI_BASE_URL="your-base-url-here"

Step 4: Download Datasets

Download the datasets from HuggingFace and place them in the dataset/ folder:

git clone https://huggingface.co/datasets/Zly0523/linear-rag
cp -r linear-rag/* dataset/

Step 5: Prepare Embedding Model

Make sure the embedding model is available at:

model/all-mpnet-base-v2/

2️⃣ Quick Start Example

SPACY_MODEL="en_core_web_trf"
EMBEDDING_MODEL="model/all-mpnet-base-v2"
DATASET_NAME="2wikimultihop"
LLM_MODEL="gpt-4o-mini"
MAX_WORKERS=16

python run.py \
    --spacy_model ${SPACY_MODEL} \
    --embedding_model ${EMBEDDING_MODEL} \
    --dataset_name ${DATASET_NAME} \
    --llm_model ${LLM_MODEL} \
    --max_workers ${MAX_WORKERS} 
    # --use_vectorized_retrieval # optional, use vectorized matrix-based retrieval for GPU acceleration if Strong GPU is available, otherwise use BFS iteration.

🎯 Performance

framework

Main results of end-to-end performance

framework

Efficiency and performance comparison.

📬 Citation

If you find this work helpful, please consider citing us:

@article{zhuang2025linearrag,
  title={LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora},
  author={Zhuang, Luyao and Chen, Shengyuan and Xiao, Yilin and Zhou, Huachi and Zhang, Yujing and Chen, Hao and Zhang, Qinggang and Huang, Xiao},
  journal={arXiv preprint arXiv:2510.10114},
  year={2025}
}