Home
Softono
rag-interview-system

rag-interview-system

Open source MIT
52
Stars
4
Forks
3
Issues
0
Watchers
1 week
Last Commit

About rag-interview-system

A complete collection of RAG interview questions, answers (200 questions & 12 RAG types), system design scenarios, architecture patterns, and production-ready concepts.

Platforms

Web Self-hosted iOS

Links

RAG Interview Questions & Answers (2026) — Retrieval-Augmented Generation Interview Prep

Stargazers Forks License: MIT Last Commit Questions PRs Welcome

RAG (Retrieval-Augmented Generation) Interview Questions and Answers — 286 Q&A covering 18 architectures and production failure modes

286 RAG (Retrieval-Augmented Generation) interview questions and answers for AI engineers, ML engineers, and GenAI/LLM developers. Covers all 18 RAG architectures, system design scenarios, vector databases, embeddings, chunking, reranking, evaluation, and the production failure modes that come up in real LLM engineering interviews.

Star this repo if it helps your interview prep — it keeps the project growing.

What is RAG?

Retrieval-Augmented Generation (RAG) is an LLM architecture that grounds model responses in external knowledge: documents are chunked, embedded, and stored in a vector database; at query time the most relevant chunks are retrieved via vector search and passed to the LLM as context for generation. RAG reduces hallucination, keeps answers current without retraining, and is the most common production pattern for enterprise LLM applications — which is why it dominates AI engineer and GenAI system design interviews.

Who is this for?

  • AI / ML engineers preparing for RAG, LLM, or GenAI interview rounds
  • Software engineers moving into LLM application development
  • Data scientists facing RAG system design interviews
  • Hiring managers and interviewers building question sets for GenAI roles

📚 Sections

Overview & Concepts · RAG Architecture Interview Questions · Failure Modes & Production Issues · Coming Soon

📖 Overview & Concepts

# Topic Purpose
00a Roadmap RAG maturity model, skill progression, and interview prep pathway
00b RAG Taxonomy Classification framework for all 18 architectures
00c Learning Path Structured curriculum and study plans
00d System Design Principles Production-grade architecture patterns
01a Embeddings Embedding models, similarity metrics, and fine-tuning
01b Chunking Strategies Document splitting and chunk optimization
01c Vector Databases Storage, indexing, and hybrid search
01d Retrieval Strategies Dense, sparse, hybrid, and advanced retrieval
01e Reranking Cross-encoders and precision filtering
01f Evaluation Metrics RAGAS, NDCG, and production monitoring
01g Prompt Injection Risks Security and defense strategies
01h Fine-Tuning for RAG When and how to fine-tune embeddings and rerankers
01i Observability & Evaluation Ops LLM-as-judge, online metrics, tracing, drift alerts
01j Multi-Tenancy & Access Control Tenant isolation, document ACLs, leakage surfaces

❓ RAG Architecture Interview Questions (18 Types)

# Topic Questions
02.01 Naive / Basic RAG 12
02.02 Advanced RAG 12
02.03 Modular RAG 12
02.04 Agentic RAG 12
02.05 Graph RAG 12
02.06 Corrective RAG (CRAG) 12
02.07 Self-RAG 12
02.08 Speculative RAG 12
02.09 Multi-modal RAG 12
02.10 Long-context RAG 12
02.11 Adaptive RAG 12
02.12 Structured / SQL RAG 12
02.13 RAPTOR 12
02.14 Contextual RAG 12
02.15 LightRAG 12
02.16 RAFT 12
02.17 Cache-Augmented Generation (CAG) 12
02.18 RAG-Fusion 12

RAG Architectures Total: 216 questions

⚠️ Failure Modes & Production Issues

# Topic Questions
03.01 Hallucination Despite Context 10
03.02 Retrieval Failure 10
03.03 Embedding Mismatch 10
03.04 Stale Index Problem 10
03.05 Context Window Overflow 10
03.06 Reranker Failure 10
03.07 Conversational Context Drift 10

Failure Modes Total: 70 questions

Grand Total: 286 questions

Difficulty distribution: ~30 Basic, ~105 Intermediate, ~151 Advanced

All cited papers with arXiv/DOI links: REFERENCES.md

🔄 Coming Soon

Each planned section has a stub README describing what it will contain and how to contribute.

# Section Status
04 Patterns Planned
05 Graphs Planned
06 Labs Planned
07 Simulator Planned
08 Evaluation Planned
09 Tools Planned
10 Decision System Planned

🗺️ RAG Architecture Types Explained (18 Patterns + 7 Failure Modes)

RAG Architectures (18 types):

Naive RAG
  └── Chunk → Embed → Store → Retrieve → Generate

Advanced RAG
  └── Query rewriting + Hybrid search + Re-ranking

Modular RAG
  └── Plug-and-play pipeline components

Agentic RAG
  └── LLM decides when/how to retrieve (ReAct, FLARE)

Graph RAG
  └── Knowledge graph for entity-aware retrieval

Corrective RAG (CRAG)
  └── Evaluates retrieval quality, falls back to web search

Self-RAG
  └── Model trained to reflect, retrieve, and critique itself

Speculative RAG
  └── Small model drafts → Large model selects best

Multi-modal RAG
  └── Retrieve across text, images, tables, audio

Long-context RAG
  └── Stuff entire docs into large context windows

Adaptive RAG
  └── Query classifier routes to no-retrieval / single-hop / multi-hop

Structured / SQL RAG
  └── Text-to-SQL generation for relational database retrieval

RAPTOR  [NEW]
  └── Recursively clusters and summarizes chunks into a multi-level tree

Contextual RAG  [NEW]
  └── LLM-generated context prefix prepended to each chunk before embedding

LightRAG  [NEW]
  └── Entity-relationship graph + dual-level (local + global) retrieval

RAFT  [NEW]
  └── Fine-tunes the LLM generator on oracle + distractor documents

Cache-Augmented Generation (CAG)  [NEW]
  └── Preloads entire corpus into KV cache — no retrieval step at inference

RAG-Fusion  [NEW]
  └── N query reformulations → N parallel retrievals → RRF merge → generation

Production Failure Modes (7 critical issues):

Hallucination Despite Context
  └── LLM ignores retrieved docs, generates false claims

Retrieval Failure
  └── Relevant chunks never surface due to semantic gap

Embedding Mismatch
  └── Query-doc embeddings in different semantic spaces

Stale Index Problem
  └── Index contains outdated information, answers are wrong

Context Window Overflow
  └── Too many/large chunks exceed context, forcing truncation

Reranker Failure
  └── Cross-encoder mis-ranks results, buries correct answers

Conversational Context Drift  [NEW]
  └── Multi-turn history poisons the retrieval query via unresolved references

💡 How to Use

Four content types:

  1. Overview & Concepts (00_overview/, 01_concepts/) — Reference material, not Q&A

    • Read these first to build foundational understanding
    • Comparison tables, ASCII diagrams, code examples, and system design patterns
    • Use to answer conceptual questions and understand mechanisms deeply
  2. Interview Questions (02_interview_bank/) — 12 questions per architecture

    • Each section contains interview-style Q&A with detailed answers
    • Every section: original 10 questions + Q11 on cost optimization + Q12 on security
    • Questions are tagged with difficulty: [Basic] [Intermediate] [Advanced]
  3. Failure Modes (03_failure_modes/) — 10 questions per failure pattern

    • Seven critical production failure scenarios with diagnostic Q&A
    • Use for system design rounds and production-readiness discussions
  4. CHEATSHEET (cheatsheets/CHEATSHEET.md) — Quick reference

    • All 18 RAG types compared in one table
    • Use during phone screens or quick prep

Study path:

  • 1-week prep: Start with 00_overview/learning_path.md → pick a track → follow the schedule
  • Phone screen: cheatsheets/CHEATSHEET.md + Q1–Q5 from relevant architectures
  • System design round: 00_overview/system_design_principles.md + Q9–Q12 from all files + 03_failure_modes/ for production readiness
  • Deep prep: Read 01_concepts/ files + all 02_interview_bank/ Q&A

🏷️ Topics Covered

Embeddings · Chunking strategies · Vector databases (FAISS, Pinecone, Weaviate, pgvector) · Hybrid search (BM25 + dense) · Reranking & cross-encoders · RAG evaluation (RAGAS, NDCG) · Agentic RAG · Graph RAG · Self-RAG & Corrective RAG · Multi-modal RAG · Text-to-SQL · Prompt injection & RAG security · Hallucination mitigation · LLM observability · Multi-tenancy & access control


Contributing

This repo grows best with real-world signal. If you were asked a RAG question in an interview, open a PR — real questions are prioritized over synthetically generated ones.

See CONTRIBUTING.md for how to submit a question.


Support

For issues, questions, or general feedback:


License

MIT


See Contributing to add your interview experience to the repo.