About ai-system-design-guide

AI system design guide for engineers building production AI systems and evals.

o

Published by

ombharatiya

Visit View Profile

README.md

View on GitHub

🧠 AI System Design Guide

The Complete Interview & Production Reference

If this guide helps you, follow @ombharatiya on GitHub, X, and LinkedIn to get notified when new chapters, model refreshes, and interview prompts ship.

The living reference for production AI systems. Continuously updated. Interview-ready depth.

A practical, continuously updated guide to AI system design, RAG architectures, LLM engineering, agentic AI, MCP and A2A protocols, and AI engineering interview preparation. Covers production patterns, model selection, evaluation, and real-world case studies from staff-level interviews.

New here? Jump to the 116-question Interview Bank, the RAG Fundamentals chapter, or pick the right LLM for production.

📚 Quick Navigation

I want to...	Start here
Prepare for interviews	Question Bank → Answer Frameworks
Learn AI systems fast	LLM Internals → RAG Fundamentals
Build production RAG	Chunking → Vector DBs → Reranking → Production RAG
Advanced retrieval	Contextual Retrieval → ColBERT → Multi-modal RAG
Design multi-tenant AI	Isolation Patterns → Case Study
Build agents	Agent Fundamentals → MCP & A2A → LangGraph
Tool-use & computer agents	Landscape → OpenClaw → Safety
Autonomous coding agents	Claude Code → OpenCoder Landscape
Pick the right model (2026)	Model Taxonomy → Pricing
Evaluate AI in production	AI Evals Guide (Phoenix/Langfuse) → AI Evals Guide (LangWatch/Langfuse)
Find the best courses to learn AI	Recommended Courses & Learning Paths
Transition from my current role to AI	Role Transition Guide
Understand the 2026 AI job market	Job Market Trends - June 2026
Get a quick answer to a common question	FAQ (RAG, agents, models, eval, inference, memory, security)
Look up a term	Glossary (every term defined)

Pick a path

flowchart TD
    A[New visitor] --> B{Your goal}
    B -->|Interview prep| C[Question Bank]
    B -->|Build RAG| D[RAG Fundamentals]
    B -->|Build agents| E[Agent Fundamentals]
    B -->|Pick a model| F[Model Taxonomy]
    B -->|Evaluate AI| G[AI Evals Guide]
    C --> H[Answer Frameworks]
    D --> I[Chunking + Vector DBs]
    E --> J[MCP and Tool Use]
    F --> K[Pricing 2026]
    G --> L[Phoenix or LangWatch]

🎯 Why This Guide

Traditional books are outdated before they ship. This is a living document: when new models release, when patterns evolve, this updates.

This Guide	Printed Books
June 2026 models (Claude Fable 5, Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, DeepSeek V4 Pro, Llama 4, Kimi K2.6, Qwen 3.6, Mistral Medium 3.5, Gemma 4)	Stuck on GPT-4
MCP 2.0, A2A v1.0, OpenClaw, Computer Use, Agentic RAG, ColBERT, latent reasoning, MoE serving	Does not exist
Real pricing with June 2026 verification dates	Already wrong
Staff-level interview Q&A (116 questions through June 2026) + Job Market Trends	Generic questions

Quick model picker (June 2026): Claude Fable 5 for the capability ceiling ($10/$50 per 1M), Claude Opus 4.8 for tool-use and long-horizon agentic coding, GPT-5.5 for general production, Gemini 3.1 Pro for multimodal, DeepSeek V4 Flash ($0.14/$0.28 per 1M) or V4 Pro ($0.435/$0.87) for cheap frontier-class output, Llama 4 for self-hosted. Full breakdown in Model Taxonomy.

🎯 What This Guide Is (and Is Not)

This guide IS:

A staff-level reference for designing production AI systems (RAG, agents, MCP, eval pipelines, multi-tenant isolation).
An interview-prep companion with 116 real questions, answer frameworks with a worked mock transcript, and nine whiteboard exercises through June 2026.
A living document tracking new model releases, protocol changes, and emerging patterns as they ship.
Opinionated about tradeoffs: latency vs cost, accuracy vs faithfulness, single-agent vs multi-agent.
Free, MIT-licensed, and open to PRs from practitioners.

This guide is NOT:

A tutorial on Python, PyTorch, or basic ML fundamentals (start with a course; see COURSES.md).
A vendor-neutral hedge; it names specific models, prices, and frameworks because real systems require real choices.
A replacement for hands-on building; read it alongside a project, not instead of one.
A research paper digest; it cites papers when they change practice, not for completeness.

📖 Guide Structure

├── 00-interview-prep/           # Questions (116), frameworks, exercises, job-market trends (June 2026)
├── 01-foundations/              # Transformers, attention, embeddings
├── 02-model-landscape/          # Claude Fable 5, Claude Opus 4.8, GPT-5.5, Gemini 3.1, DeepSeek V4, Llama 4, Kimi K2.6, Qwen 3.6
├── 03-training-and-adaptation/  # Fine-tuning, LoRA, DPO, distillation
├── 04-inference-optimization/   # KV cache, PagedAttention, vLLM
├── 05-prompting-and-context/    # Prompt engineering, CoT, Extended Thinking, DSPy, prompt injection
├── 06-retrieval-systems/        # RAG, chunking, GraphRAG, Agentic RAG, ColBERT, Contextual Retrieval
├── 07-agentic-systems/          # MCP 2.0, A2A protocol, multi-agent, computer-use
├── 08-memory-and-state/         # L1-L3 memory tiers, Mem0, caching
├── 09-frameworks-and-tools/     # LangGraph, DSPy, LlamaIndex, Claude Code, OpenCoder
├── 10-document-processing/      # Vision-LLM OCR, multimodal parsing
├── 11-infrastructure-and-mlops/ # GPU clusters, LLMOps, cost management
├── 12-security-and-access/      # RBAC, ABAC, multi-tenant isolation
├── 13-reliability-and-safety/   # Guardrails, red-teaming
├── 14-evaluation-and-observability/ # RAGAS, LangSmith, drift detection
├── 15-ai-design-patterns/       # Pattern catalog, anti-patterns
├── 16-case-studies/             # Real-world architectures with diagrams
├── 17-tool-use-and-computer-agents/ # OpenClaw, Computer Use, tool agents, safety
├── GLOSSARY.md                  # Every term defined
│
├── ai_evals_comprehensive_study_guide.md      # 🔬 Deep-dive: AI Evals (Phoenix + Langfuse)
└── ai_evals_complete_guide_langwatch_langfuse.md  # 🔬 Deep-dive: AI Evals (LangWatch + Langfuse)
└── COURSES.md                   # 🎓 Recommended courses & learning paths
└── TRANSITION_GUIDE.md          # 🔄 Transition from Backend/QA/PM/EM to AI roles

Chapters by AI System Lifecycle Stage

mindmap
  root((AI System Design Guide))
    Foundations
      LLM Internals
      Model Landscape
      Training and Adaptation
    Build
      Prompting and Context
      Retrieval Systems
      Agentic Systems
      Tool Use and Computer Agents
    Operate
      Inference Optimization
      Memory and State
      Frameworks and Tools
      Infrastructure and MLOps
    Govern
      Security and Access
      Reliability and Safety
      Evaluation and Observability
    Apply
      Design Patterns
      Case Studies
      Interview Prep

🔥 Featured Case Studies

Real interview problems with complete solutions and diagrams:

Case Study	Problem	Key Patterns
Real-Time Search	5-minute data freshness at scale	Streaming + Hybrid Search
Coding Agent	Autonomous multi-file changes	Sandboxing + Self-Correction
Multi-Tenant SaaS	Coca-Cola and Pepsi on same infra	Defense-in-Depth Isolation
Customer Support	60% auto-resolution rate	Tiered Routing + Escalation
Document Intelligence	50K contracts/month extraction	Vision-LLM + Parallel Extractors
Recommendation Engine	Personalized explanations at 50M users	ML Ranking + LLM Explanations
Compliance Automation	FDA regulation pre-screening	Claim Extraction + Precedent DB
Voice Healthcare	Real-time clinical note generation	On-Prem ASR + HIPAA
Fraud Detection	100ms decision with explainability	ML + Rules Hybrid
Knowledge Management	2M docs with access control	Permission-Aware RAG
Computer-Use Agent	Expense-report automation across 3 legacy UIs	Firecracker VMs + Action Gate + IPI Defense
Multi-Tenant Fine-Tuning	280 tenants on shared base + per-tenant LoRA	LoRA Hot-Swap + Eval-as-PRD per Tenant
Eval-Gated CI/CD	Block PRs that regress AI quality	Golden Sets + LLM Judges + Statistical Correction
Customer Distillation	Cut $50K/mo frontier spend to $6K with 3-mo payback	Trace-Based Distillation + Canary Rollout
MCP Knowledge Agent	Cross-system answers from Snowflake/Confluence/Jira/Slack	MCP + OAuth Resource Server + Capability Gating

🔬 Bonus Deep-Dive Guides

Two companion guides (3,000+ lines each) covering AI evaluation end-to-end - for Engineers, PMs, and QAs:

Guide	Platforms Covered	What's Inside
AI Evals: Comprehensive Study Guide	Arize Phoenix + Langfuse	LLM-as-a-Judge, RAG eval, multi-turn eval, production safety, statistical correction with `judgy`, 30-day learning path
AI Evals: LangWatch + Langfuse Guide	LangWatch + Langfuse	Same syllabus with LangWatch's 40+ built-in evaluators, side-by-side platform comparisons, platform choice guidance

Topics covered across both guides:

Tracing and observability setup (Phoenix, LangWatch, Langfuse)
Error analysis: open coding → axial coding → failure mode taxonomy
Building LLM judges with Train/Dev/Test split and ground truth calibration
Code-based evaluators (regex, JSON schema, format validators)
RAG-specific evals: faithfulness, context recall, answer relevance
Multi-step pipeline evaluation and multi-turn conversation eval
Production guardrails, safety monitoring, real-time drift detection
Statistical correction with judgy library
Human annotation best practices and inter-rater reliability
Cost/latency optimization for eval pipelines at scale

🎓 For Interview Prep

AI engineering and system design interviews ask questions like:

"Design a multi-tenant RAG system where competitors cannot see each other's data."

"Your agent takes 15 steps for a 3-step task. How do you debug it?"

This guide gives you concrete patterns, real tradeoffs, and production failure modes: the depth interviewers expect at senior levels.

➡️ Start with Interview Prep

❓ Frequently Asked Questions

What is AI system design?

AI system design is the discipline of architecting production-grade systems built around LLMs, retrieval, agents, and evaluation. It covers model selection, RAG pipelines, agent orchestration, memory, observability, and safety. See LLM Internals and AI Design Patterns to get oriented.

How do I prepare for an AI engineering interview?

Start with the Question Bank (116 questions through June 2026), then practice with Answer Frameworks and Whiteboard Exercises. Most senior interviews test RAG design, agent debugging, multi-tenant isolation, and cost/latency tradeoffs, all covered in the Case Studies.

What is RAG (Retrieval-Augmented Generation)?

RAG is a pattern where an LLM retrieves relevant context from an external knowledge source (vector DB, search index, graph) before generating an answer, reducing hallucinations and grounding responses in your data. The full pipeline is covered in RAG Fundamentals and scaled in Production RAG at Scale.

What are AI agents and how are they different from chatbots?

AI agents are LLM-driven systems that plan, call tools, and act over multiple steps to accomplish goals, whereas chatbots typically respond in a single turn. Agents introduce loops, memory, error recovery, and tool-use via protocols like MCP. Start with Agent Fundamentals.

What is MCP (Model Context Protocol) and how does it compare to A2A?

MCP is an open protocol that lets LLMs discover and call external tools and data sources in a standardized way. A2A (Agent-to-Agent) is a complementary protocol for inter-agent communication. They solve different layers: MCP is the tool boundary, A2A is the agent boundary. See Tool Use and MCP.

Which LLM should I use in production: Claude, GPT, Gemini, or open-source?

It depends on latency budget, context length, cost per million tokens, tool-use quality, and data residency. The Model Taxonomy and Pricing chapters give a head-to-head for Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, DeepSeek V4, Llama 4, and others as of May 2026.

How do I evaluate an LLM or RAG system in production?

Combine offline evals (LLM-as-a-judge with ground-truth calibration), online metrics (faithfulness, context recall, answer relevance), and continuous tracing. The companion deep-dives AI Evals: Phoenix + Langfuse and AI Evals: LangWatch + Langfuse walk through this end-to-end.

How do I build a multi-tenant RAG system safely?

Use defense-in-depth: per-tenant indexes or namespaces, query-time access checks, and prompt-layer guards. The Multi-Tenant RAG Isolation chapter and Multi-Tenant SaaS Case Study cover the patterns that hold up in interviews and production.

What is agentic RAG?

Agentic RAG combines retrieval with an agent loop that can decide what to search, when to re-query, and when to escalate, instead of running a single fixed retrieve-then-generate pass. See Agentic RAG for the architectures and tradeoffs.

Is this guide free? Can I contribute?

Yes, MIT-licensed and free. PRs are welcome; see Contributing Guide. If you have production failure modes, new model benchmarks, or interview questions to add, open a PR.

How often is this guide updated?

Continuously. New model releases, protocol changes (MCP, A2A), and emerging patterns are added as they ship. Recent additions include Tool-Use and Computer Agents and the May 2026 Job Market Trends.

Can I use this guide if I am transitioning from backend, QA, PM, or EM into AI?

Yes. The Role Transition Guide maps existing skills to AI engineering, MLE, and AI architect tracks, with reading paths per role. Pair it with COURSES.md for curated learning resources.

🔄 Living Book

This guide tracks:

New model releases and real-world performance
Emerging patterns (MCP, Agentic RAG, Flow Engineering)
Updated pricing and rate limits
Deprecations and best practice changes

⭐ Star and Watch the repo to get notified when updates are pushed.

🤝 Contributing

Found outdated info? Have production experience to share? PRs welcome. See Contributing Guide.

👋 Stay Connected

If this guide helps you, the easiest way to support it is to follow along where new chapters and refreshes get announced first:

GitHub: @ombharatiya - follow for the repo, star the project, and watch for new releases.
X / Twitter: @ombharatiya - short takes on model releases, MCP, agents, and interviews.
LinkedIn: ombharatiya - deeper writeups and interview prep tips for senior AI roles.

📄 License

MIT License. See LICENSE.

Built and maintained by Om Bharatiya · GitHub · Twitter · LinkedIn

ai-system-design-guide