About oreilly-ai-pipelines

Designing and Deploying LLM Pipelines

s

Published by

sinanuozdemir

Visit View Profile

README.md

View on GitHub

oreilly-logo

Designing and Deploying LLM Pipelines

This repository contains code for the O'Reilly Live Online Training for Designing and Deploying LLM Pipelines

In this comprehensive course, machine learning engineers and software developers learn how to transition large language model (LLM) prototypes into fully deployed production systems. Through detailed instruction and real-world case studies, you explore the best practices for integrating LLMs into diverse workflows, ensuring that your models perform effectively in practical applications.

Repository Structure

data/ — Example CSV datasets used in some notebooks
deploy/ — Minimal FastAPI service for model inference, with Dockerfile and runtime requirements.txt
images/ — Course/README images
notebooks/ — Jupyter notebooks used throughout the course
- crewai_streamlit/ — Streamlit app demonstrating CrewAI with app.py, its own requirements.txt, and an example secrets.toml
requirements.txt — Primary Python dependencies for running notebooks locally

Setup Instructions

Using Python 3.11 Virtual Environment

At the time of writing, we need a Python virtual environment with Python 3.11.

Option 1: Python 3.11 is Already Installed

Step 1: Verify Python 3.11 Installation

python3.11 --version

Step 2: Create a Virtual Environment

python3.11 -m venv .venv

This creates a .venv folder in your current directory.

Step 3: Activate the Virtual Environment

macOS/Linux:
```
source .venv/bin/activate
```
Windows:
```
.venv\Scripts\activate
```

You should see (.venv) in your terminal prompt.

Step 4: Verify the Python Version

python --version

Step 5: Install Packages

pip install -r requirements.txt

Step 6: Deactivate the Virtual Environment

deactivate

Option 2: Install Python 3.11

If you don’t have Python 3.11, follow the steps below for your OS.

macOS (Using Homebrew)

brew install [email protected]

Ubuntu/Debian

sudo apt update
sudo apt install python3.11 python3.11-venv

Windows (Using Windows Installer)

Go to Python Downloads.
Download the installer for Python 3.11.
Run the installer and ensure "Add Python 3.11 to PATH" is checked.

Verify Installation

python3.11 --version

You might need to run this command to make the venv findable in jupyter

python -m ipykernel install --user --name=oreilly-ai-pipelines --display-name "Python (oreilly-ai-pipelines)"

Installing Course Dependencies

Install the main dependencies for notebooks:

pip install -r requirements.txt

Note: The deploy/ service has its own minimal requirements.txt optimized for the API runtime. See the Deployment section below if you plan to run the FastAPI service.

Notebooks

Agents and Workflows

Introduction to LangGraph - Building workflows and agents with LangGraph
- Reflection to LangGraph - Reflection with LangGraph
- Planning/Executing with LangGraph - Planning/Executing with LangGraph
- ReAct with LangGraph - ReAct Agents with LangGraph
  - Using Local LLMs - ReAct Agents with LangGraph with local llms
- Evaluating LangGraph - Evaluating LangGraph
Introduction to CrewAI - CrewAI 101
Introduction to OpenAI Agents - OpenAI Agents 101
Introduction to SmolAgents - HuggingFace's SmolAgents 101

Deployment + Production

Model Training/Serving with BERT - Fine-tuning and running batch data loads with BERT
Deploying models with FastAPI - using FastAPI to deploy a model
Third party model inference - Using different types of LLM providers
Using Evaluation to combat AI drift - See how drift affects ML models

Distillation + Quantization

Quantizing Llama-3 dynamically - Using bitsandbytes to quantize a model in real-time on load. We will investigate the differences before and after quantization
- Working with llama.cpp and GGUF (no GPU) - See how to load a pre-quantized version of Llama to compare speed and memory usage
- Working with llama.cpp and GGUF (with a GPU)

Running the FastAPI Inference Service (`deploy/`)

The deploy/ folder contains a minimal FastAPI app that loads a DistilBERT sequence classification model and exposes a /predict endpoint.

Local (Uvicorn)

cd deploy
pip install -r requirements.txt
uvicorn api:app --reload

Then open http://localhost:8000/docs to try the API.

Notes:

The example model currently loads from the Hugging Face Hub. If you need access to a private model, make sure you are authenticated (e.g., huggingface-cli login) or configure a HUGGING_FACE_HUB_TOKEN in your environment.
For fully offline usage, you can modify deploy/api.py to load a local model directory instead of pulling from the Hub.

Docker

cd deploy
docker build . --tag fastapi-demo:1
# On Apple Silicon, you may need a specific platform:
# docker build . --tag fastapi-demo:1 --platform linux/amd64

docker run -p 80:8000 fastapi-demo:1
# If you built with a custom platform, include it on run:
# docker run -p 80:8000 --platform linux/amd64 fastapi-demo:1

Open http://localhost/docs to access the API docs.

For more details (including Heroku container registry notes), see deploy/README.md.

Streamlit CrewAI Demo (`notebooks/crewai_streamlit/`)

This small Streamlit app demonstrates building agents and tasks with CrewAI.

Setup

cd notebooks/crewai_streamlit
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Add your OpenAI key:

Preferred: create .streamlit/secrets.toml in this directory with:
```
  [general]
  OPENAI_API_KEY = "your-openai-api-key"
```
Alternatively, you can input the key in the UI at runtime.

Run the app:

streamlit run app.py

An example secrets.toml is included for reference in notebooks/crewai_streamlit/.

Data

Sample datasets used by notebooks are provided in data/. Not all notebooks require these files; consult each notebook’s instructions for expected inputs.

References

O’Reilly Live Training: Designing and Deploying LLM Pipelines — details, schedule, and outcomes:
- https://learning.oreilly.com/live-events/designing-and-deploying-llm-pipelines/0642572014796/

Instructor

Sinan Ozdemir Sinan is a former lecturer of Data Science at Johns Hopkins University and the author of multiple textbooks on data science and machine learning. Additionally, he is the founder of the recently acquired Kylie.ai, an enterprise-grade conversational AI platform with RPA capabilities. He holds a master’s degree in Pure Mathematics from Johns Hopkins University and is based in San Francisco, CA.

oreilly-ai-pipelines

About oreilly-ai-pipelines

Platforms

Languages

Links

README.md

Designing and Deploying LLM Pipelines

Repository Structure

Setup Instructions

Using Python 3.11 Virtual Environment

Option 1: Python 3.11 is Already Installed

Step 1: Verify Python 3.11 Installation

Step 2: Create a Virtual Environment

Step 3: Activate the Virtual Environment

Step 4: Verify the Python Version

Step 5: Install Packages

Step 6: Deactivate the Virtual Environment

Option 2: Install Python 3.11

macOS (Using Homebrew)

Ubuntu/Debian

Windows (Using Windows Installer)

Verify Installation

Installing Course Dependencies

Notebooks

Agents and Workflows

Deployment + Production

Distillation + Quantization

Running the FastAPI Inference Service (`deploy/`)

Local (Uvicorn)

Docker

Streamlit CrewAI Demo (`notebooks/crewai_streamlit/`)

Setup

Data

References

Instructor

oreilly-ai-pipelines

About oreilly-ai-pipelines

Platforms

Languages

Links

README.md

Designing and Deploying LLM Pipelines

Repository Structure

Setup Instructions

Using Python 3.11 Virtual Environment

Option 1: Python 3.11 is Already Installed

Step 1: Verify Python 3.11 Installation

Step 2: Create a Virtual Environment

Step 3: Activate the Virtual Environment

Step 4: Verify the Python Version

Step 5: Install Packages

Step 6: Deactivate the Virtual Environment

Option 2: Install Python 3.11

macOS (Using Homebrew)

Ubuntu/Debian

Windows (Using Windows Installer)

Verify Installation

Installing Course Dependencies

Notebooks

Agents and Workflows

Deployment + Production

Distillation + Quantization

Running the FastAPI Inference Service (deploy/)

Local (Uvicorn)

Docker

Streamlit CrewAI Demo (notebooks/crewai_streamlit/)

Setup

Data

References

Instructor

Running the FastAPI Inference Service (`deploy/`)

Streamlit CrewAI Demo (`notebooks/crewai_streamlit/`)