
Designing and Deploying LLM Pipelines
This repository contains code for the O'Reilly Live Online Training for Designing and Deploying LLM Pipelines
In this comprehensive course, machine learning engineers and software developers learn how to transition large language model (LLM) prototypes into fully deployed production systems. Through detailed instruction and real-world case studies, you explore the best practices for integrating LLMs into diverse workflows, ensuring that your models perform effectively in practical applications.
Repository Structure
data/— Example CSV datasets used in some notebooksdeploy/— Minimal FastAPI service for model inference, withDockerfileand runtimerequirements.txtimages/— Course/README imagesnotebooks/— Jupyter notebooks used throughout the coursecrewai_streamlit/— Streamlit app demonstrating CrewAI withapp.py, its ownrequirements.txt, and an examplesecrets.toml
requirements.txt— Primary Python dependencies for running notebooks locally
Setup Instructions
Using Python 3.11 Virtual Environment
At the time of writing, we need a Python virtual environment with Python 3.11.
Option 1: Python 3.11 is Already Installed
Step 1: Verify Python 3.11 Installation
python3.11 --version
Step 2: Create a Virtual Environment
python3.11 -m venv .venv
This creates a .venv folder in your current directory.
Step 3: Activate the Virtual Environment
-
macOS/Linux:
source .venv/bin/activate -
Windows:
.venv\Scripts\activate
You should see (.venv) in your terminal prompt.
Step 4: Verify the Python Version
python --version
Step 5: Install Packages
pip install -r requirements.txt
Step 6: Deactivate the Virtual Environment
deactivate
Option 2: Install Python 3.11
If you don’t have Python 3.11, follow the steps below for your OS.
macOS (Using Homebrew)
brew install [email protected]
Ubuntu/Debian
sudo apt update
sudo apt install python3.11 python3.11-venv
Windows (Using Windows Installer)
- Go to Python Downloads.
- Download the installer for Python 3.11.
- Run the installer and ensure "Add Python 3.11 to PATH" is checked.
Verify Installation
python3.11 --version
You might need to run this command to make the venv findable in jupyter
python -m ipykernel install --user --name=oreilly-ai-pipelines --display-name "Python (oreilly-ai-pipelines)"
Installing Course Dependencies
Install the main dependencies for notebooks:
pip install -r requirements.txt
Note: The deploy/ service has its own minimal requirements.txt optimized for the API runtime. See the Deployment section below if you plan to run the FastAPI service.
Notebooks
Agents and Workflows
- Introduction to LangGraph - Building workflows and agents with LangGraph
- Reflection to LangGraph - Reflection with LangGraph
- Planning/Executing with LangGraph - Planning/Executing with LangGraph
- ReAct with LangGraph - ReAct Agents with LangGraph
- Using Local LLMs - ReAct Agents with LangGraph with local llms
- Evaluating LangGraph - Evaluating LangGraph
- Introduction to CrewAI - CrewAI 101
- Introduction to OpenAI Agents - OpenAI Agents 101
- Introduction to SmolAgents - HuggingFace's SmolAgents 101
Deployment + Production
- Model Training/Serving with BERT - Fine-tuning and running batch data loads with BERT
- Deploying models with FastAPI - using FastAPI to deploy a model
- Third party model inference - Using different types of LLM providers
- Using Evaluation to combat AI drift - See how drift affects ML models
Distillation + Quantization
-
Quantizing Llama-3 dynamically - Using bitsandbytes to quantize a model in real-time on load. We will investigate the differences before and after quantization
- Working with llama.cpp and GGUF (no GPU) - See how to load a pre-quantized version of Llama to compare speed and memory usage
- Working with llama.cpp and GGUF (with a GPU)
Running the FastAPI Inference Service (deploy/)
The deploy/ folder contains a minimal FastAPI app that loads a DistilBERT sequence classification model and exposes a /predict endpoint.
Local (Uvicorn)
cd deploy
pip install -r requirements.txt
uvicorn api:app --reload
Then open http://localhost:8000/docs to try the API.
Notes:
- The example model currently loads from the Hugging Face Hub. If you need access to a private model, make sure you are authenticated (e.g.,
huggingface-cli login) or configure aHUGGING_FACE_HUB_TOKENin your environment. - For fully offline usage, you can modify
deploy/api.pyto load a local model directory instead of pulling from the Hub.
Docker
cd deploy
docker build . --tag fastapi-demo:1
# On Apple Silicon, you may need a specific platform:
# docker build . --tag fastapi-demo:1 --platform linux/amd64
docker run -p 80:8000 fastapi-demo:1
# If you built with a custom platform, include it on run:
# docker run -p 80:8000 --platform linux/amd64 fastapi-demo:1
Open http://localhost/docs to access the API docs.
For more details (including Heroku container registry notes), see deploy/README.md.
Streamlit CrewAI Demo (notebooks/crewai_streamlit/)
This small Streamlit app demonstrates building agents and tasks with CrewAI.
Setup
cd notebooks/crewai_streamlit
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Add your OpenAI key:
- Preferred: create
.streamlit/secrets.tomlin this directory with:[general] OPENAI_API_KEY = "your-openai-api-key" - Alternatively, you can input the key in the UI at runtime.
Run the app:
streamlit run app.py
An example secrets.toml is included for reference in notebooks/crewai_streamlit/.
Data
Sample datasets used by notebooks are provided in data/. Not all notebooks require these files; consult each notebook’s instructions for expected inputs.
References
- O’Reilly Live Training: Designing and Deploying LLM Pipelines — details, schedule, and outcomes:
Instructor
Sinan Ozdemir Sinan is a former lecturer of Data Science at Johns Hopkins University and the author of multiple textbooks on data science and machine learning. Additionally, he is the founder of the recently acquired Kylie.ai, an enterprise-grade conversational AI platform with RPA capabilities. He holds a master’s degree in Pure Mathematics from Johns Hopkins University and is based in San Francisco, CA.