Home
Softono
oreilly-hands-on-gpt-llm

oreilly-hands-on-gpt-llm

Open source Jupyter Notebook
143
Stars
97
Forks
2
Issues
5
Watchers
3 months
Last Commit

About oreilly-hands-on-gpt-llm

Mastering the Art of Scalable and Efficient AI Model Deployment

Platforms

Web Self-hosted Docker Kubernetes

Languages

Jupyter Notebook

oreilly-logo

Deploying GPT & Large Language Models

This repository contains code for the O'Reilly Live Online Training for Deploying GPT & LLMs

This course is designed to equip software engineers, data scientists, and machine learning professionals with the skills and knowledge needed to deploy AI models effectively in production environments. As AI continues to revolutionize industries, the ability to deploy, manage, and optimize AI applications at scale is becoming increasingly crucial. This course covers the full spectrum of deployment considerations, from leveraging cutting-edge tools like Kubernetes, llama.cpp, and GGUF, to mastering cost management, compute optimization, and model quantization.


Base Notebooks

Introduction to LLMs and Prompting

Notebook Description
Introduction to 3rd Party Providers Using Together.ai, HuggingFace, and Groq to run LLMs
Prompt Injection Examples See how three kinds of prompt injection attacks can attempt to jailbreak an LLM

Cleaning Data and Monitoring Drift

Notebook Description
Cleaning Data using Deep Learning Using AUM and Cosine Similarity to clean data
Combating AI drift Using Online Learning to combat drift

Evaluating Agents

Notebook Description
Evaluating AI Agents: Task Automation and Tool Integration A basic case study on tool selection accuracy
Positional Bias on Agent Response Evaluation Identifying and evaluating positional bias on multiple LLMs

LangGraph and Agents

Notebook Description
From Prompts to Workflows Why single LLM prompts break on multi-step tasks and how LangGraph provides structure, state, and control flow
LangGraph Basics Foundational LangGraph primitives: StateGraph, nodes, edges, conditional routing, memory, and visualization
Tools and ReAct Agents Tool integration and the ReAct pattern with LangChain, manual StateGraph, and MCP

Advanced Deployment Techniques

Notebook Description
Speculative Decoding Using an assistant model to aid token decoding
Prompt Caching Llama 3 Replicating prompt caching with HuggingFace tools
Distilling BERT Distilling models to optimize for speed/memory
Quantizing Llama-3 dynamically Using bitsandbytes to quantize nearly any LLM on HuggingFace
Working with GGUF (no GPU) Using Llama.cpp to work with models
Working with GGUF (with a GPU) Using Llama.cpp to work with models
DeepSeek Model on GGUF Running a DeepSeek Distilled Llama model using Llama.cpp
Qwen on GGUF with Llama.cpp Running Qwen models using Llama.cpp
K8s GGUF Demo Using embedding models and Llama 3 with GGUF on a GPU
vLLM + Gateway on K8s Production GPU deployment with vLLM, FastAPI gateway, and DigitalOcean K8s

More

Fine-Tuning LLMs

Notebook Description
Finetuning app_reviews with OpenAI
Fine-tuning BERT for app_reviews
Model Freezing with BERT

Prompt Engineering

Notebook Description
Introduction to Prompt Engineering
Advanced Prompt Engineering

Instructor

Sinan Ozdemir Sinan is a former lecturer of Data Science at Johns Hopkins University and the author of multiple textbooks on data science and machine learning. Additionally, he is the founder of the recently acquired Kylie.ai, an enterprise-grade conversational AI platform with RPA capabilities. He holds a master's degree in Pure Mathematics from Johns Hopkins University and is based in San Francisco, CA.