Hands-On-AI-Engineering
<p align="center"> <a href="https://aiengineering.beehiiv.com/"> <img src="assets/theaiengineering_logo.jpeg" alt="Hands-On AI Engineering Banner" width="150"> </a> </p> <div align="center"> # π Hands-On AI Engineering [](https://opensource.org/licenses/MIT) [](CONTRIBUTING.md) </div> A curated collection of practical, production-ready AI projects across multiple modalities, including language models, multimodal models, OCR systems, RAG pipelines, and AI agents. Each project is designed to help you learn, experiment, and build real-world AI applications. ## π Table of Contents - [π― Why This Repository?](#-why-this-repository) - [ποΈ Project Categories](#οΈ-project-categories) - [π Getting Started](#-getting-started) - [π€ Contributing](#-contributing) - [π License](#-license) --- ## π― Why This Repository? - **Learn by Doing**: Each project includes complete code, setup instructions, and documentation - **Production-Ready**: Projects follow best practices and are ready to be adapted for real-world use - **Diverse Use Cases**: From RAG systems to multi-agent workflows and specialized applications - **Multiple Model Providers**: Projects use OpenAI, Anthropic, Google, and open-source models - **Active Community**: Regular updates and new project additions --- ## ποΈ Project Categories ### π€ AI Agents Intelligent ai agents for various automation tasks. - [**Multi-Agent Financial Analyst**](./ai_agents/multi_agent_financial_analyst) β Team of specialized agents for comprehensive financial analysis. - [**FinAgent**](./ai_agents/finagent) β Financial assistant agent for stock market analysis and insights. - [**Daily AI News Digest**](./ai_agents/daily-news-digest) β Automated daily digest from 92 Karpathy-curated tech blogs delivered to Telegram every morning. MiniMax M2.7 scores articles from the last 24 hours and surfaces the 3 most significant stories. - [**Agentic Form Filler**](./ai_agents/agentic-form-filler) β Agentic form-filling agent using Landing AI for layout parsing and MiniMax M2.7 for multi-turn data gathering. - [**AI Travel Planning Agent**](./ai_agents/ai_travel_planning_agent) β Multi-agent travel planner that turns a single natural language request into a complete trip plan with flights, hotels, and a day-by-day itinerary. - [**Competitive Intelligence Agent**](./ai_agents/competitive_intelligence_agent) β Generates strategic sales battlecards by analyzing competitors through the lens of your own business context. - [**Multi-Agent Research Assistant (AG2)**](./ai_agents/multi_agent_research_assistant_ag2) β Multi-agent research pipeline using AG2 where three specialists collaborate to research any topic and produce a structured report. - [**Self-Reflective Agentic RAG**](./ai_agents/agentic_rag_system) β LangGraph RAG system that grades retrieved context, rewrites the query if needed, and generates an answer only once the context passes validation. - [**Agentic SQL Search**](./ai_agents/agentic_sql_search) β Natural language to SQL agent powered by Gemma 4 that writes, executes, and explains queries against an e-commerce database. - [**Stock Portfolio Analyst**](./ai_agents/stock_portfolio_analyst) β Portfolio analysis agent built with Agno and DeepSeek-V4-Flash. Fetches live market data via YFinance and generates a report covering P&L, concentration risk, and rebalancing recommendations. - [**Eagle Eye**](./ai_agents/eagle_eye) β GitHub PR review agent using OpenClaw and Telegram. Fetches diffs via GitHub MCP, performs structured code review with severity ratings, and posts feedback after user approval. - [**CartMate β AI Customer Support Agent**](./ai_agents/ai_customer_support_agent) β Memory-powered e-commerce support agent built with Mem0 and Mistral Small 4 that remembers customers and picks up conversations where they left off. - [**Multi-Agent Coding Assistant**](./ai_agents/multi_agent_coding_assistant) β Four-stage coding pipeline powered by Mistral Small 4 and LangChain. A Planner, Coder, and Reviewer agent collaborate to produce a polished final implementation. - [**Startup Analyst**](./ai_agents/startup_analyst) β Startup due-diligence agent powered by MiniMax M2.5. Scrapes a company's site with Firecrawl and produces an investment-grade report covering market position, financials, team, and risks. - [**Research Team**](./ai_agents/research_team) β Multi-agent research system powered by MiniMax M2.5. Seek searches the web, Scout navigates internal documents, and a team leader synthesises findings into a structured report. - [**GitHub Intelligence Agent**](./ai_agents/github_intelligence_agent) β GitHub research agent powered by Gemini 3 Flash and GitHub's official MCP server. Ask anything about repos, contributors, issues, or codebases. - [**Smolagents Code Agent**](./ai_agents/smolagents_code_agent) β Agentic task runner powered by Mistral Small 4 and HuggingFace smolagents. Writes and executes Python code at each step using DuckDuckGo and Wikipedia. - [**Agent Discovery Agent**](./ai_agents/agent_discovery_agent) β Searches and compares AI agents across NANDA, MCP, Virtuals Protocol, A2A, and ERC-8004 through a single natural language interface. Powered by Gemini 3 Flash. - [**Cal Scheduling Agent**](./ai_agents/cal_scheduling_agent) β Conversational scheduling assistant that manages Cal.com appointments through natural language. Book, reschedule, cancel, and check availability with automatic timezone handling. - [**Hacker News Newsletter Agent**](./ai_agents/hacker_news_newsletter_agent) β Fetches the 10 latest Hacker News stories, scrapes full article content with Trafilatura, generates a structured HTML newsletter with Gemma 4, and delivers it to your inbox via Gmail SMTP. - [**Hotel Finder Agent**](./ai_agents/hotel_finder_agent) β Conversational hotel search agent powered by qwen3.6-flash via Orq.ai and the Trivago MCP Server. Search by location, dates, guest count, price range, star rating, and amenities. - [**Marketing Strategy Agent**](./ai_agents/marketing_strategy_agent) β Multi-agent marketing campaign generator. A Market Analyst (with Serper web search), Strategy Officer, and Creative Director run sequentially to produce market research, a full strategy, and creative campaign content. Powered by deepseek-v4-flash via Orq.ai. - [**Brand Monitor**](./ai_agents/brand_monitor_agent) β Monitors brand mentions across Web, YouTube, Twitter/X, and LinkedIn in a single run. Scrapingdog collects platform data and DeepSeek V4 Flash produces a structured intelligence brief per channel. - [**AI Debate Agent**](./ai_agents/ai_debate_agent) - Two LLM debaters argue opposing sides of any topic you choose. A judge scores each turn and declares a winner. - [**Browser Automation Agent**](./ai_agents/browser_automation_agent) - Takes a natural language instruction and autonomously navigates the web to complete it using browser-use. - [**Documentation QnA Agent**](./ai_agents/documentation_qna_agent) - Chat with any documentation by URL. Uses Fetch MCP and DeepSeek V4 Flash on NVIDIA NIM. - [**Job Posting Agent**](./ai_agents/job_posting_agent) - Generates tailored job postings from a company name and role using DeepSeek V4 Flash on NVIDIA NIM. - [**LangChain Data Agent**](./ai_agents/langchain_data_agent) - Query the Chinook SQLite database in plain English through a conversational Streamlit chat interface. - [**Travel Planner Agent**](./ai_agents/travel_planner_agent) - AI trip planning assistant covering weather, budget, packing lists, and day-by-day itineraries from a single request. - [**Personal Finance Agent**](./ai_agents/personal_finance_agent) - Upload a bank statement CSV, auto-categorize transactions, and ask natural language questions about your spending. Powered by a LangChain tool-calling agent backed by Orq.ai with SQLite persistence. ### πΈ OCR Extracting structure and meaning from visual data and documents. - [**Image-to-Structured-Data Extractor**](./OCR/image_to_structured_data) β Converts images into validated, structured JSON using Mistral Large 3 and Instructor. - [**LaTeX Formula OCR**](./OCR/latex_formula_ocr) β Extracts math formulas from images and PDFs into LaTeX using a local vision-language model. - [**Medical Prescription Digitizer**](./OCR/medical_prescription_digitizer) β Digitizes handwritten or printed prescriptions into structured output using Mistral Large 3, with real-time drug name validation against RxNorm. ### π§ Audio Projects for audio understanding and analysis. - [**Music Explorer**](./audio/music_explorer) β Chat with any audio file or YouTube video using Gemini 3 Flash. Ask for transcriptions, emotion analysis, instrument identification, and timestamp-aware breakdowns. - [**Multilingual Audio Translator**](./audio/multilingual_audio_translator) β Upload or record audio in any language, get it transcribed with faster-whisper, translated via Gemini, and played back as synthesized speech using Kokoro TTS. ### π¬ Multimodal Projects combining vision, video, and language models. - [**GLM-OCR Pro**](./multimodal/glm_ocr_pro) β Structured document extraction using GLM-OCR via Ollama, transforming images and PDFs into formatted Markdown locally. - [**Video Understanding Agent**](./multimodal/video_understanding_agent) β Summarizes YouTube videos into chapters, key takeaways, and action items using Gemini Flash. - [**Multimodal Weather App**](./multimodal/multimodal_weather_app) β Upload a map image and get live weather. Mistral Small 4 identifies the city via vision, then fetches real-time conditions through native tool calling. - [**Multimodal RAG**](./multimodal/multimodal_rag) β RAG system that ingests text, URLs, PDFs, images, audio, and video into a shared ChromaDB index. Gemini Embedding 2 handles retrieval and Gemini 3 Flash generates grounded answers, passing actual file URIs for media sources. - [**Image Question Answering**](./multimodal/image_question_answering) β Upload a PDF, select a page, and ask visual questions answered by Gemma 4 with thinking mode. PyMuPDF renders each page to a full-resolution image for grounded reasoning over charts, tables, and figures. - [**Medical Document Parser**](./multimodal/medical_document_parser) - Extracts a structured clinical profile from medical PDFs and images using Gemma 4 vision. ### π RAG Applications Retrieval-Augmented Generation systems for knowledge-enhanced AI applications. - [**Agentic RAG with O3-Mini & DuckDuckGo**](./rag_apps/agentic_rag_with_o3_mini_and_duckduckgo) β RAG system using O3-Mini with DuckDuckGo for real-time web search. - [**Agentic RAG with Qwen & FireCrawl**](./rag_apps/agentic_rag_with_qwen_and_firecrawl) β RAG system using Qwen and FireCrawl for web scraping and retrieval. - [**Vision RAG**](./rag_apps/vision_rag) β Multimodal RAG system for processing and querying visual content. - [**Clinical RAG with ADE**](./rag_apps/clinical_rag_with_ade) β High-precision clinical RAG using LandingAI ADE for visual-first document parsing and Mistral Large for grounded reasoning. - [**YouTube Transcript RAG**](./rag_apps/youtube_transcript_rag) β Chat with any YouTube video using Whisper transcription, ChromaDB retrieval, and Mistral Small 4, with timestamp-linked answers. - [**GraphRAG Knowledge System**](./rag_apps/graphrag_knowledge_system) β Builds a local knowledge graph from uploaded documents using Mistral Small 4 and NetworkX, supporting both entity-level and thematic queries. - [**Hybrid RAG System**](./rag_apps/hybrid_rag_system) β Indexes documents into a knowledge graph and a vector store in parallel. Mistral Small 4 answers questions with fused context from both retrieval paths. - [**HyDE RAG**](./rag_apps/hyde_rag) β RAG pipeline using Hypothetical Document Embeddings. Gemini 3 Flash generates hypothetical answers, Gemini Embedding 2 embeds and averages them, and the result retrieves more precise chunks from ChromaDB. - [**Rock Music RAG**](./rag_apps/rock_music_rag) β Custom rock music knowledge base built from Wikipedia. Add any band, ask questions across all of them, and get sourced answers powered by BM25 retrieval and Gemma 4. - [**RAG Agent with Database Routing**](./rag_apps/rag_agent_with_database_routing) β Routes queries across three specialized Qdrant databases (products, support, financial) using an Agno router agent. Falls back to a LangGraph ReAct web search agent when no relevant documents are found. - [**Reasoning RAG**](./rag_apps/reasoning_rag) - Ask questions against any web source and get cited answers with a live, step-by-step reasoning trace via Gradio. --- ## π€ Contributing We welcome contributions! Whether you're adding new projects, improving existing ones, or fixing bugs, your help makes this repository better for everyone. ### How to Contribute 1. **Read the guidelines**: Check [CONTRIBUTING.md](CONTRIBUTING.md) for detailed instructions 2. **Create an issue**: Propose your project or improvement 3. **Follow the structure**: Use the appropriate category folder 4. **Submit a PR**: One project per pull request ### Project Structure Requirements - Each project must be in its own folder within the appropriate category - Must include a comprehensive `README.md` (use our [template](.github/README_TEMPLATE.md)) - Must include `requirements.txt` or `pyproject.toml` - Must include `.env.example` for required API keys - Follow snake_case naming convention --- ## π License This repository is licensed under the **MIT License**. See the [LICENSE](./LICENSE) file for details. --- ## π Acknowledgments Thank you to all contributors who have helped build this collection of AI engineering projects! --- <div align="center"> **Built with β€οΈ by the [AI Engineering Community](https://aiengineering.beehiiv.com/)** For sponsorship or collaboration inquiries, reach the maintainer at [[email protected]](mailto:[email protected]). [β¬ Back to Top](#-hands-on-ai-engineering) </div>