AI & ML · project ideas
AI Engineering Project Ideas
Build real AI-powered systems end-to-end, covering prompt engineering, model fine-tuning, RAG pipelines, agents, and production deployment.
CLI Chatbot with OpenAI API
beginner
Build a terminal-based chatbot that maintains multi-turn conversation history using the OpenAI Chat Completions API.
Requirements
- Accept user input in a loop and send messages to the API
- Maintain a rolling conversation history array
- Support a configurable system prompt via a config file
- Display streaming token output in real time
- Handle API errors and rate limits gracefully
REST API integrationPrompt engineeringPython async/streamingError handling
Semantic Document Search Engine
beginner
Ingest a folder of text files, embed them with a sentence-transformer model, and serve a local search UI that returns the most relevant passages.
Requirements
- Chunk documents into overlapping passages
- Generate embeddings using sentence-transformers
- Store vectors in a local FAISS or ChromaDB index
- Expose a simple FastAPI endpoint for query search
- Return top-5 results with similarity scores and source filenames
Text embeddingsVector databasesFastAPIChunking strategiesSimilarity search
RAG Question-Answering App
intermediate
Create a Retrieval-Augmented Generation pipeline that lets users ask questions over a custom PDF knowledge base with cited answers.
Requirements
- Parse and chunk PDFs using LangChain or LlamaIndex
- Embed chunks and store in a persistent vector store
- Retrieve top-k relevant chunks at query time
- Pass retrieved context to an LLM to generate a grounded answer
- Display source citations alongside each answer in a Streamlit UI
RAG architectureLangChain/LlamaIndexPDF parsingContext window managementStreamlit
Fine-Tuned Sentiment Classifier
intermediate
Fine-tune a pre-trained BERT-family model on a domain-specific sentiment dataset and deploy it as a REST API.
Requirements
- Load and preprocess a labeled sentiment dataset using Hugging Face Datasets
- Fine-tune a distilBERT model with Hugging Face Trainer
- Track experiments and metrics with Weights & Biases
- Export the model and tokenizer to a local directory
- Serve predictions via a FastAPI endpoint with confidence scores
Transfer learningHugging Face TransformersExperiment trackingModel serializationAPI deployment
Autonomous Research Agent
intermediate
Build a LangChain agent that can search the web, read URLs, and write a structured research report on any given topic.
Requirements
- Implement a ReAct-style agent with tool-calling support
- Integrate web search (SerpAPI or Tavily) and URL-scraping tools
- Allow the agent to plan multi-step research tasks
- Stream intermediate reasoning steps to the console
- Output a formatted Markdown report saved to disk
Agent frameworksTool use / function callingReAct promptingWeb scrapingLangChain
LLM Evaluation & Benchmarking Suite
intermediate
Build an automated evaluation harness that tests multiple LLM providers on a custom task dataset and produces a comparative scorecard.
Requirements
- Define a JSONL dataset of prompts with reference answers
- Run each prompt against at least two models (e.g. GPT-4o and Claude)
- Score outputs using exact-match, ROUGE, and LLM-as-judge methods
- Aggregate results into a performance report with pass rates and latency
- Visualize results in a Jupyter notebook with comparative charts
LLM evaluationROUGE / metricsLLM-as-judgeBenchmarking designData analysis
Multimodal Image-to-Report Pipeline
advanced
Build a production-grade pipeline that accepts uploaded images, runs vision-language model analysis, and generates structured JSON reports stored in a database.
Requirements
- Accept image uploads via a FastAPI endpoint
- Send images to a vision LLM (GPT-4o or LLaVA) with a structured output prompt
- Parse and validate the model response into a Pydantic schema
- Persist reports to a PostgreSQL database with image metadata
- Add async job queuing with Celery and Redis for scalability
Vision-language modelsStructured output / function callingPydantic validationAsync task queuesPostgreSQL
Self-Hosted RAG Platform with Observability
advanced
Deploy a fully self-hosted RAG system using open-source models, a vector database, and a full observability stack for production monitoring.
Requirements
- Serve an open-source LLM (Mistral or LLaMA) via Ollama or vLLM
- Use Qdrant as the vector store with collection management via API
- Instrument the pipeline with OpenTelemetry traces and LangSmith logging
- Implement hybrid search combining dense and sparse (BM25) retrieval
- Containerize all services with Docker Compose and document runbook
Self-hosted LLM servingHybrid searchObservability / tracingDocker ComposevLLM / Ollama
Multi-Agent Coding Assistant with Human-in-the-Loop
advanced
Build a multi-agent system where a planner agent decomposes coding tasks, executor agents write and run code, and a human approval step gates risky actions.
Requirements
- Implement a planner agent that breaks a user task into subtasks
- Spawn specialized executor agents per subtask using LangGraph or AutoGen
- Execute generated code in an isolated Docker sandbox
- Gate file-write and shell-exec actions behind a human approval prompt
- Aggregate subtask outputs into a final deliverable with an audit log
Multi-agent orchestrationLangGraph / AutoGenSandboxed code executionHuman-in-the-loop designAgent state management