RMRM Full Stack & AI Engineer · All projects · Roadmaps

AI & ML · project ideas

AI Engineering Project Ideas

Build real AI-powered systems end-to-end, covering prompt engineering, model fine-tuning, RAG pipelines, agents, and production deployment.

CLI Chatbot with OpenAI API

beginner

Build a terminal-based chatbot that maintains multi-turn conversation history using the OpenAI Chat Completions API.

Requirements

Accept user input in a loop and send messages to the API
Maintain a rolling conversation history array
Support a configurable system prompt via a config file
Display streaming token output in real time
Handle API errors and rate limits gracefully

REST API integrationPrompt engineeringPython async/streamingError handling

Semantic Document Search Engine

beginner

Ingest a folder of text files, embed them with a sentence-transformer model, and serve a local search UI that returns the most relevant passages.

Requirements

Chunk documents into overlapping passages
Generate embeddings using sentence-transformers
Store vectors in a local FAISS or ChromaDB index
Expose a simple FastAPI endpoint for query search
Return top-5 results with similarity scores and source filenames

Text embeddingsVector databasesFastAPIChunking strategiesSimilarity search

RAG Question-Answering App

intermediate

Create a Retrieval-Augmented Generation pipeline that lets users ask questions over a custom PDF knowledge base with cited answers.

Requirements

Parse and chunk PDFs using LangChain or LlamaIndex
Embed chunks and store in a persistent vector store
Retrieve top-k relevant chunks at query time
Pass retrieved context to an LLM to generate a grounded answer
Display source citations alongside each answer in a Streamlit UI

RAG architectureLangChain/LlamaIndexPDF parsingContext window managementStreamlit

Fine-Tuned Sentiment Classifier

intermediate

Fine-tune a pre-trained BERT-family model on a domain-specific sentiment dataset and deploy it as a REST API.

Requirements

Load and preprocess a labeled sentiment dataset using Hugging Face Datasets
Fine-tune a distilBERT model with Hugging Face Trainer
Track experiments and metrics with Weights & Biases
Export the model and tokenizer to a local directory
Serve predictions via a FastAPI endpoint with confidence scores

Transfer learningHugging Face TransformersExperiment trackingModel serializationAPI deployment

Autonomous Research Agent

intermediate

Build a LangChain agent that can search the web, read URLs, and write a structured research report on any given topic.

Requirements

Implement a ReAct-style agent with tool-calling support
Integrate web search (SerpAPI or Tavily) and URL-scraping tools
Allow the agent to plan multi-step research tasks
Stream intermediate reasoning steps to the console
Output a formatted Markdown report saved to disk

Agent frameworksTool use / function callingReAct promptingWeb scrapingLangChain

LLM Evaluation & Benchmarking Suite

intermediate

Build an automated evaluation harness that tests multiple LLM providers on a custom task dataset and produces a comparative scorecard.

Requirements

Define a JSONL dataset of prompts with reference answers
Run each prompt against at least two models (e.g. GPT-4o and Claude)
Score outputs using exact-match, ROUGE, and LLM-as-judge methods
Aggregate results into a performance report with pass rates and latency
Visualize results in a Jupyter notebook with comparative charts

LLM evaluationROUGE / metricsLLM-as-judgeBenchmarking designData analysis

Multimodal Image-to-Report Pipeline

advanced

Build a production-grade pipeline that accepts uploaded images, runs vision-language model analysis, and generates structured JSON reports stored in a database.

Requirements

Accept image uploads via a FastAPI endpoint
Send images to a vision LLM (GPT-4o or LLaVA) with a structured output prompt
Parse and validate the model response into a Pydantic schema
Persist reports to a PostgreSQL database with image metadata
Add async job queuing with Celery and Redis for scalability

Vision-language modelsStructured output / function callingPydantic validationAsync task queuesPostgreSQL

Self-Hosted RAG Platform with Observability

advanced

Deploy a fully self-hosted RAG system using open-source models, a vector database, and a full observability stack for production monitoring.

Requirements

Serve an open-source LLM (Mistral or LLaMA) via Ollama or vLLM
Use Qdrant as the vector store with collection management via API
Instrument the pipeline with OpenTelemetry traces and LangSmith logging
Implement hybrid search combining dense and sparse (BM25) retrieval
Containerize all services with Docker Compose and document runbook

Self-hosted LLM servingHybrid searchObservability / tracingDocker ComposevLLM / Ollama

Multi-Agent Coding Assistant with Human-in-the-Loop

advanced

Build a multi-agent system where a planner agent decomposes coding tasks, executor agents write and run code, and a human approval step gates risky actions.

Requirements

Implement a planner agent that breaks a user task into subtasks
Spawn specialized executor agents per subtask using LangGraph or AutoGen
Execute generated code in an isolated Docker sandbox
Gate file-write and shell-exec actions behind a human approval prompt
Aggregate subtask outputs into a final deliverable with an audit log

Multi-agent orchestrationLangGraph / AutoGenSandboxed code executionHuman-in-the-loop designAgent state management

Stuck on a build? Our AI tutor reviews your code and unblocks you — without writing it for you.

Open the app — free to start