Model Context Protocol (MCP) — Complete Guide for Backend Engineers

Build Tools, Resources, and AI-Driven Services Using LangChain

Modern LLM-based applications are no longer just about generating text — they need to interact with real systems:

✅ Databases
✅ File systems
✅ Internal microservices
✅ Web APIs
✅ Analytics engines
✅ Cloud services

To support this, OpenAI introduced MCP — Model Context Protocol, a powerful standard that lets LLMs communicate with tools using a safe, structured API.

This guide gives you:

✅ Clear concepts
✅ Interview-focused explanations
✅ Step-by-step MCP server creation
✅ Examples using LangChain
✅ Text-based architecture diagrams

Perfect for your blog.

What Is MCP?

MCP (Model Context Protocol) is a unified protocol that allows AI models to access tools, resources, and files in a structured manner.

Think of it as an API gateway for LLMs.

Instead of relying only on prompts, LLMs can call tools like:


get_weather
search_files
query_database
run_sql
get_customer_orders

MCP provides:

✅ A standard interface
✅ Strong typing
✅ Clear request/response format
✅ Security boundaries
✅ Cross-language interoperability

📐 High-Level Architecture (Text-Based Diagram)


                     ┌────────────────────────┐
                     │        LLM / Agent     │
                     │  (GPT-4, LangChain,    │
                     │   Anthropic, Groq)     │
                     └────────────▲───────────┘
                                  │
                     Structured Tool Calls (JSON-RPC)
                                  │
                     ┌────────────┴───────────┐
                     │      MCP Server        │
                     │  Tools / Resources     │
                     │  Transport: stdio/ws   │
                     └────────────▲───────────┘
                                  │
             ┌────────────────────┼────────────────────┐
             │                    │                    │
      ┌──────┴──────┐     ┌──────┴──────┐     ┌──────┴──────┐
      │  APIs        │     │ Databases   │     │ Filesystem   │
      │ REST/GraphQL │     │ SQL/NoSQL   │     │ Logs/Docs    │
      └──────────────┘     └──────────────┘     └──────────────┘

🔌 MCP Transport Protocols

MCP defines how an AI agent connects to your server:

✅ 1. stdio (local execution)

Uses stdin/stdout for message passing
Zero network overhead
Ideal for CLI tools, dev workflows

✅ 2. websocket (remote execution)

Perfect for cloud microservices
Works with Kubernetes, ECS, GKE, etc.
Supports multiple LLM clients

✅ 3. HTTP (proxy adapters)

HTTP isn't native in MCP but supported via
Nginx/Envoy/Gateway adapters.

🛠️ Building a Simple MCP Server (LangChain)

Below is a minimal MCP server using LangChain + FastAPI.

✅ Install dependencies


pip install langchain langchain-core fastapi uvicorn mcp-server-fastapi

✅ Step 1: Create Tools


from langchain.tools import tool
import requests, os

@tool
def get_weather(city: str) -> str:
    """Return temperature and weather for a given city."""
    return requests.get(f"https://wttr.in/{city}?format=3").text

@tool
def list_files(folder: str) -> list:
    """List files in a directory."""
    return os.listdir(folder)

✅ Step 2: Create the MCP Server


from mcp_server_fastapi import MCPServer
from fastapi import FastAPI

app = FastAPI()
server = MCPServer(app, title="Utility MCP Server")

server.add_tool(get_weather)
server.add_tool(list_files)

✅ Step 3: Run the MCP Server


uvicorn main:app --host 0.0.0.0 --port 8000

MCP endpoint available at:


ws://localhost:8000/mcp

🧰 Exposing Resources

You can expose static or dynamic resources:


from mcp_server_fastapi import resource

@resource("config/app")
def config_resource():
    return {"version": "1.0.0", "env": "production"}

📂 Exposing File-System Resources (Read-Only)


server.mount_folder("/logs", "/var/log/myapp/")

🤖 How Agents Call MCP Tools (LangChain)


from mcp_client import MCPClient
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_openai import ChatOpenAI

client = MCPClient("ws://localhost:8000/mcp")
tools = client.get_tools()

llm = ChatOpenAI(model="gpt-4.1")

agent = create_openai_tools_agent(llm, tools)
executor = AgentExecutor(agent=agent, tools=tools)

result = executor.invoke({"input": "What is the weather in Bangalore?"})
print(result["output"])

🎯 Tool Invocation Flow


User Query → Agent → Selects Tool → MCP Tool Executes → Returns Structured JSON → Agent Summarizes Result

Detailed:


    ┌───────────────────────────┐
    │  User Input: "Weather?"   │
    └───────────────┬───────────┘
                    │
            Reasoning by Agent
                    │
        ┌───────────▼───────────┐
        │  Tool Call Chosen      │
        │  get_weather("BLR")    │
        └───────────┬───────────┘
                    │ JSON-RPC
                    ▼
          ┌──────────────────────┐
          │    MCP Server        │
          │ Executes API calls   │
          └───────────┬─────────┘
                      │
                JSON Result
                      │
                      ▼
         ┌──────────────────────────┐
         │ Agent Summarizes Output │
         └──────────────────────────┘

💼 Where Backend Engineers Use MCP

✅ Integrating LLMs with microservices
✅ Allowing safe access to production data
✅ Creating API-driven agents
✅ Building internal developer tooling
✅ Simplifying multi-agent systems
✅ Enabling plug-and-play AI behavior

🎤 Interview-Ready Explanation

Q: What problem does MCP solve?
✅ Standardizes how AI models interact with external tools
✅ Makes tool usage safe, typed, predictable
✅ Enables multi-tool, multi-resource workflows

Q: How does an agent know which tool to call?
The LLM sees tool schemas + natural language description →
Uses reasoning + training → selects correct tool.

Q: What’s the difference between stdio and websocket?

stdio: Local execution
websocket: Cloud execution

Q: What can MCP expose?
✅ tools
✅ resources
✅ file systems

📦 Full Project Structure


mcp-weather-server/
│
├── main.py                 # Main MCP server entry
├── tools/
│   ├── weather.py          # Weather tool
│   ├── filesystem.py       # List file tool
│
├── resources/
│   └── config.py           # Sample resource
│
├── requirements.txt
└── README.md

📊 Summary Table

Feature	Description
Tools	Functions agent can execute
Resources	Static/dynamic information exposed to LLM
FileSystem	Safe, restricted directory access
Protocols	stdio, WebSocket, HTTP (proxy)
Language Support	Python, JS, Java (soon), Go (soon)
Architecture Style	JSON-RPC 2.0

✅ Final Thoughts

MCP is quickly becoming the standard protocol for LLM-to-system integration.
For backend engineers, knowing MCP gives you a huge advantage in:

✅ AI system design
✅ Multi-agent architectures
✅ Tooling integration
✅ LLM-powered microservices

Building an Intelligent Stock Analysis Agent with MCP, Groq LLM, and Multi-Source Data

A complete walkthrough of my MCP-powered AI agent for real-time stock insights

GitHub Repo: https://github.com/kkvinodkumaran/mcp_agent_stock_demo

🧩 Introduction

In the era of LLM-powered automation, we're moving beyond simple “question → answer” chatbots. Modern AI agents plan, reason, select the right tools, and combine multiple data sources to generate deep, actionable insights.

This project — MCP Stock Analysis Agent — demonstrates how to build a fully intelligent stock-analysis workflow using:

✅ Model Context Protocol (MCP)
✅ Groq LLM (ultra-fast inference)
✅ Multi-API data fusion (Yahoo Finance, Tavily, DuckDuckGo)
✅ LLM-driven tool selection and planning
✅ React UI + FastAPI backend + MCP server

It combines real-time market data, historical trends, company fundamentals, and news sentiment into a single, adaptive AI agent.

What Problem Does This Solve?

Traditional stock apps give you raw data: prices, charts, company descriptions, or scattered news articles. But they don’t answer real questions like:

“How is Tesla performing lately?”
“What’s the recent news about Apple?”
“Show me Microsoft’s long-term price trends.”
“Give me a full analysis of Nvidia today.”

Users don’t want to assemble multiple APIs or charts themselves.

✅ We solve this by creating an AI agent that understands your query and automatically chooses the right combination of tools.

This agent:

Interprets your natural language
Determines what data you actually need
Calls the right MCP tools and APIs
Combines results
Generates a clean, human-level summary

Architecture: How the Intelligent Agent Works


┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   User Query    │───▶│ Intelligent     │───▶│   MCP Server    │
│ (Natural Lang.) │    │ Agent (LLM)     │    │   (4 Tools)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                │                       │
                                ▼                       ▼
                       ┌─────────────────┐    ┌─────────────────┐
                       │ Tavily API      │    │ Yahoo Finance   │
                       │ (News Search)   │    │ (Stock Data)    │
                       └─────────────────┘    └─────────────────┘
                                │
                                ▼
                       ┌─────────────────┐
                       │    Groq LLM     │
                       │ (Planning +     │
                       │   Summary)      │
                       └─────────────────┘

✅ Components

MCP Server (Port 8000)
Hosts stock-related tools exposed via the Model Context Protocol.
Agent Client (Port 8001)
An LLM-powered agent that:
- Understands user intent
- Selects tools
- Orchestrates workflow
- Summarizes insights
React UI (Port 3000)
A simple frontend to query the agent.

🔧 Tools Exposed by the MCP Server

The MCP Server provides 4 main tools:

Tool	What It Does
`get_quote`	Real-time stock price + basic metrics
`get_stock_history`	Historical data (for trend analysis)
`get_company_info`	Fundamentals, sector, market cap, description
`company_news`	DuckDuckGo-based recent news

Additionally, the agent can directly call:

✅ Tavily API (Enhanced news search with relevance ranking)

🤖 How the Agent Thinks (LLM-Based Planning & Tool Selection)

The heart of the system is Groq LLM, which interprets queries and builds a plan.

Example 1 — Comprehensive Analysis

User: “Give me a complete analysis of Tesla”
Agent reasoning:

“I need price, company fundamentals, historical trends, and recent news.”

✅ Tools Selected

get_quote
get_company_info
get_stock_history
search_news_tavily

Example 2 — News-Focused Query

User: “What’s the recent news about Apple?”
Agent reasoning:

“The user only needs news. No market data required.”

✅ Tool Selected

search_news_tavily

Example 3 — Technical Analysis

User: “Show me Microsoft price trends.”
Agent reasoning:

“Trend analysis requires historical + current price.”

✅ Tools Selected

get_stock_history
get_quote

🏗️ System Architecture Overview

✅ Services

MCP Server → provides stock tools
Agent Client → coordinates LLM and tools
React UI → user interface

✅ Workflow


User Query → LLM Planning → Tool Execution → Data Fusion → AI Summary

Quick Start Guide

✅ Prerequisites

Docker & Docker Compose
Groq API Key
Tavily API Key
Note
Get Groq API key from https://console.groq.com
Get Tavily API key from https://tavily.com

1️⃣ Clone the Repository


git clone https://github.com/kkvinodkumaran/mcp_agent_stock_demo
cd mcp_agent_stock_demo

2️⃣ Configure `.env`


GROQ_API_KEY=your_groq_key
TAVILY_API_KEY=your_tavily_key

3️⃣ Start With Docker


docker-compose up --build

Access the system:

🔌 API Usage

✅ Analyze Endpoint (LLM-Based)


curl -X POST "http://localhost:8001/analyze" \
  -H "Content-Type: application/json" \
  -d '{"query": "Analyze Tesla including recent trends"}'

Response includes:

LLM reasoning
Tools selected
Raw data
Final AI summary

🛠️ Local Development

MCP Server


cd mcp_stock_server
uv sync
uv run python server.py

Agent


cd agent_client
uv sync
uv run uvicorn app.main:app --host 0.0.0.0 --port 8001

UI


cd ui
npm install
npm start

🔍 How It All Works (Under the Hood)

✅ Step-by-step pipeline

User sends a natural-language query
Groq LLM interprets the intent
Agent selects required MCP tools
Tools fetch data (Yahoo Finance, Tavily, DuckDuckGo)
Agent merges data from all sources
Groq LLM generates the final summary

📈 Intelligent Behaviors (Live Examples)

✅ “How is Tesla performing?”

Agent chooses:

get_quote
search_news_tavily

✅ “Give me Tesla's financial details”

Agent chooses:

get_company_info
get_quote

✅ “Analyze Tesla’s price trends”

Agent chooses:

get_stock_history
get_quote

🧱 Why MCP?

MCP (Model Context Protocol) is designed for:

✅ Standardized tools
✅ Dynamic discovery
✅ LLM-friendly interfaces
✅ Easy extensibility

This project shows how to expose your own tools for an AI agent.

🐳 Docker Deployment

All three services run in isolated containers
Health checks ensure reliability
Logs available via:


docker-compose logs agent-client
docker-compose logs mcp-server

🧰 Troubleshooting

LLM not selecting tools?

→ Check GROQ_API_KEY.

News not loading?

→ Check TAVILY_API_KEY.

MCP tools not available?

→ Check:


curl http://localhost:8000/list_tools

A Single-Agent Customer Support RAG System (FastAPI + LangChain + Groq + Chroma + Streamlit)

✅ 1. Problem We Are Solving

Modern customer support teams deal with:

Large PDF manuals (device manuals, onboarding guides, support documentation)
Repetitive questions (“How to reset?”, “Battery issue?”, “Warranty?”)
Delays in retrieving the correct answer
High dependency on support staff expertise

Goal:
Automate customer support by allowing users to:

📄 Upload their product manuals (PDFs)
❓ Ask natural-language questions
🤖 Receive accurate, contextual answers powered by RAG (Retrieval-Augmented Generation)

This reduces support load and improves response quality.

✅ 2. High-Level Solution

We build a single-agent RAG system:

User uploads a PDF
System extracts text + chunks + embeds
Chunks stored in ChromaDB
User asks a question
System retrieves relevant chunks
Groq LLM generates the answer using retrieved context

The agent does not use multi-agent coordination —
Instead, it performs the full RAG workflow independently.

✅ 3. Architecture Overview


                   ┌──────────────────────────┐
                   │       Streamlit UI       │
                   │  - Upload PDF            │
                   │  - Ask question          │
                   └────────────┬─────────────┘
                                │
                                ▼
                       (FastAPI Backend)
          ┌─────────────────────────────────────────┐
          │ 1. PDF Loader (PyPDF)                   │
          │ 2. Text Splitter                        │
          │ 3. Embeddings (HuggingFace)             │
          │ 4. Vector Store (ChromaDB)              │
          │ 5. Retriever (LangChain)                │
          │ 6. Groq LLM (Llama models)              │
          └─────────────────────────────────────────┘
                                │
                                ▼
                   ┌──────────────────────────┐
                   │    Final Answer (RAG)    │
                   │ Retrieved Chunks + LLM   │
                   └──────────────────────────┘

✅ 4. Technical Implementation

4.1 Backend — FastAPI

Responsibilities:

Accept PDF uploads
Convert PDF → raw text
Split text into chunks
Convert chunks → embeddings
Store embeddings in Chroma
Handle user queries
Perform retrieval + LLM answer generation

Key Components:

Component	Purpose
`PyPDF2`	PDF → text extraction
`RecursiveCharacterTextSplitter`	Splits into meaningful chunks
`HuggingFaceEmbeddings`	Embedding generation
`Chroma`	Vector store for retrieval
`Groq LLM`	Generates the final answer
`LangChain Retrieval Pipeline`	Glue connecting all steps

4.2 RAG Flow (Backend)

✅ Indexing Step (after PDF upload)


PDF → Text → Chunks → Embeddings → ChromaDB

✅ Query Step (user question)


Question → Retrieve Top Chunks → LLM (Groq) → Answer

4.3 UI — Streamlit

Responsibilities:

PDF upload
Send file to backend
Question textbox
Display final LLM answer

Ideal for rapid prototyping and demos.

✅ 5. Environment Setup

Download code from : https://github.com/kkvinodkumaran/support_rag_single_agent

Create a .env file in the project root:


GROQ_API_KEY=gsk_xxx
MODEL_NAME=llama-3.1-8b-instant

Note
Get Groq API key from https://console.groq.com
Get Tavily API key (optional) from https://tavily.com

✅ 6. Running the System

6.1 Run using Docker Compose (recommended)

From the project root:


docker compose up --build

Access:

Backend API Docs: http://localhost:8000/docs
Streamlit UI: http://localhost:8501

✅ 7. Local Development (uv)

Backend


cd backend
uv sync
GROQ_API_KEY=xxx uv run uvicorn app.main:app --reload

UI


cd ui
uv sync
BACKEND_HOST=localhost BACKEND_PORT=8000 uv run streamlit run streamlit_app.py

✅ 8. Data Persistence

All indexed embeddings are stored in ChromaDB
Path: ./data
Mounted inside backend container at /app/data
Safe across restarts

✅ 9. Summary

This project demonstrates:

✅ A single-agent RAG pipeline
✅ Built with modern Python tools (LangChain, Groq, Chroma, FastAPI)
✅ Clean UI via Streamlit
✅ Packaged with uv + Docker Compose
✅ Ready for production-grade expansion

Use this template to build:

Customer support bots
Internal knowledge base assistants
Document Q&A systems
Policy & compliance assistants

Multi-Agent RAG System for Backend Engineers

Multi-Agent RAG System

(Groq + LangGraph + FastAPI + Streamlit + ChromaDB — powered by `uv` & Docker)

A fully local, production-style Multi-Agent RAG workflow, designed for backend engineers, ML enthusiasts, and system design learners.

This project demonstrates how to build a real-world multi-agent system with retrieval, summarization, and report generation — using only one external API (Groq, free key).

✅ 1. What Problem Are We Solving?

✅ Use Case: E-commerce Competitor Analysis

Companies often need to compare themselves with major competitors (Amazon, Flipkart, Walmart, Alibaba). Doing this manually requires:

Searching online sources
Collecting competitor information
Summarizing insights
Combining everything into a structured report

This is slow, manual, and error-prone.

✅ Our Multi-Agent RAG System Automates This

Given a topic like:

“Amazon competitor analysis”

The system will:

1️⃣ Perform targeted web research
2️⃣ Summarize each result
3️⃣ Store the research in a vector database
4️⃣ Retrieve context
5️⃣ Generate a final executive-level analytical report

All orchestrated via a LangGraph state machine.

✅ 2. Architecture Overview


User Input → Research Agent → Index Agent → Draft Agent → Final RAG Report

Each step is a LangGraph node with defined responsibilities.


┌───────────────┐
│   User Input   │
│  "Nike ecommerce" 
└───────┬─────────┘
        ▼
┌───────────────────────┐
│  Research Agent        │
│  - Web search          │
│  - LLM summaries       │
└───────┬────────────────┘
        ▼
┌───────────────────────┐
│  Index Agent           │
│  - Embed text          │
│  - Store in Chroma     │
└───────┬────────────────┘
        ▼
┌───────────────────────┐
│  Draft Agent           │
│  - Retrieve context    │
│  - Generate report     │
└───────┬────────────────┘
        ▼
┌───────────────────────┐
│ Final Competitor Report│
└────────────────────────┘

✅ 3. Tools Used in the System

✅ LLM Tools

Tool	Purpose
Groq Llama 3	Ultra-fast summarization & report generation
llm_chat wrapper	Safe wrapper around Groq chat.completions

✅ Retrieval Tools

Tool	Purpose
simple_search (Tavily)	Web search to fetch snippets
HuggingFaceEmbeddings	Convert text → embeddings
ChromaDB	Local vector database for retrieval

✅ 4. Agents & What Tools They Use

Agent	Node Name	Responsibilities	Tools Used
Research Agent	`research`	Web search + summarization	`simple_search`, `llm_chat`
Index Agent	`index`	Embed text + store in vector DB	HF Embeddings, Chroma `add_texts()`
Draft Agent	`draft`	Retrieve context + generate final report	Chroma `retrieve()`, `llm_chat`

✅ 5. LangGraph State Machine

✅ State Model


class PipelineState(BaseModel):
    topic: str
    research_notes: List[str] = []
    indexed: bool = False
    context_snippets: List[str] = []
    final_report: Optional[str] = None
    error: Optional[str] = None

✅ Graph Nodes

Node	Purpose
`research`	Fetch data + summarize
`index`	Embed + store
`draft`	RAG + final report

✅ Graph Structure


START → research → index → draft → END

✅ 6. Complete Tech Stack

Python 3.11
uv
FastAPI
Streamlit
LangGraph
Groq LLM
ChromaDB
HuggingFace Sentence Transformers

✅ 7. Running the System

📦 Prereqs

Docker + Docker Compose
Groq API key -

Create a free account at https://console.groq.com, go to API Keys, and generate a key (starts with gsk_).

TAVILY API key -

Download code from https://github.com/kkvinodkumaran/multiagent_rag_full

⚙️ Setup Environment

You only need one .env file depending on how you run the system:

Option 1: Docker Compose (Recommended)

Create .env in the root directory:


cp .env.example .env

Edit the root .env file with your API keys:


GROQ_API_KEY=gsk_xxx
TAVILY_API_KEY=tvly_xxx
MODEL_NAME=llama-3.1-8b-instant
EMBED_MODEL=sentence-transformers/all-MiniLM-L6-v2
CHROMA_DIR=./data/chroma
BACKEND_HOST=backend
BACKEND_PORT=8000

Option 2: Local Development

Create .env in the backend/ directory:


cp .env.example backend/.env

Edit backend/.env with your API keys:


GROQ_API_KEY=gsk_xxx
TAVILY_API_KEY=tvly_xxx
MODEL_NAME=llama-3.1-8b-instant
EMBED_MODEL=sentence-transformers/all-MiniLM-L6-v2
CHROMA_DIR=./data/chroma

Note: You don't need both files - choose the location based on your deployment method.

Run With Docker (Recommended)


docker compose up --build

Open:

Backend: http://localhost:8000/docs
UI: http://localhost:8501

🛠 Local Dev (Optional)

Backend


cd backend
uv sync
uv run uvicorn app:app --reload --port 8000

UI


cd ui
uv sync
uv run streamlit run streamlit_app.py

✅ 8. Notes

ChromaDB persists to ./data/chroma
Replace Groq with Ollama for full offline mode
Extend system with more agents (Planner, Validator, Router)

✅ 9. Summary

This project demonstrates a real, production-grade:

✅ Multi-Agent workflow
✅ Retrieval-Augmented Generation
✅ LangGraph orchestration
✅ Chroma-based local vector retrieval
✅ Groq Llama inference pipeline
✅ Fully local deployment via Docker

Model Context Protocol (MCP) — Complete Guide for Backend Engineers

Model Context Protocol (MCP) — Complete Guide for Backend Engineers

Build Tools, Resources, and AI-Driven Services Using LangChain

What Is MCP?

📐 High-Level Architecture (Text-Based Diagram)

🔌 MCP Transport Protocols

✅ 1. stdio (local execution)

✅ 2. websocket (remote execution)

✅ 3. HTTP (proxy adapters)

🛠️ Building a Simple MCP Server (LangChain)

✅ Install dependencies

✅ Step 1: Create Tools

✅ Step 2: Create the MCP Server

✅ Step 3: Run the MCP Server

🧰 Exposing Resources

📂 Exposing File-System Resources (Read-Only)

🤖 How Agents Call MCP Tools (LangChain)

🎯 Tool Invocation Flow

💼 Where Backend Engineers Use MCP

🎤 Interview-Ready Explanation

📦 Full Project Structure

📊 Summary Table

✅ Final Thoughts

Building an Intelligent Stock Analysis Agent with MCP, Groq LLM, and Multi-Source Data

Building an Intelligent Stock Analysis Agent with MCP, Groq LLM, and Multi-Source Data

🧩 Introduction

What Problem Does This Solve?

Architecture: How the Intelligent Agent Works

✅ Components

🔧 Tools Exposed by the MCP Server

🤖 How the Agent Thinks (LLM-Based Planning & Tool Selection)

Example 1 — Comprehensive Analysis

Example 2 — News-Focused Query

Example 3 — Technical Analysis

🏗️ System Architecture Overview

✅ Services

✅ Workflow

Quick Start Guide

✅ Prerequisites

1️⃣ Clone the Repository

2️⃣ Configure .env

3️⃣ Start With Docker

🔌 API Usage

✅ Analyze Endpoint (LLM-Based)

🛠️ Local Development

MCP Server

Agent

UI

🔍 How It All Works (Under the Hood)

✅ Step-by-step pipeline

📈 Intelligent Behaviors (Live Examples)

✅ “How is Tesla performing?”

✅ “Give me Tesla's financial details”

✅ “Analyze Tesla’s price trends”

🧱 Why MCP?

🐳 Docker Deployment

🧰 Troubleshooting

LLM not selecting tools?

News not loading?

MCP tools not available?

A Single-Agent Customer Support RAG System (FastAPI + LangChain + Groq + Chroma + Streamlit)

A Single-Agent Customer Support RAG System (FastAPI + LangChain + Groq + Chroma + Streamlit)

✅ 1. Problem We Are Solving

✅ 2. High-Level Solution

✅ 3. Architecture Overview

✅ 4. Technical Implementation

4.1 Backend — FastAPI

Responsibilities:

Key Components:

4.2 RAG Flow (Backend)

✅ Indexing Step (after PDF upload)

✅ Query Step (user question)

4.3 UI — Streamlit

✅ 5. Environment Setup

✅ 6. Running the System

6.1 Run using Docker Compose (recommended)

✅ 7. Local Development (uv)

Backend

UI

✅ 8. Data Persistence

2️⃣ Configure `.env`

(Groq + LangGraph + FastAPI + Streamlit + ChromaDB — powered by `uv` & Docker)