ORCHESTRATION
LangChain
Hierarchical retrieval chain with custom reranking step. Contextual compression ensures GPT-4 only sees the most relevant sentences.
VECTOR STORE
Pinecone
Three namespace strategy: product docs, knowledge base, and FAQs searched in parallel. Metadata filters for version-aware retrieval.
EMBEDDINGS
Hugging Face
e5-large-v2 for document and query embedding. Instruction-tuned for asymmetric search — short query vs. long passage retrieval.
SYNTHESIS
GPT-4 Turbo
Structured answer generation with citation enforcement. System prompt requires every claim to reference a retrieved source chunk.
MEMORY
Persistent Sessions
Redis-backed session memory. 20-turn context window per user. Agent remembers the full conversation without re-ingesting history.
API
FastAPI + Python
Async REST endpoints with streaming support. Response tokens streamed to client for perceived low-latency experience.