SVC—004 ● LIVE RAG / SUPPORT JAN–MAR 2024

Enterprise
RAG
Support

Customer Intelligence Agent

Multi-stage RAG with hierarchical retrieval and context reranking across 100K+ documents. Persistent session memory means the agent never loses context between turns. Resolves 92% of queries without human escalation — each answer cited back to the source document.

92%
RESOLUTION RATE
-60%
RESPONSE TIME
100K+
DOCS INDEXED
LangChainPineconeGPT-4Hugging FaceFastAPIPython
// UI PREVIEW — SUPPORT AGENT INTERFACE
ENTERPRISE SUPPORT AI Support Console
92% resolution rate
INDEX STATS
TOTAL DOCS103,847
INDEXES3 namespaces
CHUNKS2.1M vectors
EMBED MODELhf/e5-large
RERANKERON
MEMORYPersistent
SESSION7 turns
RESOLVED92%
SUPPORT CONVERSATION
USER
How do I configure SSO with Okta for our enterprise plan?
AGENT
To configure Okta SSO: 1. Go to Settings → Security → SSO. 2. Select SAML 2.0. 3. Enter your Okta metadata URL. 4. Map the email attribute to user.email. Your IT admin needs the SP Entity ID from our dashboard.
docs/sso-setup.md §3kb/okta-guide.pdf p.7
USER
What's the SP Entity ID format?
AGENT
It follows the pattern: https://app.yourservice.com/saml/metadata/<org-id>. You'll find your org-id under Settings → Account → Organization.
docs/sso-setup.md §4.2
RETRIEVED CONTEXT
RANK 1 · score 0.96 · sso-setup.md
SAML 2.0 configuration requires the SP Entity ID and ACS URL from the enterprise settings panel...
RANK 2 · score 0.91 · okta-guide.pdf
For Okta integration, map the NameID format to EmailAddress and configure the attribute statements...
RANK 3 · score 0.84 · enterprise-faq.md
Enterprise SSO is available on the Business and Enterprise tiers. Contact your account manager to enable...

The Pipeline

USER
QUERY
HF Embed
Query
Pinecone
Multi-Index
Context
Reranker
GPT-4
Synthesis
CITED
RESPONSE
STEP 01
Query Embed
User query embedded via Hugging Face e5-large model. Same embedding space as document index for semantic alignment.
STEP 02
Multi-Index Search
Query vector searched across 3 Pinecone namespaces (docs, KB, FAQs). Top-20 candidates per namespace retrieved.
STEP 03
Context Reranking
Cross-encoder reranks 60 candidates → top-5. Contextual compression removes irrelevant sentences from each chunk.
STEP 04
GPT-4 Synthesis
Reranked context + session memory injected into GPT-4 prompt. Response generated with mandatory source citation.
STEP 05
Memory Update
Conversation turn appended to persistent Redis session. Memory window: 20 turns. Context always fresh across sessions.
03 // STACK

Built with

ORCHESTRATION
LangChain
Hierarchical retrieval chain with custom reranking step. Contextual compression ensures GPT-4 only sees the most relevant sentences.
VECTOR STORE
Pinecone
Three namespace strategy: product docs, knowledge base, and FAQs searched in parallel. Metadata filters for version-aware retrieval.
EMBEDDINGS
Hugging Face
e5-large-v2 for document and query embedding. Instruction-tuned for asymmetric search — short query vs. long passage retrieval.
SYNTHESIS
GPT-4 Turbo
Structured answer generation with citation enforcement. System prompt requires every claim to reference a retrieved source chunk.
MEMORY
Persistent Sessions
Redis-backed session memory. 20-turn context window per user. Agent remembers the full conversation without re-ingesting history.
API
FastAPI + Python
Async REST endpoints with streaming support. Response tokens streamed to client for perceived low-latency experience.
See the rest
of the deployments.
← ALL PROJECTS