SVC—004 ● LIVE RAG / SUPPORT JAN–MAR 2024

Enterprise
RAG
Support

Customer Intelligence Agent

Multi-stage RAG with hierarchical retrieval and context reranking across 100K+ documents. Persistent session memory means the agent never loses context between turns. Resolves 92% of queries without human escalation — each answer cited back to the source document.

92%

RESOLUTION RATE

-60%

RESPONSE TIME

100K+

DOCS INDEXED

LangChainPineconeGPT-4Hugging FaceFastAPIPython

// UI PREVIEW — SUPPORT AGENT INTERFACE

ENTERPRISE SUPPORT AI Support Console

92% resolution rate

INDEX STATS

TOTAL DOCS103,847

INDEXES3 namespaces

CHUNKS2.1M vectors

EMBED MODELhf/e5-large

RERANKERON

MEMORYPersistent

SESSION7 turns

RESOLVED92%

SUPPORT CONVERSATION

USER

How do I configure SSO with Okta for our enterprise plan?

AGENT

To configure Okta SSO: 1. Go to Settings → Security → SSO. 2. Select SAML 2.0. 3. Enter your Okta metadata URL. 4. Map the email attribute to user.email. Your IT admin needs the SP Entity ID from our dashboard.

docs/sso-setup.md §3kb/okta-guide.pdf p.7

USER

What's the SP Entity ID format?

AGENT

It follows the pattern: https://app.yourservice.com/saml/metadata/<org-id>. You'll find your org-id under Settings → Account → Organization.

docs/sso-setup.md §4.2

RETRIEVED CONTEXT

RANK 1 · score 0.96 · sso-setup.md

SAML 2.0 configuration requires the SP Entity ID and ACS URL from the enterprise settings panel...

RANK 2 · score 0.91 · okta-guide.pdf

For Okta integration, map the NameID format to EmailAddress and configure the attribute statements...

RANK 3 · score 0.84 · enterprise-faq.md

Enterprise SSO is available on the Business and Enterprise tiers. Contact your account manager to enable...

02 // ARCHITECTURE

The Pipeline

USER
QUERY

→

HF Embed
Query

→

Pinecone
Multi-Index

→

Context
Reranker

→

GPT-4
Synthesis

→

CITED
RESPONSE

STEP 01

Query Embed

User query embedded via Hugging Face e5-large model. Same embedding space as document index for semantic alignment.

STEP 02

Multi-Index Search

Query vector searched across 3 Pinecone namespaces (docs, KB, FAQs). Top-20 candidates per namespace retrieved.

STEP 03

Context Reranking

Cross-encoder reranks 60 candidates → top-5. Contextual compression removes irrelevant sentences from each chunk.

STEP 04

GPT-4 Synthesis

Reranked context + session memory injected into GPT-4 prompt. Response generated with mandatory source citation.

STEP 05

Memory Update

Conversation turn appended to persistent Redis session. Memory window: 20 turns. Context always fresh across sessions.

03 // STACK

Built with

ORCHESTRATION

LangChain

Hierarchical retrieval chain with custom reranking step. Contextual compression ensures GPT-4 only sees the most relevant sentences.

VECTOR STORE

Pinecone

Three namespace strategy: product docs, knowledge base, and FAQs searched in parallel. Metadata filters for version-aware retrieval.

EMBEDDINGS

Hugging Face

e5-large-v2 for document and query embedding. Instruction-tuned for asymmetric search — short query vs. long passage retrieval.

SYNTHESIS

GPT-4 Turbo

Structured answer generation with citation enforcement. System prompt requires every claim to reference a retrieved source chunk.

MEMORY

Persistent Sessions

Redis-backed session memory. 20-turn context window per user. Agent remembers the full conversation without re-ingesting history.

API

FastAPI + Python

Async REST endpoints with streaming support. Response tokens streamed to client for perceived low-latency experience.

EnterpriseRAGSupport

The Pipeline

Built with

Enterprise
RAG
Support