SVC—001 ● LIVE VOICE AI JAN–MAR 2025

WebRTC
Voice
Agent

GPT-4o Realtime

An enterprise-grade phone agent — not a chatbot. Handles real-time voice conversations over WebRTC, maintains contextual memory across the call, and syncs every interaction bidirectionally into HubSpot CRM. Built for production traffic with sub-800ms end-to-end latency from speech input to voiced response.

95%

ACCURACY

<800ms

LATENCY

+45%

ENGAGEMENT

GPT-4o Realtime WebRTC Twilio ElevenLabs HubSpot API AWS Lambda Redis

// UI PREVIEW — LIVE CALL OPERATIONS DASHBOARD

VOICE AGENT SYSTEM Live Call Dashboard

Call in progress

CALL STATUS

STATUS● ACTIVE

DURATION04:32

CHANNELWebRTC / Twilio

LATENCY612ms

MODELgpt-4o-realtime

TTSElevenLabs

MEMORYLoaded (7 turns)

INTENTproduct_inquiry

LIVE TRANSCRIPT

AGENT

Hi there, this is Alex from the support team. How can I help you today?

CALLER

Yes, I wanted to ask about the enterprise pricing for your platform.

AGENT

Absolutely. Based on your company size I can walk you through our Team and Enterprise tiers. Would you prefer a demo first?

CALLER

A demo would be great, yes. Next week works.

AGENT

Perfect — I'm scheduling that now. You'll get a confirmation at the email on file.

CRM SYNC LOG

Contact record updated — intent: enterprise_inquiry

04:12

Lead score raised: 42 → 87

04:18

Meeting scheduled: Demo — Next Tue 3:00 PM

04:29

Webhook fired → HubSpot deal pipeline stage: Qualified

04:30

Email confirmation queued via n8n workflow

04:31

02 // ARCHITECTURE

The Pipeline

CALLER

→

Twilio

→

WebRTC

→

FastAPI

→

GPT-4o
Realtime

→

ElevenLabs
TTS

→

HubSpot
CRM

STEP 01

WebRTC Connect

Caller dials in via Twilio. WebRTC stream established to FastAPI backend over secure WSS channel.

STEP 02

Audio Processing

Raw audio chunks streamed in real-time to GPT-4o Realtime API. No intermediate STT step — native voice understanding.

STEP 03

Context & Memory

Redis session store injects prior conversation history. Agent maintains context across the full call duration.

STEP 04

TTS Response

GPT-4o response streamed to ElevenLabs for low-latency neural TTS synthesis. Audio piped back to caller in under 800ms.

STEP 05

CRM Sync

Post-call: intent, lead score, scheduled meetings, and full transcript written bidirectionally to HubSpot via webhook.

03 // STACK

Built with

VOICE MODEL

GPT-4o Realtime

Native real-time audio API. Understands speech directly without intermediate STT, enabling natural voice-to-voice flow.

TRANSPORT

WebRTC + Twilio

WebRTC handles peer-to-peer audio streaming. Twilio bridges PSTN callers to the WebRTC infrastructure.

TEXT-TO-SPEECH

ElevenLabs

Neural TTS with custom voice cloning. Streaming output to keep end-to-end latency under 800ms.

MEMORY

Redis

Session-scoped conversation memory. Persists context between turns so the agent never loses track of the call.

CRM

HubSpot API

Bidirectional sync — reads prior contact history before the call, writes outcomes, scores, and meetings after.

INFRASTRUCTURE

AWS Lambda

Serverless backend for webhook handling and async CRM writes. Auto-scales under production traffic spikes.

See the rest
of the deployments.

← ALL PROJECTS