SVC—005 ● LIVE COMPUTER VISION JAN 2026

Vision
PDF
Layout

Detection Engine

A YOLOv11-powered computer vision pipeline that renders PDF pages as images, detects multi-column layouts and table regions via bounding box inference, then semantically classifies each region using Sentence Transformers + LightSVM. Trained on 2.4K+ annotated document images. CPU-optimized and fully Dockerized for production deployment.

2.4K+
TRAIN IMAGES
CPU
OPTIMIZED
100%
DOCKERIZED
YOLOv11Sentence TransformersLightSVMPyMuPDFpdfplumberFastAPIRoboflow
// UI PREVIEW — DOCUMENT PROCESSING DASHBOARD
VISION LAYOUT ENGINE Document Analysis
Processing complete
PDF QUEUE
📄resume_john_doe.pdf
📄invoice_2024_q3.pdf
📄contract_acme.pdf
📄report_annual.pdf
📄form_w2_2024.pdf
2 done · 1 processing · 2 queued
REGION DETECTION — resume_john_doe.pdf
HEADER 0.97
SECTION 0.94
EXPERIENCE 0.91
SKILLS 0.89
CLASSIFICATION OUTPUT
HEADER
John Doe — Senior Engineer
SECTION
Professional Summary
EXPERIENCE
Work History Block
SKILLS
Technical Skills List
4 regions · conf threshold 0.45

The Pipeline

PDF
Input
PyMuPDF
Render
YOLOv11
Detect
Sentence
Transformers
LightSVM
Classify
Semantic
JSON
STEP 01
PDF Render
PyMuPDF renders each PDF page to high-resolution PNG at 150 DPI. pdfplumber extracts raw text and bounding coordinates in parallel.
STEP 02
YOLO Detection
YOLOv11 runs inference on rendered page images. Detects bounding boxes for text blocks, headers, tables, and columns at conf ≥ 0.45.
STEP 03
Region Extraction
Detected bounding boxes mapped back to raw text via coordinate overlap. Overlapping boxes merged via NMS. Each region gets its raw text content.
STEP 04
Semantic Embed
Each text region encoded via Sentence Transformers (all-MiniLM-L6-v2) into 384-dim vectors capturing semantic meaning.
STEP 05
SVM Classify
LightSVM classifies each embedding into document section type: header, experience, skills, education, summary, table, footer. Output as structured JSON.
03 // STACK

Built with

DETECTION
YOLOv11
Custom-trained on 2.4K+ annotated document images via Roboflow. Detects layout regions with mAP@0.5 above 0.87 on held-out test set.
EMBEDDINGS
Sentence Transformers
all-MiniLM-L6-v2 encodes each extracted text region. Lightweight enough for CPU deployment with negligible inference overhead.
CLASSIFIER
LightSVM
Trained SVM classifier on embedding features. Outperforms fine-tuned BERT classifiers at 40× lower inference latency on CPU.
PDF PARSING
PyMuPDF + pdfplumber
PyMuPDF for page rendering. pdfplumber for text extraction with position metadata. Dual-engine approach handles scanned and native PDFs.
TRAINING DATA
Roboflow
Annotation pipeline and dataset versioning. Augmentation: rotation, crop, brightness shifts to improve robustness on real-world document variance.
API
FastAPI + Docker
REST endpoint accepts PDF upload, returns structured JSON with section labels. Docker image runs on CPU-only with no GPU dependency.
See the rest
of the deployments.
← ALL PROJECTS