SVC—005 ● LIVE COMPUTER VISION JAN 2026

Vision
PDF
Layout

Detection Engine

A YOLOv11-powered computer vision pipeline that renders PDF pages as images, detects multi-column layouts and table regions via bounding box inference, then semantically classifies each region using Sentence Transformers + LightSVM. Trained on 2.4K+ annotated document images. CPU-optimized and fully Dockerized for production deployment.

2.4K+

TRAIN IMAGES

CPU

OPTIMIZED

100%

DOCKERIZED

YOLOv11Sentence TransformersLightSVMPyMuPDFpdfplumberFastAPIRoboflow

// UI PREVIEW — DOCUMENT PROCESSING DASHBOARD

VISION LAYOUT ENGINE Document Analysis

Processing complete

PDF QUEUE

📄resume_john_doe.pdf✓

📄invoice_2024_q3.pdf✓

📄contract_acme.pdf●

📄report_annual.pdf–

📄form_w2_2024.pdf–

2 done · 1 processing · 2 queued

REGION DETECTION — resume_john_doe.pdf

HEADER 0.97

SECTION 0.94

EXPERIENCE 0.91

SKILLS 0.89

CLASSIFICATION OUTPUT

HEADER

John Doe — Senior Engineer

SECTION

Professional Summary

EXPERIENCE

Work History Block

SKILLS

Technical Skills List

4 regions · conf threshold 0.45

02 // ARCHITECTURE

The Pipeline

PDF
Input

→

PyMuPDF
Render

→

YOLOv11
Detect

→

Sentence
Transformers

→

LightSVM
Classify

→

Semantic
JSON

STEP 01

PDF Render

PyMuPDF renders each PDF page to high-resolution PNG at 150 DPI. pdfplumber extracts raw text and bounding coordinates in parallel.

STEP 02

YOLO Detection

YOLOv11 runs inference on rendered page images. Detects bounding boxes for text blocks, headers, tables, and columns at conf ≥ 0.45.

STEP 03

Region Extraction

Detected bounding boxes mapped back to raw text via coordinate overlap. Overlapping boxes merged via NMS. Each region gets its raw text content.

STEP 04

Semantic Embed

Each text region encoded via Sentence Transformers (all-MiniLM-L6-v2) into 384-dim vectors capturing semantic meaning.

STEP 05

SVM Classify

LightSVM classifies each embedding into document section type: header, experience, skills, education, summary, table, footer. Output as structured JSON.

03 // STACK

Built with

DETECTION

YOLOv11

Custom-trained on 2.4K+ annotated document images via Roboflow. Detects layout regions with mAP@0.5 above 0.87 on held-out test set.

EMBEDDINGS

Sentence Transformers

all-MiniLM-L6-v2 encodes each extracted text region. Lightweight enough for CPU deployment with negligible inference overhead.

CLASSIFIER

LightSVM

Trained SVM classifier on embedding features. Outperforms fine-tuned BERT classifiers at 40× lower inference latency on CPU.

PDF PARSING

PyMuPDF + pdfplumber

PyMuPDF for page rendering. pdfplumber for text extraction with position metadata. Dual-engine approach handles scanned and native PDFs.

TRAINING DATA

Roboflow

Annotation pipeline and dataset versioning. Augmentation: rotation, crop, brightness shifts to improve robustness on real-world document variance.

API

FastAPI + Docker

REST endpoint accepts PDF upload, returns structured JSON with section labels. Docker image runs on CPU-only with no GPU dependency.

VisionPDFLayout

The Pipeline

Built with

Vision
PDF
Layout