Antonio Zarauz Moreno | Bridging Machine Perception

Beyond the Model: Robust Pipeline Integration

In an enterprise environment, selecting a SOTA OCR model is only half the battle. The real challenge—and value—lies in the integration. By focusing on a "plug and play" architecture for AI agents, this project delivers a complete document understanding tool that bridges raw multimodal vision with actionable business intelligence.

A key strategic decision was the resource allocation for the `dots.ocr` engine. Although it is a relatively small 3B model, we dedicate the entire GPU capacity to it. This approach reflects a deep business understanding: real-world usage patterns often involve "bulking" an entire multi-hundred-page PDF into the service at once. By allowing the GPU to hold massive concurrent batches, we ensure that the entire document is processed with near-instant responsiveness, rather than page-by-page serial bottlenecks.

Event-Driven Architecture Overview

The platform decouples request ingestion from heavy processing using a robust event-driven design:

Orchestrator (FastAPI): A lightweight entry point that handles authentication and rate limiting, dropping jobs into an asynchronous queue for infinite buffering.
SQL Persistence & Real-Time Tracking: A relational PostgreSQL ledger maintains a strict Parent-Child relationship. One PDF (Parent) is mapped to N individual page images (Children), allowing for granular, real-time tracking of the document's progress through the pipeline.
PDF Fan-Out & Queue Strategy: A dedicated worker utilizes Azure Service Bus to "fan-out" complex documents. It parallelizes the processing by converting pages into independent image jobs, ensuring that the GPU workers are never idle.
High-Throughput OCR (GPU): The A100 heavy-lifter, optimized for massive batch sizes (65k tokens per cycle) to maximize hardware bandwidth and eliminate processing stalls.
Artifact Streaming: Leveraging Azure ACR to reduce 10GB+ image cold-starts from minutes to seconds, ensuring true "on-demand" scaling.

This architecture provides a scalable, enterprise-grade solution for complex financial and legal document extraction, delivering high-speed reasoning directly to the agent layer.

Doc Intelligence

Beyond the Model: Robust Pipeline Integration

Event-Driven Architecture Overview