Skip to content

OCR / Text Extraction

Extract text from images using AI-powered optical character recognition. Supports multiple languages and quality tiers.

API Endpoint

POST /api/v1/tools/ocr

Processing: Synchronous (returns extracted text directly, though progress is reported via SSE if a clientJobId is provided)

Model bundle: ocr (3-4 GB)

Parameters

ParameterTypeRequiredDefaultDescription
filefileYes-Image file (multipart)
qualitystringNo"balanced"Quality tier: fast (Tesseract), balanced (PaddleOCR v5), best (PaddleOCR VL)
languagestringNo"auto"Language hint: auto, en, de, fr, es, zh, ja, ko
enhancebooleanNotruePre-process image for better OCR accuracy
enginestringNo-Deprecated. Use quality instead. Maps tesseract to fast, paddleocr to balanced

Example Request

bash
curl -X POST http://localhost:13490/api/v1/tools/ocr \
  -F "[email protected]" \
  -F 'settings={"quality":"best","language":"en","enhance":true}'

Response (200 OK)

json
{
  "jobId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "filename": "document.png",
  "text": "Extracted text content from the image...",
  "engine": "paddleocr-vl"
}

Progress (SSE, optional)

If a clientJobId is provided, progress events are streamed:

event: progress
data: {"phase":"processing","stage":"Recognizing text...","percent":50}

Notes

  • Requires the ocr model bundle to be installed (3-4 GB).
  • Unlike most AI tools, OCR returns a synchronous JSON response with extracted text (not an image download URL).
  • Uses a fallback chain: if a higher-quality tier crashes (e.g., PaddleOCR segfault), it automatically retries with the next lower tier.
  • If a tier returns empty text without crashing, it also falls back to the next tier.
  • Quality tiers map to engines: fast = Tesseract, balanced = PaddleOCR v5, best = PaddleOCR VL.
  • Supports HEIC/HEIF, RAW, TGA, PSD, EXR, and HDR input formats via automatic decoding.