LightOnOCR-2-1B

by LightOn

Open SourceSelf-HostedApache-2.0

SOTA 1B parameter end-to-end OCR (83.2 on OlmOCR-Bench) that beats models 9x larger while being 5x faster

OCRLayout AnalysisTable ExtractionDocument Conversion

🤗 Hugging Face 📄 arXiv Paper 🚀 Try Demo Official Blog

Overview

LightOnOCR-2-1B is a fully differentiable end-to-end vision-language model for OCR. It processes PDFs and images directly to clean, naturally ordered Markdown without requiring external pipelines. With only 1B parameters, it achieves state-of-the-art 83.2 on OlmOCR-Bench, outperforming Chandra-9B by 1.5+ points.

The model handles tables, receipts, forms, multi-column layouts, and math notation with high accuracy. It supports 11 languages (with emphasis on European languages) and offers optional bounding box variants for image localization. With throughput of 5.71 pages/second on a single H100 (~493k pages/day), it costs less than $0.01 per 1,000 pages to run.

LightOnOCR comes in multiple variants: the main model for best OCR quality, base checkpoints for fine-tuning with LoRA/PEFT, bbox variants for image detection, and merged soup variants balancing OCR and localization. Can be served with vLLM for production or Hugging Face Transformers for flexibility.

Strengths

SOTA on OlmOCR-Bench (83.2) - beats Chandra-9B by 1.5+ points
5.71 pages/s on single H100 (~493k pages/day)
5x faster than dots.ocr, 2x faster than PaddleOCR-VL
3.3x faster than Chandra OCR, 1.7x faster than OlmOCR
Less than $0.01 per 1,000 pages
End-to-end: PDF/image in, clean Markdown out
Day-0 Hugging Face Transformers support
Servable via vLLM or Transformers
Base checkpoints for domain-specific fine-tuning (LoRA/PEFT)
Optional bbox variants for image localization
11 language support with European emphasis

Limitations

Optimal at 200 DPI with longest dimension 1540px
11 languages vs 100+ in some alternatives
Newer model - community ecosystem still developing

Best Use Cases

High-volume document processing pipelines
RAG systems requiring clean Markdown from PDFs
Invoice and receipt processing
Scientific document digitization
Multi-column layout extraction
Domain-specific OCR via fine-tuning (legal, medical, financial)