- Document Processing
- /
- MiniCPM-V 4.5
MiniCPM-V 4.5
by OpenBMB
GPT-4o-level 8B multimodal model with state-of-the-art OCR and document parsing
Overview
MiniCPM-V 4.5 is OpenBMB's flagship multimodal model achieving GPT-4o-level performance with only 8 billion parameters. Built on Qwen3-8B and SigLIP2-400M, it surpasses proprietary models like GPT-4o-latest and Gemini-2.0 Pro on vision-language benchmarks.
The model excels at document understanding with leading performance on OCRBench and OmniDocBench. Using LLaVA-UHD architecture, it processes high-resolution images up to 1.8 million pixels with 4x fewer visual tokens than most MLLMs.
A novel unified OCR training approach dynamically corrupts text regions with varying noise, teaching the model to adaptively switch between accurate recognition and context-based reasoning. This eliminates hallucinations from over-augmented OCR data while achieving top-tier performance with minimal engineering overhead.
Strengths
- GPT-4o-level performance with only 8B parameters
- Leading OCRBench scores, surpassing GPT-4o-latest
- State-of-the-art on OmniDocBench for PDF parsing
- High-resolution support up to 1.8M pixels
- Runs on smartphones and consumer GPUs
- Strong handwritten OCR and complex table parsing
Limitations
- 8B model still requires significant memory
- May be slower than specialized OCR-only models
- General-purpose model, not OCR-specialized
Best Use Cases
- Complex PDF and document parsing
- Research paper extraction
- Financial report processing
- Handwritten text recognition
- Mobile and edge deployment