MiniCPM-V 4.5

Overview

MiniCPM-V 4.5 is OpenBMB's flagship multimodal model achieving GPT-4o-level performance with only 8 billion parameters. Built on Qwen3-8B and SigLIP2-400M, it surpasses proprietary models like GPT-4o-latest and Gemini-2.0 Pro on vision-language benchmarks.

The model excels at document understanding with leading performance on OCRBench and OmniDocBench. Using LLaVA-UHD architecture, it processes high-resolution images up to 1.8 million pixels with 4x fewer visual tokens than most MLLMs.

A novel unified OCR training approach dynamically corrupts text regions with varying noise, teaching the model to adaptively switch between accurate recognition and context-based reasoning. This eliminates hallucinations from over-augmented OCR data while achieving top-tier performance with minimal engineering overhead.

Overview

Strengths

Limitations

Best Use Cases