MiniCPM-V 4.5

by OpenBMB

Open SourceSelf-HostedApache-2.0

GPT-4o-level 8B multimodal model with state-of-the-art OCR and document parsing

OCRLayout AnalysisTable ExtractionData Extraction

Overview

MiniCPM-V 4.5 is OpenBMB's flagship multimodal model achieving GPT-4o-level performance with only 8 billion parameters. Built on Qwen3-8B and SigLIP2-400M, it surpasses proprietary models like GPT-4o-latest and Gemini-2.0 Pro on vision-language benchmarks.

The model excels at document understanding with leading performance on OCRBench and OmniDocBench. Using LLaVA-UHD architecture, it processes high-resolution images up to 1.8 million pixels with 4x fewer visual tokens than most MLLMs.

A novel unified OCR training approach dynamically corrupts text regions with varying noise, teaching the model to adaptively switch between accurate recognition and context-based reasoning. This eliminates hallucinations from over-augmented OCR data while achieving top-tier performance with minimal engineering overhead.

Strengths

  • GPT-4o-level performance with only 8B parameters
  • Leading OCRBench scores, surpassing GPT-4o-latest
  • State-of-the-art on OmniDocBench for PDF parsing
  • High-resolution support up to 1.8M pixels
  • Runs on smartphones and consumer GPUs
  • Strong handwritten OCR and complex table parsing

Limitations

  • 8B model still requires significant memory
  • May be slower than specialized OCR-only models
  • General-purpose model, not OCR-specialized

Best Use Cases

  • Complex PDF and document parsing
  • Research paper extraction
  • Financial report processing
  • Handwritten text recognition
  • Mobile and edge deployment