InternVL 3.5

Overview

InternVL 3.5 is OpenGVLab's pioneering open-source alternative to GPT-4o, demonstrating superior multimodal perception and reasoning capabilities. The series spans from 2B to 78B parameters, offering options for various deployment scenarios.

The model achieves competitive results across nine document understanding benchmarks (AI2D, ChartQA, TextVQA, DocVQA, InfoVQA, OCRBench, SEED-2-Plus, CharXiv, VCR), outperforming other open-source and many closed-source models.

InternVL3 integrates Variable Visual Position Encoding (V2PE) for better long context understanding. The pre-training corpus covers diverse domains including OCR, charts, documents, mathematics, knowledge grounding, and multi-turn dialogue.

Overview

Strengths

Limitations

Best Use Cases