Moonshot family

Kimi

Kimi: K2.6 Thinking ranks #11 of 186 with 131K-token context and $0.57/$2.3 per 1M tokens. Compare K2.6, K2, and Kimi VL by workload.

Top in this family

K2.6 Thinking ranks #11 of 186 on overall quality (QS 99.5) at $0.57/$2.3 per 1M tokens.

Variants
2
License
Open weights
Provider
Moonshot

★ Most teams should start here

Moonshot Kimi K2

Variant: K2.6 Thinking

The default for text-only workloads. Strong Moonshot chat-tier with competitive long-context. Pick Kimi VL when the workload involves image inputs.

Quality Score
99.5
Input
$0.570/1M
Output
$2.30/1M
Context
131K
License
Open weights

Best variant by workload

One pick per common job. Pick by what you need to ship — not by which variant has the highest score on a leaderboard you don't use.

Note — picks are framed for direct API usage where cost per million tokens is load-bearing. If you're inside an agent harness (Claude Code, Cursor, etc.) the calculus changes: the harness sets the model, the per-task cost is usually negligible, and the flagship variant tends to win. See our piece on Claude Code for the harness-vs-API framing.
WorkloadBest pickWhy
General API workhorse
Moonshot Kimi K2
K2.6 Thinking
$0.570/1M / $2.30/1M
Moonshot's flagship chat model. Strong long-context behavior is the headline differentiator within the family.
Document AI / OCR
Kimi-VL-A3B
Thinking
Vision-language variant in the family. Use for layout-aware document workloads where image-grounded extraction beats OCR-then-text-LLM pipelines.

All variants

16 variants across 2 models (+ 1 cross-family for context). Sorted by quality score (descending).

VariantQSGPQAHLESWESWE-ProTerminalTauMCPAIMEIn $/MOut $/MContextReleasedLic.
K2.6 Thinking
Kimi K2
99.5
#11/186
90.580.258.6$0.57$2.3131KJul 1, 2025
K2.5 Thinking
Kimi K2
88.9
#33/186
87.631.576.853.850.864.484.8$0.4$1.9262KJul 1, 2025
Thinking
Kimi K2
82.9
#52/186
84.571.335.794.5$0.6$2.5262KJul 1, 2025
0905 Preview
Kimi K2
75.3
#93/186
74.269.251.0$0.6$2.5262KJul 1, 2025
0711 Preview
Kimi K2
73.3
#102/186
75.165.870.649.5$0.57$2.3131KJul 1, 2025
Base
Kimi K2
60.9
#151/186
48.1$0.57$2.3131KJul 1, 2025
Instruct
Kimi K2
60.5
#155/186
27.727.8$0.57$2.3131KJul 1, 2025
Thinking Turbo
Kimi K2
$0.57$2.3131KJul 1, 2025
K2.5 Instant
Kimi K2
$0.57$2.3131KJul 1, 2025
K2.6
Kimi K2
$0.57$2.3131KJul 1, 2025
Thinking
Kimi-VL-A3B
Jan 15, 2025
Non-Thinking
Kimi-VL-A3B
Jan 15, 2025
V4 Pro Thinkingcross-family
DeepSeek V4
98.0
#15/186
90.137.780.655.473.6$0.435$0.871.0MApr 24, 2026
V4 Flash Thinkingcross-family
DeepSeek V4
92.0
#27/186
88.134.879.052.669.0$0.098$0.1971.0MApr 24, 2026
V4 Procross-family
DeepSeek V4
80.9
#61/186
72.97.773.652.169.4$0.435$0.871.0MApr 24, 2026
V4 Flashcross-family
DeepSeek V4
78.1
#78/186
71.28.173.749.164.0$0.098$0.1971.0MApr 24, 2026

Benchmark evidence

Every benchmark we track for this family, across capabilities. The headline Quality Score draws from a deliberately narrow, governed panel (64 of 199 rows here feed it); the rest is tracked evidence — recorded and comparable, but not folded into one synthetic score.

Model / VariantBenchmarkScoreRankScoring
Moonshot Kimi K2 · K2.6 ThinkingLiveCodeBench · v689.62 / 40In Quality Score
Moonshot Kimi K2 · ThinkingSWE-bench Verified · multilingual_single61.12 / 10In Quality Score
Moonshot Kimi K2 · 0711 PreviewSWE-bench Verified · single_agentless51.82 / 7In Quality Score
Moonshot Kimi K2 · ThinkingAIME 2025 · aime_2025_python99.13 / 7In Quality Score
Moonshot Kimi K2 · 0905 PreviewSWE-bench Verified · multilingual_single55.94 / 10In Quality Score
Moonshot Kimi K2 · K2.6 ThinkingHumanity's Last Exam · tools544 / 38In Quality Score
Moonshot Kimi K2 · ThinkingAIME 202594.55 / 88In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingLiveCodeBench · v6856 / 40In Quality Score
Show all benchmark evidence (199 rows)

Reasoning

Model / VariantBenchmarkScoreRankScoring
Moonshot Kimi K2 · ThinkingAIME 2025 · aime_2025_python99.13 / 7In Quality Score
Moonshot Kimi K2 · K2.6 ThinkingHumanity's Last Exam · tools544 / 38In Quality Score
Moonshot Kimi K2 · ThinkingAIME 202594.55 / 88In Quality Score
Moonshot Kimi K2 · 0905 PreviewAIME 2025 · aime_2025_python75.26 / 7In Quality Score
Moonshot Kimi K2 · K2.6 ThinkingHumanity's Last Exam · hle_text34.76 / 56In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingMMLU Pro87.18 / 86In Quality Score
Moonshot Kimi K2 · 0711 PreviewLiveBench76.48 / 110In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingHumanity's Last Exam · tools51.89 / 38In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingHumanity's Last Exam · hle_text30.19 / 56In Quality Score
Moonshot Kimi K2 · K2.6 ThinkingGPQA Diamond90.511 / 143In Quality Score
Moonshot Kimi K2 · ThinkingHumanity's Last Exam · hle_text23.914 / 56In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingHumanity's Last Exam · hle31.517 / 90In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingGPQA Diamond87.622 / 143In Quality Score
Moonshot Kimi K2 · K2.6Arena Elo146224 / 158In Quality Score
Moonshot Kimi K2 · ThinkingHumanity's Last Exam · tools44.924 / 38In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingAIME 202584.825 / 88In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingSimpleBench46.825 / 61In Quality Score
Moonshot Kimi K2 · ThinkingMMLU Pro84.627 / 86In Quality Score
Moonshot Kimi K2 · K2.6 ThinkingLiveBench72.227 / 110In Quality Score
Moonshot Kimi K2 · 0905 PreviewHumanity's Last Exam · tools21.735 / 38In Quality Score
Moonshot Kimi K2 · ThinkingGPQA Diamond84.536 / 143In Quality Score
Moonshot Kimi K2 · 0905 PreviewHumanity's Last Exam · hle_text7.936 / 56In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingArena Elo144937 / 158In Quality Score
Moonshot Kimi K2 · ThinkingSimpleBench39.637 / 61In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingLiveBench69.138 / 110In Quality Score
Moonshot Kimi K2 · 0905 PreviewMMLU Pro81.940 / 86In Quality Score
Moonshot Kimi K2 · 0711 PreviewMMLU Pro81.146 / 86In Quality Score
Moonshot Kimi K2 · 0711 PreviewSimpleBench26.348 / 61In Quality Score
Moonshot Kimi K2 · 0905 PreviewAIME 20255150 / 88In Quality Score
Moonshot Kimi K2 · 0711 PreviewAIME 202549.551 / 88In Quality Score
Moonshot Kimi K2 · 0711 PreviewHumanity's Last Exam · hle_text4.752 / 56In Quality Score
Moonshot Kimi K2 · K2.5 InstantArena Elo143253 / 158In Quality Score
Moonshot Kimi K2 · Thinking TurboArena Elo143056 / 158In Quality Score
Moonshot Kimi K2 · ThinkingLiveBench61.657 / 110In Quality Score
Moonshot Kimi K2 · 0711 PreviewGPQA Diamond75.164 / 143In Quality Score
Moonshot Kimi K2 · 0905 PreviewGPQA Diamond74.266 / 143In Quality Score
Moonshot Kimi K2 · BaseMMLU Pro69.266 / 86In Quality Score
Moonshot Kimi K2 · 0905 PreviewArena Elo141868 / 158In Quality Score
Moonshot Kimi K2 · 0711 PreviewArena Elo141770 / 158In Quality Score
Moonshot Kimi K2 · BaseGPQA Diamond48.1119 / 143In Quality Score
Moonshot Kimi K2 · ThinkingHMMT Feb 2025 · python95.12 / 6Tracked evidence
Moonshot Kimi K2 · 0711 PreviewAceBench76.52 / 7Tracked evidence
Moonshot Kimi K2 · ThinkingLongform Writing73.82 / 5Tracked evidence
Moonshot Kimi K2 · ThinkingHealthBench582 / 5Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingHMMT Feb 202692.73 / 16Tracked evidence
Moonshot Kimi K2 · BaseGSM8K92.13 / 10Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingSciCode52.23 / 24Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingAIME 202696.44 / 19Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingHMMT Feb 202595.44 / 44Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingIMO AnswerBench865 / 28Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingWMT24++77.65 / 6Tracked evidence
Moonshot Kimi K2 · 0905 PreviewHMMT Feb 2025 · python70.45 / 6Tracked evidence
Moonshot Kimi K2 · 0905 PreviewLongform Writing62.85 / 5Tracked evidence
Moonshot Kimi K2 · 0905 PreviewHealthBench43.85 / 5Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingBrowseComp83.27 / 51Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingBrowseComp · context_manage74.97 / 15Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingSciCode48.77 / 24Tracked evidence
Moonshot Kimi K2 · 0711 PreviewBFCL v371.18 / 49Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingIFBench70.18 / 28Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingGlobal PIQA89.39 / 26Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingMMMU PRO79.49 / 52Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingAIME 202694.510 / 19Tracked evidence
Moonshot Kimi K2 · ThinkingSciCode44.810 / 24Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMMMU PRO78.511 / 52Tracked evidence
Moonshot Kimi K2 · ThinkingBrowseComp_zh62.311 / 20Tracked evidence
Moonshot Kimi K2 · BaseSimpleQA35.311 / 40Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingIMO AnswerBench81.812 / 28Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingHMMT Feb 202681.312 / 16Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingBrowseComp_zh62.312 / 20Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingHMMT Nov 202591.113 / 31Tracked evidence
Moonshot Kimi K2 · 0711 PreviewMMLU89.514 / 33Tracked evidence
Moonshot Kimi K2 · 0711 PreviewSimpleQA3114 / 40Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMMMLU8615 / 38Tracked evidence
Moonshot Kimi K2 · ThinkingHMMT Feb 202589.416 / 44Tracked evidence
Moonshot Kimi K2 · ThinkingIMO AnswerBench78.617 / 28Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMAXIFE72.817 / 21Tracked evidence
Kimi-VL-A3B · ThinkingMMMU · mmmu_single60.217 / 22Tracked evidence
Moonshot Kimi K2 · BaseMMLU87.819 / 33Tracked evidence
Kimi-VL-A3B · Non-ThinkingMMMU · mmmu_single5220 / 22Tracked evidence
Moonshot Kimi K2 · 0905 PreviewBrowseComp_zh22.220 / 20Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingBrowseComp60.621 / 51Tracked evidence
Moonshot Kimi K2 · ThinkingBrowseComp60.222 / 51Tracked evidence
Moonshot Kimi K2 · 0905 PreviewSciCode30.724 / 24Tracked evidence
Moonshot Kimi K2 · 0905 PreviewIMO AnswerBench45.826 / 28Tracked evidence
Moonshot Kimi K2 · 0711 PreviewAIME 202469.631 / 69Tracked evidence
Moonshot Kimi K2 · 0711 PreviewHMMT Feb 202538.835 / 44Tracked evidence
Moonshot Kimi K2 · 0905 PreviewHMMT Feb 202538.836 / 44Tracked evidence
Moonshot Kimi K2 · 0711 PreviewBrowseComp7.943 / 51Tracked evidence
Moonshot Kimi K2 · 0905 PreviewBrowseComp7.445 / 51Tracked evidence

Coding

Model / VariantBenchmarkScoreRankScoring
Moonshot Kimi K2 · K2.6 ThinkingLiveCodeBench · v689.62 / 40In Quality Score
Moonshot Kimi K2 · ThinkingSWE-bench Verified · multilingual_single61.12 / 10In Quality Score
Moonshot Kimi K2 · 0711 PreviewSWE-bench Verified · single_agentless51.82 / 7In Quality Score
Moonshot Kimi K2 · 0905 PreviewSWE-bench Verified · multilingual_single55.94 / 10In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingLiveCodeBench · v6856 / 40In Quality Score
Moonshot Kimi K2 · 0711 PreviewSWE-bench Verified · multiple71.67 / 10In Quality Score
Moonshot Kimi K2 · 0711 PreviewSWE-bench Verified · multilingual_single47.37 / 10In Quality Score
Moonshot Kimi K2 · K2.6 ThinkingSWE-bench Verified80.28 / 68In Quality Score
Moonshot Kimi K2 · ThinkingLiveCodeBench · v683.111 / 40In Quality Score
Moonshot Kimi K2 · 0711 PreviewAider (Polyglot)6019 / 45In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingSWE-bench Verified76.820 / 68In Quality Score
Moonshot Kimi K2 · 0711 PreviewGSO (Global Software Optimization) · opt_at_1220 / 24In Quality Score
Moonshot Kimi K2 · 0905 PreviewLiveCodeBench · v656.128 / 40In Quality Score
Moonshot Kimi K2 · 0711 PreviewLiveCodeBench · v653.730 / 40In Quality Score
Moonshot Kimi K2 · BaseLiveCodeBench · v626.337 / 40In Quality Score
Moonshot Kimi K2 · ThinkingSWE-bench Verified71.342 / 68In Quality Score
Moonshot Kimi K2 · 0905 PreviewSWE-bench Verified69.243 / 68In Quality Score
Moonshot Kimi K2 · 0711 PreviewSWE-bench Verified65.848 / 68In Quality Score
Moonshot Kimi K2 · K2.6 ThinkingOJ-Bench60.61 / 19Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingOJ-Bench · cpp57.41 / 6Tracked evidence
Moonshot Kimi K2 · ThinkingOJ-Bench · cpp48.73 / 6Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingSWE-bench Multilingual76.74 / 18Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingSecCodeBench61.35 / 6Tracked evidence
Moonshot Kimi K2 · 0905 PreviewOJ-Bench · cpp25.56 / 6Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingSWE-bench Multilingual738 / 18Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingNL2Repo328 / 9Tracked evidence
Moonshot Kimi K2 · 0711 PreviewOJ-Bench27.111 / 19Tracked evidence

Agentic

Model / VariantBenchmarkScoreRankScoring
Moonshot Kimi K2 · K2.5 ThinkingMCP Atlas · public_set63.810 / 13In Quality Score
Moonshot Kimi K2 · K2.5 Thinkingτ²-bench · average80.212 / 30In Quality Score
Moonshot Kimi K2 · K2.5 ThinkingMCP Atlas64.412 / 33In Quality Score
Moonshot Kimi K2 · 0711 Previewτ²-bench · airline56.516 / 29In Quality Score
Moonshot Kimi K2 · 0711 Previewτ²-bench · telecom65.818 / 28In Quality Score
Moonshot Kimi K2 · 0711 Previewτ²-bench · retail70.622 / 34In Quality Score
Moonshot Kimi K2 · K2.6 ThinkingDeepSearchQA92.51 / 7Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingWideSearch80.81 / 13Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingFinSearchComp · t2_t367.81 / 2Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingPaperBench63.51 / 2Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingSeal-057.41 / 16Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingDeepSearchQA77.12 / 7Tracked evidence
Moonshot Kimi K2 · ThinkingSeal-056.32 / 16Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingMCPMark55.92 / 8Tracked evidence
Moonshot Kimi K2 · ThinkingFinSearchComp-T3472 / 5Tracked evidence
Moonshot Kimi K2 · K2.5 Thinkingτ³-Bench · telecom86.84 / 6Tracked evidence
Moonshot Kimi K2 · K2.5 Thinkingτ³-Bench · airline764 / 6Tracked evidence
Moonshot Kimi K2 · K2.5 Thinkingτ³-Bench · banking14.94 / 6Tracked evidence
Moonshot Kimi K2 · K2.5 Thinkingτ³-Bench · retail72.85 / 6Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingWideSearch72.75 / 13Tracked evidence
Moonshot Kimi K2 · 0905 PreviewFinSearchComp-T3105 / 5Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingBFCL v468.36 / 18Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingToolathlon506 / 31Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMCPMark29.58 / 8Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingOSWorld · verified73.19 / 27Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingCyberGym41.39 / 12Tracked evidence
Moonshot Kimi K2 · K2.5 Thinkingτ³-Bench6610 / 10Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingDeepPlanning14.514 / 16Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingOSWorld · verified63.315 / 27Tracked evidence
Moonshot Kimi K2 · 0905 PreviewSeal-025.216 / 16Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingToolathlon27.825 / 31Tracked evidence

Multimodal

Model / VariantBenchmarkScoreRankScoring
Moonshot Kimi K2 · K2.6 ThinkingV*96.91 / 23Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMMBench · en_dev_v1_194.21 / 24Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingVideoMME87.41 / 4Tracked evidence
Kimi-VL-A3B · Non-ThinkingChartQA · test871 / 10Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingSLAKE81.61 / 22Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMotionBench70.41 / 4Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMathVista · mini90.12 / 36Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingMathVision87.42 / 17Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMMVU80.42 / 20Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingLVBench75.92 / 18Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingWorldVQA46.32 / 5Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingVideo-MMMU86.63 / 28Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingVideoMME · with_sub87.44 / 22Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingSimpleVQA71.24 / 29Tracked evidence
Kimi-VL-A3B · ThinkingMathVerse · mini614 / 10Tracked evidence
Kimi-VL-A3B · ThinkingMathVision · mini50.34 / 10Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingVideoMME · without_sub83.25 / 21Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingBabyVision39.85 / 22Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMathVision84.26 / 17Tracked evidence
Kimi-VL-A3B · ThinkingHallusionBench70.66 / 33Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMMStar80.57 / 33Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingBabyVision36.57 / 22Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingZEROBench · sub33.57 / 23Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingZEROBench97 / 27Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingAI2D · test90.88 / 33Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMLVU · mavg858 / 22Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingDynaMath84.48 / 23Tracked evidence
Kimi-VL-A3B · ThinkingChartQA · test73.38 / 10Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingHallusionBench69.88 / 33Tracked evidence
Kimi-VL-A3B · Non-ThinkingMathVerse · mini41.78 / 10Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingCountBench94.19 / 23Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMedXpertQA · mm65.39 / 31Tracked evidence
Kimi-VL-A3B · Non-ThinkingMathVision · mini28.39 / 10Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingRealWorldQA8110 / 24Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingLingoQA68.210 / 16Tracked evidence
Moonshot Kimi K2 · K2.6 ThinkingCharXiv Reasoning80.411 / 48Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMVBench73.511 / 18Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingRefCOCO · avg87.812 / 18Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingEmbSpatialBench77.415 / 24Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingV*7717 / 23Tracked evidence
Kimi-VL-A3B · Non-ThinkingHallusionBench65.217 / 33Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingCharXiv Reasoning77.518 / 48Tracked evidence
Kimi-VL-A3B · Non-ThinkingAI2D · test84.621 / 33Tracked evidence
Kimi-VL-A3B · ThinkingMathVista · mini78.622 / 36Tracked evidence
Kimi-VL-A3B · ThinkingMMStar69.622 / 33Tracked evidence
Kimi-VL-A3B · ThinkingAI2D · test81.227 / 33Tracked evidence
Kimi-VL-A3B · Non-ThinkingMMStar6029 / 33Tracked evidence
Kimi-VL-A3B · Non-ThinkingMathVista · mini67.132 / 36Tracked evidence

Document/OCR

Model / VariantBenchmarkScoreRankScoring
Moonshot Kimi K2 · K2.5 ThinkingOCRBench92.32 / 35Tracked evidence
Moonshot Kimi K2 · K2.5 ThinkingMMLongBench-Doc58.57 / 22Tracked evidence
Kimi-VL-A3B · Non-ThinkingOCRBench86.512 / 35Tracked evidence
Kimi-VL-A3B · ThinkingOCRBench79.924 / 35Tracked evidence

Where this family sits in the market

Quality Score vs. input price across the public model catalog. Highlighted dots are this family's variants — same set as the table above.

AnthropicCohereDeepSeekGoogleMetaMicrosoftMiniMaxMistralMoonshotnvidiaOpenAIQwenxAIZhipu

Dashed line = Pareto frontier (no model both cheaper and better). Thinking/non-thinking pairs of the same model are connected — line length = cost of reasoning. Hover any dot for details.

Self-hosting

These variants ship with open weights, so you can run them on your own hardware or via a hosting provider you control. Pick a variant that fits your GPU memory budget; mixture-of-experts variants are cheaper to serve than their total parameter count suggests, but the full weights still need to fit in memory.

  • Moonshot Kimi K2K2.6 Thinking · open weights
  • Kimi-VL-A3BThinking · open weights

Alternatives to consider

Peer families that solve overlapping problems. Pick by your binding constraint (cost, latency, open weights, vendor lock-in), not by leaderboard order.

Caveats

What this page does not tell you, listed honestly.

  • Quality score not yet computed for: Kimi-VL-A3B. We require a minimum benchmark coverage before scoring; until the gap is filled the row shows a dash.
  • No tracked API pricing for: Kimi-VL-A3B. Variants without hosted-provider pricing are listed for completeness; cost columns show a dash.
  • Context window not declared for: Kimi-VL-A3B.
  • Cross-family models (marked "cross-family" in the variants table) are shown for context only. Their canonical page lives on the family that owns them.

Editor's notes

By borisLast verified AI-assisted, human-reviewed

Why this family matters

Moonshot AI's Kimi line solves two distinct problems with two distinct models. Kimi K2 is the chat-and-tools workhorse, and the K2.6-thinking variant lands at Quality Score 99.5 (#11 of 186 models we track), which puts it inside the open-weights frontier cluster against the strongest open competitors. Kimi VL is the vision-language line.

The structural fact pulling teams onto K2 is the long-context profile. The 0905 checkpoint ships with a 262K-token context window at $0.6 input / $2.5 output per million, and the K2.6-thinking variant pairs strong reasoning quality with that same context envelope. For teams running document-heavy or long-conversation workloads on the open-API side, that combination is the headline reason to evaluate.

Which variant to start with

Default to moonshot-kimi-k2 for text-only workloads. The 0905 checkpoint at 262K context and $0.6 / $2.5 per million is the practical entry point. Step up to the K2.6-thinking variant (QS 99.5, $0.57 / $2.3 per million) when the workload visibly benefits from explicit reasoning behaviour.

When to deviate:

  • Image-grounded document workloads: consider moonshot-kimi-vl-a3b, the family's vision-language line. Use it when the workload is layout-aware extraction or image-grounded reasoning where running OCR to text and then a chat-tier LLM loses information. Caveat: benchmark coverage for Kimi VL in our current index is incomplete (no pricing, no context window, no scores at last verification). Treat this variant as a directional pick to evaluate against your own data, not a pre-validated recommendation.
  • Long chain-of-thought reasoning: K2.6-thinking is the family's reasoning ceiling on our index. Compare against DeepSeek R1 and Claude Opus 4.5+ thinking on the specific benchmark that matters before committing.
  • Cheapest Moonshot tier: the K2 0711 and base checkpoints are slightly older but priced at $0.57 / $2.3 per million with 131K context. The 0905 update moves both context (131K to 262K) and price (cheaper input, same output) in the right direction; there is no strong reason to start on 0711 unless a deployment is already pinned.

Where the data is weak

We aggregate benchmark scores from multiple sources but coverage on this family is uneven. Specifically:

  • Kimi VL has effectively no public benchmark or pricing data in our current pull. The variant is registered (Kimi-VL-A3B with thinking and non-thinking modes) but context window, per-million pricing, and per-benchmark scores are all unset. We surface it for completeness; treat it as exploratory until the data fills in.
  • K2 has many minor checkpoints (0711, 0905, base, instruct, thinking, thinking-turbo, K2.5-thinking, K2.6-thinking). The difference between K2 0711 (QS 73.3) and K2.6-thinking (QS 99.5) is large enough that "Kimi K2 quality" is not a useful single number; the variant table is the load-bearing artifact, not a family-level Quality Score.
  • Pricing on this page is the published API list price. Moonshot ships through several inference providers in addition to its own API; list price is a calibration anchor, not the cost ceiling.
  • Long-context behaviour on K2 deserves its own evaluation. A 262K context window in the API does not mean uniform recall across that range; verify with a needle-in-haystack-style test on your specific document distribution before committing.

If you are making a procurement decision, the variant table on this page is the load-bearing artifact. Cross-check pricing against Moonshot's own docs before you commit.

When to reach for which alternative

  • Cheapest competent long-context API: DeepSeek V4 Flash ships 1M context at $0.098 / $0.197 with QS 78.1, which is a stronger cost-and-context anchor than K2 0905 if quality is acceptable at Flash's tier. K2.6-thinking still beats Flash on raw quality score; the choice depends on whether the score delta or the context-cost delta matters more for the workload.
  • Closed-flagship reasoning at the top end: Claude Opus 4.5+ thinking and full GPT-5 are the anchors to compare against if peak quality on a single reasoning benchmark is the binding constraint.
  • You need vision-language data you can rely on today: the Kimi VL data gap on our index is real; if you cannot wait for our coverage to fill in, evaluate against vision-language variants from families with stronger published benchmarks (e.g. the Qwen3-VL surface in our index, when it ships).

Sources worth reading

How we score

Quality scores combine multiple public benchmarks (LMArena, LiveBench, SWE-bench, Aider and others) into a single comparable number. Pricing is the published API list price; self-hosted cost depends on your own hardware. We do not accept paid placements.

Author: Boris. Read the full methodology.

Get the next Kimi update

New variants, repriced models, and recommendation changes, in plain English. No spam, no paid placements.

Subscribe →

Need help picking for production?

Independent evaluation against your real workload, your real data, and your real cost ceiling. No vendor incentives.

See services →