Home/Open Source Models/Command R+ (Cohere)

Command R+ (Cohere)

Corporate OSS#11 of 12 in Open Source Models
59%
COVERAGE
RAG-optimized open model; 128K context; strongest retrieval-augmented performance; built-in citation generation; 10 languages; grounded generation specialist; non-commercial license limits use
Benchmarks
0 full, 4 partial of 4
Knowledge (MMLU/GPQA)
Performance on knowledge benchmarks — MMLU, GPQA, ARC. Breadth and depth of world knowledge vs frontier closed-source models.
Partial
Reasoning (MATH/Logic)
Multi-step reasoning, chain-of-thought, MATH benchmark. Dedicated reasoning variants (e.g. DeepSeek-R1, Qwen-reasoning).
Partial
Coding (SWE-Bench/HumanEval)
Code generation, debugging quality. Specialized code variants (Codestral, Qwen-Coder, Granite Code) and SWE-Bench scores.
Partial
Speed & Inference Efficiency
Tokens-per-second on common GPUs, time-to-first-token, memory efficiency. How fast the model runs on typical self-hosted hardware.
Partial
Model Features
1 full, 1 partial of 2
Parameter Size Range
Available sizes from small (1-7B) to large (70B+). Ability to match model size to hardware constraints and use case.
Partial
Multilingual Support
Number of languages supported with strong performance. Non-English capability depth and quality.
Full
Deployment
2 full, 2 partial of 5
License Terms
Apache 2.0 / MIT (fully permissive) vs custom licenses with restrictions (Llama Community License, CC-BY-NC, etc).
None
Fine-tuning Ecosystem
Ease of fine-tuning with LoRA/QLoRA/full. Availability of training recipes, datasets, community adapters, and fine-tune guides.
Partial
Quantization Support
GGUF, GPTQ, AWQ, and other quantization formats. Quality retention at lower precision. Community-quantized versions available.
Full
Inference Optimization
vLLM, TGI, llama.cpp, TensorRT-LLM, SGLang support. Framework compatibility and serving infrastructure breadth.
Full
Self-Hosting Cost
Cost to self-host. Full = runs on consumer/single GPU (<$1/hr). Partial = needs multi-GPU ($1-5/hr). None = requires GPU cluster ($5+/hr). Considers smallest capable variant.
Partial
Ecosystem
2 full, 2 partial of 5
Community & Ecosystem
HuggingFace downloads, GitHub stars, community fine-tunes, tooling support, and overall adoption momentum.
Partial
Hosting Platform Access
Available on Together AI, Fireworks, Groq, Replicate, Lepton, and other inference providers. Breadth of cloud hosting options.
Full
Multimodal Variants
Vision, audio, and video model variants within the family. Multimodal capability breadth (e.g. Llama-Vision, Qwen-VL).
None
Context Length
Maximum context window. 128K+ is leading. Long-context variants and quality retention at extended lengths.
Full
Safety & Controllability
Built-in safety training, system prompt adherence, refusal calibration, alignment quality, and safety documentation.
Partial
Top Peers in Open Source Models
1Llama 3.3 (Meta)
97%
2Qwen 2.5 (Alibaba)
97%
3Mistral Large / Nemo
94%
See all 12 vendors in Open Source Models →
Full vendor profile →Back to Open Source Models →