Home/Open Source Models/Command R+ (Cohere)

Command R+ (Cohere)

Corporate OSS#11 of 12 in Open Source Models

59%

COVERAGE

RAG-optimized open model; 128K context; strongest retrieval-augmented performance; built-in citation generation; 10 languages; grounded generation specialist; non-commercial license limits use

Benchmarks

0 full, 4 partial of 4

Knowledge (MMLU/GPQA)

Performance on knowledge benchmarks — MMLU, GPQA, ARC. Breadth and depth of world knowledge vs frontier closed-source models.

Partial

Reasoning (MATH/Logic)

Multi-step reasoning, chain-of-thought, MATH benchmark. Dedicated reasoning variants (e.g. DeepSeek-R1, Qwen-reasoning).

Partial

Coding (SWE-Bench/HumanEval)

Code generation, debugging quality. Specialized code variants (Codestral, Qwen-Coder, Granite Code) and SWE-Bench scores.

Partial

Speed & Inference Efficiency

Tokens-per-second on common GPUs, time-to-first-token, memory efficiency. How fast the model runs on typical self-hosted hardware.

Partial

Model Features

1 full, 1 partial of 2

Parameter Size Range

Available sizes from small (1-7B) to large (70B+). Ability to match model size to hardware constraints and use case.

Partial

Multilingual Support

Number of languages supported with strong performance. Non-English capability depth and quality.

Full

Deployment

2 full, 2 partial of 5

License Terms

Apache 2.0 / MIT (fully permissive) vs custom licenses with restrictions (Llama Community License, CC-BY-NC, etc).

None

Fine-tuning Ecosystem

Ease of fine-tuning with LoRA/QLoRA/full. Availability of training recipes, datasets, community adapters, and fine-tune guides.

Partial

Quantization Support

GGUF, GPTQ, AWQ, and other quantization formats. Quality retention at lower precision. Community-quantized versions available.

Full

Inference Optimization

vLLM, TGI, llama.cpp, TensorRT-LLM, SGLang support. Framework compatibility and serving infrastructure breadth.

Full

Self-Hosting Cost

Cost to self-host. Full = runs on consumer/single GPU (<$1/hr). Partial = needs multi-GPU ($1-5/hr). None = requires GPU cluster ($5+/hr). Considers smallest capable variant.

Partial

Ecosystem

2 full, 2 partial of 5

Community & Ecosystem

HuggingFace downloads, GitHub stars, community fine-tunes, tooling support, and overall adoption momentum.

Partial

Hosting Platform Access

Available on Together AI, Fireworks, Groq, Replicate, Lepton, and other inference providers. Breadth of cloud hosting options.

Full

Multimodal Variants

Vision, audio, and video model variants within the family. Multimodal capability breadth (e.g. Llama-Vision, Qwen-VL).

None

Context Length

Maximum context window. 128K+ is leading. Long-context variants and quality retention at extended lengths.

Full

Safety & Controllability

Built-in safety training, system prompt adherence, refusal calibration, alignment quality, and safety documentation.

Partial

Top Peers in Open Source Models

3Mistral Large / Nemo

94%See all 12 vendors in Open Source Models →

Full vendor profile →Back to Open Source Models →