Unique SSM+Transformer hybrid architecture; 256K context with efficient processing; Apache 2.0; excels at long-document tasks; lower memory footprint than pure transformer at equivalent context
Benchmarks
1 full, 3 partial of 4
Knowledge (MMLU/GPQA)
Performance on knowledge benchmarks — MMLU, GPQA, ARC. Breadth and depth of world knowledge vs frontier closed-source models.