DeepSeek R1
DeepSeek R1 is an open-source large language model released in January 2025, designed specifically for complex reasoning tasks. Built using a novel pure reinforcement learning (RL) approach without supervised fine-tuning, it achieves performance comparable to OpenAI's o1 on mathematics, coding, and logical reasoning benchmarks. The model uses a Mixture of Experts (MoE) architecture with 671 billion total parameters but only 37 billion activated per forward pass, ensuring computational efficiency. DeepSeek R1 was trained for just $5.58 million using 2.78 million GPU hours, dramatically less than comparable models from larger organizations. What sets DeepSeek R1 apart is its complete open-source availability under the MIT license, including model weights, training code, and distilled versions ranging from 1.5B to 70B parameters. The API pricing ($0.55 input / $2.19 output per 1M tokens) is 96.4% cheaper than OpenAI o1, making advanced reasoning accessible at scale. DeepSeek R1 excels at mathematical problem-solving (79.8% on AIME 2024, 97.3% on MATH-500), coding (2029 Elo on Codeforces), and general reasoning (90.8 on MMLU). Its pure RL training led to remarkable emergent behaviors including self-verification, reflection, and dynamic chain-of-thought length adjustment.