We all know giants like OpenAI’s GPT-4 and Meta’s Llama dominate the world of Artificial Intelligence. But there is a Chinese AI lab shaking up the playing field with its latest innovation: DeepSeek V3. Released on Wednesday by the AI firm DeepSeek, this model is being hailed as one of the most powerful “open” AI challengers to date. Its advanced capabilities, groundbreaking architecture, and controversial limitations have sparked a global conversation about the future of AI.
DeepSeek V3 isn’t your average AI model. Designed for a range of tasks, from writing essays and emails to coding and language translation, it outperforms many rivals in specific benchmarks. What sets it apart? First, its “open” nature.
Unlike closed AI models that can only be accessed through an API (like GPT-4), DeepSeek V3 is available under a permissive license. Developers can download, modify, and even use it for commercial purposes—an increasingly rare approach in a field where closed systems are becoming the norm.
Despite its open-source approach, DeepSeek remains profitable—an unusual feat in a field dominated by subsidies and losses. The founder sees open source as both a strategy and a cultural statement: “Our value lies in our team, which grows and accumulates know-how. Building an organization and culture that can consistently innovate is our real moat.” said Liang Wenfeng, DeepSeek’s founder in a interview
DeepSeek V3’s technical specs, according to commentators and enthusiasts, are jaw-dropping. With 671 billion parameters (or 685 billion on AI platform Hugging Face), it dwarfs many competitors, including Meta’s Llama 3.1 405B. Parameters are akin to the brain cells of an AI model, helping it make decisions and predictions.
The more parameters, the more nuanced and accurate the model’s capabilities, at least in theory. But here’s the twist: DeepSeek V3 employs a “Mixture of Experts” (MoE) design, which activates only a subset (37 billion) of its total parameters for any given task.
Think of MoE as a team of specialists: instead of involving the entire group for every problem, only the most relevant experts are called in. This makes DeepSeek V3 both efficient and powerful, a combination that has long been a challenge for AI developers to achieve.
Performance: Outshining the competition
DeepSeek V3’s performance on various benchmarks is nothing short of impressive. Let’s break it down:
In academic testing, DeepSeek V3 scored 88.5 on the MMLU benchmark, which measures how well models can answer complex academic questions. This places it among the best in the industry. On the reading comprehension test DROP, it hit an extraordinary 91.6—far surpassing GPT-4. However, it stumbled on simpler Q&A tasks like SimpleQA, scoring just 24.9.
For developers, DeepSeek V3 is a promising tool. It excelled in coding challenges like HumanEval-Mul (82.6), which tests the AI’s ability to handle diverse coding problems. It also shone on Aider Tools’ polyglot editing task, scoring 79.7. However, it struggled with more complex real-world coding benchmarks like LiveCodeBench, where its performance lagged slightly behind competitors.
If you’re looking for a math tutor, DeepSeek V3 might be it. The model excelled in solving advanced math problems, scoring 90.2 on the MATH-500 benchmark—a sign that it can tackle even challenging high school or college-level problems.
Unsurprisingly for a model developed in China, DeepSeek V3 is adept at handling Chinese language tasks. It scored an impressive 90.9 on the CLUEWSC benchmark, which tests understanding of nuanced sentences. However, it performed less well on simpler Chinese Q&A tasks, scoring 64.1 on C-SimpleQA.
What’s perhaps most remarkable about DeepSeek V3 is how it was developed.
The model was trained on a staggering 14.8 trillion tokens (equivalent to roughly 11.1 trillion words) in just two months using Nvidia H800 GPUs. These GPUs have recently been restricted for purchase by Chinese companies due to U.S. trade regulations, yet DeepSeek managed to leverage them efficiently.
Even more striking is the cost. Training DeepSeek V3 reportedly cost just $5.5 million—a fraction of what it took to develop models like GPT-4. This raises questions about how DeepSeek achieved such efficiency, especially given the sheer scale of the dataset and model size.
Liang noted: “We introduced architectural advancements like MLA (Multi-head Latent Attention) that reduced memory usage and computational demands. This allowed us to bring down costs while maintaining cutting-edge performance.”
These achievements are no accident. Liang highlighted the importance of structural innovation: “Most Chinese companies follow Llama’s framework for fast deployment, but we’ve explored new architectures to stay at the forefront of global standards.”
Not everything about DeepSeek V3 is rosy. As with many Chinese AI systems, the model’s outputs are subject to government oversight. When asked about politically sensitive topics like Tiananmen Square, DeepSeek V3 refuses to answer—a reflection of the Chinese internet regulator’s requirement that AI models embody “core socialist values.” This censorship limits the model’s utility in open discourse and raises concerns about the intersection of AI and geopolitics.
Liang acknowledges these constraints but believes the focus should remain on advancing technology: “Some explorations are inevitable. Chinese AI can’t stay in a perpetual following position. We need to contribute to the global ecosystem.”
Who’s behind DeepSeek?
DeepSeek is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund with a strong focus on AI. High-Flyer has deep pockets, having spent $138 million on one of its server clusters equipped with 10,000 Nvidia A100 GPUs.
The company began exploring fully automated quantitative trading in 2008, employing advanced techniques such as machine learning. A major milestone was reached on October 21, 2016, when the first trade powered by a deep learning algorithm was executed.
By 2017, deep learning was fully integrated into the company’s trading systems. Over the years, the company has gained extensive practical experience in AI-driven quantitative trading, navigating various market conditions. The firm has always been committed to advancing AI algorithm research and continues to invest heavily in this area as a key part of its long-term strategy.
The hedge fund’s founder, Liang Wenfeng, has ambitious goals of achieving “superintelligent” AI and sees closed-source models like GPT-4 as a “temporary moat.” For him “In the face of disruptive technology, a closed-source moat is temporary. True progress comes from a relentless pursuit of the hardest problems, and that’s where we find our strength.”
This mindset aligns with the release of DeepSeek V3 as an open model, challenging the dominance of proprietary systems. Wenfeng’s bet seems to be paying off, as DeepSeek V3 positions itself as a serious contender in the global AI arms race.
DeepSeek V3 sets itself apart by embracing open-source principles in a field where most advancements are locked behind proprietary APIs. Liang Wenfeng, DeepSeek’s enigmatic founder, explained: “Open sourcing and publishing papers don’t mean we lose anything. For technologists, being followed is an achievement. Giving is a form of honor, and it attracts talent by fostering a unique culture.”
DeepSeek’s ultimate goal is achieving AGI (Artificial General Intelligence). Liang described their roadmap: “We’re placing bets on three directions: mathematics and code, multimodality, and natural language itself. AGI could be 2, 5, or 10 years away, but it will definitely happen in our lifetime.”
This idealism extends to their belief in collaboration over competition. Liang stated: “Providing cloud services isn’t our primary goal. We aim to create an ecosystem where others build on our foundation, driving societal operational efficiency.”