Ant Group, backed by Jack Ma, has made significant strides in reducing the costs associated with training artificial intelligence models. The company has used Chinese-made semiconductors, including those from Alibaba Group’s affiliate and Huawei Technologies, to train models with a machine learning technique known as Mixture of Experts (MoE). Sources familiar with the matter revealed that this strategy resulted in AI training efficiencies comparable to Nvidia’s H800 chips, which are currently restricted from being exported to China by the US.
Though Ant continues to rely on Nvidia’s chips for some AI development, it is increasingly turning to alternatives, including those from Advanced Micro Devices Inc. and domestic Chinese manufacturers, to power its latest AI models. This marks a notable entry for Ant into the global AI race, which has been fueled by both Chinese and US companies. The development underscores China’s ambition to reduce reliance on foreign semiconductors, particularly those from Nvidia, which has been blocked from exporting its most advanced chips to China.
In a research paper published this month, Ant Group claimed that its models had outperformed Meta Platforms Inc. in certain benchmarks, although Bloomberg News has not independently verified this assertion. If accurate, these results could represent a breakthrough for Chinese AI development, offering a more cost-effective alternative to the high expenses typically associated with AI inferencing and support services.
MoE models, which divide tasks into smaller sets of data for more efficient processing, have gained popularity within the industry, with companies like Google and the Chinese startup DeepSeek also leveraging this technology. However, training MoE models typically requires high-performance GPUs like those made by Nvidia. The cost of these GPUs has proven prohibitive for smaller firms, limiting wider adoption of MoE technology. Ant’s research aims to tackle this issue by finding ways to train large language models (LLMs) more efficiently, without the need for expensive, premium GPUs.
This approach challenges Nvidia’s CEO Jensen Huang’s belief that demand for powerful chips will continue to grow, even with more efficient models like DeepSeek’s R1. Huang has consistently championed the need for GPUs with higher processing cores, memory capacity, and transistors to support the growing computational needs of AI.
China’s AI Innovation Gathers Pace
Ant’s efforts highlight the accelerating pace of technological innovation within China’s AI sector. If confirmed, Ant’s success would underscore China’s increasing self-sufficiency in AI development, with a focus on cost-effective, computationally efficient models to bypass export restrictions on Nvidia chips. Robert Lea, senior analyst at Bloomberg Intelligence, noted that this development is a sign of China’s commitment to becoming a leader in AI technology.
In terms of cost savings, Ant’s research shows that training 1 trillion tokens using high-performance hardware typically costs around 6.35 million yuan ($880,000). However, Ant’s optimized approach using lower-specification hardware reduces the cost to 5.1 million yuan, offering a potential 20% reduction in training expenses.
The company plans to leverage the advancements made through its Ling-Plus and Ling-Lite models to expand its AI applications in industries such as healthcare and finance. This move comes after Ant’s acquisition of the Chinese healthcare platform Haodf.com earlier this year, strengthening its AI capabilities in the medical field. Additionally, Ant operates the AI-powered life assistant app Zhixiaobao and the financial advisory service Maxiaocai.
In benchmarks testing English-language proficiency, Ant’s Ling-Lite model outperformed Meta’s Llama model. Furthermore, both Ling-Lite and Ling-Plus models surpassed DeepSeek’s counterparts in Chinese-language benchmarks, underscoring the growing competitiveness of Chinese AI models.
Robin Yu, Chief Technology Officer at Shengshang Tech Co., an AI solutions provider in Beijing, remarked, “If you find one point of attack to beat the world’s best kung fu master, you can still say you beat them, which is why real-world application is important.”
Ant’s Open-Source Models
Ant has made both its Ling-Lite and Ling-Plus models open-source. Ling-Lite boasts 16.8 billion parameters, while Ling-Plus includes 290 billion parameters—both considered substantial in the world of large language models. For comparison, experts estimate that OpenAI’s GPT-4.5 has 1.8 trillion parameters, while DeepSeek-R1 has 671 billion.
Despite the progress, Ant’s training process faced challenges. The company reported instability issues during model training, noting that even slight changes in hardware or model structure led to errors and inconsistencies in the model’s performance. These obstacles highlight the complexities involved in developing large-scale AI models, even with innovative approaches.
As Ant continues to refine its AI technologies, its efforts reflect the broader trend of Chinese companies striving to reduce reliance on foreign technology while fostering innovation within the country’s AI industry.
Related topic:
Germany’s Spending Surge Raises Borrowing Costs Across Europe
Manus AI Gains State Support as China Rolls Out New Assistant
OpenAI Faces New Privacy Complaint Over ChatGPT’s False Information