Tencent strikes back! Releases the official version of Hunyuan T1, which is as good as DeepSeek-R1 in actual combat and 3/4 cheaper

Tencent Hunyuan T1 official version is released, with performance comparable to DeepSeek-R1 and more affordable price.
Core content:
1. Performance overview of Tencent Hunyuan T1 official version and comparison with DeepSeek-R1
2. Hunyuan T1's performance in knowledge quiz, mathematical reasoning and complex instruction following
3. Hunyuan T1's price advantage and application prospects in the industry
Zhidongxi reported on March 22 that last night, Tencent officially upgraded the deep thinking model of the Hunyuan large model series to the official version of Hunyuan-T1.
T1 is a strong inference model developed by Tencent. It can generate words at a speed of 60 to 80 tokens per second , which is much faster than DeepSeek-R1 in actual generation performance.
The predecessor of this model was the Hunyuan T1-Preview (Hunyuan-Thinker-1-Preview) reasoning model based on the Hunyuan medium-sized base, which was launched by the Hunyuan team on the Tencent Yuanbao APP in mid-February this year.
Compared with T1-Preview, the official version of T1 is based on the industry's first ultra-large-scale Hybrid-Transformer-Mamba MoE large model TurboS fast thinking base released by Tencent Hunyuan in early March. It expands the reasoning capability through large-scale post-training and further aligns human preferences. This is also the first time in the industry that the hybrid Mamba architecture has been losslessly applied to ultra-large inference models .
Currently, T1 has been launched on Tencent Cloud’s official website. The input price is 1 yuan per million tokens , and the output price is 4 yuan per million tokens . The output price is 1/4 of the DeepSeek standard period, which is consistent with the DeepSeek discount period .
▲DeepSeek API price
In the knowledge question-answering scenario, Tencent Hunyuan Research Team demonstrated a comparison of the generation effects of T1 and DeepSeek.
The first prompt is "Can ethyl acetate be mixed with water?" It can be seen that the length and results of the overall generated results of T1 and DeepSeek-R1 are similar, but the generation speed of T1 is obviously faster.
The second biggest challenge is about scientific mathematical reasoning , which has more restrictions on the model and a longer thinking process. From the output results, the conclusions generated by T1 and DeepSeek-R1 are consistent, and the speed of T1 is still faster.
The third challenge is to test the ability to follow complex instructions . T1 was asked to come up with the second line of the couplet. The first line given in the prompt was "deep and shallow streams of water". The difficulty lies in that the model must follow the consistent three-point water radical and the first four characters must be an AABB structure. During T1's thinking process, he accurately analyzed the characteristics of the first line of the couplet and gave the answer after many wrong attempts: "surging waves".
The fourth difficult problem is a general task , and its prompt is an open question: "Generate a text for the circle of friends, with the theme of the long journey of life." There are no clear style instructions and requirements, so it is an open question.
T1 can also be used as a productivity tool to improve user work efficiency. The next demo will demonstrate T1's ability to summarize long articles.
The prompt was "A 4,000-word news report on Microsoft's acquisition of Blizzard, asking T1 to summarize the content of the article." In the output results, T1 not only summarized the main content of the article, but also extracted several key figures in the news report.
The last demonstration was about the role-playing ability of the model. The prompt was "Please play the role of Li Bai, and use a tone that matches Li Bai's characteristics. Guess a riddle: Complaint is invalid." T1's thinking process focused on analyzing the riddle, and after coming to the conclusion that the answer was "Hao", he output the answer in Li Bai's tone and composed a poem.
In addition to being basically the same as or slightly higher than R1 in various public benchmarks such as MMLU-pro, CEval, AIME, Zebra Loigc, and other Chinese and English knowledge and competition-level mathematics and logical reasoning indicators, Hunyuan-T1 can also benchmark against Tencent's internal artificial experience set evaluation, among which it is slightly better than R1 in cultural and creative instruction compliance, text summarization, and agent capabilities.
On the MMLU-PRO dataset, which tests the base model's memory and generalization capabilities for a wide range of knowledge understanding, T1's score is second only to o1. In public benchmark tests of Chinese and English knowledge and competition-level mathematics and logical reasoning such as CEval, AIME, and Zebra Logic, T1's performance is basically the same as or slightly higher than R1.
During the post-model training phase, Tencent's Hunyuan research team invested 96.7% of its computing power into reinforcement learning training, focusing on improving pure reasoning capabilities and optimizing alignment with human preferences.
In terms of data, T1's high-quality prompt collection mainly focuses on complex instruction diversity and data of different difficulty levels. Based on the world's science problems, researchers have collected data sets covering mathematics/logical reasoning/science/code, including problems ranging from basic mathematical reasoning to complex scientific problem solving, and then combined with ground-truth real feedback to ensure the performance of the model when facing various reasoning tasks.
In terms of training plan, T1 adopts a course learning approach to gradually increase data difficulty, while step-by-step expanding the model context length, so that the model's reasoning ability is improved while learning to efficiently use tokens for reasoning.
In terms of training strategies, the researchers referred to strategies such as data playback and phased strategy reset from classic reinforcement learning , which improved the long-term stability of model training by more than 50%.
In the stage of aligning human preferences, it adopts a unified reward system feedback scheme of self-rewarding (comprehensive evaluation and scoring of model output based on the early version of T1-preview) + reward mode to guide the model to improve itself.
On this basis, Tencent Hunyuan team is exploring new research ideas to find new solutions to reduce large model hallucinations, reduce training costs, etc.