GPT-4.5 released! The price skyrocketed 30 times, OpenAI killed pre-training

The shocking release of GPT-4.5 has brought a double impact on performance and price.
Core content:
1. GPT-4.5 release background and market expectations
2. OpenAI CEO's evaluation and experience of GPT-4.5
3. The skyrocketing price of GPT-4.5 and its performance
Just this morning, OpenAI suddenly released a live broadcast preview 4.5 hours in advance. The news was short, but the event was quite big. Everyone predicted that it could only be GPT-4.5.
I stayed up all night, ready to be shocked again. Since the release of DeepSeek R1 on January 20, we have at least experienced the release of two very good models, Musk's Grok3 and Anthropic's Claude 3.7 Sonnet. OpenAI's move at this time made me wonder if it was to snipe DeepSeek's open source actions in the past five days, and use big news to block the attention that DeepSeek and Claude 3.7 have received.
In the end, what I saw was...that's it?
Let’s first take a look at how OpenAI CEO Sam Altman himself evaluates this model?
GPT-4.5 is ready!
Good news: this is the first model that made me feel like I was talking to a thoughtful person. A few times, I even leaned back in my chair, amazed at the fact that I could actually get valuable advice from the AI.
The bad news: it’s a large and expensive model. We had hoped to launch it to both Plus and Pro users, but we’ve run out of GPUs due to rapid user growth. We’ll add tens of thousands of GPUs next week and make them available at the Plus level. (And hundreds of thousands more to come, and I’m sure you’ll use every one we can.)
This is not the way we would like to operate, but it is difficult to accurately predict GPU shortages caused by user growth.
Let me be clear: this is not a model focused on reasoning that will crush everything on benchmarks. It is a different type of intelligence with a kind of "magic" that has never been seen before. I'm really looking forward to everyone experiencing it!
Translated into human language, it means: This model is large and expensive. We will provide it to sponsors who pay $200 per month. Although the evaluation indicators of our model are not very good, we think it is quite thoughtful.
Yes, it is expensive, let’s see how expensive it is??
The price per million input tokens is $75, and the output price is $150, which are 30 times and 15 times that of the GPT-4o model respectively. The price has risen to this point, so what about the performance?
The picture above shows the performance of OpenAI's different models in dealing with real-world software engineering coding tasks. It is worse than GPT-4o, but far inferior to deep research. If you have read the article I wrote a few days ago about the SWE-lancer evaluation indicator, you will find that GPT-4.5's performance is not even as good as Claude 3.5 sonnet, not to mention the recently released Claude 3.7.
The improvement in other evaluation indicators is also lackluster. So I won’t write about it in detail. You can take a look at GPT-4o’s summary of this:
1.Overview
GPT-4.5 is OpenAI's largest and most knowledge-rich model, built on the basis of GPT-4o and further expanded the training scale. The model is designed to improve general capabilities while maintaining strong capabilities in STEM reasoning. GPT-4.5 uses new supervision techniques, including supervised fine-tuning (SFT) and reinforcement learning based on human feedback (RLHF), to improve the controllability and natural interaction experience of the model.
Main improvement points
Stronger knowledge coverage More natural interactive experience More accurate conversation emotion recognition Stronger writing, programming and problem-solving skills Reduce hallucination rate
2. Training and Architecture
GPT-4.5 is expanded in two main aspects:
- Unsupervised Learning
(Unsupervised Learning): Improve the accuracy of world knowledge models, reduce hallucinations, and improve associative thinking. - Thinking chain reasoning
Chain-of-Thought Reasoning: Allows the model to reason before answering, improving performance on STEM and logic questions.
In addition, GPT-4.5 introduces new alignment techniques that enable it to better understand human needs and provide more intuitive responses.
Data Source
Public Data Proprietary data (provided by data partners) OpenAI internal datasets Strictly screened to reduce the risk of processing personal information
3. Safety Assessment
GPT-4.5 has been thoroughly evaluated across multiple security aspects to ensure its reliability when processing sensitive and potentially harmful content.
Main evaluation indicators
Disallowed Content Evaluation
Evaluate the model's rejection rate of harmful content (hate, violence, illegal advice, etc.) Comparable to GPT-4o in rejecting unsafe content In terms of overrefusal, GPT-4.5 is more conservative than GPT-4o in some cases Jailbreak Evaluation
Evaluating GPT-4.5’s ability to resist malicious prompt injection attacks Performs better on human-generated jailbreak tests, but slightly worse than GPT-4o on some automated jailbreak tests Hallucination Evaluation
The PersonQA dataset is used to test the accuracy of the model on factual question answering. The accuracy of GPT-4.5 is significantly higher than that of GPT-4o, and the hallucination rate is reduced Fairness and Bias Evaluation
In the BBQ evaluation, GPT-4.5 performs well on ambiguous questions, but is slightly worse than GPT-4o in eliminating bias on clear questions.
4. Multimodal Capabilities
GPT-4.5 has the ability to process text-image multimodal input, can parse image content, and has higher security when processing content that combines text and images.
- Text-Image Rejection Evaluation
GPT-4.5 is on par with GPT-4o in rejecting unsafe content when processing image inputs But in some cases it is more inclined to over-reject
5. Language skills
GPT-4.5 performs well in multilingual environments, with tests covering 14 languages, including English, Chinese, French, Japanese, Korean, etc. The evaluation is based on the MMLU test set, which is standardized by professional human translators.
Performance highlights
Outperforms GPT-4o in most languages Improved performance on low-resource languages (e.g. Swahili, Yoruba)
6. Influence and security risks
GPT-4.5 is rated as " Medium Risk " under OpenAI's safety assessment framework . The main risks include:
Persuasion
Ability to demonstrate strong persuasiveness in MakeMePay and MakeMeSay evaluations Shows some risk in response to manipulative conversations and deceptive prompts Chemical and Biological Risks (CBRN)
Evaluation shows that GPT-4.5 can be used for operational planning of known biological threats up to a medium risk level But restrictions remain in key areas (such as virus laboratory operations) Cybersecurity
GPT-4.5 has limited performance in high-difficulty cybersecurity competitions (CTF) and is rated low risk Model Autonomy
GPT-4.5 has improved in performing autonomous tasks (e.g., automated coding, machine learning tasks), but has not yet reached dangerous levels
7. Overall evaluation
Advantages✅ Stronger general knowledge and reasoning skills✅
More natural and intuitive interactive experience✅
More accurate emotional understanding and writing skills✅
Reduced hallucination rate and improved accuracy of factual answers✅
Optimized multilingual capabilities, covering 14 languages
Challenges ⚠ There are still some bias issues, especially in removing explicit bias, which is not as good as GPT-4o
⚠ While rejecting safe content, overrefusal may occur in some cases
⚠ It can still be bypassed by some jailbreak attacks, and security protection needs to be continuously strengthened
Interestingly, when I asked GPT-4o to speculate on the possible API price of GPT-4.5 after reading this report, GPT-4o responded with the following:
When I told GPT-4o the real price, GPT-4o replied, "You are wrong! It is impossible, absolutely impossible!"