Woter AI detection.Hurry - ends Jul 21st

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

GPT-4.5 released! The price skyrocketed 30 times, OpenAI killed pre-training

Written by

Iris Vance

Updated on:July-15th-2025

Just this morning, OpenAI suddenly released a live broadcast preview 4.5 hours in advance. The news was short, but the event was quite big. Everyone predicted that it could only be GPT-4.5.

I stayed up all night, ready to be shocked again. Since the release of DeepSeek R1 on January 20, we have at least experienced the release of two very good models, Musk's Grok3 and Anthropic's Claude 3.7 Sonnet. OpenAI's move at this time made me wonder if it was to snipe DeepSeek's open source actions in the past five days, and use big news to block the attention that DeepSeek and Claude 3.7 have received.

In the end, what I saw was...that's it?

Let’s first take a look at how OpenAI CEO Sam Altman himself evaluates this model?

GPT-4.5 is ready!

Good news: this is the first model that made me feel like I was talking to a thoughtful person. A few times, I even leaned back in my chair, amazed at the fact that I could actually get valuable advice from the AI.

The bad news: it’s a large and expensive model. We had hoped to launch it to both Plus and Pro users, but we’ve run out of GPUs due to rapid user growth. We’ll add tens of thousands of GPUs next week and make them available at the Plus level. (And hundreds of thousands more to come, and I’m sure you’ll use every one we can.)

This is not the way we would like to operate, but it is difficult to accurately predict GPU shortages caused by user growth.

Let me be clear: this is not a model focused on reasoning that will crush everything on benchmarks. It is a different type of intelligence with a kind of "magic" that has never been seen before. I'm really looking forward to everyone experiencing it!

Translated into human language, it means: This model is large and expensive. We will provide it to sponsors who pay $200 per month. Although the evaluation indicators of our model are not very good, we think it is quite thoughtful.

Yes, it is expensive, let’s see how expensive it is??

The price per million input tokens is $75, and the output price is $150, which are 30 times and 15 times that of the GPT-4o model respectively. The price has risen to this point, so what about the performance?

The picture above shows the performance of OpenAI's different models in dealing with real-world software engineering coding tasks. It is worse than GPT-4o, but far inferior to deep research. If you have read the article I wrote a few days ago about the SWE-lancer evaluation indicator, you will find that GPT-4.5's performance is not even as good as Claude 3.5 sonnet, not to mention the recently released Claude 3.7.

The improvement in other evaluation indicators is also lackluster. So I won’t write about it in detail. You can take a look at GPT-4o’s summary of this:

1.Overview

GPT-4.5 is OpenAI's largest and most knowledge-rich model, built on the basis of GPT-4o and further expanded the training scale. The model is designed to improve general capabilities while maintaining strong capabilities in STEM reasoning. GPT-4.5 uses new supervision techniques, including supervised fine-tuning (SFT) and reinforcement learning based on human feedback (RLHF), to improve the controllability and natural interaction experience of the model.

Main improvement points

Stronger knowledge coverage
More natural interactive experience
More accurate conversation emotion recognition
Stronger writing, programming and problem-solving skills
Reduce hallucination rate

2. Training and Architecture

GPT-4.5 is expanded in two main aspects:

Unsupervised Learning
(Unsupervised Learning): Improve the accuracy of world knowledge models, reduce hallucinations, and improve associative thinking.
Thinking chain reasoning
Chain-of-Thought Reasoning: Allows the model to reason before answering, improving performance on STEM and logic questions.

In addition, GPT-4.5 introduces new alignment techniques that enable it to better understand human needs and provide more intuitive responses.

Data Source

Public Data
Proprietary data (provided by data partners)
OpenAI internal datasets
Strictly screened to reduce the risk of processing personal information

3. Safety Assessment

GPT-4.5 has been thoroughly evaluated across multiple security aspects to ensure its reliability when processing sensitive and potentially harmful content.

Main evaluation indicators

Disallowed Content Evaluation

Evaluate the model's rejection rate of harmful content (hate, violence, illegal advice, etc.)
Comparable to GPT-4o in rejecting unsafe content
In terms of overrefusal, GPT-4.5 is more conservative than GPT-4o in some cases

Jailbreak Evaluation

Evaluating GPT-4.5’s ability to resist malicious prompt injection attacks
Performs better on human-generated jailbreak tests, but slightly worse than GPT-4o on some automated jailbreak tests

Hallucination Evaluation

The PersonQA dataset is used to test the accuracy of the model on factual question answering.
The accuracy of GPT-4.5 is significantly higher than that of GPT-4o, and the hallucination rate is reduced

Fairness and Bias Evaluation

In the BBQ evaluation, GPT-4.5 performs well on ambiguous questions, but is slightly worse than GPT-4o in eliminating bias on clear questions.

4. Multimodal Capabilities

GPT-4.5 has the ability to process text-image multimodal input, can parse image content, and has higher security when processing content that combines text and images.

Text-Image Rejection Evaluation

GPT-4.5 is on par with GPT-4o in rejecting unsafe content when processing image inputs
But in some cases it is more inclined to over-reject

5. Language skills

GPT-4.5 performs well in multilingual environments, with tests covering 14 languages, including English, Chinese, French, Japanese, Korean, etc. The evaluation is based on the MMLU test set, which is standardized by professional human translators.

Performance highlights

Outperforms GPT-4o in most languages
Improved performance on low-resource languages (e.g. Swahili, Yoruba)

6. Influence and security risks

GPT-4.5 is rated as " Medium Risk " under OpenAI's safety assessment framework . The main risks include:

Persuasion

Ability to demonstrate strong persuasiveness in MakeMePay and MakeMeSay evaluations
Shows some risk in response to manipulative conversations and deceptive prompts

Chemical and Biological Risks (CBRN)

Evaluation shows that GPT-4.5 can be used for operational planning of known biological threats up to a medium risk level
But restrictions remain in key areas (such as virus laboratory operations)

Cybersecurity

GPT-4.5 has limited performance in high-difficulty cybersecurity competitions (CTF) and is rated low risk

Model Autonomy

GPT-4.5 has improved in performing autonomous tasks (e.g., automated coding, machine learning tasks), but has not yet reached dangerous levels

7. Overall evaluation

Advantages✅ Stronger general knowledge and reasoning skills✅
More natural and intuitive interactive experience✅
More accurate emotional understanding and writing skills✅
Reduced hallucination rate and improved accuracy of factual answers✅
Optimized multilingual capabilities, covering 14 languages

Challenges ⚠ There are still some bias issues, especially in removing explicit bias, which is not as good as GPT-4o
⚠ While rejecting safe content, overrefusal may occur in some cases
⚠ It can still be bypassed by some jailbreak attacks, and security protection needs to be continuously strengthened

Interestingly, when I asked GPT-4o to speculate on the possible API price of GPT-4.5 after reading this report, GPT-4o responded with the following:

When I told GPT-4o the real price, GPT-4o replied, "You are wrong! It is impossible, absolutely impossible!"