Irresponsible prediction of the future of AI large models (LLM)

Written by

Caleb Hayes

Updated on:June-13th-2025

First, let’s look at Gary Marcus’s prediction on March 11, 2024:

Today, 14 months later, I analyze and comment on each point:

7-10 GPT-4 level models
The prediction was right. Google returned as the king, and the Gemini family surpassed its competitors. OpenAI is still very stable, and its revenue has soared. Anthropic is far ahead in programming, X released a Grok with serious hallucinations, Meta fell behind in Llama 4 and was embarrassed to the point that the team collapsed. There are basically only three companies left in China, DeepSeek, qwen, and Doubao, and one in France)
No huge improvements (no GPT-5)
The prediction was correct. OpenAI did not release GPT-5, and the model progress was limited. However, the post-training effect of reinforcement learning was confirmed, and the potential of thinking/reasoning is still being explored, which also promoted the great development of Agent.
Price war
The prediction was correct, deepseek is the biggest winner, and it is of great benefit to the AI application layer.
No one has a moat
The prediction was correct. In fact, OpenAI has long understood this, and former Google CEO Schmidt repeatedly made wrong judgments, and then was slapped in the face by China's rapid catch-up several times, and then made a fuss, turned into a China threat theorist, and confirmed his position as an American arms dealer. In China, the six AI dragons have basically bid farewell to basic model research and development, and they have risen rapidly and fallen suddenly.
There is no robust solution to the hallucination problem
The prediction was right. The illusion is still unsolvable, but in vertical fields, it is still possible to brute-force alignment through various methods such as Prompt/SFT/RL. This brings some hope to application teams in large companies that must follow the AI trend, and also causes Agents to flourish everywhere, greatly alleviating the FOMO mentality and creating a wave of new jobs. Of course, the effect is unknown, at least in many scenarios, the effect is still limited.
Moderate growth in enterprise applications
The prediction is not entirely correct. Currently, the largest incremental share of enterprise application growth is in AI Coding, because Vibe coding allows programmers to manually correct hallucination problems, so explosive growth has already occurred.
The profit is very small.
Under FOMO, profits are not important, but small companies have to withdraw from the game. The competition for large AI models can be compared to the competition for nuclear weapons and semiconductors.

2. My views on the future of AI

The intelligence level of large models has probably stagnated , and there will be no smarter models in the future. This can be basically inferred from the substantial stagnation of B200 hardware and the stagnation of model parameters. To increase it by another order of magnitude, it will require a leap in software and hardware + large-scale mining of new data sources.

RL is just pruning and building new pathways. The model intelligence has not changed, it has just learned a better way to solve problems. Of course, this is beneficial to vertical AI. By making good use of GRPO , we can form a set of application methodologies for small scenarios.

The illusion of the model is largely due to AI's lack of physical perception and spatial perception, so the combination with the robot body will help the progress of AGI. But the problem with Embedded AI is that the data sampling cost is tens of thousands times higher than that of smart driving. The development of this field is in a bubble , and no substantial products will be launched for a long time.

Long Context, short-term/long-term Memory, is a problem that must be solved by large models in the future.

AI Coding is revolutionary and the biggest business opportunity in recent years. The reason is that it is similar to intelligent driving. There are professional human drivers/programmers to assist AI in correcting behavior. The rules are clear and quantifiable, the verification cost is low, and a closed loop can be formed. Of course, will the number of programmers decrease? We cannot deceive ourselves. In the long run, it will definitely decrease. However, it seems that the number of drivers' jobs has not decreased. Why? Because of the problem of responsibility definition and end-to-end delivery .

I am not optimistic about the hot AI Agent. At least in the e-commerce Agent, most of them are just occasional innovations that are icing on the cake. In addition to customer service response and research report writing, which do not require strict accuracy, what other tasks are Agents good at? Some products even use Multi-Agents. How to solve the alignment and memory synchronization between multiple models?