The second half of AI Agent: from token generation to autonomous experience

Written by
Iris Vance
Updated on:June-27th-2025
Recommendation

The future of AI Agents: the transition from relying on instructions to autonomous action.

Core content:
1. The current limitations of AI agents that rely on human instructions
2. The role of scaffolding frameworks in improving AI capabilities
3. How technologies such as reinforcement learning can promote the development of AI towards autonomous experience

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)


?²·ℙarad?g? Research on the square paradigm of intelligence: writing deconstructs intelligence, and paradigm improves cognition

AI agents are actors driven by large language models (LLMs), but the core capability of LLMs is to generate tokens, similar to a "brain in a language vat" - it does not have the ability to perceive the outside world or interact directly with the real world, nor does it have intrinsic will or spontaneous motivation. Therefore, at the current stage of technology, the motivation of AI agents' actions still depends on humans to provide it through instructions, prompts or task definitions.


In order for LLM to complete specific tasks, we need to build a "scaffolding agent framework" for it, integrating functions such as tool use and memory to provide a running environment and external support.


Workflow and API orchestration are two common scaffolding implementations used to convert LLM text output into executable actions. The main difference between the two is the design flexibility:

• Workflow: Usually uses a predefined, deterministic sequence of steps and is suitable for fixed, process-based tasks.

• API Orchestration: Supports more dynamic tool calls, suitable for complex scenarios that require flexible decision-making and adaptation.


The “intelligence” of the LLM (i.e., its token generation capability) is crucial in these systems, as it generates text instructions to guide actions. However, the integrity and effectiveness of action motivations depend more on the task goals set by humans and the design of the scaffolding than on whether the LLM itself can independently “understand” or “describe” motivations.


In the future, the functions of AI agents need to be gradually internalized to reduce dependence on external scaffolding, thereby improving efficiency and consistency. Through technologies such as reinforcement learning (RL), LLM can gradually integrate tool calls or memory requests naturally into the token generation process, making its behavior closer to autonomous action. For example:

• ReACT: Use prompts to guide LLM to switch between thinking and action. It is flexible and easy to implement, suitable for rapid deployment.

• ReTool and ReSearch: Internalize tool usage and search strategies into the model’s generative patterns through RL, significantly improving performance on specific tasks.


At this stage, scaffolding is still indispensable. It not only provides an operating environment for LLM, but is also responsible for parsing and executing LLM outputs and integrating external feedback back into the system. In the future, with the advancement of RL and model fine-tuning technology, some scaffolding functions are expected to be internalized into LLM, but more technological breakthroughs are still needed to achieve fully autonomous AI agents. The core role of scaffolding is to make up for the limitations of LLM, ensure that its outputs can be effectively executed in complex tasks, and play the role of safety guardrails emphasized by OpenAI.



introduction

Discussion on how to build AI agents | After a year of big model wave, "AI Agent" has become the core buzzword of the new round of technology narrative. LangChain, OpenAI, and Anthropic have successively released construction frameworks and guidelines, laying scaffolding while redefining the appearance of "intelligent agents". At the same time, we also see that more and more technical people are beginning to face a fundamental problem:
LLM is just a "brain in a vat" of language. Can it really become a motivated agent?

text

-Silver & Sutton proposed: AI paradigm enters the experience era

We must admit that the Large Language Model (LLM) is still a " conditional token generator ". It learns to predict the next word given a context through training, showing a certain "intelligence", but in essence:

  • It has no "will" - it does not decide for itself what to do;

  • It has no “goal”—and no real idea of ​​what “success” is;

  • Its "intelligence" only plays a passive role in the prompts and reminder templates designed by humans.

This kind of "brain in a vat" intelligence cannot really act autonomously. Therefore, we need to build an "Agent scaffolding" for it: provide action tools (Tool Use), memory system (Memory), state feedback mechanism (Observations), and then use reinforcement learning (RL) to let it learn when to use which tools and how to achieve the set goals.

This is what agent framework methodologies such as ReAct, ReTool, and ReSearch are doing: extending the token generation capabilities of language models to the space of continuous actions and problem solving .


LangChain pointed out in its blog that "the key to an agent is not how powerful the tool is, but whether the LLM can understand and describe the task goals and dynamically organize behaviors accordingly." In its construction guide, OpenAI proposed that "Agent is a systematic combination of model + tool + instruction"; Anthropic defines "Agentic Systems" as the overall ability of an intelligent agent to actively dispatch tools, maintain memory and goal-oriented behavior in the environment .

However, these efforts are still confined to the framework driven by language generation: LLM "thinks" in the prompt, the scheduling tool "acts", and then "continues to think" after getting feedback - they are all simulated actions in the "closed language space".

So, what’s next ? What we need is “Agents in the Age of Experience”.

As Sutton and Silver argue in The Era of Experience:

“Human data is running out, and future advances in intelligence must rely on the interaction experience between agents and the environment.”

They advocate an intelligent paradigm with "stream of experience" as the core: Agents no longer just respond to prompts, but have a persistent state, long-term environmental interaction, dynamic goal evolution and self-adjustment capabilities . This is exactly the blueprint drawn by Experiential Intelligence.

This is also surprisingly consistent with the shift proposed by Yao Shunyu in "The Second Half":

"The first half of AI is about building models and benchmarking, while the second half is about thinking about what AI is really for and how we can evaluate its true 'value'."

In the past, we were keen on scoring high on standard benchmarks. Now, we need to build new evaluation logic and new problem definitions to allow agents to prove themselves in real tasks.

We see a shift happening:

  • From static orchestration → dynamic perception and action: LLM is no longer just a language dispatcher, but an active agent that continuously adapts and optimizes in the environment;

  • Activation from prompt → Strategy internalization: The agent does not just respond passively, but generates autonomous behavior strategies through reinforcement learning, behavioral rewards and punishments, and long-term experience;

  • From human data → self-generated experience: The real breakthrough in intelligence lies not in memorizing all human knowledge, but in learning from failures and interactions.

This paradigm shift is not only an upgrade of engineering practice, but also a profound questioning of the "essence of intelligence":

What kind of agent deserves to be called "intelligent"?
Does it require desires, goals, and memories?
Should we design a learning mechanism that encourages “experience-driven, long-term growth”?
And are humans willing to give it such freedom?

Perhaps, we can offer this insight:

The future of AI Agent does not lie in how powerful its prompts are or how flashy its tools are, but in whether it can autonomously generate experience, build world models, and pursue long-term utility.

This requires two things:

  1. On the technical level, the scaffolding system must evolve from a “functional puzzle” to a “cognitive collaborative system”;

  2. At the evaluation level, we must shift from "the accuracy of a single round of answers" to "the effectiveness of multi-round collaboration", "the degree of achievement of long-term goals" and "the degree of adaptability to interaction with people".

ReAct is the first step, ReTool is the second, and “Experience Agent” will be the third