Is the big model Agent just text art?

The battle between technology and art behind large model agents, exploring the essence of intelligence.
Core content:
1. Is the large model agent just a stack of prompts
? 2. The different views of technical experts and developers
3. The essence of agents and the importance of state management
There has been an interesting debate in the technology circle recently: Is a large model agent just a stack of various prompts? Is an agent like Manus, which looks very smart, essentially using clever prompts to constrain the large model to generate better output? In other words, is this a literary art?
This question has sparked heated discussions among industry experts, with opinions sharply divided. Let's take a look at the different voices.
The clash of views between the two camps
Viewpoint 1: It is just literary art, there is no need to hide it
One developer said bluntly: "No one dares to tell the truth? Yes, it's just a pile of prompts." Another practical developer was even more sharp: "LLM's input is the prompt. No matter what tool or model, it's just to optimize the prompt. Many people hype up the tools for 'optimizing prompts', but they intentionally or unintentionally avoid the fact that their purpose is just to generate a better prompt."
This view holds that Agent can be seen as an upgraded version of prompt engineering. The core lies in how to design prompts, how to split them, and what the order is. Other complex architectural designs are considered "ivory tower nonsense" by them.
Viewpoint 2: It’s much more than just a word game
But technical experts are obviously not convinced. Someone analyzed from an engineering perspective: "Production-level engineering is obviously not as simple as Prompt." Take OpenHands as an example. Connecting to LLM is just one module. The core that really drives the Agent to complete complex tasks is a complete set of event-driven mechanisms, including state machines, event flow frameworks, controllers, etc., and also uses sandbox technologies such as Sandbox.
Another expert summarized it more comprehensively: "A truly usable agent = prompt (language interface) + programmatic arrangement + long-term state/memory + external tool action + self-feedback loop. If any one of these links is missing, it will quickly degenerate from an 'autonomous intelligent agent' to an 'advanced chatbot'."
What is the essence of Agent?
From a technical definition, an agent is essentially a looping system. For a given goal, an AI agent can create tasks, complete tasks, create new tasks, re-prioritize the task list, and loop until the goal is achieved.
The formula is: Agent = LLM + Planning + Tool use + Feedback
This definition reveals a key point: Agent enables LLM to have the ability to achieve goals and complete given goals through a self-motivation cycle.
The key is state management
There is a technical detail that many people overlook: almost all large model APIs are stateless. Large model APIs don’t even have the seven-second memory of a fish. It can’t even remember what its last answer was.
So why do we see that AI chat tools "remember" historical conversations? In fact, the previous historical conversations are re-passed to the big model each time, making it seem to have memory. This is how a stateless API is made stateful.
As applications become more complex, state management becomes more and more important. The real powerful agents compete on state management capabilities.
What Prompt Really Does
A developer who has analyzed the project structure of Manus and OpenManus pointed out that Prompt is indeed very important. It can guide the behavior of large models based on prior knowledge without fine-tuning the large model to achieve the expected business results.
But the point is: Prompt is the lubricant for the large model and other individual components in the Agent system, not the whole thing.
Taking OpenManus as an example, its structure mainly includes:
Agent Flow Tool Prompt
Prompt is just one of those components.
The evolutionary logic of technology
From the perspective of technological development, this debate actually reflects the cognitive differences at different stages:
Primary stage : It really depends on prompt engineering, and the model can perform better through carefully designed prompt words.
Intermediate stage : Begins to introduce tool calls, multi-round conversations, and simple state management.
Advanced stage : Building complete event-driven systems, including complex state machines, memory management, autonomous planning and execution capabilities.
Enterprise-level stage : Engineering issues such as concurrency, fault tolerance, monitoring, and security need to be considered.
Conclusion: Yes and No
Back to the original question: Is the large model Agent text art?
The answer is: yes and no.
In a sense, all interactions with LLM are ultimately implemented through text (prompt), which is indeed a language art. Designing a good prompt requires a deep understanding of the language, clever wording and precise logic.
But simply equating Agent with stacking Prompts is like saying that buildings are stacked bricks - technically correct, but it ignores more important aspects such as design, structure, and engineering.
A real Agent system needs:
Well-designed prompt (text art) Complex state management (systems engineering) Intelligent task planning (algorithm design) Reliable tool calling (interface engineering) Continuous self-optimization (feedback mechanism)
Last words
The value of this debate does not lie in who is right or wrong, but in pushing us to think more deeply about the nature of AI Agents.
For beginners, starting with the Prompt project is indeed a good starting point, which allows you to quickly understand how to interact with AI.
Experienced developers need to think beyond the limitations of Prompt and think about how to build a truly usable intelligent agent from a systems engineering perspective.
Technological progress often spirals upward in such debates. No matter which side you stand on, you have to admit that we are witnessing an exciting era of technological change.
What do you think of the big model Agent