The inspiration of von Neumann architecture and hierarchical architecture for building universal agents

Written by
Audrey Miles
Updated on:June-22nd-2025
Recommendation

How the von Neumann architecture inspires the construction of general intelligent agents, and looks at the future of artificial intelligence from the perspective of computer history.

Core content:
1. Comparative analysis of the von Neumann architecture and hierarchical architecture
2. The "stored program" principle in the construction of general agents
3. The "programmability" and challenges of modern agents

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

Recently, I have felt powerless when implementing a system similar to Manus. I have tried multiple agent architectures and the current modular agent loop model.

But there always seems to be something wrong. This feeling is very subtle. The ideas I have come across so far do not seem to touch upon something deeper.

Lying in bed at night, an idea suddenly occurred to me: Is the current situation of agents similar to that of computers before the emergence of the von Neumann architecture? What insights can be gained from the book "Principles of Everything" for current agents? I was thinking about this question.

Computers before von Neumann

Imagine that around World War II, humans invented the first truly electronic computers. They were large, expensive, and powerful, used to calculate ballistics and crack codes. They were the pinnacle of "intelligence" at that time.

But there's a huge problem: they're "dedicated."

Take the famous ENIAC for example. You want it to perform different computing tasks? Sorry, you have to unplug hundreds of cables, replug them, and adjust physical switches like the telephone switchboard lady.

Changing a program is almost equivalent to redesigning the hardware. This is like today's Agent. You design one for writing articles and another for writing code. Their capabilities are "hard-coded" in specific structures and processes, making it difficult to reuse and migrate them.

It was an era of “hardware-defined functions” and “separation of programs from data.” Machines were certainly capable, but they lacked a fundamental, universal, general capability.

Until a genius scientist - John von Neumann - brought a revolutionary idea.

The first cornerstone: Von Neumann's omnipotence of programming

Von Neumann brilliantly proposed the concept of "stored program". He envisioned that the computer's "instructions" (that is, programs) and the "data" to be processed should be stored in the same memory space.

This sounds simple, but it means a lot.

This means that computers no longer need to switch functions by changing physical circuits. Just by loading different "programs", the same hardware can be transformed from a ballistic calculator to an accounting machine and then to a game console.

The program itself can be read, written, modified, and even generated by the computer itself like data!

This is the cornerstone of all modern computers we have today - the von Neumann architecture. It gives computers unprecedented versatility, flexibility and programmability.

It can be said that it is the "stored program" principle that transformed computing from a "special device" into a "general tool", and ultimately gave birth to software, operating systems, the Internet, and even the AI ​​we are discussing today.

"What does this mean for Agent?"

For building agents, the revelation of the von Neumann architecture is that a truly universal agent requires a "universal execution engine" like the von Neumann machine.

It cannot be just a package of fixed-function tools, but must be able to flexibly load and execute a variety of operations or "behavior programs" based on the "instructions", "plans" and even "goals" received. Agents need to have basic "programmability".

Many current agents, especially those based on large language models, have the ability to load and execute tools (API calls, function execution) to some extent, which can be regarded as the basis for executing "programs".

But the problem is that their "programs" - those complex task processes, tool selection logic, and command chains in prompts - are often relatively fixed, or in other words, they are "externally" assigned, rather than being deeply understood, reflected upon, or even "modified" by the Agent itself.

This is like going back to the time before von Neumann: computers can execute programs, but the programs themselves are dead, "inserted" in manually, and cannot dynamically and fundamentally change their execution logic according to the situation during runtime.

Having a powerful execution engine is important, but just like a strong heart, it is only the foundation. To achieve true general intelligence, the agent also needs more complex organization, coordination and deeper "self-cognition".

"The current dilemma of agents: intelligent, but not "universal" and not "autonomous""

Therefore, we see the general dilemma of current agents: they are very "intelligent" - they can demonstrate amazing capabilities in specific tasks; but they are not "general" enough - they may be helpless if they switch to another field or deviate slightly; they are especially not "autonomous" enough - they cannot truly understand the logic of their own behavior, let alone modify or optimize it.

Most of them are "process-driven tool combinations" whose behavior logic is hard-coded in codes, rule sets, or complex prompt templates. Just like the cables of ENIAC, although they are complex and sophisticated, they lack fundamental flexibility.

What are they missing?

  1. "A deep understanding and reflection on the logic of one's own behavior:"  Know "why" you do something, rather than just being "set" to do it.
  2. "The ability to autonomously modify behavioral strategies based on the environment and goals:"  When problems are discovered during execution, one can adjust one's "thinking" or "action" just like a programmer modifies code.
  3. "Flexible selection, combination and even generation of own skill modules:"  Not only can existing tools be called upon, but also the underlying capabilities can be recombined into new "tools" or "skills" as needed.

In other words, they cannot treat their own "behavior programs" like data, read, analyze, modify, and generate them, just like a von Neumann machine. They are not truly "self-programmable" intelligent entities.

So, in addition to the "universal execution" foundation provided by von Neumann, where can we find inspiration for building general intelligence? Perhaps we should turn our attention to another older and more sophisticated "intelligent" system - nature itself.

"I highly recommend The Principle of Everything."

"Second cornerstone: the hierarchical structure of nature"

Look at nature and look at ourselves.

Life is composed of molecules that form cells, cells that form tissues, tissues that form organs, organs that form systems, and ultimately the complex human body.

The brain has a brainstem responsible for basic physiological needs, a limbic system responsible for emotions, and a cortex responsible for higher-level cognition and rationality.

In a complex ecosystem, there are producers, consumers, and decomposers, forming complex food chains and networks between species.

These systems all present a "hierarchical" organizational structure. Moreover, this hierarchical structure is not formed randomly, but contains profound wisdom:

  • "The more general the bottom layer, the more specialized the top layer:"  The basic functions of the cells that make up life are similar, but the functions of the cerebral cortex are highly differentiated, responsible for specific tasks such as language and logic.
  • "Layers communicate primarily with adjacent layers:"  Most information and control flows occur between adjacent layers, maintaining the locality and modularity of the system.
  • 「A few key cross-layer communications:」  Emergency situations (bottom-level perception) can directly trigger high-level alarms (top-level cognition), and high-level goals can also directly affect bottom-level actions.
  • "Simple tasks are processed at a low level, and complex tasks are processed step by step:"  Avoid obstacles (low-level reflexes), feel pleasure or fear (mid-level emotions), and solve math problems (high-level rationality).

This hierarchical structure provides an elegant organizational principle and efficiency guarantee for building complex systems. It is a wisdom about macro-organization and complexity management.

"What does this mean for Agent?"

Applying this natural hierarchical thinking to agent design brings extremely important insights:

  • "Modularity and separation of responsibilities:"  Agents should not be a flat, huge pile of codes or rules. They can be designed into multiple levels, each of which is responsible for information processing and decision-making at different granularity and abstraction levels. This makes the design, understanding and maintenance of Agents feasible.
  • "Gradual abstraction and information flow:"  Raw perceptual data (such as pixels in an image) is processed into features at the lower level, objects and patterns are identified at the middle level, and the higher level abstracts and gives meaning to these objects and patterns. As information flows upward from the bottom level, it is refined layer by layer, and the higher level does not need to process all the details, which greatly improves decision-making efficiency.
  • "Balanced reaction and deliberation:"  Agents can have a "lower layer" (similar to the brain stem or reflex arc) responsible for fast, instinctive reactions, and a "higher layer" (similar to the cerebral cortex) responsible for slow, careful planning. Simple and urgent tasks can be responded to quickly, while complex tasks can activate the higher layer for deep thinking.
  • "Structure and flexibility:"  The main adjacent layer communications provide a clear and orderly information flow and control flow; a few key cross-layer communications provide the flexibility to deal with emergencies and achieve fine control of the upper layer over the lower layer.

An ideal agent should have a clear hierarchical structure like the brain. The bottom layer handles basic perception and action, the middle layer handles skills and local planning, and the top layer handles global goals and strategies. Information flows, abstracts, and makes decisions between these layers.

The von Neumann architecture is considered as the "execution engine" within each layer or the entire agent, and the hierarchical architecture defines how these engines are organized, how they work together, and how data flows at different abstract levels. An agent that can execute flexibly and has a good organizational structure can better cope with the complex and changing world.

The attempts we mentioned earlier, such as Manus Agent, have introduced modules and rules, but compared with the deep, multi-scale hierarchy in nature, the hierarchical abstraction of its information processing and decision-making process is not clear enough, and it is more like calling tools and rules on a relatively flat structure. This prompts us to think: Is the underlying layer that constitutes these "tools" and "rules" really "basic" and "universal" enough?

"The third cornerstone: cellular atomicity"

This is the most thought-provoking and perhaps even the most subversive perspective.

If even agent tools such as "reading and writing files" and "executing commands" seem "high-level" compared to the atomicity at the "cell" level in organisms, then what should be the "ultimate basic unit" that constitutes a truly universal agent?

Human beings are such a complex organism, but at the bottom they consist of only a few hundred different types of cells. The functions of a single cell are relatively limited - metabolism, division, and response to signals. But it is these "atoms" with limited functions that, through amazing numbers, differentiation, organization, and collaboration, build tissues, organs, and the entire life with infinite complexity and functions.

This is an "emergence" from extreme simplicity to extreme complexity, from atomic to macroscopic.

This brings us to the third and perhaps the most basic cornerstone inspiration for building a general agent: we need more fine-grained "atomic primitives".

If we compare the current Agent's tools (such as file_write, search) to the "organs" or "tissues" of an organism, then these "organs" or "tissues" themselves are composed of lower-level "cell" level operations. For a truly universal Agent, its lowest-level "atomic tools" or "cognitive primitives" should perhaps be closer to:

  • “Perception primitives:”  Not just “recognize the cat in the image”, but “read the color value of a pixel”, “capture the waveform of a sound sampling point”, “detect whether a button is pressed”.
  • "Action primitives:"  Not just "open a file" or "send an email", but "control an execution unit to make a tiny displacement", "send a minimum data packet to the network", "change the color of a point on the screen".
  • "Internal operation primitives:"  Not just "call the big model API", but "read/write a bit in memory", "perform basic logical operations (AND, OR, NOT)", "perform basic arithmetic operations".

With these extremely fine-grained atomic operations, we can theoretically combine any complex perception, cognition, and behavior, just like nature uses cells to build everything. This brings profound insights including:

  • "Combination and emergence:"  The high-level capabilities of agents (such as understanding language, writing complex code, solving a new problem) should not rely on a few preset high-level tools, but should be built from a large number of lower-level, flexibly combinable atomic primitives. The key to intelligence lies in how agents can effectively learn and combine these primitives.
  • "Dynamic construction and adaptation:"  Agent's "tools" or "skill modules" should not be static and fixed, but can be dynamically "assembled", "generated" or even "reshaped" by these atomic primitives according to the task and environment requirements. When faced with new problems, instead of looking for ready-made tools in the tool library, the underlying capabilities are used to dynamically build solutions. This is very close to the ideal state of "self-programming" or "meta-learning".
  • "Strong robustness and fault tolerance:"  Capabilities are built on a large number of fine-grained units, which means that the system may have better redundancy and resilience. Even if a specific implementation path of a high-level function fails, in theory the agent can try to bypass or repair it with a combination of other atomic primitives.
  • "A new understanding of the nature of learning:"  Learning may not just be about adjusting the parameters of high-level modules. Deeper learning may be about how to connect, organize and optimize these atomic primitives to form more efficient and adaptive "cognitive structures."

The "high-level" nature of current agent tools means that their versatility and adaptability are limited to the preset tool set. When encountering new scenarios that existing tools cannot cover, their adaptability may be greatly reduced. The "cell-level" atomicity, on the other hand, points to the underlying construction capabilities required to build truly open and adaptive intelligent agents - like cells, they can differentiate, organize, and build any desired "tissue" or "organ".

"Integrating the three: architectural conjectures leading to the future"

Now, let us try to integrate these three revelations: the powerful "execution foundation" provided by the von Neumann architecture (about how machines run instructions), the elegant "organizational structure" provided by the hierarchical nature (about how complex systems manage information and control in layers), and the ultimate "building block" provided by "cellular" atomicity (about what basic elements all advanced functions are made of) - perhaps, we can get a glimpse of a possible outline of the future general agent architecture:

  1. "The lowest level (atomic and execution level):"

Contains a set of extremely general and fine-grained perception, action and internal operation primitives. They are the "atoms" that constitute all "cognition" and "behavior" and are driven by a von Neumann-style general execution engine. This layer provides the Agent with the ability to interact with the physical or digital world at the lowest level and perform basic internal calculations.

  1. Middle layer (organization, skills and planning layer):

Above the atomic layer, various higher-level "modules", "skills" or local "planners" emerge or are constructed through learning and dynamic combination. The construction of these intermediate layers is itself a process of dynamic adaptation.

For example, a "file writing" skill is composed of a series of basic action primitives, and an "image recognition" module is composed of a series of perception primitives and internal processing primitives. These modules are responsible for processing specific types of tasks or information flows and mainly communicate with adjacent layers. They constitute the core library of Agent capabilities, but this library is not fixed, but can grow and reorganize dynamically.

  1. "The highest level (cognition, goals and reflection):"

This is the "cerebral cortex" of the agent, responsible for understanding high-level intentions and goals, making long-term, global plans, coordinating the work of middle-level modules, and conducting self-reflection and learning. It receives key information abstracted from the middle layer and issues high-level instructions to influence lower-level behaviors.

More importantly, this layer needs to be able to treat its own "goals", "plans", "experiences" and even "behavioral logic" as data - just like the von Neumann architecture treats programs. This means that it can reflect on past execution processes (program trajectories), learn from successes and failures, and dynamically modify the combination of middle-layer modules based on these learnings, and even adjust its own planning strategy (modify its own "program").

A few key cross-layer communication channels connect the highest layer with the lower layer to respond to emergencies and achieve efficient control. For example, an emergency target set by the upper layer can directly trigger a series of rapid response primitive combinations at the lower layer.

Such an architecture has both powerful underlying execution capabilities (von Neumann) and a clear and efficient organizational structure (hierarchy). Its versatility and adaptability come from the flexible combination of the lowest-level primitives (atomicity) and the highest-level learning, control, and "self-modification" of the combination (deep application of von Neumann's "program is data").

However, I am not sure whether it is appropriate to follow this approach to build an agent at this stage, because some details may require progress in LLM itself (such as autonomously deciding whether to perform deeper reasoning and autonomously deciding the cost of deep reasoning), rather than relying on agent engineering optimization.

But I think this is a possible way to build a general agent.