You think AI understands, but it doesn’t

Subvert your understanding of AI "understanding" and challenge the traditional imagination of intelligence.
Core content:
1. New York University research reveals the "intelligence" truth of large language models
2. Information theory framework tests the understanding of concepts and meanings of large models
3. The fundamental difference between humans and AI in conceptual understanding
I recently read a NYU (New York University) study that completely overturned my perception of the "intelligence" of large language models. For a long time, we have been arguing whether LLMs (large language models) really think like humans, and now someone has finally given an answer using a scientific method - the results may surprise you.
The question at the heart of this research is simple yet profound: Do large language models truly understand concepts and meaning, or are they merely performing complex statistical pattern matching?
To answer this question, the research team used classic experiments in the field of cognitive psychology as a benchmark. Instead of using crowdsourced data, they used rigorous scientific benchmarks - datasets that have been used to study how humans actually categorize things, such as how humans understand concepts like "bird" or "furniture."
Experimental design: Testing conceptual understanding using an information-theoretic framework
The research team tested more than 30 large language models, including the familiar BERT, Llama, Gemma, Qwen, etc. They used a clever information theory framework to measure the trade-off between two key indicators:
• Compression efficiency : how effectively the model organizes information • Semantic preservation : how much semantic details the model retains
This framework reminds me of an interesting analogy: Imagine you are organizing a huge library. You can choose to simply arrange all the books in alphabetical order (high compression, but lose subject information), or arrange them according to a complex subject classification system (maintain more semantic information, but more complex organization).
Finding 1: Good news—LLMs do form concepts
The first finding was reassuring: LLMs were indeed able to form broad conceptual categories with remarkable agreement with humans , with agreement far exceeding chance.
More interestingly, the study found that smaller encoder models (such as BERT) actually outperformed larger models in this regard. This finding challenges our traditional perception that "bigger is better". It seems that size is not everything in this specific task of concept understanding .
This got me thinking: maybe we’ve been pursuing advancements in AI in the wrong direction?
Finding 2: The devil is in the details—LLMs lack an understanding of “typicality”
However, the second finding reveals a key problem: LLMs have obvious difficulties in fine-grained semantic distinctions .
What is "typicality"? In simple terms, humans know that a robin is more typical of a "bird" than a penguin, and a rose is more typical of a "plant" than a cactus. This understanding helps us make quick judgments and inferences in the complex real world.
But LLMs can't do this. Their internal conceptual structure can't match humans' intuitive understanding of category membership. It's like a person being able to recognize all birds but not understanding why some are "more bird-like" than others.
Finding 3: Fundamental Differences—Divergence in Optimization Goals
What shocked me the most was the third finding, which revealed a fundamental difference between LLMs and humans :
• LLMs strategy : aggressive statistics compression (minimizing redundancy) • Human strategies : adaptive richness (maintaining flexibility and context)
This difference explains why LLMs can simultaneously demonstrate impressive capabilities while missing some reasoning that is obvious to humans. They are not "broken" - they are just optimized for pattern matching, not for the kind of rich, contextual understanding that humans use.
Imagine if you were asked to save all important information with the least amount of storage space, you might create a highly compressed system. But if you need to use this information flexibly in various unpredictable situations, you might choose a more redundant but more flexible storage method. This is the difference between LLMs and humans.
Deep implications for the development of AI
The significance of this research goes far beyond the academic level. It provides three important insights for the future development of AI:
1. Scaling may not lead to human-like understanding
Current AI development strategies rely heavily on scaling up — bigger models, more data, more computing power. But this research suggests that simple scaling up may not lead to true human-like understanding .
We need to rethink: maybe the path to AGI is not bigger models, but smarter architectural design?
2. New architectures are needed to balance compression and semantic richness
Research shows that we need a new architecture that can balance compression efficiency and semantic richness . This is not a simple technical issue, but a fundamental design philosophy issue.
How can we make AI systems retain more semantic details and contextual information while maintaining efficiency? This may require us to develop completely new neural network architectures or training methods.
3. Re-examine the optimization goals
Most importantly, this research reminds us that we need to rethink the optimization goals of AI systems . If we want AI to understand the world more like humans, we may need to change their learning goals from pure statistical efficiency to more complex objective functions that include semantic richness.
Measurement tools: paving the way for future research
This study not only reveals the problem, but also provides tools for solving it. The compression-meaning trade-off measurement framework developed by the research team can be used to guide future AI development and help us build AI systems that are more consistent with human concept representation.
This reminds me of the early days of computer science. When we first started building computers, we focused on computing speed and storage capacity. But over time, we realized that user experience, usability, and human-computer interaction were just as important. Perhaps AI is at a similar turning point right now.
The wonderful combination of cognitive psychology and AI
As a technology practitioner, I particularly appreciate the way this study combines cognitive psychology and AI research. It reminds us that real AI progress may require cross-disciplinary collaboration .
Humans have spent millions of years evolving complex cognitive abilities, and we should not expect to replicate these abilities simply by increasing computing resources. Instead, we need to deeply understand the mechanisms of human cognition and then implement similar principles in AI systems.
Final Thoughts: Redefining "Intelligence"
This research made me rethink the definition of "intelligence". Maybe true intelligence is not about how much data can be processed or how complex calculations can be performed, but about being able to understand the world flexibly and contextually like humans.
LLMs have already performed well on many tasks, but they are fundamentally different from the way humans understand the world. Recognizing this is not to belittle the value of current AI technology, but to provide a clearer direction for future development.
Future AI systems may need to find a better balance between statistical efficiency and semantic richness, which will be a challenging but extremely interesting research direction. As an observer and participant of AI technology, I am full of expectations for this future.
After all, truly understanding how humans think may be the first step toward building truly intelligent machines.