Woter AI detection.Hurry - ends Jul 21st

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

Talking about Agents with Silicon Valley Entrepreneurs: What are the technical bottlenecks for starting an agent business this year?

Written by

Audrey Miles

Updated on:July-10th-2025

Manus, released in March, allowed the public to experience the power of Agent for the first time.

The release of R1, o3 and more reasoning models also provides a strong enough technical foundation for the development of Agent. As many people say, 2025 is the real year of Agent.

However, there are still many issues to be resolved regarding Agents, such as what are the current technical bottlenecks? How should Agents collaborate in the future? Is now a good time to start an Agent business?

On March 16, Global Ready, a global closed-door community under Geek Park, jointly organized a closed-door discussion with Integer Intelligence, inviting AI experts and entrepreneurs from Silicon Valley to conduct an in-depth discussion on the implementation of Agent, technical difficulties, and commercialization direction.

After desensitization, we have compiled the key points of this closed-door discussion. We would like to thank GR member Integer Intelligence for its support for this event.

Main speakers:

Yifeng Yin, host of this Global Ready event, Co-founder of a stealth start-up, ex-HuggingFace

Kecheng Huang, Co-founder & CEO, Emerging AI

Dongxu Huang, Co-founder & CTO, PingCAP

Zheqing (Bill) Zhu, Founder & CEO, Pokee AI

About integer intelligence:

Originated from the Computer Innovation Technology Research Institute of Zhejiang University, it provides intelligent data engineering platform and data set construction services (including data collection, data cleaning, and data labeling), covering data processing capabilities of multiple modal types including images, videos, text, audio, point clouds, etc. Through AI automation, it improves data labeling efficiency by 500%-1000%.

About Global Ready:

Global Ready Community is a global innovator community incubated by GeekPark, exploring the infinite possibilities of technology with the world's top innovators.

Community members can enjoy: efficient connection with 500+ overseas founders, technical experts and investors, tickets to closed-door events and other benefits. We hope to become your efficient API to connect with the world.

01 Agent’s current technical bottlenecks:

More tooling, more context

Yifeng Yin: Today we are talking about agents. The commercial potential of a technology depends mainly on what it can disrupt. So, what problems are agents the best solution for? If we want agents to truly solve these problems, what technological breakthroughs do we need?

Zheqing (Bill) Zhu: I personally think that if we start from first principles and assume that all end-to-end communications in the future will be completed by agents, then human-centric web browsing may disappear, and information transmission and task execution will be completely achieved by collaboration between agents.

This may require several conditions: first, the Internet itself may need to be reconstructed, no longer relying on browser operations, but allowing agents to directly perform tasks; second, the execution capabilities of agents must be greatly improved. Current agents, such as those made with Claude or GPT-4o, have very limited execution capabilities. They may reach their limit after calling 50 tools, and errors will occur if they call more. Therefore, what we need to solve in the future is: how to enable agents to autonomously call thousands of tools in unknown environments to complete complex tasks? This may be the direction we need to break through.

Dongxu Huang: My conclusion is similar to Bill's, but my perspective is slightly different.

When evaluating tools, I pay more attention to what we really need from a human perspective, rather than simply looking at what the tool can do or what its application scenarios are. Just as the invention of the Internet was to solve the efficiency problems of human communication and information acquisition, agents should also match our core needs. I recently reread the Transformer paper, and the title "Attention is All You Need" is very philosophical. I think it can be extended to say that from a human perspective, "Attention is all we have" - attention is all we have and the resource we want to control the most.

Things like booking air tickets and hotels are annoying, and require concentration to compare prices and choose, but we don't want to waste energy on this. We prefer to spend our time and attention on things we like, such as going outdoors to breathe nature, playing or reading. Therefore, the essence of the agent should be to help us spend our time where we really want to spend it. This is why I think general or personal assistant agents will be an important direction. In the past, search engines or mobile applications were simple processes designed based on fixed scenarios and lacked flexibility. But with the popularity of GenAI, we can finally achieve higher flexibility. For example, when ChatGPT first came out, it was just a conversation tool, but now it can handle the range of 50 tools. I believe that it will go a step further in the future and become a real personal assistant, and anything you don't want to do can be handed over to it.

To give another example, as a corporate executive, I definitely need a personal assistant, but it is impossible to equip every employee with one, it is too expensive. But this demand is universal. If AI agents can allow everyone to have an assistant and better allocate attention to where we want to focus, this is the core problem it can solve. Therefore, the direction of future agents may be how to amplify human attention management capabilities through technology. This is a bit abstract, but I hope it can give everyone a different perspective: What do we want as humans?

Kecheng Huang: Bill just mentioned that web browsing may be reconstructed, and I have similar ideas. When ChatGPT appeared, everyone started using it, and even with xAI, our team was asking ourselves: When new forms of interaction emerge, what will happen to the old forms? I read a book called "Communication Media", which mentioned that when new media appear, the old media will not disappear completely, but will be folded into specific scenarios. For example, after the emergence of computers, paper and pen did not disappear, but their uses became narrower .

I think web browsing, ChatGPT, and agents can be seen as three iterations. Now when we use Google, we usually know exactly what we want to search for, such as going to the official website of Abakai AI or the official website of Hugging Face. Direct search is the fastest way. Chatbots like ChatGPT are suitable for ambiguous thinking or heuristic scenarios. As for agents, I think they are slowly emerging. With the improvement of underlying infrastructure, data accumulation, and exploration of product paradigms, agents are beginning to show their potential. For example, Manus, which has been very popular recently, can condense your historical behavior into a framework to help you handle ambiguous tasks.

From a technical perspective, I think agents have three core capabilities that are more important than chatbots or web pages: first, multimodal understanding, which is not limited to text, but can understand needs in multiple forms; second, real-time environmental perception, because when performing tasks, it is necessary to call constantly changing services, and agents must be able to adapt dynamically; third, personalized data integration capabilities, integrating personal historical data - various modalities, various scenarios - to truly understand you. This requires efficient data storage and conversion, such as knowledge graphs. These three points make me very excited about the future of agents. Manus is just the beginning, and there will be more exciting things to come.

Dongxu Huang: I particularly agree with the two points mentioned by Kecheng. Let me summarize briefly: First, we lack an Internet between agents, just like a router, which allows agents to communicate efficiently. Now when I am doing MCP development, I find it very troublesome to call more than a dozen services. Some want to be deployed locally, and some want to be placed remotely. The context sharing between agents is also not enough. The second point is memory. Agents will eventually provide personalized services, which is inseparable from data. But the current software industry's data processing interface is not suitable for AI or agents, and may need to be reconstructed. For example, how to operate data? The best one now may be SQL, but there may be more AI-friendly ways in the future.

In addition, there are differences between the information exchange between agents and humans. Humans rely on message passing, but agents do not need such an inefficient method. I saw a demo a while ago where two AI agents were whispering on the phone. It was quite interesting, but I think this is completely the wrong direction. Memory sharing between agents should not rely on phone calls, but directly give an S3 endpoint and load it into memory. Therefore, I think two things are missing: one is the agent's Internet, and the other is the agent's brain, including memory and context management.

02 Agent does not create new demand.

Instead, optimize the solution

Yifeng Yin: If we want agents to become solutions to certain problems and implement business models with agents as the core, what efforts do we need to make in terms of infrastructure? What kind of infrastructure is needed for agents to be truly implemented?

Zheqing (Bill) Zhu : Agents do not meet a completely new demand, but optimize existing demands. Some things were originally done by people, but in fact they can be handed over to machines. It is only because the Internet is not fully connected or the AI capabilities are insufficient that they have to rely on manpower.

For example, I posted a blog some time ago. I downloaded pictures and text from one platform, organized them into documents, and uploaded them to another platform. The whole process took me two hours. This is actually what an agent should do: the first step is to download all the content from Google Docs and save it locally or in a storage; the second step is to upload this information to LinkedIn in the best way. But there is no tool on the Internet that can do it in one click. Agents are going to solve this kind of repetitive labor, letting robots replace people.

I think we need to solve the repetitive labor on the Internet first, such as replacing a two-hour or two-day task with 15 seconds. If this can be done, there will definitely be a market, and product fit (PMF) will naturally emerge. In terms of infrastructure, I don’t think there is a gap in computing power, and it is enough now. The core problem lies in the integrity of the tools, such as the lack of a unified standard interface. Like the Internet between agents mentioned earlier, even the chain for a single agent to call a tool is not perfect. After planning a series of operations, can you find the corresponding tool to execute? This has not been solved yet. First, the tool chain must be standardized. Whether it is the government or an individual, as long as they access the agent, they use the same format. This may be the first step to implementation. Later, consider communication between agents, or the connection between the virtual world and the physical world, which is a more distant future.

Dongxu Huang: The tool chain Bill talked about overlaps with my ideas, such as the Internet and memory capabilities between agents. But from a human perspective, I think there is a deeper problem.

For example, if I want to buy a plane ticket, I have to go to the United or Booking website. The agent must help me complete it smoothly and must be able to read my WeChat or access these platforms. This is not a technical problem, but a problem of human nature or business barriers. When will humans really feel that they should make the world a better place instead of guarding their own moats, closing APIs, and implementing various closed-source and restricted services? Therefore, the biggest challenge for the implementation of agents may not be technology, but human nature. If existing players do not change their thinking, there may only be new LinkedIns and new Bookings rising.

Reshaping the market based on an open ecosystem is more important than technological breakthroughs.

Technically, I think we need an agent internet based on trust. Just like the current Internet is based on TCP/IP, the agent ecosystem must also have similar underlying support. Openness and collaboration are key, which is why I am optimistic about open source - it can promote collective wisdom rather than a single company dominating.

Kecheng Huang: Infrastructure is indeed a big topic. In terms of computing power, algorithms and models, open source models are becoming stronger and stronger, and algorithms are also improving. As for data, Dongxu just mentioned how to make industry data fairer and more open, which is something that both enterprises and governments need to promote.

The construction of cloud computing clusters is already mature, but there is still room for optimization in the flow of computing power on the end and edge. Our mobile phones are designed for the Internet era, but in the agent era, multimodal data and high-concurrency tasks will bring new challenges, which requires more investment.

As for the role of the government, supervision is becoming increasingly important in the development of China and the United States. Domestic large models still need to be registered and approved. How to balance between accelerating innovation and maintaining social stability is a difficult problem. Another interesting point is the social background. If agents and embodied robots replace more manpower, what will happen to the remaining labor force? We found in our global business that China and the United States are both hotly discussing AI, but in Japan and Europe, the penetration rate of AI is not that high, and these mature societies have defense mechanisms. Now China and the United States lack mechanisms in this regard, and I am worried that there will be more protests like the Wuhan carrot run in the future.

Zheqing (Bill) Zhu: I am not that worried. From the Industrial Revolution to the Information Age, the population has increased a lot, but the employment rate has not decreased, but increased. After the execution level is replaced by agents, will new space for creative work be opened up? For example, now there are 20 Hollywood blockbusters a year. Can there be 5 million in the future, with personalized versions for everyone?

03 2025 is the beginning of Agent determinism

Audience question: Nowadays, everyone says that agents and AI agents are indispensable in life. As a company founder, how do you ensure that your products will not be commercialized by big companies? How do you position your company in the future?

Zheqing (Bill) Zhu : This is an interesting question, but it seems a bit like we have to win, haha. I don’t think this is a winner-takes-all game. Look at the current models: Anthropic is good at code and writing, OpenAI has advantages in reasoning and mathematics, and Perplexity can search. There will be many homogeneous products in the market, but they will evolve naturally, find their own comfort zone, and differentiate themselves from each other.

I don’t think it will be a winner-takes-all game in the end. There will definitely be vertical agents and different infrastructures. Each company may use multiple infrastructures at the same time, just like using several major cloud vendors now. Therefore, it is not the first-mover advantage or the strongest technology that will win, but whether you can find the right product-market fit (PMF) and find your own niche to survive.

Dongxu Huang: I agree that AI is the biggest lever for all companies in the future, but this lever is not for winning, but to make our lives and work better. In our company PingCAP, we have a practice: everyone has to write in their reports how to use LLM or GenAI to improve efficiency, even the front desk is no exception. Programmers have tasted the sweetness of using tools like Cursor, and I didn't even reimburse them for the expenses, but they paid for it themselves.

For me, AI is not a competitive point for the main business, but a productivity change for the entire society. All industries will be transformed by AI, whether they like it or not. As for positioning, we are engaged in data infrastructure, which will also be reshaped by AI, but the key is to embrace it, rather than thinking about defeating anyone.

Yifeng Yin: If the current basic model capabilities stop here, how much space can be created by engineering and agent based on the current level? In other words, how much commercial value can be generated?

Dongxu Huang: I have been thinking about this for the past year. I have been struggling whether to invest more energy in building agent frameworks and workflows, or wait for the underlying technical models to improve. Because there is an awkward situation: if you make the agent workflow very complex early on, but the model capabilities suddenly improve, like when OpenAI's latest inference model comes out, the previous efforts may be wasted. For example, before OpenAI's o1 came out, all the workflows I made were unusable because the underlying model was too weak. But the emergence of o1 made me feel that I could start to do something useful, it is a starting point.

But o1 is just the first step. So now I don’t build specific frameworks, but focus on function calling and tool development to lay a solid foundation. I am looking forward to o2 and o3, such as DeepSeek’s next-generation model. By then, the market space will be larger and the certainty will be higher. Back to your question, I think it’s just the beginning now. Based on reasoning models like o1, there are already some good business scenarios. The space will become larger and larger in the future, but it is still at the beginning of certainty. Just like the success of dify in recent years, but after technological advancement, the old paradigm may have to be iterated. So my suggestion is to wait and see, and start testing the waters based on existing capabilities.

Zhao He: I would like to add something from a non-technical perspective. This question is actually asking: If the model does not improve, can it do everything? Of course it will improve, but Dongxu’s answer from the perspective of "whether to wait" is very interesting. I would like to answer it again from a different perspective.

I have been reflecting: the current model is already very powerful, but why are the things produced still unsatisfactory? I have observed two reasons. First, Yifeng talked to me before, Google's model is to have cash cow business to support scientists, and then powerful engineers translate science into products, and finally designers understand user needs. The current problem is that there are many scientists, but too few engineers who can translate the underlying technology into practical products. This is a problem of broken information transmission chain, which requires time to accumulate.

Second, it is difficult for the previous generation of Internet technology people to have a zero mentality. When new technologies come out, if they threaten their status, it becomes a mistake - "How can you be better than me?" Either they are recruited by them or they are killed. This resistance hinders innovation. So even if the technology is sufficient, good things cannot come out. This is not a technical bottleneck, but a bottleneck of thinking. You have to break through this first before you can release value.