Reflections of AI entrepreneurs: The neglected “fast” and “long”

Written by

Clara Bennett

Updated on:June-13th-2025

When the AI wave just started, we rushed into the top 30 of the US overall ranking. At that time, looking at the companies on the same screen, they were all well-known large companies, two of which were my old employers in the United States. I felt very proud.

After two years of ups and downs in this wave of AI technology, my cognition spiraled upwards, and I was repeatedly hit and ravaged by reality. I gradually realized which product directions were wrong (from the perspectives of retention, revenue, moat, etc.), and then pivoted little by little - thinking, practicing, feedback, and iteration. This article records some of the setbacks and reflections at this stage.

I had previously overlooked the importance of speed - I couldn't experience it in China, but when I was living in Silicon Valley, I found that ChatGPT was so easy to use. Although the value of a single user was not that high, user habits would be affected by convenience. On the other hand, Perplexity also gained popularity at the beginning because of its speed. Since I added Cloudflare check, my usage rate has dropped significantly. This article ChatGPT, written in 2023, is the 91 mobile assistant in the AI era. My judgment at the time was still rash. Smooth, fast loading , and invaluable
The previous article about L4 replacing white-collar workers is a wrong idea. In fact, 90% of it is expanding the TAM of white-collar workers.
Example: bland.ai can make calls with real voices, calling pre-made decision trees and each "prompt cell". Use scenarios such as Flexport (a shipping and trucking company in the United States) calling drivers one by one to ask if they accept the order, and then syncing it to the demand side.

Let me give you another example: the industry is used to using SMS for recall and products for push, but now you can directly have a "real" customer service for activation and retention; it's a bit like when we were making games before, only big spenders could get the GS experience, with full customer service operations, emotional value and product experience soaring. In the past, in the scenario of small spenders and orders that were not large enough, because the commercial value was less than the cost of hiring people, only static productized experience could be used, but now customized experience (product -> product + sales/ops) can be achieved. For example, the first activation of Superhuman did not let you explore the product directly, but made a 30-minute video call, projected the screen step-by-step to achieve inbox zero, and had a pre-sales person accompany you to reach the aha moment, so the payment rate soared. This approach can be popularized in more scenarios with lower LTV in the future.

This directly overturns the distribution doldrum mentioned by Peter Thiel in "Zero to One" 10 years ago. Previously, only products with high ARPU could be sold, and low ARPU products could only rely on mkt distribution. Now, low ARPU products can also provide strong sales and customized experience!

Another example is localization. Previously, overseas companies mainly focused on pure air force (tools, information flow, games) and airborne troops (e-commerce), and many land force businesses failed (Didi only won Brazil, OPay shrank in Africa). With L4 AI employees, they should not replace traditional air force employees, but expand the land force at a low cost ...
How to do Workflow Capture ?
When new markets emerge, they are integrated into the incremental business, and then the models and cloud vendors used are abstracted away; quoting Wang Chuan, Wang Chuan: It is better to have high-dimensional abstract grass than low-dimensional concrete seedlings. Business value creation and value capture are two different things. Whoever is pipelined will be commoditized, and it will become a perfect competition, and there will be no excess profits. For example, WeChat pipelined operators, and ByteDance pipelined app stores.

Marc Andresseen said in a recent interview that the biggest guiding significance of the browser war is that Unix finally defeated all proper server OS, and SUN's $100 billion was finally reduced to zero. The open source model will make the model layer pipelined. The final value is in the application layer, workflow capture - return to migration costs and network effects
I ignored the real practical effect of long context.
Gemini 2.5 pro gave me a glimpse, it turns out that long context is really so useful. The RAG/enterprise knowledge base that the industry has been talking about for three years is all nonsense, because the model can only use a little bit of context at runtime, and it can't achieve the effect that the sales said. (Many times the boss of a large SaaS company is the biggest BS sales)

However, after the latest 1m token context real usable model came out, throwing 30 documents and talking again is a completely different product.

In fact, Eric Schmidt made it very clear in his Stanford speech last year, which was less hallucination, longer context, more available multi-modal and multi-agency. The path was laid out very early. But it was only when the model was actually developed that I realized it. I was really slow to realize it!
Reflection: What will the C-end experience be like if I ignore the fast and long contexts, including fewer hallucinations? Why did I make such a mistake?

The micro-perception of model practitioners: Many product requirements written in PRD are actually meaningless, because it is difficult to determine whether the model can be made based on the product requirement description.

For example, if you want the model to call relevant documents in the enterprise knowledge base to give suggestions, the effect/speed/cost will be greatly different depending on the model and context length, RAG or all the documents .

So it is meaningless for a product manager to write such a requirement. How to solve this problem? I can only think of playing around with it more, and then doing a lot of ab experiments, and when doing experiments, there must be strict restrictions on the base model and prompt parameters. For example, if a feature does not work, it may not be because the PRD does not work. Maybe changing to a larger base model will make it work.

In fact, Zhang Yueguang has told us about this before. The difference between Miaoya Camera and all other similar photo studios is the model capability. Keywords: real, like, beautiful. The same is true for DeepSeek. C-end product managers need to make great strides to improve their cognitive models. Many times, the biggest positive benefits are in adjusting the model, how to call and use it, rather than in UI/UX. Models are superpowers and the source of power.

Similarly, the work of investors is also difficult. I have observed that investors often focus on product positioning, traffic and interaction, but in fact they still need to look at the C-end experience variables brought by fast/long/smart. The investor I admire most is Yuri Milner. He is the only one among these big bosses who sat with me and played with the product hand in hand, asking me about the details of screen delay. Everyone will look at 30-day retention, user interview summary, and TAM, but how many people will play with it themselves? There are 10 mobile Internet companies in China and the United States that have reached 100 billion US dollars, and he has invested in 7 of them.
Learning from History in the Age of Recommendation Algorithms
The last one standing will win. The strongest product is the UGC short content ecosystem that can maximize the power of the recommendation algorithm. The same may be true in the AI era. The product that maximizes the superpowers of the model will win.

Sundar Pichai's latest thinking is to do only one thing: to make the best model, scenario, and business model. It is a great wisdom to manage 200,000 people and focus on only one thing.