How to design the interaction between AI and humans? And why real innovation must be centralized?

Written by
Silas Grey
Updated on:June-28th-2025
Recommendation

Explore the essence of AI interaction design and reveal the core ideas of product innovation.

Core content:
1. Key elements of AI and human interaction design
2. The balance between recall rate and precision rate in product design
3. The importance of centralized innovation to product innovation

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)


Real product innovation cannot be achieved through cross-departmental collaboration, it can only be achieved through centralized creation.
I will start with the "violent theory" and then get to the point at the end.
Because we don't need ambiguous and beautiful words, we need sharp opinions. Otherwise, the meaning of sharing will be lost, and there will be no exchange of ideas.



Article structure:

1. Let’s first talk about how to design the interaction between AI and people

2. What is a better design?

3. Why “real product innovation cannot rely on cross-departmental collaboration, but can only be created in a centralized way.”

4. Postscript & Easter Eggs

5. Contact Information



Let’s first talk about how to design the interaction between AI and humans.

This topic is very broad, so this article will give examples and try to make a point in the middle clear.
Existing products should not be discussed too much, otherwise it will be like this article " Fellou is not an AI browser! Fellou is not Manus. " It is easy to be misunderstood as "Guangzi" and miss the content.
So, let's take the previous example, Google Photo and iOS photo album, which was a classic question I asked when interviewing AI product managers many years ago. To be more precise, the product managers recruited at that time were called product managers for smart products.
I give this example because you may have used these two products or similar products more or less, that is, you can enter keywords in the input box in the photo album and search for photos. It may be Souma, Sogou, or a photo taken with friends on a certain day.
The same function, what is the difference between the two? You may find that if you have taken a lot of dog photos, you can search for a lot of dogs when you type "dog" in your iPhone, but some dogs may not be found. The dogs that cannot be found may be because the photos are blurred, or they may look like monkeys, such as my favorite monkey-faced terrier, which even looks like a little lion or a "little devil", as shown in the picture below.


In short, it doesn't look that much like most dogs.
But if you search for "dog" in Google Photo, you'll probably find that it will basically show you all the dogs, but it might also show you photos of your stuffed animals or even a lion cub's butt.
So there is a significant difference between the two in terms of product design tendencies. What is this difference?
We all know that there are two classic technical indicators in search products. The same is true for RAG, and we need to look at these two indicators.
That is , recall and precision . Wikipedia explains it as follows .
?

Recall rate = number of correct targets retrieved ÷ total number of targets .

Precision = the number of correct targets retrieved ÷ the total number of targets returned by the search .

Recall rate can also be called recall rate , and precision rate can also be called precision rate. The names are easier to understand. Let's use an example to illustrate it.

So, you can understand that recall means that if you have 100 dog photos and a bunch of cat photos in your photo album, and nothing else, then if the system finds 100 dog photos for you, that’s the best. At this point, recall = 100 / 100 = 100%, and precision is also 100%.

Even if all 10 cats are picked up at the same time, the recall rate is still 100% , but the precision rate is only 100 / (100 +10) ≈ 91% , because it only looks at whether the real dogs are missed, not whether the search is complete.

But if you found 90 photos of dogs and missed 10 of them, that's a bit off. At this point, recall = 90 / 100 = 90% (10 photos were missed, not complete) but precision could be 100% , as long as all 90 photos are of dogs.

?

What about the precision rate? If the system finds 90 photos for you, and all 90 photos are of dogs, the precision rate = 90 / 90 = 100% (even if there are still 10 photos of dogs that have not been retrieved, the precision rate is still 100%, it only cares about whether the ones retrieved are accurate), but the recall rate is only 90% .

If you find 90 photos, but there are 10 cat photos among the 90 photos, and only 80 are real dogs, then it is not good. The precision rate = 80 / 90 ≈ 89% (because there are 10 cat photos mixed in), and the recall rate is only 80% .

As shown in the following table.
Scenario

System returns content

Recall

Accuracy

what happened

S

Only 100 real dogs

100%

100%

All the real dogs have been found. All the real dogs have been found.

A

100 real dogs + 10 cats

100%

91%

Not a single dog was missed, but three of the four were cats.

B

90 real dogs (no cats)

90%

100%

Missed 10 dogs, but all were accurate

C

80 real dogs + 10 cats

80%

89%

Leakage and mixing, both indicators dropped

In other words, recall is concerned with whether all the real dogs have been caught, while precision is concerned with whether all the dogs caught are real dogs.
Obviously, the goal of the model is generally to continuously optimize itself so that both can be improved at the same time to reach the optimal state.
But in the real world, if other conditions such as model capabilities remain unchanged and are only controlled by adjusting the confidence threshold , the two are, to a certain extent, in a trade-off relationship. If you want to "leave no one behind", you have to cast the net wider, and cats are bound to sneak in; if you want to "only catch real dogs", you have to tighten the mesh, and you may miss some marginal dogs.

Threshold adjustment only

Recall

Precision

Lower threshold/Enlarge mesh

↑ (less missed detection)

↓ (Increase in false positives)

Raise threshold/tighten mesh

↓ (Increase in missed detections)

↑ (reduction in false positives)

These two indicators are like the two ends of a seesaw. When one side is high, the other side is low. We call this the seesaw effect . This is a common phenomenon in scenarios such as "search".
So how to adjust it in the end depends on the business scenario. If you want to "not miss any", you tend to increase the recall rate; if you want to "only the real dogs", you tend to increase the precision rate.
So, let’s get back to the topic.
You can imagine that if you were to design a product like iOS Photos or Google Photo, in the scene where you enter text to search for photos in the album, since there is a seesaw effect, how would you choose?
If you haven't thought about this question, you can stop and think about it before reading on.
We will publish their practices first. Then you decide who is doing well and who is not.
iOS Photos generally prioritizes precision, while Google Photo prioritizes recall. Priority means that the accuracy should be higher here, but not too low there.
You can try to be the judge to see who does the best.
We are stating clearly that Google is right here and Apple is doing a poor job.
?

Why? Because in the search scenario, users need to find photos the most. Even if there are some cats and stuffed toys mixed in, it will not cause significant loss to users, because it will not affect your dog watching or sharing, and if you want to make an album, you can just delete the non-dog parts.

But if some dogs are not found, you will find it hard to accept. Is it because you don’t love them enough and didn’t take a photo, or is it because your memory is wrong and you didn’t take a photo at that time? You may even suspect that you deleted the photos by mistake because most users don’t know the design logic behind the product. They will think there is something wrong with them.

Yes, Apple may have continued its usual design style here, “don’t make mistakes”, while Google continued its search style, “find more”.
But there is actually a hidden question behind this, which is how much design do we need to do for users? As producers of tools and content, as product developers, and as interaction designers, are we necessarily right? Should we make decisions entirely for users? Or, should we be the "master" for users?
Going further, the implicit question is how should producers design better interactions and leave them to users? Should they assume that users do not understand them better than they do?
This problem is even more common in current AI product design and agent design.
Which ones should be handed over to AI and which ones should be handed over to users? How to interact with users? How to interact more elegantly?
As AI becomes more and more powerful, I believe product developers are gradually realizing “Less Structure, More Intelligence”. But do we need to pretend that AI is omnipotent and knows you better than you do yourself? If not, then we don’t have to think that asking users to confirm their needs, make clear their wishes, and make clear their constraints is an ugly design. Visual ugliness or elegance can be designed, but pretending that AI is omnipotent is arrogant. This has actually been improved in many excellent projects that have already been run, but it can be better.
Back to the photo album example, does the seesaw principle exist? Is it good enough for us to achieve what Google Photo does?

What is a better design?

Consider the specific scenario of searching for pictures by entering short text in the mobile phone album. We present a better design in one paragraph. Because it is very simple, we will not let AI generate an interactive demo.

1. After the user enters text, the first screen (or the first two screens) of the query adopts a precision-first strategy, that is, to ensure that the results found are real dogs as much as possible.

2. Then, from the second (or third) screen onwards, gradually adopt a recall-first strategy. Until the precision drops to a level that obviously hurts the user experience, there can be a downward click to ask the user, "Do you want to expand the search to find as many dog ​​photos as possible?" This inquiry action can even be triggered when the user scrolls down the screen too quickly.

3. Going a step further, when the number of searches is too small, the user can be prompted to add more search information, expand the search scope, or search for something else. These operations can also be designed to be very "humane".

4. Considering what I have proposed, a good AI product should actively build the three elements of "Profile, Preference, Context", understand the user profile, preference, and context, and design and collect some good signal feedback, so as to achieve smarter, more personalized, more situational, and more humane services. In fact, there is no end to the improvement of user experience. See the series of articles on # AI product design for details .

Yes, interacting with users does not make your product look stupid, but elegant. However, this interaction is not like some product designs:
“Why don’t I just make a control bar for recall and precision and let the user adjust it themselves?”
“Wouldn’t it be better if I could give users a choice between prioritizing precision or recall?”
What is the difference between this and ChatGPT, which puts a bunch of models in the upper left corner and lets users choose which one to use?
What is the difference between this and ChatGPT, which lets users choose which model to use when they retry?
Note: The above strategic orientations of Google Photo and iOS Albums are based on my experience a few years ago. We can re-examine whether they have made any optimizations now.

True product innovation cannot be achieved across departments, but can only be centralized. Why does the design of AI products need to be centralized and driven by organizational innovation?

So, I think we can go back to the beginning and answer the question indirectly with an example. Because the design of product experience and process, interaction design, indicator trade-offs, solution design, etc. require a set of fast closed-loop, fast-iterative non-linear collaboration, the traditional overly professional division of labor process will hinder the emergence of intelligence. A highly optimized user experience is not the job of any one person, such as product managers, UX designers, UI designers, user research, and technical R&D. Moreover, in the traditional division of labor, they each have their own responsibilities and lack frequent, in-depth, and mutually trusted communication and co-creation.
The ultimate user experience is never a solo performance by a certain role, but an improvisational ensemble of multiple roles. To make the Jam Session smooth, you must bring people together and tear down the walls. Use real OKRs instead of KPIs disguised as OKRs.
Don’t treat means as ends, and don’t treat skills as professions. ——《AI Helps You Win》

Postscript & Easter Eggs

The article complained about ChatGPT's product design. One possible response would be "OpenAI is a company that focuses on the underlying model capabilities and is going for AGI. Product design doesn't matter and there is no need to waste resources." I'll post a link to the interview question about album design that I discussed with Yuanbao. It answered quickly and in the right direction. As long as the question is right, the answer will not be too far away. You can see it here.
Of course, driven by the multimodal big model, the experience no longer needs to rely on the solution we give in "What is a better design?" Because it is still a preset interaction. It can be smarter. But the design of the entire system still involves human choices.
Yesterday, I had dinner with Wenfeng and Mianmian, and we talked about AI/Agent product design. This article is a record. In addition, we also talked about a topic: strategically avoid the false and attack the real, and tactically avoid the real and attack the false. This is just my opinion, and I can expand on it later.
In addition, I also updated a shortcut command in the previous article "I Taught Apple How to Be an Agent" . It is really useful, so I updated it. You can double-click your phone in any interface (such as WeChat) to automatically create a schedule for you, and then iOS will calculate your departure time in real time. Ji Gang said that because of this command, he has switched back to the iOS native calendar. As long as you pay attention to the native ecosystem, the experience will definitely be great. Next, let's see if the AI ​​function released by Apple will perform according to the script I wrote.