Talking about the future of attack and defense of LLM-generated text detection

Written by

Audrey Miles

Updated on:July-09th-2025

This article is not a technical article and does not discuss the specific technical solutions for current LLM text detection attack and defense.

0. Introduction

I recently tried the AIGC detection capability of VIP (the one used for my graduation thesis at school), and found that a solution I currently use, which I think generates high-quality text, is often judged as AI-generated. Of course, this shows that it does a good job.

But in my opinion, the quality of the manuscript I generated is actually very high, and there is no obvious AI trace of the low-end model. It is probably just because its wording style is more inclined to the "average" wording style in the academic community, which makes it easy to identify.

This got me thinking about this question.

Of course, this article only discusses text aspects. Images, voice and video aspects are other issues.

1. Quality of manuscripts generated by LLM and traces of AI

Before 2025, the content generated by LLM still had some easily identifiable characteristics, the so-called AI flavor. For example, the R1 model particularly liked to use some high-sounding terms such as quantum mechanics. In my opinion, many of the problems here are caused by the poor model capabilities used, or some problems in the post-training alignment stage of LLM.

But if you really use LLM to generate text seriously, it can actually be very effective . My current solution with the highest quality is to use o1 Pro to conceive the content first, then use GPT-4.5 to do a round of writing, and then use GPT-4.5 to do a round of review and remove redundancy. I think this solution is actually very good. Of course, if you want to deliberately distinguish, you can still see a little bit of characteristics, or at least his writing habits are different from mine. But for conveying information, it is already very good. I recently reposted some full-text podcasts, and the summary part at the beginning was generated in this way. Of course, in the actual generation process, I gave manual instructions on which parts to extract. It is not an automatic process, but the generation of text does rely on this workflow. If you are interested, you can compare and distinguish.

In addition, the full text of the podcast I reposted was actually generated by LLM workflow, and the model used to write the final text was Claude 3.5 Sonnet. I used it mainly because I took into account both cost and quality. GPT-4.5 would be closer to people, but for me, the call cost is higher, so I am still using this solution.

For example: Full interview with Google Labs VP Josh Woodward in Chinese

But the text generated by GPT-4.5 is still recognizable, which surprised me. After thinking about it, I think it is still a feature of the "average" wording. This is indeed an almost inevitable performance of LLM generated content. There is only a certain difference depending on the prompt and the task, and it is difficult to completely eliminate it.

2. My opinion on detecting LLM generated text

I personally think that no matter what kind of text it is, whether it is written by a person or spoken by a person, whether it is directly generated based on LLM or generated based on a complex workflow, the way to evaluate it should be to look at the quality of the text itself. Nowadays, people rarely make serious grammatical errors and produce texts that are completely incomprehensible. Therefore, the quality of text content now mainly depends on the content or information it expresses.

But in my opinion, the current LLM text generation method does not detect this point, but detects whether it is an "average" wording habit, because this feature is easier to identify, although it has become a bit difficult for humans to identify.

In order to pass this test, you actually need to add more personal features and some randomness in human wording. In my opinion, it does not significantly improve the value of the text, but rather adds a different flavor. It's like the difference between sugar produced automatically and sugar made by hand. Artificially produced sugar and text can be charged a higher price because of its higher production cost, as long as someone is willing to pay a premium for it. But I don't agree that the price of low-cost produced text should be suppressed institutionally. Although the text is output by LLM, the choice of what content to output, what instructions are there, and the choice of multiple results generated may all be done manually. The value of these manual tasks cannot be reflected under such a detection scheme. And I think these have a very large impact on the quality of the text. The difference between the quality of the text should be based on the actual quality of the text content, rather than whether its wording contains some personalized characteristics of humans.

I don’t know which side has the upper hand in the attack and defense of LLM generated text. But I do not have an automated way to break through VIP’s recognition method. I also don’t know whether the same method has been applied to other content platforms.

Assuming that the defense (detection) side is currently in an advantageous position, then in the future, as LLM is more widely used, more places will use it as a major feature to identify the quality of content because of its easy availability. This, in my opinion, is a step in the wrong direction. In this sense, I think it is important to maintain a balance of power between the offense and the defense. Only in this way can the whole society be forced to turn to identifying the quality of the content itself, rather than the wording habits.

If we can find a way to better model each person's wording habits and randomness and add it to the text, I think in the next 2-3 years, as LLM is more fully applied in the field of content production, the market value in this area will be very large, and it will naturally grow as the entire AIGC market grows.

3. Some technical judgments

Assuming a team starts doing the offensive side of generating text detection, that is, modifying text to pass the opponent's LLM generation recognition detection, then I expect they must be doing both offense and defense. It is hard for me to imagine a team that only knows how to attack but is weak in defense.

Of course, there are some tricky ways to combat this, but I don’t think these methods are sustainable. The long-term solution is to understand the difference between the text generated by humans and the “average” wording.

Of course, from the perspective of the platform, their offensive and defensive teams are likely to be strong in defense but weak in offense. In the future, they may also enhance their capabilities by acquiring some of these offensive companies.

My current judgment on this individual difference is that part of it is the individual habitual wording and expression of each person, and the other part is some randomness in the wording and conception. I think the habitual wording and expression account for the main part of this difference, and this will drift as the person ages.

4. Text and non-text, and other digital assets

I think text and other modalities are significantly different in application. Fabricating a non-existent, seemingly real video, a non-existent photo, or a non-existent audio will have a great impact on people. This is determined by human nature.

But words don’t actually contain such a large amount of information, and people don’t trust words that much. Whether a person writes a piece of text himself or gets a piece of text from other means, as long as he reads the content and approves of it, when he publishes it, it should be regarded as him agreeing with the content and he is willing to endorse it to a certain extent. This has no direct relationship with whether the text conforms to his personal writing habits.

For non-text modalities, I am in favor of adding AI-generated tags to content that is very close to real life, and not adding them to content that is obviously not a real photo or video.

But for information like text, or simple data tables, structured content, etc., I think it should be signed and responsible: signing and publishing the content is regarded as agreeing with the point of view. I am not inclined to give a good review just because a text content is written by a real person but it is rubbish; nor am I inclined to give a bad review to a content just because it is generated by AI.

This discussion can be further extended to other digital assets, such as web pages, executable programs, and 3D models.

Do we want to restrict a web page, a program, or an app from making a profit simply because it is generated by AI coding?

Does it mean that as long as a 3D model is not made step by step by humans, its value should be limited to a very low level through legal or policy means?

I don't agree with this logic, nor do I expect such a world.

5. What is the value of personalization?

The value of digital assets in the world should be determined by its use value, or the supply and demand relationship, especially in the future when the supply of digital assets is becoming more and more sufficient. Some people will prefer to buy purely handmade products, but I don’t think everyone can be forced to use or pay more for handmade products. Most people can just use a good enough product without limiting how it is produced.

Now the focus of attack and defense is on identifying this average wording style, which brings some additional premium to some personalized characteristics of individuals. Although I don’t agree with this, it is indeed an interesting thing.

Human nature is to follow the crowd, but now we are required to show our individuality and our uniqueness, and then we can explain that the results we deliver are not generated by AI. Just like the situation currently faced by college graduates. This is a bit funny, but it may be a reality we have to face. At least before the offensive and defensive sides are evenly matched.

A. Endnote

If any readers want to work in this field, or want to start a business in this field, and recognize the future value of this field, please contact me.

(Most of the text in this article was written by me using voice input method, and I have tried my best to correct the parts with recognition errors. But maybe because I am speaking instead of typing, it will be more colloquial. However, considering the theme of this article, this should be a copy that is less like the result generated by LLM.)