Why is it difficult for a large model to use mathematical reasoning to solve a junior high school question?

The weakness of the big model in mathematical reasoning: the challenge of the first grade math problems.
Core content:
1. Why the big model's performance in mathematical reasoning is not satisfactory
2. Detailed analysis and problem-solving ideas of the first grade math problems
3. Comparison of the problem-solving process of different big models and summary of errors
In recent years, the progress of large models in mathematical reasoning has attracted attention, and major manufacturers have claimed that their models have leading mathematical capabilities. However, when faced with a junior high school math problem, we found that the performance of large models is still uneven, and even surprisingly "clumsy".
Topic Information
The following is a question from the winter vacation homework of the first grade of junior high school. You can think about this question by yourself first and see if you can solve it....
Description of the topic: In the laboratory, there are three cylindrical containers A, B, and C on a horizontal table (the containers are high enough), and the ratio of their bottom radii is 1:2:1. Two identical tubes are connected at a height of 5 cm in the container (that is, the bottom of the tube is 5 cm away from the bottom of the container). Now, among the three containers, only container A has water, and the water level is 1 cm high, as shown in the figure. If the same amount of water is poured into B and C at the same time every minute, and the water level of B rises by 5/6 cm after 1 minute of water injection, how many minutes of water injection will it take for the difference in the height of the water levels of A and B to be 0.5 cm.
I believe you have already had your own thoughts in mind. If you think about it carefully, the main points here are the area and volume calculation of the circle and the linear equation of one variable; the additional point is the principle of connected vessels. If you have a general understanding or common sense of these two problems, it should be relatively easy to solve them. Even if you can't consider all the situations, you can at least solve 1-2.
The answer is at the end
After finishing the solution, I threw this question to the big model .
Model problem-solving process
Here I will use the models in my previous article "Use one sentence to understand the basics of these large models" to test. The prompt is also relatively simple, as shown below:
Please solve the following junior high school math problem and give a detailed analysis process. The title is as follows:
----
Description of the topic: In the laboratory, there are three cylindrical containers A, B, and C on a horizontal table (the containers are high enough), and the ratio of their bottom radius is 1:2:1. Two identical tubes are connected at a height of 5 cm in the container (that is, the bottom of the tube is 5 cm away from the bottom of the container). Now, among the three containers, only container A has water, and the water level is 1 cm high. If the same amount of water is poured into B and C at the same time every minute, and the water level of B rises by 5/6 cm after 1 minute of water injection, how many minutes of water injection will it take for the difference in the height of the water levels of A and B to be 0.5 cm.
----
In order to prevent the model from crawling original questions from the Internet, all model chat boxes that can display the closing of online search are closed.
Because the reasoning process of most models is too long, it seems too difficult to put it in the article in the form of long screenshots or segmented screenshots, so we simply use screen recording to show it.
deepseek
I recently used the official chat, and it seems like I've returned to a one-on-one conversation. In more than 80% of the scenarios, I can ask the first time, but if I ask again, the server is busy.
Thousand Questions on Tongyi
ChatGPT
iFlytek Spark
Thousand Questions on Tongyi
contrast
But from the results, deepseek solved one, and the other one was wrong. The other models were basically wiped out...;
When I submitted the correct answer to them again for self-analysis and comparison, Chatgpt only did a simple analysis and did not calculate the question again, which was more in line with the prompt I gave; iFlytek Spark and Kimi 1.5 reanalyzed the whole process and calculated the correct answer. Tongyi Qianwen's answer was the same as deepseek's first answer; and deepseek started to get busy again and again and again ...
Some thoughts
It may be due to the problem of the questions, or the mathematical calculations involving physical phenomena, or the fact that there are no junior high school questions in the training data sets of these large models, which leads to problems such as lengthy reasoning, calculation errors, and deviations in physical understanding. Perhaps this does not mean that the mathematical ability of large models is worthless, but it means that it still needs to be further optimized in specific scenarios.
For AI researchers, how to make large models more accurate and efficient in mathematical reasoning; the explosion of DeepSeek at the beginning of the year has added more business possibilities and brought a lot of innovation to technology, but when we put aside those overused data sets used to brush the charts, true reinforcement learning and self-reasoning may still have a long way to go.
Original answer
There are three cylindrical containers A, B, and C (the containers are high enough), the ratio of the bottom radius is 1:2:1, and the water level in B rises by 5/6cm after one minute of water filling. Therefore, the water level in C rises by :
5/6 * 2^2 = 10 / 3 cm
Suppose that after the water is injected for t minutes, the difference in water level between A and B is 0.5cm; The difference in water level between A and B is 0.5cm. There are three situations: