With So Many AI Models Out There, How Do You Choose? (Part 1)

Written by
Silas Grey
Updated on:June-15th-2025
Recommendation

Is it difficult to choose a large model? The OpenRouter rankings can help you.

Core content:
1. Introduction to the OpenRouter large model aggregation platform and its rankings
2. The rankings are based on the total number of prompt and completion tokens of the model
3. Analysis of the usage ratio of each model under 13 usage scenarios

 
Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

This article only wants to solve one problem: there are so many large models on the market now, how should we choose in our work scenarios?

 

Here is a list of OpenRouter models:

https://openrouter.ai/rankings

 

AI enthusiasts must be familiar with OpenRouter, which is an AI large model aggregation platform that brings together top large models from many manufacturers such as OpenAI, Anthropic , Google, DeepSeek, etc.

 

Its ranking is based on  the sum of prompt and completion tokens of each model  , which is then normalized using  the GPT-4 tokenizer  for comparison.

 

What does it mean?

Prompt and completion tokens  are the sum of the number of prompt tokens that the user inputs to the model and the number of tokens that the large model generates for the response.

The calculation of the token number uniformly uses  the GPT-4 tokenizer , which is a unified standard for calculating tokens.

 

The picture below is the GPT-4 tokenizer. Enter any content and the corresponding token number will be displayed.

Website: https://platform.openai.com/tokenizer

 

The OpenRouter rankings statistics are updated every 10 minutes.

This is actually done to ensure fairness.

 

Let’s take a look at this ranking, which is classified into 13 usage scenarios.

The bar chart can clearly show the usage ratio of different models.

Note that this is not a ranking of big model performance, but a ranking of big model token usage.

It can be seen which large model people prefer to use in various fields, as well as the changes in usage.

 

For example, in  the field of programming , the most commonly used one is not Claude-3.7-sonnet, nor the newly crowned king Gemini-2.5-pro, but GPT-4o-mini. This is because in the user's actual usage scenario, it is not necessarily the most powerful model that will be used. Factors such as price, ease of use, and scenario applicability must also be considered.

 

In the field of programming, the usage of mainstream large models is not much different, but if we look at  the fields of technology and science , GPT-4o-mini is the only one that dominates, accounting for 88.5%.

If you are a scientific researcher, you should definitely use GPT-4o-mini first to see the effect, and then try other large models.

 

In the frequently used  translation scenarios, we can see that Gemini-1.5-Flash-8B is the most used. In the past two months, the usage of Gemini-2.0-Flash has also increased, accounting for about 40%.

 

There are so many large models nowadays. Choosing a suitable large model in your own work scenario is more cost-effective than choosing the model with the strongest performance.

 

OpenRouter LLM Rankings is a good reference indicator when we choose large models.