Google released the open source model Gemma 3. It is superior to DeepSeek-V3/R1 in deployment efficiency.In the score rankings below, the dots represent the estimated NVIDIA H100 GPU requirements. Gemma 3 27B ranks higher and only requires one GPU (TPU is also acceptable), while DeepSeek-V3/R1 models require 32 each.Here are its eight highlights:1. Built on Gemini 2.0 technology.
2. Family bucket: 1B, 4B, 12B and 27B.
3. “The world’s best single-chip model” (single GPU or TPU).
4. Best non-inference open source model: LMArena outperforms Llama-405B, DeepSeek-V3, and o3-mini, and is second only to DeepSeek R1.
5. Multimodal: Possesses “advanced text and visual reasoning capabilities” and can “analyze images, texts, and short videos” at a 4B+ scale.
6, 128k token context window, out of the box.
7. Supports more than 35 languages, and pre-trained supports more than 140 languages.
8. No GPU required: 6144 TPUv5P cards were used to train 27B.
The entire training did not use a GPU card.Regarding training costs, the technical report states that each model configuration is optimized to minimize the time of the training step. For the vision encoder, the embeddings of each image are pre-calculated and directly used for training, so it does not increase the training cost of the language model. However, the report does not give a specific amount of training costs.Gemma-3 27B is a dense model that beat DeepSeek-V3 671B/37B to become the first non-inference open source model, but it is not as good as DeepSeek-R1 for inference. It can also enter the top ten when competing with those cutting-edge closed-source large models.(Evaluation of the Gemma 3 27B IT model in Chatbot Arena (Chiang et al., 2024). All models were evaluated in a blind test and scored by human reviewers in a side-by-side comparison. The score of each model is based on the Elo rating system. The data for Gemma-3-27B-IT are preliminary results and were received on March 8, 2025.)
Regarding training data, Gemma-3 27B trained 14 trillion tokens, 12B version trained 12 trillion tokens, 4B version trained 4 trillion tokens, and 1B version trained 2 trillion tokens. The increase in the number of training tokens is to accommodate the use of mixed image and text data in the pre-training process. In addition, the proportion of multilingual data has been increased to improve language coverage. Monolingual data and parallel data are introduced, and a strategy inspired by Chung et al. (2023) is adopted to deal with the uneven distribution of data in different languages.Google has had a clear open source idea from the beginning, that is, to release the open source small model Gemma at the same time as the proprietary cutting-edge model Gemini, which is used for terminal deployment on the Android operating system. This time, it reflects the consistent idea, and Gemma 3 has become the most suitable open source model for terminal deployment.So far, the cost of using Google's closed-source large model API and the efficiency of deploying closed-source models are both better than DeepSeek-V3/R1.Next, wait for DeepSeek-R2 to be released as soon as possible.