Before AI is implemented, you need to invest in several databases

Written by
Jasper Cole
Updated on:June-28th-2025
Recommendation

Master the key steps before AI is implemented and gain a deep understanding of the importance of database investment.

Core content:
1. The necessity of database investment before AI is implemented
2. OpenAI database investment case analysis
3. Database technology route and domestic large model enterprise dynamics

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

Before AI is implemented, you need to invest in several databases. Wherever the data is, there will be AI. The core data of an enterprise is stored in various databases. Therefore, before AI is implemented, the ability to connect to the database must be optimized. If large model manufacturers want to grab the big cake of the enterprise market, they must first invest in several databases.


1. Why OpenAI invested in databases


In April 2025,  Kevin Weil , Chief Product Officer of OpenAI , participated in the latest round of investment in database developer  Supabase Inc.  , and the company's valuation reached US$2 billion. Supabase can be understood as a  cloud-based plus version of PostgreSQL  .


On June 21, 2024, OpenAI acquired  Rockset , which is known for providing real-time analysis and query capabilities. It is a technical route that combines structured database capabilities with vector retrieval.

Structurally, SupaBase  and  Rockset  can form  a combination of transactional database + analytical database  . AI can access, analyze, and even operate the database through specific tools, interfaces, and protocols, truly entering the enterprise workflow.



2. The core of DataBase for AI is “real-time”


In the era of hot big data platforms, the technical product route integrating  OLTP  and  OLAP  already exists. Why does OpenAI continue to invest in new database products? The key is "real-time".

The non-real-time nature of traditional big data platforms has been criticized, which has also spawned a class of data products with "  ETL- free " as the core selling point. When AI is applied in enterprises, especially when performing reasoning tasks, data real-time nature is a key indicator. For example, in the fields of online marketing and financial risk control, where AI has been well implemented, if it takes a day or several hours for AI to process data, customers will have already been lost and risk control risks will have exploded.

Based on  the features that Supabase  and  Rockset  are praised for, DB for AI  should have the following features:

  • Core: Real-time response data, data consistency
  • Low-code tools, no need to write  SQL , convenient for designing interfaces for AI access
  • Cloud native, high concurrency
  • Distributed, even deployed at the edge, so that AI can process data in real time
  •  Operator embedding database design similar to  DuckDB improves data analysis and processing capabilities

3. Is there a domestic large-scale model enterprise investment database?


As far as I know, there are no large-scale model companies that directly invest in database software. However, the domestic large-scale models are dominated by super cloud vendors, which have their own characteristics:

  • Alibaba: Tongyi +  PolarDB  and other Alibaba Cloud database products
  • Baidu: Wenxin +  DorisDB  , etc.
  • Huawei: Gaussian DB , etc.
  • DeepSeek : Although it is directly invested in the database, it is open source  3FS , directly at the bottom of the file system, integrating  DuckDB  's embedded engine
  •  Cooperation between Zhipu and  Milvus

4. Technical route of DB for AI


  • HTAP , such as  MySQL  's  Heatwave
  • Plus versions of open source databases, such as  Pigsty,  one of the community distributions of  PostgreSQL