Deep integration of CAMEL-AI and OceanBase vector database

Written by
Silas Grey
Updated on:June-27th-2025
Recommendation

CAMEL-AI and OceanBase join forces to create a new intelligent ecosystem based on large language models.

Core content:
1. The flexibility and scalability of the CAMEL AI framework
2. CAMEL's RAG technology and its search enhancement capabilities
3. The technical advantages of OceanBase as a vector database

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

1. Introduction to CAMEL AI

CAMEL (Communicative Agents for Mind Exploration of Large Language Model Society) is the earliest multi-agent framework based on large language models (LLMs). It has now developed into a general framework for building and using LLM-based agents to solve real-world tasks. The CAMEL team believes that studying these agents on a large scale can provide valuable insights into understanding their behaviors, capabilities, and potential risks. To facilitate research in this area, CAMEL implements and supports various types of agents, tasks, prompts, models, and simulation environments.

The core advantage of the CAMEL framework lies in its flexibility and scalability, supporting multiple types of agent interaction modes:

  1. Single-agent system : an agent that can independently complete a specific task
  2. Multi-agent collaboration : Collaboration and interaction between multiple agents to jointly solve complex problems
  3. Role-playing : Agents can play specific roles to simulate real-world interaction scenarios

As an open source framework, CAMEL provides a rich set of tools and components that enable researchers and developers to easily build, test, and deploy LLM-based intelligent agent applications.

2. CAMEL's RAG and Graph RAG Capabilities

The CAMEL framework has built-in powerful Retrieval Augmentation Generation (RAG) capabilities, which is a crucial part of current LLM applications. RAG technology allows the model to retrieve and reference external knowledge when generating answers, thereby improving the accuracy and relevance of the answers. CAMEL's RAG implementation mainly includes:

1. Support multiple search methods

CAMEL supports multiple search methods, including:

  • Vector Retrieval : A retrieval method based on semantic similarity
  • BM25 search : a traditional search method based on keyword matching
  • Hybrid Retrieval : Combining the advantages of vector retrieval and keyword retrieval
  • Rerank Retrieval : Rerank the initial search results to improve relevance

2. AutoRetriever

CAMELAutoRetrieverThe component can automatically select the most suitable retrieval method and handle document parsing, segmentation and embedding, greatly simplifying the development process of RAG applications. Users only need to provide queries and content, and the system will automatically complete the rest of the work.

3. Graph RAG Capabilities

CAMEL also implements the graph-based RAG capability, which is an important extension of the traditional RAG:

  • Knowledge graph integration : combining structured knowledge graphs with unstructured text
  • Relational retrieval : not only considers the similarity of content, but also the relationship between entities
  • Automatic knowledge graph construction : Use Agent to automatically extract entity and relationship information and build a knowledge graph

These RAG capabilities enable CAMEL to handle more complex knowledge retrieval and reasoning tasks, providing richer and more accurate information support for intelligent agents.

3. Why choose OceanBase as the vector database for CAMEL-AI?

When implementing an efficient RAG system, the choice of vector database is crucial. The CAMEL framework chooses to support OceanBase as the vector database, mainly based on the following technical advantages:

1. Excellent high-dimensional vector processing capabilities

OceanBase's vector index supports up to 4096-dimensional vectors by default, which covers the needs of most mainstream embedding models on the market. More importantly, this upper limit is configurable and extensible, which means that researchers can safely choose higher-dimensional models to achieve better results without sacrificing model accuracy to perform dimensionality reduction due to database limitations.

In the CAMEL implementation,OceanBaseStorageThe class fully exploits this advantage, allowing users to flexibly configure vector storage according to the output dimensions of their embedding model.

2. Native hybrid search: both precision and efficiency

One of the killer features of OceanBase is that its vector index natively supports hybrid retrieval. In the CAMEL implementation, users can directly perform precise scalar filtering and efficient vector similarity search at the same time when querying:

results = self._client.ann_search(
    table_name=self.table_name,
    vec_data = query.query_vector,
    vec_column_name= "embedding" ,
    distance_func=distance_func,
    with_dist= True ,
    topk=query.top_k,
    output_column_names=[ "id""embedding""metadata" ],
)

The advantages of this hybrid search are obvious:

  • Precision : Define the scope before searching to ensure that you find what you really want and avoid data loss
  • Efficient : Direct processing at the index layer avoids the overhead of secondary filtering at the application layer, resulting in faster query speed
  • Simple : No need to write complex SQL statements, the API interface is concise and clear

3. Intelligent space recycling mechanism

OceanBase uses an LSM-Tree-based architecture at the bottom layer, which is unique in processing data additions, deletions, and space recovery. It has a more complete and automated space recovery mechanism, and is more friendly to data types such as vectors that are large and may be updated frequently.

In the implementation of CAMEL, users hardly need to worry about space reclamation issues. OceanBase will handle space reclamation smoothly and efficiently in the background, reducing the trouble of database expansion and greatly alleviating the burden of operation and maintenance.

4. Innate advantages of distributed architecture

OceanBase is a distributed database by nature, and has inherent advantages in horizontal scalability and high availability under high concurrency and large data volumes. This provides the possibility of future expansion for frameworks such as CAMEL that need to process large amounts of vector data, without having to worry about performance bottlenecks caused by the growth of data scale.

4. Implementation and Effect of OceanBase in CAMEL

The CAMEL framework has fully implemented the support for the OceanBase vector database, mainly throughOceanBaseStorageThe class provides comprehensive vector storage and retrieval capabilities.

1. Complete vector operation support

OceanBaseStorageThe class implementsBaseVectorStorageAll methods of the interface provide complete vector operation support:

  • Vector Add : supports batch adding of vector records and automatically processes IDs and metadata
  • Vector deletion : supports deleting vector records by ID, including numeric ID and non-numeric ID
  • Vector query : supports vector query based on similarity and provides rich query options
  • Status query : supports querying the status of the vector database, including vector dimension and quantity
  • Clear operation : supports clearing all records in vector storage

2. Seamless integration with CAMEL search system

OceanBase storage has been seamlessly integrated with CAMEL's retrieval system. Users canVectorRetrieverandAutoRetrieverEasily use OceanBase for document retrieval:

# Use OceanBase as vector storage
storage = OceanBaseStorage(
    vector_dim=embedding_model.get_output_dim(),
    table_name= "camel_documents" ,
    uri= "oceanbase-host:2881" ,
    user = "root@test" ,
    password= "password" ,
    db_name= "test"
)

# Create a retriever
retriever = VectorRetriever(
    embedding_model=embedding_model,
    storage=storage
)

# Processing Documents
retriever.process(content=document_path)

# Query related content
results = retriever.query(query= "My Query" , top_k= 5 )

3. Actual application effect

In practical applications, OceanBase, as a vector database for CAMEL, has demonstrated excellent performance and stability:

  • Query performance : OceanBase has excellent query speed on large-scale vector collections, especially in mixed query scenarios.
  • Storage efficiency : Thanks to OceanBase's storage architecture, vector data storage is more compact and space utilization is higher
  • Easy operation and maintenance : Automated space recovery and management mechanism reduces the burden of operation and maintenance
  • Good scalability : As the amount of data grows, OceanBase can scale smoothly and maintain stable performance

V. Future Prospects and Cooperation Directions

The cooperation between CAMEL-AI and OceanBase has just begun, and there is still broad room for development in the future:

1. Multi-modal RAG support

Expand the application of OceanBase in CAMEL to support storage and retrieval of multimodal data:

  • Image Vector Storage : Store and retrieve feature vectors of images
  • Audio vector storage : supports vectorization and retrieval of audio content
  • Cross-modal retrieval : realize cross-modal retrieval capabilities such as text to image and image to text

2. Expansion of enterprise-level application scenarios

For enterprise-level application scenarios, the integration of CAMEL and OceanBase is further optimized:

  • Private deployment solution : Provide a complete private deployment solution to meet data security requirements
  • Industry-specific solutions : Develop specialized solutions for specific industries such as finance, healthcare, and law
  • Large-scale multi-agent system : Building a large-scale multi-agent knowledge sharing system based on OceanBase

3. Performance optimization and expansion

Continue to optimize the integration performance of CAMEL and OceanBase:

  • Query optimization : optimize specific query patterns to improve retrieval efficiency
  • Batch processing optimization : Optimize the processing flow of large batches of data
  • New feature support : timely support for new features and functions of OceanBase

Summarize

As a powerful multi-agent framework, CAMEL provides solid technical support for its RAG and Graph RAG capabilities by integrating the OceanBase vector database. With its high-dimensional vector processing capabilities, native hybrid retrieval, intelligent space recycling, and distributed architecture, OceanBase has become an ideal choice for CAMEL to achieve efficient knowledge retrieval.

Currently, CAMEL has achieved full support for OceanBase and has demonstrated excellent performance and stability in actual applications. In the future, the cooperation between the two parties will be further deepened, and more extensive explorations will be carried out in Graph RAG, multimodal RAG, enterprise-level applications, etc., to provide more powerful technical support for AI applications.