Dify v1.1.0 released: Use metadata to "label" knowledge bases, double RAG search efficiency

The new version of Dify v1.1.0 brings a revolutionary knowledge retrieval experience, and the metadata function doubles the efficiency.
Core content:
1. Metadata filtering improves knowledge base retrieval efficiency
2. The application value of metadata in RAG scenarios
3. The operation process and configuration method of metadata filtering
Today, we are pleased to announce the release of Dify v1.1.0, and the introduction of a new feature that uses "metadata" as a knowledge filter. By leveraging custom metadata attributes, metadata filtering can improve the efficiency and accuracy of retrieval of relevant data in the knowledge base. In the past, users could only search in large data sets, and could not filter or control access according to specific needs, making it difficult to quickly lock in the most relevant information. After the introduction of metadata, it is equivalent to labeling and categorizing the data, which greatly improves the efficiency and accuracy of retrieval. For users who need to manage massive amounts of information in RAG (retrieval enhanced generation) scenarios, metadata is even more important because it can help manage and access information more effectively.
What is metadata filtering?
Metadata is essentially "data about data". It provides additional context or attribute tags for the primary data, making search and retrieval more precise. For example, in a document management system, metadata may include document name, author, creation date, etc. With this structured information, the system is able to filter based on specific conditions, thereby retrieving relevant content more accurately.
Metadata filtering: making RAG applications more powerful
Metadata filtering can significantly improve the accuracy of RAG application searches, helping users quickly locate the documents they need and reduce irrelevant results. It strengthens data security through "access control" to ensure that only users with appropriate permissions can view sensitive information. In addition, metadata filtering can optimize search performance by accurately limiting the query scope, improving efficiency and saving computing resources. In the enterprise, this customization function is particularly useful, not only can it improve the user experience at once, but also make it easier to find the desired content in a large number of documents, and it is more intuitive to operate.
The following diagram shows the comparison between different access controls and explains how metadata filtering can achieve more fine-grained access management. Three filtering conditions are used in the example: privacylevel, uploader, and update_date. By adjusting the privacylevel, users' access rights to the RAG 2.0 roadmap can be controlled, allowing administrators to accurately determine which users can retrieve or view certain information, improving data access efficiency while ensuring security.
In short, metadata is like an intelligent knowledge filter that enables smarter, safer and more efficient information retrieval by adding contextual attributes and access control to data. Especially in RAG (retrieval-augmented generation) systems, it is necessary to balance the privacy and relevance of knowledge, and the importance of metadata is self-evident.
How to use metadata filtering to make knowledge retrieval more accurate?
Step 1: Add metadata to documents in the knowledge base
Users can add and manage metadata for documents in the knowledge base. Each document is automatically assigned some default metadata when it is created (such as file name, uploader, upload date, etc.). Users can also manually add new metadata fields, set field names and data types, and batch edit or modify existing documents. By tagging documents in this way, more structured information can be attached to the documents, making subsequent search and management more efficient.
Step 2: Configure metadata filtering in the application
Users can find the configuration entry for metadata filtering in the "Context" section of Chatbot, or in the knowledge retrieval node in Chatflow and Workflow, so as to accurately filter and retrieve information based on metadata attributes. Users can choose between automatic and manual filtering modes. In automatic mode, the system automatically extracts and generates filtering conditions based on the user's query; when manually configuring, users can set filtering conditions based on metadata field types (such as strings, numbers, or time), and set the relationship between multiple conditions to AND or OR.
Three major metadata types and application scenarios
We currently support three types of metadata: string, value, and time, which can be used flexibly according to actual scenarios. Here are some examples:
String metadata – improve contextual relevance String metadata can be used to filter out a large amount of information that is not relevant to the query, thereby returning results more accurately. For example, when a user searches for "project report", if the document carries metadata tags such as "marketing department" or "R&D department", documents related to these tags can be presented first in the search.
Value metadata – Enforce access control Numerical metadata can be used to restrict access to documents based on pre-defined criteria. For example, users can only retrieve documents with a privacy level above a certain threshold, ensuring secure and compliant data access.
Temporal metadata – managing document versions Time metadata can distinguish between old and new versions of a document. When content is updated and re-uploaded, the latest version can be retrieved first by filtering by time. If the uploader is set to the same user, it is also convenient to compare and test different versions uploaded in multiple batches, while ensuring consistency in document processing.