Dify workflow → Knowledge retrieval | Question classification

Written by
Jasper Cole
Updated on:June-29th-2025
Recommendation

Dify workflow: intelligent knowledge retrieval and accurate question classification, helping efficient AI Q&A.

Core content:
1. Dify knowledge retrieval function, to retrieve relevant text content from the knowledge base
2. Question classifier automatically matches classification labels based on user input
3. Configuration guidance and sample workflow, making AI Q&A system smarter and more accurate

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)


Functional Overview

  • Retrieve text content related to the user's question from the knowledge base as context for downstream LLM nodes
  • Application scenario: Building an AI question-answering system (RAG) based on external data/knowledge

Basic application process

User question → Knowledge base search → Recall relevant text → LLM generates answer

Typical example: Knowledge base question and answer application

Configuration Guide

1. Query variable configuration

  • Select a query variable that represents the user's question (usually sys.query
  • Knowledge base maximum query content limit: 200 characters

2. Knowledge base selection

  • The target knowledge base needs to be created in the Dify knowledge base in advance

3. Recall Mode

  • Starting September 1, the system will be forced to switch to the "multi-channel recall" mode.
  • It is no longer recommended to use the N-choose-1 recall mode

4. Downstream node connection

  • Typically connected to an LLM node
  • You need to associate the knowledge retrieval output in the context variable of the LLM node

Output variable description

{
  "result" : {
    "content""The retrieved text segment" ,
    "title""Section title" ,
    "link""Original link" ,
    "icon""Logo Icon" ,
    "metadata""Additional metadata"
  }
}

Downstream node configuration specifications

  1. Context variable association
  • Will Result Variables are bound to the context variables of the LLM node
  • Insert context variable placeholders in prompt words
  1. Operation Logic
  • When there are search results: context variable values ​​are automatically filled in, and LLM answers based on the knowledge base content
  • When there are no search results: the context variable is empty and LLM answers the question directly
  1. Extended function support
  • Support application-side reference tracing function
  • Can display source information (title/link, etc.) of text segments

Tip: This configuration scheme supports both knowledge enhancement and original knowledge attribution display. It is recommended to design a reasonable knowledge reference format in the prompt words.


Dify → Problem Classification |

By defining the classification description, the question classifier can use LLM to infer the matching classification based on the user input and output the classification result, providing more accurate information to the downstream nodes.

Common usage scenarios include :

  • Customer Service Conversation Intent Classification
  • Product review categories
  • Batch mail classification

In a typical product customer service question-and-answer scenario, the question classifier can serve as a prerequisite for knowledge base retrieval, classifying the user's input question intent and, after classification, directing different downstream knowledge bases to query related content in order to accurately answer the user's questions.

Sample workflow template The following figure shows a sample workflow template for a product customer service scenario:

  • Category 1 : After-sales related issues
  • Category 2 : Issues related to product operation and use
  • Category 3 : Other issues

Application Example When users enter different questions, the classifier will automatically complete the classification according to the set classification labels/descriptions:

  • “How to set up address book contacts on iPhone 14?” →  “Questions related to product operation and use”
  • “How long is the warranty period?” →  “Questions related to after-sales service”
  • “What’s the weather like today?” →  “Other questions”

How to configure

  1. Select input variables : refers to the input content used for classification, and supports input file variables. In customer service Q&A scenarios, it is usually questions entered by users. sys.query.
  2. Select an inference model : Based on the natural language classification and inference capabilities of the large language model, select an appropriate model to improve the classification effect.
  3. Write category labels/descriptions : Manually add multiple categories and write keywords or descriptions to help the large language model understand the classification basis.
  4. Select downstream nodes : Select the subsequent process path based on the relationship between the classification results and the downstream nodes.

Advanced Settings

  • Instructions : Add additional instructions (such as richer classification criteria) in advanced settings to enhance classification capabilities.
  • Memory : When enabled, input will include chat history to improve question understanding during conversational interactions.
  • Image analysis : only applicable to LLMs with image recognition capabilities, allowing the input of image variables.
  • Memory Window : When off, the system dynamically filters the chat history based on the model context window; when on, the number of transfers (logarithmic) can be precisely controlled.

Output variables

  • class_name: Stores the prediction results of the classification model. After classification is completed, this variable contains the specific category label and can be referenced in subsequent processing nodes to execute the corresponding logic.