Dify workflow → Knowledge retrieval | Question classification

Written by

Jasper Cole

Updated on:June-29th-2025

Functional Overview

Retrieve text content related to the user's question from the knowledge base as context for downstream LLM nodes
Application scenario: Building an AI question-answering system (RAG) based on external data/knowledge

Basic application process

User question → Knowledge base search → Recall relevant text → LLM generates answer

Typical example: Knowledge base question and answer application

Configuration Guide

1. Query variable configuration

Select a query variable that represents the user's question (usually sys.query）
Knowledge base maximum query content limit: 200 characters

2. Knowledge base selection

The target knowledge base needs to be created in the Dify knowledge base in advance

3. Recall Mode

Starting September 1, the system will be forced to switch to the "multi-channel recall" mode.
It is no longer recommended to use the N-choose-1 recall mode

4. Downstream node connection

Typically connected to an LLM node
You need to associate the knowledge retrieval output in the context variable of the LLM node

Output variable description

{
  "result" : {
    "content" :  "The retrieved text segment" ,
    "title" :  "Section title" ,
    "link" :  "Original link" ,
    "icon" :  "Logo Icon" ,
    "metadata" :  "Additional metadata"
  }
}

Downstream node configuration specifications

Context variable association

Will Result Variables are bound to the context variables of the LLM node
Insert context variable placeholders in prompt words

Operation Logic

When there are search results: context variable values are automatically filled in, and LLM answers based on the knowledge base content
When there are no search results: the context variable is empty and LLM answers the question directly

Extended function support

Support application-side reference tracing function
Can display source information (title/link, etc.) of text segments

Tip: This configuration scheme supports both knowledge enhancement and original knowledge attribution display. It is recommended to design a reasonable knowledge reference format in the prompt words.

Dify → Problem Classification |

By defining the classification description, the question classifier can use LLM to infer the matching classification based on the user input and output the classification result, providing more accurate information to the downstream nodes.

Common usage scenarios include :

Customer Service Conversation Intent Classification
Product review categories
Batch mail classification

In a typical product customer service question-and-answer scenario, the question classifier can serve as a prerequisite for knowledge base retrieval, classifying the user's input question intent and, after classification, directing different downstream knowledge bases to query related content in order to accurately answer the user's questions.

Sample workflow template The following figure shows a sample workflow template for a product customer service scenario:

Category 1 : After-sales related issues
Category 2 : Issues related to product operation and use
Category 3 : Other issues

Application Example When users enter different questions, the classifier will automatically complete the classification according to the set classification labels/descriptions:

“How to set up address book contacts on iPhone 14?” → “Questions related to product operation and use”
“How long is the warranty period?” → “Questions related to after-sales service”
“What’s the weather like today?” → “Other questions”

How to configure

Select input variables : refers to the input content used for classification, and supports input file variables. In customer service Q&A scenarios, it is usually questions entered by users. sys.query.
Select an inference model : Based on the natural language classification and inference capabilities of the large language model, select an appropriate model to improve the classification effect.
Write category labels/descriptions : Manually add multiple categories and write keywords or descriptions to help the large language model understand the classification basis.
Select downstream nodes : Select the subsequent process path based on the relationship between the classification results and the downstream nodes.

Advanced Settings

Instructions : Add additional instructions (such as richer classification criteria) in advanced settings to enhance classification capabilities.
Memory : When enabled, input will include chat history to improve question understanding during conversational interactions.
Image analysis : only applicable to LLMs with image recognition capabilities, allowing the input of image variables.
Memory Window : When off, the system dynamically filters the chat history based on the model context window; when on, the number of transfers (logarithmic) can be precisely controlled.

Output variables

class_name: Stores the prediction results of the classification model. After classification is completed, this variable contains the specific category label and can be referenced in subsequent processing nodes to execute the corresponding logic.