How to implement image question answering with RAGFlow: principle analysis + detailed steps (with source code)

Written by
Caleb Hayes
Updated on:July-03rd-2025
Recommendation

RAGFlow is deployed in an independent container with an image server to implement local image access.

Core content:
1. Gradio local image rendering mechanism
2. RAGFlow framework features and deployment methods
3. Detailed steps and source code analysis to implement image question and answer

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

How to display the original document image in RAGFlow or dify?

This is a question I have been asked recently. The previous article showed you how to render local image paths in the Gradio lightweight framework. RAG maintenance case sharing: How to implement "text + picture" answer presentation

Displaying relevant images in the original document in the answer to provide richer information presentation does have great practical significance in many demand scenarios.

This article uses RAGFlow as an example to explain how to implement the local image access function by deploying RAGFlow in an independent container with an image server .

The implementation method in Dify is similar, you can refer to it for testing.

Videos and articles are better together

Below, enjoy:

1

   

Two mechanisms of Gradio local image rendering

1.1

   

Static cultural service mounting

In the code, the FastAPI application mounts the static file directory:

app.mount("/static", StaticFiles(directory="static"), name="static")

This line of code exposes the local "static" directory as the "/static" path of the HTTP service, making all files (including images) in the static directory accessible through HTTP URLs.

Note: The source code of the previous issue is in the Knowledge Planet

1.2

   

Markdown rendering function

The chat component enables Markdown rendering:

chatbot = gr.Chatbot(label="Chatbot", height=750, avatar_images=("images/user.jpeg", "images/tongyi.png"), render_markdown=True)

render_markdown=True This parameter enables the Chatbot component to render Markdown-formatted text (including image links) as HTML.

1.3

   

Workflow Description

When the RAG system generates answers, if the answer contains a Markdown link to an image in a static directory (such as !image description (  /static/images/example.jpg  )), Gradio will automatically render it as an HTML image element, and then the browser will fetch and display the image through an HTTP request.

1.4

   

Summary of key points

File system to HTTP conversion:

Gradio/FastAPI converts local file system path to HTTP URL

Markdown rendering:

The Chatbot component can parse and render image references in Markdown

Browser request mechanism:

The browser initiates an HTTP request to obtain the image based on the image URL in the rendered HTML.

In terms of use cases, Gradio is mainly designed for prototype development and demonstration, pursuing a rapid development experience. The demo released earlier is also intended to help everyone understand the basic implementation principles first.

Because RAGFlow lacks a mechanism to automatically convert local paths to HTTP paths and provide static file services, we need to find another way.

2

   

Features of frameworks like RAGFlow

Before discussing several options for implementing local image rendering with RAGFlow, let's review several features of frameworks like RAGFlow so that we can better prescribe the right remedy:

2.1

   

Docker container isolation design

Frameworks such as Dify and RAGFlow are usually designed to run in a containerized environment, which limits direct access to the host file system for security reasons. Docker containers are isolated from the host file system, which is also one of the core security features of containerization technology.

2.2

   

Enterprise-level architecture considerations

These frameworks are designed with more complex production environment requirements in mind:

Front-end and back-end separation architecture : stricter front-end and back-end separation, suitable for distributed deployment

Specialized storage solutions: Expect to use specialized object storage services instead of local file systems

Multi-node deployment support: In a distributed system, direct access to the local file system can lead to data consistency issues

2.3

   

Security considerations

Automatically exposing the local file system as an HTTP service may pose security risks:

Directory traversal attacks: If not configured properly, sensitive files may be accessed

File upload security : Strict control over what types of files can be uploaded and accessed

Permission control: Fine-grained file access permission control is required in enterprise environments

Personally, I guess these platforms want users to use specialized object storage services (such as S3, OSS, etc.) to handle static resources in production environments, rather than relying on simple local file service mechanisms. However, these are indeed more reasonable architectural choices in large-scale deployment environments.

3

   

Comparison of four implementation solutions

In general, there are four theoretically feasible solutions, of which the first two are unreliable in actual tests, the last two are feasible, but the last one has the best cost-effectiveness.

3.1

   

Cloud Storage Solutions

  • Upload the image to a cloud storage service (such as Alibaba Cloud OSS)

  • Replace local image paths with cloud storage URLs during document preprocessing

  • RAGFlow only processes text containing cloud storage URLs

This solution is also the one I tested first. Its advantage is that it is short, flat and fast, but its limitation is also obvious. It requires pre-processing the original document image into a URL accessible to everyone, which is obviously unrealistic in a production environment.

In addition, the process of configuring public domain access using Alibaba Cloud OSS is also very cumbersome, and I believe it will discourage most people. Anyway, just know this method, there is no need to try it.

3.2

   

Base64 embedding

Embedding the image into the document in base64 format for vectorization is also a seemingly more direct approach. After all, it can circumvent the problem of image address access method. However, this approach has several obvious flaws that cannot be overcome:

Context length limit

This is the most serious problem. Base64 encoding will significantly increase the data size. For example, a medium-quality image may take up thousands or even tens of thousands of tokens, which can easily exhaust the context window of the model. In my tests, I often get an error at the embedding step and cannot proceed.

Vectorization efficiency is reduced

Text embedding models are not designed to process large blocks of base64 encoding. These seemingly random character sequences have no practical meaning for semantic vectorization and may lead to problems such as decreased retrieval accuracy and pollution of the vector space by meaningless data.

Storage and computational overhead

Base64 encoding significantly increases storage requirements and processing burdens. Not only does it slow down retrieval, but the cost of token-based billing models will also increase significantly.

Consider a specific example. A normal 300KB image converted to Base64 will become about 400KB of text, which may occupy more than 5000 tokens. So if the document contains 10 such images, it will exceed the context limit of many models. Anyway, you don't need to try this approach.

3.3

   

Developing framework plugins

  • Study the plugin development framework of Dify/RAGFlow

  • Develop a plugin to implement local file system access and HTTP service

  • Configure the framework to use this plugin to process image URLs

It should be noted that RAGFlow is not an option for plugin development yet, and its Agent Studio does not support custom Python components as of v0.17.2.

The official Dify plugin development website provides detailed documentation, including initializing development tools, calling predefined models or integrating custom models, and encapsulating business code as plugins. This part will not be expanded in this article. If you want to know more, please leave a message. If there are many people, I will consider publishing a special issue later.

 Dify plugin documentation: https://docs.dify.ai/zh-hans/plugins/quick-start/develop-plugins 


Let me digress here. A few days ago, there was a question in Knowledge Planet: How to introduce the dialogue generation and answer component in the RAGFlow agent process into MCP without affecting the original function?

Because the latest version of RAGFlow Agent Studio does not support adding custom Python nodes, I originally planned to create a simple API service as a bridge between MCP and RAGFlow. Later, after a simple test, I found that using Dify as the main framework, creating custom tools in Dify, and calling RAGFlow's API through HTTP might be a better approach. This part of the tutorial is expected to be sent out before mid-April (I dug a hole for myself again).

3.4

   

Independent containerization of image servers (recommended solution)

How it works

  • Create a separate HTTP server dedicated to serving image files

  • Containerize the image server and run it in the same network as the RAGFlow container

  • Replace local image paths with HTTP URLs during document preprocessing