AI large model knowledge question answering system architecture diagram

Written by

Clara Bennett

Updated on:June-13th-2025

Special statement: This article is for the purpose of conveying knowledge rather than making profits. It does not represent agreement with its views or confirmation of its description. The content is for reference only.

This article explains in detail an AI large model knowledge question answering architecture diagram. The main contents are as follows:

1. Overview of the overall process

The AI big model knowledge question answering architecture diagram shows the core process of building and using a knowledge question answering system based on a large language model, which is divided into two major stages: knowledge base construction (offline processing) and knowledge question answering (online use).

2. Knowledge Base Construction Phase

1. Knowledge source preparation

● The starting point is the knowledge raw materials in various document formats (such as WORD, PDF, TXT, etc.).

2. Text Extraction

● Extract plain text content from the source document, remove formatting information, and retain only the core text information (TXT).

3. Text Slicing

● Cut the extracted large text into smaller, semantically relatively complete segments (Chunks). The size of the slices needs to balance information integrity and retrieval efficiency. It can be done by rules such as paragraph, fixed number of characters or period segmentation.

4. Vectorization

● Use a special vectorization model (such as embedding models such as Text-Embedding-ADA-002) to convert each text slice Chunk into a numerical vector of fixed length. The vector represents the semantic information of the text slice. Text vectors with similar meanings are closer in the vector space.

5. Vector Storage

● Store the vectors corresponding to all text slices and the original text content (sometimes including metadata) in an optimized vector database (such as ChromaDB, Faiss, Milvus, Pinecone, etc.), which is the basis for subsequent efficient similarity search.

3. Knowledge Question and Answer Stage

1. User questions

● User enters a natural language question (original question).

2. Problem vectorization

● Use the same vectorization model as when building the knowledge base to convert the user's original question into a numerical vector that represents the semantics of the question.

3. Similarity matching and retrieval

● In the vector database, by comparing the similarity between the user question vector and all text slice vectors in the library (commonly used cosine similarity or dot product), the top K text fragments (topk chunks) that are most similar to the question semantics are retrieved, where K is an adjustable parameter.

4. Combination and enhancement of prompt words

● Combine the original question with the text content of the retrieved topk chunk to form a richer, more contextualized cue. This process may include refining the retrieved text to remove redundant information before combining it with the question, which is called "cue engineering" or "retrieval augmented generation (RAG)".

5. Large Models Generate Answers

● Input the combined and enhanced prompt words into a large language model (such as the GPT series, Claude series, Llama series, etc.). The large model outputs the final natural language answer based on its powerful understanding and generation capabilities and built-in knowledge (learned during training).

6. Result Output

● The answer generated by the large model is returned to the user.

4. Summary of Key Ideas

1. RAG Architecture

● It embodies the core ideas of retrieving relevant knowledge fragments (Retrieval), enhancing prompt words with knowledge (Augment), and letting the large model generate answers (Generate), solving the problems that large models are prone to hallucinations and cannot reference the latest/specific knowledge.

2. Vectorization and Similarity Search

● Vectorization is a key technology for computers to understand the semantics of text. Vector databases enable quick retrieval of semantically relevant content in massive amounts of knowledge.

3. Modular design

● Clearly divide different modules (knowledge base construction, text extraction, vectorization, vector storage, retrieval, prompt engineering, and large models) to facilitate technology selection and iteration. For example, the vectorization model can be upgraded separately or the large model can be replaced.

4. Balance between efficiency and accuracy

● The slice size, number of retrievals, vectorization model selection, prompt word construction method, etc. need to be balanced and optimized between retrieval efficiency, content relevance and final answer quality.