Understanding Large Language Model Knowledge Enhancement in One Article: Knowledge Injection (Prompt + Finetune + RAG)

Written by

Jasper Cole

Updated on:June-15th-2025

Although general large language models (such as DeepSeek and Qwen) have extensive knowledge coverage and basic reasoning capabilities, they still have the following limitations:

(1) Knowledge gaps: It is difficult to cover fine-grained, dynamically updated facts (such as treatment options for rare diseases and the latest guidelines);

(2) Weak logic: insufficient performance in complex reasoning chains, counterintuitive logic, or ethical judgments;

(3) Field bias: In professional fields such as medicine and finance, vertical models are needed to meet high-precision requirements.

Through knowledge injection of large language models - data layer injection (Prompt), model layer injection (Finetune), and reasoning layer injection (RAG), the performance of the model in specific scenarios can be significantly improved.

‍Limited time 50% discount (system learning large language model knowledge enhancement)

1. Data layer injection (Prompt)

Data layer injection - knowledge "bibimbap method"

By “mixing” domain knowledge or task instructions into the input prompts, the model can absorb new knowledge without modifying the structure, guiding the model to generate accurate responses in a minimalist way.

(1) Core objectives

Using data as a carrier, the model can 'eat' knowledge during training or inference, which is similar to mixing seasoning into rice (data).

(2) Implementation ideas

Data layer injection (knowledge "bibimbap method") is what we often call the prompt word engineering , which enables the model to absorb new knowledge without structural modification.

Just like when you ask a friend for help, you need to clearly state “what to do, how to do it, and what results you want” , prompt word engineering is to teach the large language model (LLM) how to understand your needs, which is the “task instruction manual” you give to the LLM.

(3) Prompt Engineering

Everyone is familiar with prompt engineering, which is used every day. By designing input prompts, the model is guided to use external knowledge to answer questions.

It is like "using conversation skills to improve efficiency" - in daily work, whether it is adding restrictions when asking AI to search for information, or adjusting the way of asking questions when writing reports, it is essentially guiding the output through "optimizing input" (data layer injection).

For example, design a prompt template that contains domain knowledge (such as "According to Article XX of the Civil Code, the contract terms should ______").

2. Model layer injection (Finetune)

Model layer injection - knowledge "hardware upgrade"

By directly modifying the underlying knowledge base or parameter structure of the model, the model can evolve from "factory settings" to "domain experts", achieving more efficient and accurate knowledge calls.

(1) Core objectives

Modifying the model parameters or structure and solidifying the knowledge into the neural network is equivalent to performing a "hardware upgrade" on the model.

(2) Implementation ideas

Model-layer injection (knowledge "hardware upgrade") is what we often call model fine-tuning. It is essentially a "customized upgrade" for the pre-trained model - by further training on data in specific fields, the model is transformed from a "generalist" to an "expert" .

(3) PEFT (Parameter Efficient Fine-tuning)

A commonly used method for model fine-tuning is PE FT (parameter efficient fine-tuning), which achieves efficient adaptation of the model to specific tasks at a very low cost by optimizing only some parameters of the model (such as low-rank matrices, adapters) instead of all parameters.

Method 1: LoRA (Low-Rank Adaptation)

Low-rank matrices are introduced into the weight matrix of the pre-trained model (reducing parameters by more than 90%), and fine-tuning is achieved by optimizing these low-rank matrices without making major modifications to the entire model.

Method 2: QLoRA (Quantized Low-Rank Adaptation)

Combining LoRA with quantization technology, the pre-trained model is quantized to low precision (such as 4 bits) while maintaining minimal loss in model accuracy.

3. Reasoning Layer Injection (RAG)

Reasoning layer injection - knowledge "real-time plug-in"

By dynamically searching external knowledge bases and splicing the latest information into input prompts in real time, the large language model has evolved from "memorizing answers by rote" to "writing while looking up information." The output content is both professional and accurate, and natural and fluent, completely saying goodbye to "serious nonsense" (reading and replying randomly).

(1) Core objectives

When the model generates answers, it dynamically searches the external knowledge base and splices the search results into the input prompts in real time, which is equivalent to installing a "real-time plug-in" for the model.

(2) Implementation ideas

Reasoning layer injection (knowledge "real-time plug-in") is what we often refer to as RAG (retrieval-augmented generation), which vectorizes user questions → searches the knowledge base → returns relevant fragments, and then inputs "questions + retrieval results" into the large language model to generate answers.

(3) RAG (Retrieval Enhanced Generation)

RAG (Retrieval-Augmented Generation) combines information retrieval and generation techniques to enable large language models to retrieve relevant information from external knowledge bases in real time and splice this information into input prompts to generate more accurate and useful answers or texts.

Retrieval: Accurately capture information fragments that are highly relevant to the problem from the external knowledge base to provide real-time knowledge basis for generation.
Augmented: splice the retrieved information into the input prompt to inject external knowledge into the generative model and enhance the professionalism and accuracy of the answer.
Generation: Combine the retrieved information with the original question and generate a coherent, natural, and accurate answer or text.