Large Model RAG Practice | Generate Answers with Citation Sources

Written by

Clara Bennett

Updated on:June-28th-2025

In enterprise-level application scenarios, we usually use Retrieval-Augmented Generation (RAG) technology, which requires the large language model (LLM) to generate content strictly based on information retrieved from the knowledge base, and not to fabricate any content.

At this point, accurately marking the source in the LLM-generated content and listing all cited sources afterwards will greatly improve the credibility and transparency of the answer.

This article will introduce a "hybrid solution".

1. Core Idea

When the system generates content through LLM, it will mark the source (inline citations) synchronously. However, the generated content does not include a list of citation sources, but is dynamically generated and displayed by the system. This solution is more flexible and makes the management of citation sources more controllable.

Hierarchical structure management : First, the retrieved content and metadata (such as title, author, etc.) are stored in a structured manner to ensure the accuracy and ease of reference.

Inline citation generation : Through prompt words, let LLM generate answers with inline citations, such as [1], [2], etc., pointing to the data source.

Dynamically manage reference lists : Reference source lists are not generated directly by LLM, but are dynamically generated based on reference tags in the answer.

This way, the list of citation sources can be flexibly displayed in a variety of ways, such as a collapsible list, a tooltip, or a direct link to the original document.

2. Implementation steps

We can generate answers with citation sources by following the four steps below: structured search content, prompt word engineering, dynamic management of citation lists, and user interface integration.

1. Structured search content

Use a retrieval system (such as vector search or keyword search) to obtain document fragments (Chunks) related to the user's question . Make sure each fragment carries the metadata of the original document.

Save the retrieved content in a structured format. Each piece of content should contain the following fields:

ID: unique identifier, such as 1, 2, 3
Content: Precise snippet relevant to the user’s question
Metadata: including the original document’s title, author, date, URL, etc.

Example JSON data structure:

[ { "id": 1, "content": "The sun generates energy through nuclear fusion in its core.", "metadata": { "title": "The principle of nuclear fusion in the sun", "author": "NASA", "url": "https://www.nasa.gov/sun" } }, { "id": 2, "content": "The sun is mainly composed of hydrogen and helium.", "metadata": { "title": "Composition of the sun", "author": "Wikipedia", "url": "https://zh.wikipedia.org/wiki/太阳" } }]

If multiple snippets come from the same source, it is recommended to merge them. This reduces duplicate citations and provides more complete context.

2. Prompt word engineering

Provide clear instructions (prompt) for LLM to generate answers with inline citations and not generate a list of citation sources.

Example Prompt:

You are an intelligent assistant. Please answer the user's question based on the information retrieved below. Add inline citations to your answer in the format [n], where n is the number of the source, for example: [1], [2], or [1][2], [1][3][4]. Please do not generate a list of cited sources. Retrieved information: 1. "The sun generates energy through nuclear fusion in its core." (Source: "The principle of nuclear fusion in the sun", NASA, https://www.nasa.gov/sun) 2. "The sun is mainly composed of hydrogen and helium." (Source: "The composition of the sun", Wikipedia, https://zh.wikipedia.org/wiki/太阳) User question: What is the sun made of? How does it generate energy? Answer:

Please note that this is just an example. Since different LLMs have different understanding and generation capabilities, please further debug and optimize the prompt words based on the LLM you are using so that it can stably output the expected effect.

3. Dynamically manage reference lists

In the first step, we have stored the retrieved content and metadata as structured data to facilitate the subsequent dynamic generation of reference lists.

When LLM generates an answer, it does not use all the retrieved information, but only the information relevant to the user's question. Therefore, we need to parse the inline reference tags (such as [1], [2]) in the LLM-generated content and extract the corresponding reference information from the previously stored metadata.

Extract inline quote tags

citations = re.findall(r"\[(\d+)\]", answer)

Dynamically generating reference lists

citation_list = []for citation in sorted(set(citations)): source = metadata.get(citation, {}) if source: citation_list.append(f"{citation}. [{source['title']}]({source['url']}) - {source['author']}")

Output reference list

print("Citation source:")print("\n".join(citation_list))

Sample output:

Sources: 1. [The principle of nuclear fusion in the sun](https://www.nasa.gov/sun) - NASA 2. [The composition of the sun](https://zh.wikipedia.org/wiki/太阳) - Wikipedia

4. User Interface Integration

The answers generated by LLM are displayed to the user verbatim through streaming technology, with inline citations included in the output text, such as [1], [2].

After the answer is output, the reference list is dynamically extracted and generated based on the reference tags in the output content, and can be displayed in the following ways:

Expand list: Display the full list of citations below your answer
Tooltip: Displays source information when users hover over or click on an inline citation
Direct link: Inline reference jumps directly to the corresponding source page

UI Example:

Answer: The sun is mainly composed of hydrogen and helium[2]. It generates energy through nuclear fusion in its core[1]. [Expand citation source]Citation source: 1. [Principle of nuclear fusion in the sun](https://www.nasa.gov/sun) - NASA 2. [Composition of the sun](https://zh.wikipedia.org/wiki/太阳) - Wikipedia

3. Continuous Optimization

Through the above scheme, we can automatically generate answers with cited sources through structured processing of search data, prompt word engineering and LLM's understanding and generation capabilities.

Combined with the application scenario, we can further make the following optimizations:

Ensure citation accuracy: Verify the consistency of inline citation markers and reference lists to avoid omissions or errors.
Managing long answers: For answers with many citations, important references will be displayed first, and the rest can be collapsed or listed separately.
Improve user experience: Provide multiple ways to display citations, such as tooltips or links, based on user needs.

Using this solution, we can efficiently generate answers with reference sources, which not only ensures the transparency of the generation process, but also provides a flexible user interaction method. This implementation method is simple and easy to use and can be widely used in scenarios such as question-answering systems and knowledge base assistants.