RAG Frontier Progress: ARM Implementation Ideas for Multi-Abstract Level Chunks and Alignment Mechanisms

Written by

Audrey Miles

Updated on:July-16th-2025

Let's continue to look at RAG, and take a look at the RAG solution that introduces multi-level granularity chunks , mainly to deal with the problem of chunk segmentation; and the ARM solution that introduces an alignment mechanism to address the problem of decomposition. You will gain a lot from reading it .

Specialization and systematization will lead to more in-depth thinking. Let's work together.

1. Introducing the RAG solution with multiple abstraction levels and chunk granularity

Existing RAG methods usually rely on fixed-length chunks to support question answering, which may lead to fragmentation and incompleteness of information; retrieving too much information may lead to the "lost in the middle" problem and exceed the token limit. Therefore, the commonly used approach is chunk , that is, block optimization strategies such as fixed size, recursion, sliding window, etc.

Of course, there is also a new approach, which is to play with the levels, so you can take a look at the recent work " Multiple Abstraction Level Retrieve Augment Generation " (https://arxiv.org/pdf/2501.16952).

The core idea is as follows:

In terms of indexing , first, the reference documents are preprocessed into blocks at four abstract levels, including document-level blocks, section-level blocks, paragraph-level blocks, and multi-sentence-level blocks .

For document-level and section-level blocks, the map-reduce method is used to generate summary information to reduce length and focus. Document-level blocks directly use the original content extracted from the document. Section-level blocks first generate summaries for each paragraph, then aggregate these paragraph summaries into section-level summaries, and finally aggregate these section-level summaries into document summaries. These summary information is used to generate the content of section-level blocks.

In terms of retrieval , the Linq-Embed-Mistral embedding model is used to generate embedding vectors for questions and text blocks, and the similarity is calculated through cosine similarity. The similarity scores are converted into probabilities through the softmax equation, and text blocks whose cumulative probability does not exceed the preset threshold are selected.

On the generation side , the retrieved text chunks are fed into the Vicuna-13B-v1.3 model together with the input question to generate the final answer.

In terms of effectiveness, it performed well on the question-answering dataset in the field of sugar science, and the answer accuracy was improved by 25.739% compared with the traditional single-level RAG method.

2. ARM search scheme with alignment mechanism

Also in recent work, the existing large model (LLM)-based question answering system (RAG) lacks understanding of the available data and the organizational structure of the data when decomposing problems, resulting in poor retrieval performance; although the iterative RAG method interacts with the dataset, each step depends on the result of the previous step, which is inefficient and prone to reasoning deviation.

So, we can look at an idea, which is also a recent work "Can we Retrieve Everything All at Once? ARM: An Alignment-Oriented LLM-based Retrieval Method" (https://arxiv.org/pdf/2501.18539). This work proposes an LLM-based alignment-oriented retrieval method ARM to solve the retrieval problem of complex open domain problems.

Specifically, it includes the following steps:

First, tables and paragraphs (as well as data objects of other modalities, such as images) are uniformly regarded as text data objects . Each serialized data object is divided into chunks, the embedding of each chunk is calculated, and the N-gram set is used for representation and indexing.

Secondly, guide LLM to generate a reasoning process with multiple intermediate steps . The first step is to independently extract keywords from user questions to determine the key information needed to answer the question. Then, through constraint decoding, these keywords are restated using N-grams in the data object. Constraint decoding starts with decoding a left bracket and ends with decoding a right bracket, indicating that the alignment of a keyword is completed.

Then, information alignment is performed , and the result of each N-gram decoding is used as a query to search for relevant text blocks in the dataset using the BM25 algorithm. The embedding similarity between the user question and each serialized object is then calculated and combined with the BM25 score to form the final ranking score. Finally, the most relevant objects are selected based on the ranking score to form a basic search object set, which serves as the basis for LLM to continue generating the "inference process" .

Next, structural alignment is performed to reason about a complete set of search objects and their organization in order to match the required information and fully answer the question. The structural alignment problem is formulated as a mixed integer programming (MIP) problem, where the goal is to select k objects from a given list of search objects to maximize the relevance between the question and the selected objects and the compatibility between the selected objects. When specifically executed here,

Finally, each draft generated by the MIP solver is serialized into a string and injected into the decoding process of the LLM through constraint decoding . Each draft contains the selected objects and their connection relationships, and then the LLM performs self-verification during the decoding process to check whether the selected objects cover different aspects of the problem and are correctly connected. Use constraint decoding to ensure factuality. For each draft, beam search is used to generate multiple inference processes, and the weight and number of votes for each object are weighed by model confidence (such as the average of logits) . Ultimately, the confidence of an object is the weighted sum of its weighted votes and the normalized value of the votes. Finally, the final set of data objects is selected as the answer based on the confidence . Objects with high confidence are more likely to be selected, thereby ensuring the quality and accuracy of the final answer.

Let's take a specific example:

However, as also described in this work, while it performs well in most cases , in some extreme cases the model may forget the information generated in previous iterations or get stuck in a loop of searching for similar keywords even if relevant objects have been retrieved.

Summarize

This paper mainly introduces two works on RAG, one is the RAG scheme that introduces multi-level chunk granularity, and the other is the alignment-guided retrieval method ARM based on LLM, which is used to solve the retrieval problem of complex open domain problems. The core of both is the process design.