RAG system development 01: Using rig to call ollama's model

Written by

Clara Bennett

Updated on:July-16th-2025

This is a series of articles that will introduce how to develop a RAG system based on the Rust language ecosystem. This article is the first one and mainly introduces how to use rig ^[1] to call the ollama model.

Project Preparation

Setting up your Rust development environment

It is recommended to use RsProxy to set up the Rust development environment. The steps are very simple:

1. Set up the Rustup image and modify the configuration ~/.zshrc or ~/.bashrc

export RUSTUP_DIST_SERVER="https://rsproxy.cn"
export RUSTUP_UPDATE_ROOT="https://rsproxy.cn/rustup"

2. Install Rust (please complete the environment variable import in step 1 and source the rc file or restart the terminal to take effect)

curl --proto '=https' --tlsv1.2 -sSf https://rsproxy.cn/rustup-init.sh | sh

3. Set up the crates.io image and modify the configuration ~/.cargo/config.toml:

[source.crates-io]
replace-with  =  'rsproxy-sparse'
[source.rsproxy]
registry  =  "https://rsproxy.cn/crates.io-index"
[source.rsproxy-sparse]
registry  =  "sparse+https://rsproxy.cn/index/"
[registries.rsproxy]
index  =  "https://rsproxy.cn/crates.io-index"
[net]
git-fetch-with-cli  =  true

Install ollama and download the model

For detailed installation and usage, please refer to my previous article: Running deepseek-r1 locally, LLM installation concise guide

Create a project

We recommend using VSCode as a development tool and installing the rust-analyzer plugin . Execute the following commands in the command line terminal to create a Rust project and add the necessary crates:

cargo new fusion-rag
cd fusion-rag
cargo add rig-core --features derive
cargo add tokio --features full

Now that the project has been built, you can open it through VSCode

code .

Execute the default main.rs File can be run successfully.

Using rig-core

Accessing the Ollama API in openai-compatible mode

edit main.rs File, modify it to the following code:

use  rig::{completion::Prompt, providers};

#[tokio::main]
async fn main ()  -> Result <(),  Box < dyn  core::error::Error>> {
    let client  = providers::openai::Client:: from_url ( "ollama" ,  "http://localhost:11434/v1" );
    let v1  = client
        .agent ( "qwen2.5:latest" )  // .agent ("deepseek-r1:latest")
        // preamble is used to set the `system` part of the conversation, usually set to the prompt of the chat context
        . preamble ( "You are an AI assistant. You are better at logical reasoning and conversations in Chinese and English." )
        .build () ;

    // prompt is used to set the `user` part of the conversation, which is used to provide the content of each conversation
    let response  = v1.prompt ( "Which is bigger , 1.1 or 1.11?" ) .await ?;
    println! ( "Answer: {}" , response);
    Ok (())
}

Running the program will give you the following output:

Answer: In numerical comparison, when 1.1 and 1.11 are compared, it can be seen that 1.11 is greater than 1.1.

The specific mathematical comparison process is as follows:

-First  compare the first digit after the decimal point. In this case both are "1", so this digit is equal.
-Then  we continue to compare the next digit, which is the digit after the second decimal point. For 1.1, there is no digit after this step, so we assume it is 0 (usually padded with zeros in practice), so we can think that 1.1 is equivalent to 1.10. At this time, we can see that when comparing "1.10" and "1.11", the result of "1.11" is obviously greater than "1.10".

So, the conclusion is: 1.11 is greater than 1.1.

Tip: Use deepseek-r1:latest Models can get more detailed answers (including thought processes), but they require more resources and the output will be longer. Readers can choose the model that suits them.

Implementing RAG via Embedding Model

nomic-embed-text model

nomic-embed-text It is a model specifically designed for generating text embeddings. Text embedding is the process of converting text data into vector representations. These vectors can capture the semantic information of the text and are very useful in many natural language processing tasks, such as information retrieval (finding documents with semantics similar to the query text), text classification, cluster analysis, etc. You can download this model using the following command.

ollama pull nomic-embed-text

Add crates dependency

cargo add serde

Implementing RAG logic

edit main.rs Update the file to the following code:

use  rig::{
    completion::Prompt, embeddings::EmbeddingsBuilder, providers,
    vector_store::in_memory_store::InMemoryVectorStore, Embed,
};
use  serde::Serialize;

// Data that needs to be RAG processed. Need to perform a vector search on the `definitions` field,
// So we mark `WordDefinition` with the `#[embed]` macro to derive the `Embed` trait.
#[derive(Embed, Serialize, Clone, Debug, Eq, PartialEq, Default)]
struct WordDefinition  {
    id:  String ,
    word:  String ,
    #[embed]
    definitions:  Vec < String >,
}

#[tokio::main]
async fn main ()  -> Result <(),  Box < dyn  core::error::Error>> {
    const  MODEL_NAME: & str  =  "qwen2.5" ;
    const  EMBEDDING_MODEL: & str  =  "nomic-embed-text" ;
    let client  = providers::openai::Client:: from_url ( "ollama" ,  "http://localhost:11434/v1" );
    let embedding_model  = client. embedding_model (EMBEDDING_MODEL);

    // Generate embedding vectors for all document definitions using the specified embedding model
    let embeddings  = EmbeddingsBuilder:: new (embedding_model. clone ())
        . documents ( vec! [
            WordDefinition {
                id:  "doc0" . to_string (),
                word:  "flurbo" . to_string (),
                definitions:  vec !
                    "1. *flurbo* (noun): A flurbo is a green alien that lives on a cold planet." . to_string (),
                    "2. *flurbo* (noun): A fictional digital currency that originated from the animated series Rick and Morty." . to_string ()
                ]
            },
            WordDefinition {
                id:  "doc1" . to_string (),
                word:  "glarb glarb" . to_string (),
                definitions:  vec !
                    "1. *glarb glarb* (noun): glarb glarb is an ancient tool used by the ancestors of the inhabitants of the planet Jiro to cultivate the land." . to_string (),
                    "2. *glarb glarb* (noun): a fictional creature found in a remote swamp on the planet Glibbo in the Andromeda Galaxy." . to_string ()
                ]
            },
        ])?
        .build ( )
        . await ?;

    // Create vector storage using these embeddings
    let vector_store  = InMemoryVectorStore:: from_documents (embeddings);

    // Create vector storage index
    let index  = vector_store.index ( embedding_model );

    let rag_agent  = client
        .agent (MODEL_NAME )
        .preamble (
            "You are the dictionary assistant here, helping users understand the meaning of words.
            Below you will find other non-standard word definitions that may be useful. " ,
        )
        . dynamic_context ( 1 , index)
        .build () ;

    // Prompt and print the response
    let response  = rag_agent.prompt ( "What does \ "glarb glarb\" mean?" ). await ?;
    println! ( "{}" , response);

    Ok (())
}

Run the program first to see the effect, and you can get the following output:

$  cargo run -q
In the definition given, "glarb glarb" has the following two meanings:

1. **Noun**: This is an ancient tool used by the ancestors of the inhabitants of the planet Jiro to cultivate the land.
2. **Noun**: A fictional creature found in a remote swamp on the planet Glibbo in the Andromeda Galaxy.

Note that this is based on the document definition provided, and "glarb glarb" may be two different nouns with different meanings and contexts.

When we comment out .dynamic_context(1, index) Run it again with one line, and the output is as follows:

$  cargo run -q
Sorry, "glarb glarb" is not a known word or expression with no clear meaning in the standard language. It could be a typo or a made-up phrase for a specific situation. More context is needed to determine the exact meaning. If you see this phrase in a game, book, or special community, you may need to refer to the rules or explanation in that context.

As you can see, we are using .dynamic_context(1, index) function, which searches for the most similar documents from the vector storage based on the input prompt, and then adds these documents to the prompt, thus achieving the RAG (Retrieval Augmented Generation) effect.

summary

This article briefly introduces how to use the rig-core library to implement RAG using Ollama's local model. This is a basic example, and actual applications may require some adjustments and extensions based on requirements. There will be more detailed introductions and examples later, such as document (PDF, Word, Excel, PPT) parsing, data persistence storage, etc. Please stay tuned.