The best OCR is here? Introduction to Mistral AI OCR

Written by

Iris Vance

Updated on:July-08th-2025

Brief Introduction

In the past, the best website/software for recognizing mathematical formulas and converting codes was Mathpix: it supports various format conversions, such as pdf, png to tex, md, and of course docx, etc. Whether it is pdf document conversion or screenshot recognition, from the mobile Snip App to the desktop client and web version, browser plug-in full ecosystem coverage makes the user experience extremely smooth.

But the free version has limited usage, and the paid subscription is too expensive ($50 a year); in addition, the web version occasionally has unstable network connection.

But now, Mistral, the world's most powerful OCR tool, has appeared. It is a French AI startup, which can be understood as the European version of DeepSeek. Its price is also very cheap (the OCR function can convert thousands of pages of PDF for about 1 US dollar).

Usage Examples

File upload

Directly access the official website chat interface (you may need to register an account with your mobile phone number):

https://chat.mistral.ai/chat

Just upload the file in the dialog box like other large language models and enter "Convert to markdown"; for example, here I uploaded a paper related to DeepSeek-R1: Wait a few seconds to get the converted markdown code. Furthermore, you can use a markdown editor such as Typora or Obsidian to view or convert it to pdf, docx and other formats (you may need to install pandoc separately).

Web version effect display

For more effect demonstrations, please refer to the official website:

https://mistral.ai/news/mistral-ocr

This is the original pdf file: This is the effect after conversion and display in Typora (theme: Newsprint):

As you can see, the effect is very good except for the picture, but if you want to extract the picture and keep it in the corresponding position, you need to use the following method.

Advanced Configuration

In addition to processing files in the official website's dialogue interface, you can also perform batch processing through API calls. Thanks to @nicekate for providing Python code, you can call the Mistral API locally to process files, and there is also a corresponding demonstration video on Bilibili.

GitHub repository address: https://github.com/nicekate/mistral-ocr
Bilibili video: [Testing Mistral OCR: The world's best document understanding model? ] https://www.bilibili.com/video/BV1Bw92YiEEH

The configuration method is also very simple. You only need to apply for your own API key, then clone the above repository and fill in the corresponding API key.

Apply for an API key

Click "API Keys" in the menu bar on the left side of the console, then click "Create new key" in the upper right corner and copy it.

https://console.mistral.ai/home

Download Python code

First clone the above warehouse locally:

git clone https://github.com/nicekate/mistral-ocr.git

Then install the dependencies:

pip install mistralai

existpdf_ocr.py Just modify the API key and PDF file path in (lines 72-73):

API_KEY =  "Fill in your own api key "

PDF_PATH = " xxx.pd f"

Run the file and you can find the converted folder in the same directoryocr_results_xxx, which contains the converted markdown files and image files.

Local conversion effect display

Theme: Typora GitHub

Batch Processing

Sometimes I need to convert multiple PDF files at the same time, so on this basis I added a batch processing function by myself, which can convert all PDF files in the same directory.

The specific modifications are as follows:

Added get_pdf_files_in_directory Function that scans a specified folder and returns the full paths of all PDF files.
exist __main__ In the PDF file, instead of manually specifying the PDF file path, the PDF file is automatically obtained from the folder.
If there is no PDF file in the folder, the user will be prompted. The following is the newly added code snippet:

def  process_pdfs (pdf_paths: list, api_key: str)  ->  None :

for  pdf_path  in  pdf_paths:

try :

output_dir = process_pdf(pdf_path, api_key)

print( f"File  {pdf_path}  processed, results saved in:  {output_dir} " )

except  Exception  as  e:

print( f"   Error  processing file {pdf_path} : {e} " )



def  get_pdf_files_in_directory (directory: str)  -> list:

"""Get all PDF file paths in the specified directory"""

pdf_files = []

for  file  in  os.listdir(directory):

if  file.endswith( ".pdf" ):

pdf_files.append(os.path.join(directory, file))

return  pdf_files



if  __name__ ==  "__main__" :

# Example

API_KEY =  "your_mistral_api_key"

DIRECTORY =  "your_pdf_file" #Specifies the folder name containing the PDF file



# Get all PDF files in the folder

PDF_PATHS = get_pdf_files_in_directory(DIRECTORY)

if not  PDF_PATHS:

print( f"   No PDF file was found in directory {DIRECTORY} ." )

else :

process_pdfs(PDF_PATHS, API_KEY)