WeClone: ​​Fine-tuning a large language model using WeChat chat logs

Written by
Caleb Hayes
Updated on:June-19th-2025
Recommendation

Use personal WeChat chat records to fine-tune the large language model and create your own intelligent digital avatar.

Core content:
1. Full-link solution from chat records to model fine-tuning
2. Chat record training, high-quality voice cloning and WeChat robot binding functions
3. Environment construction, data preparation and personalized optimization technical guide

 
Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

 

Fine-tune the Large Language Model (LLM) through personal WeChat chat records to create a unique digital avatar. It provides a full-link solution from chat data to model fine-tuning, from text generation to voice cloning, and from training to deployment. Let your digital avatar not only "speak your words" but also "sound like you".

 

Key Features

  • Chat history training: Use WeChat chat history to fine-tune the large language model and imitate the way users speak.

  • High-quality voice cloning: Using a 0.5B parameter model and a 5-second voice sample, we can generate voices with a similarity of up to 95%.

  • WeChat robot binding: connect your digital avatar to WeChat and support automatic text and voice replies.

  • Data preprocessing tool: Provides scripts to convert chat records into training data, filtering sensitive information by default.

  • Model personalization optimization: Support LoRA fine-tuning technology to make the model more in line with user language habits.

Installation and Usage

Hardware requirements

Currently, the project uses the chatglm3-6b model by default, and the LoRA method is used to fine-tune the sft stage, which requires about 16GB of video memory. You can also use other models and methods supported by LLaMA Factory, which take up less video memory, but you need to modify the systemprompt word and other related configurations of the template yourself.

Estimated video memory required:

method
Accuracy
7B
14B
Page 30
70 bytes
x
B
Complete (or bf16fp16 )
32
120GB
240GB
600GB
1200GB
18x
National Standard
Complete ( pure_bf16 )
16
60GB
120GB
300GB
600GB
8x
National Standard
Freeze/LoRA/GaLore/APOLLO/BAdam
16
16 GB
32GB
64GB
160GB
2x
National Standard
QLoRA
8
10GB
20GB
40GB
80GB
x
National Standard
QLoRA
4
6GB
12GB
24GB
48GB
x/2
National Standard
QLoRA
2
4GB
8GB
16 GB
24GB
x/4
National Standard

Environment Construction

It is recommended to use uv, which is a very fast Python environment manager. After installing uv, you can use the following command to create a new Python environment and install dependencies. Note that this does not include the dependencies for the xcodec (audio cloning) feature:

git clone https://github.com/xming521/WeClone.git
cd WeClone
uv venv .venv --python=3.9
source .venv/bin/activate
uv pip install --group main -e . 

Notice

Training and reasoning related configurations are unified in the file settings.json

Data preparation

Please use PyWxDump to extract WeChat chat records. After downloading the software and decrypting the database, click Chat Backup, export type to CSV, you can export multiple contacts or group chats, and then put the exported folder in the directory, that is, put the folders of different people's chat records together. The sample data is located in data/example_chat.csv. wxdump_tmp/exportcsv./data./data/csv

Data preprocessing

By default, the project removes the mobile phone number, ID number, email address, and website address from the data. It also provides a blocked word library called blocked_words, where you can add words and sentences that need to be filtered (the entire sentence including the blocked words will be removed by default). Execute the script to process the data. ./make_dataset/csv_to_json.py

When the same person answers multiple sentences in a row, there are three ways to deal with it:

document
Processing
csv_to_json.py
Connect with commas
csv_to_json - single sentence answer.py (deprecated)
Only the longest answer is selected as the final data
csv_to_json-single sentence multiple rounds.py
Placed in the prompt word 'history'

Model Download

First download the ChatGLM3 model from Hugging Face. If you encounter problems downloading the Hugging Face model, you can use the Moda community through the following method. Subsequent training and reasoning need to be performed first to use the Moda community model. Since the model is large, the download process is relatively long, please be patient. export USE_MODELSCOPE_HUB=1

export USE_MODELSCOPE_HUB=1 # Windows uses `set USE_MODELSCOPE_HUB=1`
git lfs install
git clone https://www.modelscope.cn/ZhipuAI/chatglm3-6b.git

The file of MoDa Community needs to be replaced with modeling_chatglm.py of Hugging Face

Configure parameters and fine-tune the model

  • (Optional) Modify settings.json and select other models downloaded locally.

  • Modify and adjust the video memory usage. per_device_train_batch_size gradient_accumulation_steps

  • You can modify parameters such as num_train_epochslora_ranklora_dropout according to the quantity and quality of your own data set.

Single card training

Run the sft stage fine-tuning, my loss only dropped to about 3.5, too much reduction may lead to overfitting, I used about 20,000 integrated valid data. src/train_sft.py

python src/train_sft.py
Multi-card training
uv pip install deepspeed
deepspeed --num_gpus=Number of GPUs to use src/train_sft.py

Actual combat exercises

Scenario 1: Simple reasoning using the browser demo

python ./src/web_demo.py 

Scenario 2: Reasoning using interfaces

python ./src/api_service.py

Scenario 3: Testing with common chat questions

python ./src/api_service.py
python ./src/test_model.py

Scenario 4: Deploy to a chatbot

AstrBot Solution

AstrBot is an easy-to-use multi-platform LLM chatbot and development framework. The platform supports QQ, QQ channel, Telegram, WeChat, Qiwei, and Feishu.

Steps:

  1. Deploy AstrBot

  2. Deploy the messaging platform in AstrBot

  3. Execute to start the API service python ./src/api_service.py 

  4. Add a new service provider in AstrBot, select OpenAI as the type, fill in the API Base URL according to the AstrBot deployment method (for example, docker deployment may be http://172.17.0.1:8005/v1), and fill in gpt-3.5-turbo as the model

  5. After fine-tuning, tool calls are not supported. Please turn off the default tool first and send the command on the message platform: , otherwise there will be no effect after fine-tuning. /tool ​​off reminder

  6. Set the system prompt word in AstrBot according to the default_system used during fine-tuning.