Table of Content
Single card 4090 fine-tuning DeepSeek-R1-32B

Updated on:July-13th-2025
Recommendation
Fine-tune the DeepSeek-R1-Distill-Qwen-32B model on a single RTX 4090 card to achieve efficient medical data processing.
Core content:
1. How to fine-tune the DeepSeek-R1-Distill-Qwen-32B model on a single RTX 4090 card
2. Use unsloth and lora technology to optimize video memory usage and achieve model fine-tuning
3. Full fine-tuning experimental results and code sharing completed on the Beilian cloud computing platform
Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)
- Total steps: 9288
- Total training epochs: 3.0
- Data volume per round: 24,772 records
- Training time: Total 28 hours, 28 minutes, 37 seconds (102517.8411 seconds)
import wandb
# Log in to wandb.ai for experiment tracking
wandb.login(key= "Place your token on the wandb.ai website" )
# Initialize the wandb project
run = wandb.init(
project= 'Lora-R1-Distill-Qwen on Medical COT Dataset' ,
job_type = "training" ,
anonymous= "allow"
)
####################################################################################################
# 1. Load the model
# Load the model using unsloth optimized FastLanguageModel
from unsloth import FastLanguageModel
max_seq_length = 4096 # Maximum sequence length
dtype = None # data type, None means automatic selection
load_in_4bit = True # Load the model using 4 bit quantization to save video memory
# Load the pre-trained model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
#model_name = "unsloth/DeepSeek-R1-Distill-Qwen-7B" ,
model_name = "/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B" ,
local_files_only=True, # avoid networking
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
#token = hf_token,
)
print (model)
####################################################################################################
# 2. Define the prompt template and do an inference before fine-tuning
prompt_style = """Following are instructions describing the task, as well as input to provide more context.
Please write a response that appropriately completes this request.
Before answering, think about the question carefully and create a step-by-step chain of thought to ensure your response is logical and accurate.
### Instruction:
You are a medical professional with expertise in clinical reasoning, diagnosis, and treatment planning.
Please answer the following medical questions.
### Question:
{}
### Response:
<think>{}" ""
train_prompt_style = prompt_style + """
</think>
{}" ""
# Medical questions for testing
question = "A 70-year-old male patient was admitted to the hospital for chest pain and vomiting for 16 hours. The electrocardiogram showed ST segment elevation of 0.1~0.3mV in the inferior wall leads and right chest leads. After fluid infusion, the blood pressure dropped to 80/60mmHg. The patient had symptoms of dyspnea and inability to lie flat. Physical examination revealed a large number of bubbling sounds in both lungs. In this case, what is the most appropriate drug treatment?"
# Set the model to inference mode
FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "" )], return_tensors= "pt" ).to( "cuda" )
# Generate answers
outputs = model.generate(
input_ids = inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens = 1200 ,
use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print ( "### Model inference results before fine-tuning: " )
print (response[ 0 ].split( "### Response:" )[ 1 ])
####################################################################################################
# 3. Processing the dataset
EOS_TOKEN = tokenizer.eos_token # Add end marker
#Formatting prompt function, used to process examples in the data set
def formatting_prompts_func(examples):
# Extract questions, thought chains and answers from examples
inputs = examples[ "Question" ] # List of medical questions
cots = examples[ "Complex_CoT" ] # Thinking chain list
outputs = examples[ "Response" ] # answer list
#Store formatted text
texts = []
# Traverse each example and combine the question, thought chain and answer into the specified format
for input, cot, output in zip(inputs, cots, outputs):
# Use the train_prompt_style template to format the text and add a terminator
text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
texts.append (text )
# Return the formatted text dictionary
return {
"text" : texts,
}
# Load the dataset and apply formatting
from datasets import load_dataset,load_from_disk
dataset = load_dataset(
"json" , # specifies the data format as JSON
data_files= "/datasets/FreedomIntelligence/medical-o1-reasoning-SFT/medical_o1_sft_Chinese.json" ,
#split= "train[0:500]" , # only take the first 500 data
trust_remote_code=True # Compatible with remote code behavior
)
# If DatasetDict is returned, take out the "train" part
if isinstance(dataset, dict):
dataset = dataset[ "train" ]
dataset = dataset.map ( formatting_prompts_func, batched = True,)
print (dataset) # View the dataset structure
####################################################################################################
# 4. Configure training parameters and start training
model = FastLanguageModel.get_peft_model(
model,
r = 32 ,
target_modules=[
"q_proj" , "k_proj" , "v_proj" , "o_proj" ,
"gate_proj" , "up_proj" , "down_proj" ,
],
lora_alpha = 16 ,
lora_dropout = 0 ,
bias= "none" ,
use_gradient_checkpointing= "unsloth" ,
random_state = 8137 ,
use_rslora=False,
loftq_config=None,
)
print (model)
# Configure training parameters and initialize the trainer
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
# Initialize SFT trainer
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field = "text" , # name of the dataset field
max_seq_length=max_seq_length,
dataset_num_proc= 2 , # The number of parallel processes for dataset processing to improve CPU utilization
args = TrainingArguments(
per_device_train_batch_size= 2 ,
gradient_accumulation_steps= 4 ,
warmup_steps= 5 , # warmup steps, gradually increase the learning rate
learning_rate= 2e-4 , # learning rate
lr_scheduler_type = "linear" , # Linear learning rate scheduler
# max_steps = 200 , # Maximum number of training steps (one step = processing a batch of data)
fp16=not is_bfloat16_supported(), # If bf16 is not supported, use fp16
bf16=is_bfloat16_supported(), # Use bf16 if supported
logging_steps= 10 , # log every 10 steps
optim = "adamw_8bit" , # Use the 8 -bit AdamW optimizer to save video memory and hardly affect the training effect
weight_decay= 0.01 , # Weight decay coefficient, used for regularization to prevent overfitting
seed= 8137 , # random number seed
output_dir = "outputs" , # Save model checkpoints and training logs
run_name = "medical-o1-sft-experiment" , # Explicitly set the wandb run name to avoid warnings
),
)
# Start training
print (f "trainer.args.max_steps: {trainer.args.max_steps}" )
print (f "trainer.args.num_train_epochs: {trainer.args.num_train_epochs}" )
trainer.train()
print (f "Total training steps: {trainer.state.max_steps}" )
print (f "Total epochs: {trainer.state.epoch}" )
####################################################################################################
# 5. Perform inference on the fine-tuned model
FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "" )], return_tensors= "pt" ).to( "cuda" )
# Generate answers
outputs = model.generate(
input_ids=inputs.input_ids, # Input token id sequence
attention_mask=inputs.attention_mask, # Attention mask, used to mark valid input positions
max_new_tokens= 1200 , # The maximum number of new tokens generated
use_cache=True, # Whether to use KV cache to accelerate generation
)
response = tokenizer.batch_decode(outputs)
print ( "### Model inference results after fine-tuning: " )
print (response[ 0 ].split( "### Response:" )[ 1 ])
####################################################################################################
# 6. Save the model
new_model_local = "DeepSeek-R1-Medical-COT-Qwen-32B"
model.save_pretrained(new_model_local)
tokenizer.save_pretrained(new_model_local)
# Save the merged 16 bit model
model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit" ,)
# Save as GGUF model
# model.save_pretrained_gguf( "DeepSeek-R1-Qwen-32B-Medical-COT-GGUF" , tokenizer,)
The complete log is as follows:
ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(li neounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter( lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounte r(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineoun ter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineo unter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lin eounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(l ineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(linewandb: Appending key for api.wandb.ai to your netrc file: /root/.netrcwandb: W&B API key is configured. Use `wandb login --relogin` to force reloginwandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.wandb: Tracking run with wandb version 0.19.6wandb: Run data is saved locally in /workspace/wandb/run-20250212_150918-mvocweduwandb: Run `wandb offline` to turn off syncing.wandb: Syncing run ruby-wind-2wandb: ⭐️ View project at https://wandb.ai/xlxkming-none/Lora-R1-Distill-Qwen%20on%20Medical%20COT%20Dataset?apiKey=edb4e5ad4f056c86bc64f3ea1d5b327e88378327wandb: ? View run at https://wandb.ai/xlxkming-none/Lora-R1-Distill-Qwen%20on%20Medical%20COT%20Dataset/runs/mvocwedu?apiKey=edb4e5ad4f056c86bc64f3ea1d5b327e88378327wandb: WARNING Do NOT share these links with anyone. They can be used to claim your runs.? Unsloth: Will patch your computer to enable 2x faster free finetuning.? Unsloth Zoo will now patch everything to make training faster! INFO 02-12 15:09:30 __init__.py:190] Automatically detected platform cuda.==((====))== Unsloth 2025.2.4: Fast Qwen2 patching. Transformers: 4.48.3. \\ /| GPU: NVIDIA GeForce RTX 4090. Max memory: 23.65 GB. Platform: Linux.O^O/ \_/ \ Torch: 2.5.1+cu121. CUDA: 8.9. CUDA Toolkit: 12.1. Triton: 3.1.0\ / Bfloat16 = TRUE. FA [Xformers = 0.0.29.post1. FA2 = False] "-____-" Free Apache license: http://github.com/unslothai/unslothUnsloth: Fast downloading is enabled - ignore downloading bars which are red colored!Loading checkpoint shards: 100%|██████████| 8/8 [00:16<00:00, 2.07s/it]Unsloth 2025.2.4 patched 64 layers with 64 QKV layers,64 O layers and 64 MLP layers./models/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B does not have a padding token! Will use pad_token = <|vision_pad|>.Qwen2ForCausalLM( (model): Qwen2Model( (embed_tokens): Embedding(152064, 5120, padding_idx=151654) (layers): ModuleList( (0-63): 64 x Qwen2DecoderLayer( (self_attn): Qwen2Attention( (q_proj): Linear4bit(in_features=5120, out_features=5120, bias=True) (k_proj): Linear4bit(in_features=5120, out_features=1024, bias=True) (v_proj): Linear4bit(in_features=5120, out_features=1024, bias=True) (o_proj): Linear4bit(in_features=5120, out_features=5120, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): Qwen2MLP( (gate_proj): Linear4bit(in_features=5120, out_features=27648, bias=False) (up_proj): Linear4bit(in_features=5120, out_features=27648, bias=False) (down_proj): Linear4bit(in_features=27648, out_features=5120, bias=False) (act_fn): SiLU() ) (input_layernorm): Qwen2RMSNorm((5120,), eps=1e-05) (post_attention_layernorm): Qwen2RMSNorm((5120,), eps=1e-05) ) ) (norm): Qwen2RMSNorm((5120,), eps=1e-05) (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): Linear(in_features=5120, out_features=152064, bias=False))### Model inference results before fine-tuning: <think><think>Well, this problem looks a bit complicated, but I will analyze it step by step. First, I need to understand the patient's condition and test results, then determine the possible diagnosis, and finally choose the appropriate drug treatment. The patient is a 70-year-old male who came to the doctor because of chest pain and vomiting for 16 hours. The electrocardiogram shows ST elevation in the inferior leads and right chest leads, which may indicate myocardial infarction, especially inferior and right ventricular infarction. Because inferior myocardial infarction is usually related to right coronary artery obstruction, and ST elevation in the right chest leads may involve the right ventricle. Next, the patient's blood pressure dropped to 80/60 mmHg after fluid infusion, which may mean hypotension, but the blood pressure dropped instead after fluid infusion, which may be because the heart function is impaired and cannot pump blood effectively, leading to cardiogenic shock. At the same time, the patient has difficulty breathing and cannot lie flat. Physical examination found a lot of bubbling sounds in both lungs, which may indicate pulmonary edema, especially cardiogenic pulmonary edema, because the heart cannot pump blood effectively, causing fluid to accumulate in the lungs. Now, I need to determine the specific situation of the patient. Inferior and right ventricular infarction may cause a decrease in the heart's pumping function, especially right ventricular dysfunction, which affects the heart's output and causes hypotension and pulmonary edema. In this case, the patient's hemodynamic state may be unstable and requires emergency treatment. Next, consider drug treatment. Typically, for myocardial infarction, we use antiplatelet drugs (such as aspirin), anticoagulants (such as heparin or ticagrelor), and beta-blockers, ACEIs, or ARBs to improve heart function and reduce heart workload. However, the patient now has low blood pressure, and ACEIs may not be suitable for use, because ACEIs may further lower blood pressure and cause hypotension to worsen. In addition, the patient has pulmonary edema and may need diuretics to reduce fluid accumulation in the lungs. However, diuretics may cause further reduction of blood volume, thereby aggravating hypotension, which may not be suitable for the current situation. Considering the patient's hypotension and pulmonary edema, positive inotropic drugs such as dopamine or dobutamine may be needed to enhance cardiac contractility and improve cardiac output, thereby raising blood pressure and reducing pulmonary edema. At the same time, the use of other drugs may need to be adjusted to avoid further affecting blood pressure. In addition, the patient may need mechanical ventilation support, especially if the dyspnea is severe and cannot lie flat, and noninvasive ventilation or intubation may be required. But this may be beyond the scope of current drug treatment. In summary, the patient's situation may involve an inferior and right ventricular myocardial infarction, resulting in cardiogenic shock and pulmonary edema. In this case, the most appropriate medical management may include the use of positive inotropes (such as dopamine or dobutamine) to improve cardiac function, while continuing antiplatelet and anticoagulant therapy, but carefully adjusting to avoid worsening hypotension. Diuretics may also be required to reduce pulmonary edema, but they need to be used under monitoring to prevent hypovolemia. Of course, specific situations may require further evaluation, such as cardiac ultrasound to determine the function of the right ventricle and the presence of mechanical complications, such as ventricular septal perforation or papillary muscle insufficiency. In addition, interventional treatment, such as coronary angiography and stenting, may be required to restore blood flow and improve cardiac function. But according to the problem, it is mainly in the medical management, so the focus should be on the use of positive inotropes and supportive care, while monitoring and adjusting the use of other drugs. </think>For this patient's condition, the most appropriate drug treatment is as follows: 1. **Antiplatelet and anticoagulant therapy**: Continue to use aspirin and clopidogrel (or ticagrelor), and give heparin anticoagulation to prevent further thrombosis. 2. **Positive inotropic drugs**: Use dopamine or dobutamine to enhance cardiac contractility, improve cardiac output, increase blood pressure, and reduce pulmonary edema. 3. **Diuretics**: Use diuretics (such as furosemide) under monitoring to reduce pulmonary edema, but be careful to avoid hypovolemia. 4. **Avoid the use of ACEI or ARB**: Due to the patient's low blood pressure, temporarily avoid the use of ACEI or ARB to prevent further lowering of blood pressure. 5. **Monitoring and supportive treatment**: Closely monitor the patient's vital signs, perform mechanical ventilation support if necessary, and consider interventional treatment (such as coronary angiography and stent implantation) to restore blood flow. In summary, the focus of drug treatment is to use positive inotropic drugs and supportive therapy, while continuing antiplatelet and anticoagulant therapy to improve cardiac function and hemodynamic status. <|end of sentence|>Dataset({ features: ['Question', 'Complex_CoT', 'Response', 'text'], num_rows: 24772})PeftModelForCausalLM( (base_model):LoraModel( (model): Qwen2ForCausalLM( (model): Qwen2Model( (embed_tokens): Embedding(152064, 5120, padding_idx=151654) (layers): ModuleList( (0-63): 64 x Qwen2DecoderLayer( (self_attn): Qwen2Attention( (q_proj): lora.Linear4bit( (base_layer): Linear4bit(in_features=5120, out_features=5120, bias=True) (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=5120, out_features=32, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=32, out_features=5120, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() (lora_magnitude_vector): ModuleDict() ) . . . )...
Training process and result log:
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\\ /| Num examples = 24,772 | Num Epochs = 3
O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 4
\/Total batch size = 8 | Total steps = 9,288
"-____-" Number of trainable parameters = 268,435,456
trainer.args.max_steps: -1
trainer.args.num_train_epochs: 3.0
{ 'loss' : 2.149, 'grad_norm' : 0.16384364664554596, 'learning_rate' : 0.00019989227620381345, 'epoch' : 0.0}
{ 'loss' : 1.5362, 'grad_norm' : 0.07211203873157501, 'learning_rate' : 0.0001996768286114403, 'epoch' : 0.01}
{ 'loss' : 1.4647, 'grad_norm' : 0.07446285337209702, 'learning_rate' : 0.00019946138101906713, 'epoch' : 0.01}
...
{ 'loss' : 1.39, 'grad_norm' : 0.08653779327869415, 'learning_rate' : 0.0001977378002800819, 'epoch' : 0.04}
...
{ 'loss' : 1.2627, 'grad_norm' : 0.1181635782122612, 'learning_rate' : 0.00013590434126898633, 'epoch' : 0.96}
...
{ 'loss' : 1.1951, 'grad_norm' : 0.11674296855926514, 'learning_rate' : 0.00013224173219864268, 'epoch' : 1.02}
...
{ 'loss' : 1.071, 'grad_norm' : 0.1962611824274063, 'learning_rate' : 3.1843154152752344e-05, 'epoch' : 2.52}
...
{ 'loss' : 0.9945, 'grad_norm' : 0.17683860659599304, 'learning_rate' : 2.2794355273079824e-05, 'epoch' : 2.66}
...
{ 'loss' : 1.1104, 'grad_norm' : 0.21208912134170532, 'learning_rate' : 6.032532586448347e-07, 'epoch' : 2.99}
{ 'loss' : 1.0957, 'grad_norm' : 0.2164667695760727, 'learning_rate' : 3.8780566627167944e-07, 'epoch' : 2.99}
{ 'loss' : 1.101, 'grad_norm' : 0.21290326118469238, 'learning_rate' : 1.723580738985242e-07, 'epoch' : 3.0}
100%|██████████| 9288/9288 [28:28:37<00:00, 11.04s/it]
{ 'train_runtime' : 102517.8411, 'train_samples_per_second' : 0.725, 'train_steps_per_second' : 0.091, 'train_loss' : 1.210533706973484, 'epoch' : 3.0}
Total training steps: 9288
Total epochs: 2.999192636848054
### Model inference results after fine-tuning:
<think>The patient is a 70-year-old male who came to the hospital with chest pain and vomiting, which first made me think that it might be related to the heart. The electrocardiogram showed ST segment elevation in the inferior leads and right chest leads, which might mean an inferior myocardial infarction. Next, his blood pressure dropped to 80/60 mmHg after fluid infusion, which is very low, and he also had difficulty breathing, could not lie flat, and had bubbling sounds in his lungs, which made me suspect that he had cardiogenic shock or acute heart failure.
Cardiogenic shock and acute heart failure usually require rapid treatment because they are life-threatening. The first thing to think about is to stabilize his hemodynamic status. Usually, we use positive inotropic drugs in this situation, such as dobutamine, because it can increase myocardial contractility, increase cardiac output, and help improve symptoms of hypotension and dyspnea.
But wait, the patient has hypotension and pulmonary edema, which makes me think that it may not just be a simple cardiogenic shock, but also a fluid overload problem. If it is fluid overload, using a diuretic such as furosemide may be more effective because it can help remove the excess fluid, reduce pulmonary edema, and reduce the workload on the heart.
On second thought, the patient's hypotension is severe and he cannot lie flat, which may indicate that the heart is very poorly pumping. In this case, it may be more appropriate to use a positive inotropic drug to increase the force of the heart's contractions. Dobutamine can increase cardiac output while improving hypotension, which may be a better choice at this time.
Oh, by the way, the patient's electrocardiogram shows inferior ST-segment elevation, which may indicate right ventricular infarction. Right ventricular infarction may cause cardiogenic shock and requires special attention. In this case, it may be more appropriate to use dobutamine to enhance cardiac contractility and increase cardiac output.
In summary, considering the patient's severe hypotension, dyspnea, and the possibility of right ventricular infarction, it is most appropriate to use dobutamine to quickly improve the hemodynamic status. Well, this should be a wise choice.
</think>
In this case, the patient presented with hypotension, dyspnea, inability to lie flat, and bubbling sounds in the lungs, symptoms that suggest possible cardiogenic shock or acute heart failure. The electrocardiogram showed ST segment elevation in the inferior leads and right chest leads, suggesting possible inferior wall myocardial infarction, possibly involving the right ventricle.
For this situation, the most appropriate drug treatment is to use positive inotropic drugs to improve the heart's pumping function and increase cardiac output, thereby improving the symptoms of hypotension and dyspnea. Dobutamine is a commonly used positive inotropic drug that can increase myocardial contractility and cardiac output, while also dilating blood vessels to a certain extent, reducing cardiac afterload, and helping to improve the patient's hemodynamic state.
Therefore, considering the patient's current hemodynamic instability and possible right ventricular infarction, the use of dobutamine is a reasonable and necessary choice. This drug intervention can quickly help stabilize the patient's condition and buy time for subsequent treatment. <|end of sentence|>
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 303.83 out of 503.72 RAM for saving.
Unsloth: Saving model... This might take 5 minutes...
0%| | 0/64 [00:00<?, ?it/s]
We will save to Disk and not RAM now.
100%|██████████| 64/64 [01:34<00:00, 1.47s/it]
Unsloth: Saving tokenizer... Done.
Done.
wandb:
wandb: ? View run ruby-wind-2 at: https://wandb.ai/xlxkming-none/Lora-R1-Distill-Qwen%20on%20Medical%20COT%20Dataset/runs/mvocwedu?apiKey=edb4e5ad4f056c86bc64f3ea1d5b327e88378327
wandb: Find logs at: wandb/run-20250212_150918-mvocwedu/logs
Peak resource usage. This is Lora rank 32:
ounter(lineounter(lineounter(lineounter(lineounter(line|=============================================+========================+========================|| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off || 76% 64C P2 392W / 450W | 24176MiB / 24564MiB | 100% Default || | | N/A |+-----------------------------------------+-------------------------+-------------------------+
I have tested Lora Rank 8 before, and it doesn’t take much less resources than Rank 32:
ounter(lineounter(lineounter(lineounter(lineounter(line|==============================================+========================+========================|| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off || 82% 65C P2 394W / 450W | 21246MiB / 24564MiB | 100% Default || | | N/A |+-----------------------------------------+-------------------------+-------------------------+