iT邦幫忙

2025 iThome 鐵人賽

DAY 10
0
生成式 AI

LLM - Zero to Hero or Give Up ?系列 第 10

【Day10】實際使用 Unsloth 來對模型微調

  • 分享至 

  • xImage
  •  

介紹

前一天介紹了 Unsloth 這個工具可以快速對 LLM 模型進行微調的方法及優缺點(LoRA 和 QLoRA)。今天要實際操作看看如何準備一份簡單的資料集來對 llama3.2:3b 進行微調,並且將訓練完的模型轉換成 Ollama 可用的格式並且在本機運行。

操作

微調究竟需要多少的記憶體?

Model parameters QLoRA (4-bit) VRAM LoRA (16-bit) VRAM
3B 3.5 GB 8 GB
7B 5 GB 19 GB
8B 6 GB 22 GB
9B 6.5 GB 24 GB
11B 7.5 GB 29 GB
14B 8.5 GB 33 GB
27B 22GB 64GB
32B 26 GB 76 GB
40B 30GB 96GB
70B 41 GB 164 GB
81B 48GB 192GB
90B 53GB 212GB
405B 237 GB 950 GB

表格來源Unsloth 官網提供

微調開始

  • 模型:llama3.2:3b
  • 微調方法:QLoRA(爲求快速,使用 4-bit 精度可以減少記憶體使用,但也會影響到模型的表現)
  • 資料集:python-docs-zh-tw

資料集準備

  • data.json 已上傳至 GitHub可直接下載
    20250907002650

部分程式碼

載入模型

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 2x faster
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # 4bit for 405b!
    "unsloth/Mistral-Small-Instruct-2409",     # Mistral 22b 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!

    "unsloth/Llama-3.2-1B-bnb-4bit",           # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4bit",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",

    "unsloth/Llama-3.3-70B-Instruct-bnb-4bit" # NEW! Llama 3.3 70B!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-3B-Instruct", # or choose "unsloth/Llama-3.2-1B-Instruct"
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
  • 使用 load_in_4bit 就代表使用 4-bit 精度來減少記憶體使用

設定 PEFT(LoRA)的參數

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)
  • 把 base model 套上 LoRA adapter
  • 依照原本 Unsloth 模版的設定來進行,沒特別調整

資料集格式

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 July 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

**Iterators terminating on the shortest input sequence:**<|eot_id|><|start_header_id|>assistant<|end_header_id|>

**在最短輸入序列 (shortest input sequence) 處終止的疊代器:**<|eot_id|>

訓練參數設定

from trl import SFTConfig, SFTTrainer
from transformers import DataCollatorForSeq2Seq
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    packing = False, # Can make training 5x faster for short sequences.
    args = SFTConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

20250908003127

將訓練完的模型轉換成 GGUF 可用的格式

不知道為什麼 Unsloth 的 llama.cpp 轉換成 GGUF 時會出現問題,要先自己手動 clone llama.cpp 在本機編譯後才能成功轉換,而且不知道為什麼 convert_hf_to_gguf.py 的格式會跑掉,所以只好自己手動下載最新的 convert_hf_to_gguf.py 來使用。

!git clone --recursive https://github.com/ggerganov/llama.cpp
!(cd llama.cpp; cmake -B build;cmake --build build --config Release)
!wget https://raw.githubusercontent.com/ggml-org/llama.cpp/refs/heads/master/convert_hf_to_gguf.py -O llama.cpp/convert_hf_to_gguf.py
if True: model.save_pretrained_gguf("/content/drive/MyDrive/lora_model/q4_k_m", tokenizer, quantization_method = "q4_k_m")

20250908001335

重點回顧

  • 瞭解使用 Unsloth 來對 LLM 模型進行微調所需要的記憶體需求
  • 準備一份簡單的資料集來對 llama3.2:3b 進行微調
  • 將訓練完的模型轉換成 GGUF 可用的格式
  • 處理 Unsloth 微調後的模型轉換成 GGUF 的方法(手動編譯 llama.cpp)

參考資料


上一篇
【Day09】Unsloth 模型微調方法
系列文
LLM - Zero to Hero or Give Up ?10
圖片
  熱門推薦
圖片
{{ item.channelVendor }} | {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言