Training / Fine-Tuning

LoRA

Intermediate [3/5]
Low-Rank Adaptation Low-Rank Adapters

Definition

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that adds small trainable matrices to frozen pretrained weights. Instead of updating all parameters, LoRA decomposes weight updates into low-rank matrices, dramatically reducing trainable parameters and memory.

LoRA enables fine-tuning large models on consumer hardware while maintaining quality close to full fine-tuning.

Key Concepts

  • Low-rank decomposition: ΔW = BA where B and A are small matrices
  • Rank (r): Controls adapter capacity (typically 4-64)
  • Frozen base: Original weights unchanged during training
  • Mergeable: Can merge adapters into base weights for inference

Examples

Mathematics
How LoRA Works
LORA DECOMPOSITION: Original weight matrix: W ∈ R^(d×k) Example: d=4096, k=4096 → 16M parameters FULL FINE-TUNING: W' = W + ΔW ΔW has 16M trainable parameters LORA: W' = W + BA where B ∈ R^(d×r), A ∈ R^(r×k) For r=8: (4096×8) + (8×4096) = 65K parameters! PARAMETER REDUCTION: Full: d × k = 16,777,216 params LoRA: d×r + r×k = 65,536 params Reduction: 256× fewer parameters! FORWARD PASS: h = Wx + BAx = Wx + B(Ax) # efficient: small matrix mults ↑ ↑ frozen trainable INITIALIZATION: A: Random Gaussian (small values) B: Zero matrix → BA = 0 at start (no change to model) SCALING FACTOR (α): output = Wx + (α/r) × BAx α controls update magnitude Common: α = r (so α/r = 1)
Implementation
Using LoRA for Fine-Tuning
PEFT LIBRARY (HuggingFace): from peft import LoraConfig, get_peft_model # Configure LoRA lora_config = LoraConfig( r=16, # rank lora_alpha=32, # scaling target_modules=[ # which layers "q_proj", "k_proj", "v_proj", "o_proj" ], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM" ) # Apply to model model = get_peft_model(base_model, lora_config) model.print_trainable_parameters() # "trainable params: 4M || all params: 7B || 0.06%" # Train normally trainer = Trainer(model=model, ...) trainer.train() # Save only LoRA weights (~50MB vs 14GB) model.save_pretrained("my-lora-adapter") # Load and merge for inference model = model.merge_and_unload() MEMORY COMPARISON (7B model): Full fine-tuning: - Model: 14 GB (FP16) - Gradients: 14 GB - Optimizer: 28 GB (Adam) - Total: ~56 GB → needs 4× A100 LoRA (r=16): - Model: 14 GB (frozen, no grads) - LoRA params: ~50 MB - Gradients: ~50 MB - Optimizer: ~100 MB - Total: ~15 GB → single 24GB GPU!

Interactive Exercise

Calculate LoRA Parameters

A model has weight matrices of size 8192×8192. How many trainable parameters does LoRA add with r=32?

Pro Tips
  • Start with r=8-16 for most tasks; increase if underfitting
  • Apply LoRA to attention layers (q, k, v, o) for best results
  • Multiple LoRA adapters can be swapped at runtime for different tasks
  • LoRA works well combined with quantization (see QLoRA)

Related Terms