LoRA | HyperKit.ai

Definition

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that adds small trainable matrices to frozen pretrained weights. Instead of updating all parameters, LoRA decomposes weight updates into low-rank matrices, dramatically reducing trainable parameters and memory.

LoRA enables fine-tuning large models on consumer hardware while maintaining quality close to full fine-tuning.

Key Concepts

Low-rank decomposition: ΔW = BA where B and A are small matrices
Rank (r): Controls adapter capacity (typically 4-64)
Frozen base: Original weights unchanged during training
Mergeable: Can merge adapters into base weights for inference

Examples

Mathematics

How LoRA Works

LORA DECOMPOSITION:

Original weight matrix: W ∈ R^(d×k)
Example: d=4096, k=4096 → 16M parameters

FULL FINE-TUNING:
W' = W + ΔW
ΔW has 16M trainable parameters

LORA:
W' = W + BA
where B ∈ R^(d×r), A ∈ R^(r×k)
For r=8: (4096×8) + (8×4096) = 65K parameters!

PARAMETER REDUCTION:
Full: d × k = 16,777,216 params
LoRA: d×r + r×k = 65,536 params
Reduction: 256× fewer parameters!

FORWARD PASS:
h = Wx + BAx
  = Wx + B(Ax)   # efficient: small matrix mults
     ↑     ↑
  frozen  trainable

INITIALIZATION:
A: Random Gaussian (small values)
B: Zero matrix
→ BA = 0 at start (no change to model)

SCALING FACTOR (α):
output = Wx + (α/r) × BAx
α controls update magnitude
Common: α = r (so α/r = 1)

Implementation

Using LoRA for Fine-Tuning

PEFT LIBRARY (HuggingFace):

from peft import LoraConfig, get_peft_model

# Configure LoRA
lora_config = LoraConfig(
    r=16,                    # rank
    lora_alpha=32,           # scaling
    target_modules=[         # which layers
        "q_proj", "k_proj",
        "v_proj", "o_proj"
    ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply to model
model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters()
# "trainable params: 4M || all params: 7B || 0.06%"

# Train normally
trainer = Trainer(model=model, ...)
trainer.train()

# Save only LoRA weights (~50MB vs 14GB)
model.save_pretrained("my-lora-adapter")

# Load and merge for inference
model = model.merge_and_unload()

MEMORY COMPARISON (7B model):

Full fine-tuning:
- Model: 14 GB (FP16)
- Gradients: 14 GB
- Optimizer: 28 GB (Adam)
- Total: ~56 GB → needs 4× A100

LoRA (r=16):
- Model: 14 GB (frozen, no grads)
- LoRA params: ~50 MB
- Gradients: ~50 MB
- Optimizer: ~100 MB
- Total: ~15 GB → single 24GB GPU!

Interactive Exercise

✎

Calculate LoRA Parameters

A model has weight matrices of size 8192×8192. How many trainable parameters does LoRA add with r=32?

Pro Tips

Start with r=8-16 for most tasks; increase if underfitting
Apply LoRA to attention layers (q, k, v, o) for best results
Multiple LoRA adapters can be swapped at runtime for different tasks
LoRA works well combined with quantization (see QLoRA)

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms