QLoRA combines 4-bit quantization of the base model with LoRA adapters, enabling fine-tuning of large models on a single GPU. The base model is loaded in 4-bit precision while LoRA adapters are trained in higher precision (BF16/FP16).
QLoRA makes it possible to fine-tune 65B+ parameter models on consumer hardware with minimal quality loss.