Definition: Efficient fine-tuning technique that inserts small trainable matrices into a frozen model, drastically reducing the resources needed to adapt LLMs.
— Source: NERVICO, Product Development Consultancy
What is LoRA
LoRA (Low-Rank Adaptation) is an efficient fine-tuning technique that enables adapting large language models without modifying their original weights. Instead of updating all model parameters during training, LoRA inserts small low-rank matrices into the model’s layers and only trains these additional matrices. This reduces the number of trainable parameters by 99% or more, making fine-tuning of models with billions of parameters accessible on modest hardware.
How It Works
LoRA decomposes weight updates into two low-rank matrices, A and B, where rank r is much smaller than the original dimensions. For a layer with weights W of dimension d x d, instead of updating the full matrix, LoRA learns A (d x r) and B (r x d), where r can be as low as 4 or 8. The final update is W + AB, where AB approximates the full update. The original weights remain frozen, and LoRA adapters are stored as separate files that can be swapped without modifying the base model.
Why It Matters
LoRA democratizes LLM fine-tuning. Fully training a 70B parameter model requires hundreds of GBs of VRAM and days of compute. With LoRA, the same model can be fine-tuned on a single GPU with 24 GB of VRAM in a few hours. Additionally, since the base model is not modified, a company can maintain multiple LoRA adapters for different tasks on the same base model, optimizing storage and simplifying deployment.
Practical Example
A company wants to adapt Llama 3 to answer questions about their internal documentation. Full fine-tuning would require 4 A100 GPUs for 3 days. With LoRA (rank 16), they train a 50 MB adapter on a single GPU in 4 hours. The adapter is loaded on top of the base model in production and achieves 97% of full fine-tuning performance.
Related Terms
- Fine-Tuning - General model adjustment process that LoRA optimizes
- QLoRA - Combination of LoRA with quantization for greater efficiency
- LLM - Language models adapted with LoRA
Last updated: February 2026 Category: Artificial Intelligence Related to: Fine-Tuning, QLoRA, Model Adaptation, Parameter Efficient Training Keywords: lora, low-rank adaptation, efficient fine-tuning, parameter efficient, model adaptation, peft, adapters, frozen weights