Model Distillation

Definition: Technique for transferring knowledge from a large model (teacher) to a smaller one (student), reducing inference costs while preserving most of the performance.

— Source: NERVICO, Product Development Consultancy

What is Model Distillation

Model distillation is a compression technique that transfers knowledge from a large, powerful model (teacher) to a smaller, more efficient model (student). The student model learns to replicate the teacher’s behavior, including not only the correct answers but also the probability distribution over all possible responses. The result is a lighter model that retains between 90-98% of the original performance at a fraction of the computational cost.

How It Works

During the distillation process, the teacher model generates responses for a large training dataset. Instead of training the student only with correct labels (hard labels), it is trained to replicate the teacher’s full probability distribution (soft labels). These soft labels contain rich information about the relationships between concepts the model has learned. The loss function combines KL divergence between teacher and student distributions with accuracy on correct labels, balancing fidelity to the teacher with generalization.

Why It Matters

The most powerful language models are often too expensive to run in production at scale. Distillation enables companies to leverage large-model quality while reducing inference costs by 5x to 20x. This makes it viable to deploy AI on edge devices, mobile applications, and scenarios where latency or compute budget is a constraint.

Practical Example

A company needs an intent classification model for its chatbot processing 100,000 daily queries. Using GPT-4 directly would cost thousands of dollars monthly. Through distillation, they generate GPT-4 responses for 50,000 examples and train a 7B parameter model that achieves 95% of GPT-4’s accuracy at one-tenth of the per-query cost.

LLM - Large models that serve as teachers in distillation
Fine-Tuning - Related model adjustment technique
Quantization - Complementary model compression technique

Last updated: February 2026 Category: Artificial Intelligence Related to: LLM, Model Compression, Inference Optimization, Fine-Tuning Keywords: model distillation, knowledge distillation, teacher student, model compression, inference optimization, soft labels, kl divergence

What is Model Distillation

How It Works

Why It Matters

Practical Example

Related Terms

Need help with product development?