PatentBrief · Patent BriefUS 10289962

How to Shrink Large AI Models Using Knowledge Distillation

A method for teaching small, efficient AI models to mimic the complex decision-making patterns of much larger, more powerful neural networks.

Patent Number

US 10289962

Status

Active

Filing Date

June 4, 2015

Grant Date

May 14, 2019

Expiration

~June 2035 (estimated)

Claims

Assignee

Google LLC

Inventors

Oriol Vinyals, Geoffrey E. Hinton, Jeffrey A. Dean

Citations

4 forward · 3 backward

What it covers

This patent describes a process called knowledge distillation. First, a large, heavy 'cumbersome' model is trained on a dataset to learn complex patterns. Then, a smaller 'distilled' model is trained, not just to predict the correct answer, but to mimic the probability distribution (the 'soft outputs') of the large model. By using a 'temperature constant' higher than 1 during training, the model is forced to pay attention to the relationships between incorrect answers, which provides more information than a simple right-or-wrong label. This allows the smaller model to achieve performance levels close to the large model while being much faster and lighter for mobile devices.

What it doesn't cover

—Does not cover training models from scratch without a pre-existing cumbersome model.
—Does not cover hardware-specific optimization techniques like model quantization or pruning.
—Does not cover methods where the distilled model is trained using only hard labels (e.g., just the correct class) instead of soft outputs.
—Does not cover architectures where the distilled model has more parameters than the cumbersome model.

The clever bit

The innovation is using a 'temperature' parameter to soften the output distribution, which reveals the 'dark knowledge'—the subtle hints about how the big model views the similarities between different categories.

Why it matters

This technique is fundamental to modern AI deployment. It allows companies like Google to run sophisticated language models and image classifiers on smartphones and edge devices that lack the massive computing power required by the original, cumbersome models. It effectively bridges the gap between research-grade supercomputing and consumer-grade hardware.

Real-world examples

1.Mobile versions of Google Translate
2.On-device voice recognition on Android phones
3.Lightweight image classification models for mobile apps

Generated by PatentBrief · Not legal advice · patentbrief.org

US 10289962 · 2026