# How to Shrink Large AI Models Using Knowledge Distillation

> A method for teaching small, efficient AI models to mimic the complex decision-making patterns of much larger, more powerful neural networks.

- **Patent:** US 10289962
- **Original title:** Training distilled machine learning models
- **Owner:** Google LLC
- **Granted:** 2019
- **Status:** Active
- **Times cited:** 4
- **Field:** ai_ml, software, consumer_electronics

## What it does

This patent describes a process called knowledge distillation. First, a large, heavy 'cumbersome' model is trained on a dataset to learn complex patterns. Then, a smaller 'distilled' model is trained, not just to predict the correct answer, but to mimic the probability distribution (the 'soft outputs') of the large model. By using a 'temperature constant' higher than 1 during training, the model is forced to pay attention to the relationships between incorrect answers, which provides more information than a simple right-or-wrong label. This allows the smaller model to achieve performance levels close to the large model while being much faster and lighter for mobile devices.

## What it does NOT cover

- Does not cover training models from scratch without a pre-existing cumbersome model.
- Does not cover hardware-specific optimization techniques like model quantization or pruning.
- Does not cover methods where the distilled model is trained using only hard labels (e.g., just the correct class) instead of soft outputs.
- Does not cover architectures where the distilled model has more parameters than the cumbersome model.

## The clever bit

The innovation is using a 'temperature' parameter to soften the output distribution, which reveals the 'dark knowledge'—the subtle hints about how the big model views the similarities between different categories.

## Real-world examples

1. Mobile versions of Google Translate
2. On-device voice recognition on Android phones
3. Lightweight image classification models for mobile apps

## Why it matters

This technique is fundamental to modern AI deployment. It allows companies like Google to run sophisticated language models and image classifiers on smartphones and edge devices that lack the massive computing power required by the original, cumbersome models. It effectively bridges the gap between research-grade supercomputing and consumer-grade hardware.

## Frequently asked questions

### What does How to Shrink Large AI Models Using Knowledge Distillation cover?

A method for teaching small, efficient AI models to mimic the complex decision-making patterns of much larger, more powerful neural networks.

### Who owns patent US 10289962?

Google LLC owns this patent, granted in 2019.

### When does this patent expire?

This patent is expected to expire on May 14, 2039, when the invention enters the public domain.

### What is patent US 10289962 cited by?

This patent has been cited by 4 later patents that build on its ideas.

### What problem does this patent solve?

This technique is fundamental to modern AI deployment. It allows companies like Google to run sophisticated language models and image classifiers on smartphones and edge devices that lack the massive computing power required by the original, cumbersome models. It effectively bridges the gap between research-grade supercomputing and consumer-grade hardware.

### What does this patent NOT cover?

Does not cover training models from scratch without a pre-existing cumbersome model.

**Full plain-English explainer:** https://patentbrief.org/patent/us/10289962/deep-q-networks-dqn

**Original patent:** https://patents.google.com/patent/US10289962

---

_Source: PatentBrief — https://patentbrief.org. Patent facts are from public records; the plain-English explanation is PatentBrief's._
