# How to Shrink Large AI Models Using Knowledge Distillation

> A method for teaching small, efficient AI models to mimic the complex decision-making patterns of much larger, more powerful neural networks.

- **Patent:** US 10289962
- **Original title:** Training distilled machine learning models
- **Owner:** Google LLC
- **Granted:** 2019
- **Status:** Active
- **Times cited:** 4
- **Field:** ai_ml, software, consumer_electronics

## What it does

This patent describes a process called knowledge distillation. First, a large, heavy 'cumbersome' model is trained on a dataset to learn complex patterns. Then, a smaller 'distilled' model is trained, not just to predict the correct answer, but to mimic the probability distribution (the 'soft outputs') of the large model. By using a 'temperature constant' higher than 1 during training, the model is forced to pay attention to the relationships between incorrect answers, which provides more information than a simple right-or-wrong label. This allows the smaller model to achieve performance levels close to the large model while being much faster and lighter for mobile devices.

## What it does NOT cover

- Does not cover training models from scratch without a pre-existing cumbersome model.
- Does not cover hardware-specific optimization techniques like model quantization or pruning.
- Does not cover methods where the distilled model is trained using only hard labels (e.g., just the correct class) instead of soft outputs.
- Does not cover architectures where the distilled model has more parameters than the cumbersome model.

## The clever bit

The innovation is using a 'temperature' parameter to soften the output distribution, which reveals the 'dark knowledge'—the subtle hints about how the big model views the similarities between different categories.

## Real-world examples

1. Mobile versions of Google Translate
2. On-device voice recognition on Android phones
3. Lightweight image classification models for mobile apps

## Why it matters

This technique is fundamental to modern AI deployment. It allows companies like Google to run sophisticated language models and image classifiers on smartphones and edge devices that lack the massive computing power required by the original, cumbersome models. It effectively bridges the gap between research-grade supercomputing and consumer-grade hardware.

## Frequently asked questions

### What does How to Shrink Large AI Models Using Knowledge Distillation cover?

A method for teaching small, efficient AI models to mimic the complex decision-making patterns of much larger, more powerful neural networks.

### Who owns patent US 10289962?

Google LLC owns this patent, granted in 2019.

### When does this patent expire?

This patent is expected to expire on May 14, 2039, when the invention enters the public domain.

### What is patent US 10289962 cited by?

This patent has been cited by 4 later patents that build on its ideas.

### What problem does this patent solve?

This technique is fundamental to modern AI deployment. It allows companies like Google to run sophisticated language models and image classifiers on smartphones and edge devices that lack the massive computing power required by the original, cumbersome models. It effectively bridges the gap between research-grade supercomputing and consumer-grade hardware.

### What does this patent NOT cover?

Does not cover training models from scratch without a pre-existing cumbersome model.

**Full plain-English explainer:** https://patentbrief.org/patent/us/10289962/deep-q-networks-dqn

**Original patent:** https://patents.google.com/patent/US10289962

---

_Source: PatentBrief — https://patentbrief.org. Patent facts are from public records; the plain-English explanation is PatentBrief's._


## Related patents

Semantically similar inventions in the PatentBrief corpus:

- [Adapting AI Models to Fit Device Resources](https://patentbrief.org/patent/us/20220383078/data-processing-method-and-related-device) — This patent describes how a computer system can automatically shrink a large artificial intelligence model, specifically a "transformer" type, to fit the available computing power of a phone or other device.
- [How Devices Train Shared AI Models While Keeping Your Data Private](https://patentbrief.org/patent/us/12443890/partially-local-federated-learning) — This patent describes a method for training a machine learning model across many devices, where each device keeps some parts of the model and its data private, only sharing updates for the common, global parts of the model.
- [Training AI Models Together with Unlabeled Data Using a Teacher](https://patentbrief.org/patent/us/20220012637/federated-teacher-student-machine-learning) — This patent describes a way for multiple AI systems to learn together from data that hasn't been manually labeled, using a 'teacher' AI to create temporary labels for a 'student' AI.
- [How Projection Neural Networks Speed Up AI Predictions](https://patentbrief.org/patent/us/11544573/llama-large-language-model-architecture) — A method for making artificial intelligence models faster and more efficient by using fixed, non-trainable projections to simplify complex data before processing.
- [How to Update AI on Small Devices with Slow Internet](https://patentbrief.org/patent/us/20250363357/systems-and-methods-for-deploying-and-updating-neural-networks-at-the-edge-of-a-) — This patent describes a method for efficiently updating artificial intelligence models on small, internet-connected devices, like smart cameras, by sending only the changes, or 'patches,' instead of the entire updated model, which saves bandwidth.