# How to Make AI Run Faster on Smaller Computer Chips

> A method to shrink complex AI models by converting their high-precision math into simpler, faster formats that run efficiently on mobile devices.

- **Patent:** US 10373050
- **Original title:** Fixed point neural network based on floating point neural network quantization
- **Owner:** Qualcomm Inc
- **Granted:** 2019
- **Status:** Active
- **Times cited:** 17
- **Field:** semiconductors, ai_ml, consumer_electronics, telecommunications

## What it does

This patent describes a way to compress artificial intelligence models, specifically neural networks, so they can run on hardware with limited power, like smartphones. It works by converting 'floating point' numbers—which are very precise but computationally expensive—into 'fixed point' numbers, which are much simpler for a processor to handle. The core mechanism involves analyzing the statistical 'moments' (like the average or spread) of the data flowing through the network. By shifting these values to create a 'zero-mean distribution' and adjusting the math functions accordingly, the system ensures the model stays accurate even after it has been simplified.

## What it does NOT cover

- Does not cover general neural network training methods that do not involve quantization.
- Does not cover hardware-specific circuit designs for performing multiplication or addition.
- Does not cover techniques that use retraining or fine-tuning to recover accuracy after quantization.
- Does not cover non-neural network machine learning models like support vector machines.

## The clever bit

Instead of just rounding numbers, the patent shifts the entire distribution of data to be centered around zero, which minimizes the mathematical errors introduced when you switch from high-precision to low-precision math.

## Real-world examples

1. Qualcomm Snapdragon mobile processors
2. On-device AI image processing
3. Mobile voice assistants
4. Edge computing AI accelerators

## Why it matters

As AI models grew larger, they became too heavy for mobile processors to run in real-time. This patent provides a standardized way for companies like Qualcomm to ensure their mobile chips can handle advanced AI tasks, such as image recognition or voice processing, without draining the battery or overheating the device.

## Frequently asked questions

### What does How to Make AI Run Faster on Smaller Computer Chips cover?

A method to shrink complex AI models by converting their high-precision math into simpler, faster formats that run efficiently on mobile devices.

### Who owns patent US 10373050?

Qualcomm Inc owns this patent, granted in 2019.

### When does this patent expire?

This patent is expected to expire on August 6, 2039, when the invention enters the public domain.

### What is patent US 10373050 cited by?

This patent has been cited by 17 later patents that build on its ideas.

### What problem does this patent solve?

As AI models grew larger, they became too heavy for mobile processors to run in real-time. This patent provides a standardized way for companies like Qualcomm to ensure their mobile chips can handle advanced AI tasks, such as image recognition or voice processing, without draining the battery or overheating the device.

### What does this patent NOT cover?

Does not cover general neural network training methods that do not involve quantization.

**Full plain-English explainer:** https://patentbrief.org/patent/us/10373050/tensor-processing-unit-tpu

**Original patent:** https://patents.google.com/patent/US10373050

---

_Source: PatentBrief — https://patentbrief.org. Patent facts are from public records; the plain-English explanation is PatentBrief's._


## Related patents

Semantically similar inventions in the PatentBrief corpus:

- [How Projection Neural Networks Speed Up AI Predictions](https://patentbrief.org/patent/us/11544573/llama-large-language-model-architecture) — A method for making artificial intelligence models faster and more efficient by using fixed, non-trainable projections to simplify complex data before processing.
- [How a Chip Uses Memory to Speed Up AI Calculations](https://patentbrief.org/patent/us/11741188/hardware-accelerated-discretized-neural-network) — This patent describes a specialized computer chip that uses non-volatile memory and analog signals to quickly perform calculations for artificial intelligence, especially for neural networks that need to remember past information.
- [Adapting AI Models to Fit Device Resources](https://patentbrief.org/patent/us/20220383078/data-processing-method-and-related-device) — This patent describes how a computer system can automatically shrink a large artificial intelligence model, specifically a "transformer" type, to fit the available computing power of a phone or other device.
- [How to Automatically Expand Neural Networks by Adding New Nodes](https://patentbrief.org/patent/us/10832138/gpt-language-model-pre-training) — A method for growing artificial intelligence models by identifying underperforming parts of a network and adding new nodes based on the behavior of existing ones.
- [How to Update AI on Small Devices with Slow Internet](https://patentbrief.org/patent/us/20250363357/systems-and-methods-for-deploying-and-updating-neural-networks-at-the-edge-of-a-) — This patent describes a method for efficiently updating artificial intelligence models on small, internet-connected devices, like smart cameras, by sending only the changes, or 'patches,' instead of the entire updated model, which saves bandwidth.