# Training AI on Private Data Without Seeing It

> This patent describes a way to train artificial intelligence models using private data stored on many separate computers, by generating fake data that mimics the real data's patterns, so the private data itself never leaves its original location.

- **Patent:** US 12518214
- **Original title:** Distributed machine learning systems including generation of synthetic data
- **Owner:** Nant Holdings IP
- **Granted:** 2026
- **Status:** Active
- **Times cited:** 0
- **Field:** software, ai_ml, telecommunications, finance, biotech, healthcare

## What it does

This patent outlines a system for distributed machine learning where private data stays put. Imagine many computers, each holding sensitive information like patient health records. A central system sends a 'task' definition to these private computers. Each private computer's 'modeling agent' uses its local private data to create synthetic, or fake, data that mimics the real data's patterns. It then trains a 'proxy model' on this synthetic data. The system then collects this proxy model data from multiple private servers. If the data from different servers looks similar in shape or properties, it's combined into a 'global model.' If the data looks different, it might signal a problem with the original private data, like corruption or missing information.

## What it does NOT cover

- Systems where private data is de-identified or exposed to unauthorized systems.
- Systems that directly transmit the original local private data to a non-private server.
- Training AI models solely on synthetic data that does not originate from private data distributions.
- Systems where the proxy model data is not compared between different private data servers.
- Aggregating models without first generating synthetic data based on private data distributions.

## The clever bit

The core innovation is generating synthetic data that captures the essence of the private data's distributions and patterns. This synthetic data is then used to train proxy models, allowing knowledge to be shared and aggregated into a global model without ever exposing the original, sensitive private data.

## Real-world examples

1. Training medical diagnostic AI using data from multiple hospitals without sharing patient records.
2. Developing fraud detection models across different financial institutions.
3. Collaborative AI research on sensitive datasets in academic settings.

## Why it matters

This patent addresses a critical challenge in modern AI development: accessing and utilizing sensitive data, such as patient health information, for training without violating privacy regulations like HIPAA. It enables collaborative AI training across organizations that cannot share raw data, potentially accelerating research in fields like healthcare and finance.

## Frequently asked questions

### What does Training AI on Private Data Without Seeing It cover?

This patent describes a way to train artificial intelligence models using private data stored on many separate computers, by generating fake data that mimics the real data's patterns, so the private data itself never leaves its original location.

### Who owns patent US 12518214?

Nant Holdings IP owns this patent, granted in 2026.

### When does this patent expire?

This patent is expected to expire on April 21, 2043, when the invention enters the public domain.

### What problem does this patent solve?

This patent addresses a critical challenge in modern AI development: accessing and utilizing sensitive data, such as patient health information, for training without violating privacy regulations like HIPAA. It enables collaborative AI training across organizations that cannot share raw data, potentially accelerating research in fields like healthcare and finance.

### What does this patent NOT cover?

Systems where private data is de-identified or exposed to unauthorized systems.

**Full plain-English explainer:** https://patentbrief.org/patent/us/12518214/distributed-machine-learning-systems-including-generation-of-synthetic-data

**Original patent:** https://patents.google.com/patent/US12518214

---

_Source: PatentBrief — https://patentbrief.org. Patent facts are from public records; the plain-English explanation is PatentBrief's._


## Related patents

Semantically similar inventions in the PatentBrief corpus:

- [How to Automatically Detect and Fix Changes in AI Model Data](https://patentbrief.org/patent/us/10599957/systems-and-methods-for-detecting-data-drift-for-data-used-in-machine-learning-m) — This patent describes a system that automatically notices when the real-world data an AI model sees changes, causing its predictions to become less accurate, and then fixes the model.
- [How Computers Calculate Probabilities in Large Knowledge Bases](https://patentbrief.org/patent/us/9361579/large-scale-probabilistic-ontology-reasoning) — A method for finding answers in a database of uncertain facts by ignoring probabilities to find a solution first, then calculating how likely that solution is based on the underlying evidence.
- [AI System That Learns Normal Email Use to Spot and Stop Cyber Threats](https://patentbrief.org/patent/us/11606373/cyber-threat-defense-system-protecting-email-networks-with-machine-learning-mode) — This 2023 patent describes an AI system that learns how your company normally uses email and then automatically takes action to stop cyber threats that behave unusually.
- [How Multiple AI Models Detect Unusual Behavior on Computer Networks](https://patentbrief.org/patent/us/12438891/anomaly-detection-based-on-ensemble-machine-learning-model) — This patent describes a computer system that uses several artificial intelligence models working together to spot unusual and potentially dangerous activity from users or devices on a computer network.
- [How to Make Artificial Intelligence Explain Its Own Decisions](https://patentbrief.org/patent/us/10824959/explainers-for-machine-learning-classifiers) — A system that helps complex machine learning models explain why they made a specific decision by turning their data into simple, readable rules.
