Training AI on Private Data Without Seeing It
This patent describes a way to train artificial intelligence models using private data stored on many separate computers, by generating fake data that mimics the real data's patterns, so the private data itself never leaves its original location.
Patent Number
US 12518214
Status
Active
Filing Date
April 21, 2023
Grant Date
January 6, 2026
Expiration
April 21, 2043
Claims
25
Assignee
Nant Holdings IP
Inventors
Christopher W. Szeto, Stephen Charles Benz, Nicholas J. Witchey
Citations
0 forward · 52 backward
What it covers
This patent outlines a system for distributed machine learning where private data stays put. Imagine many computers, each holding sensitive information like patient health records. A central system sends a 'task' definition to these private computers. Each private computer's 'modeling agent' uses its local private data to create synthetic, or fake, data that mimics the real data's patterns. It then trains a 'proxy model' on this synthetic data. The system then collects this proxy model data from multiple private servers. If the data from different servers looks similar in shape or properties, it's combined into a 'global model.' If the data looks different, it might signal a problem with the original private data, like corruption or missing information.
What it doesn't cover
- —Systems where private data is de-identified or exposed to unauthorized systems.
- —Systems that directly transmit the original local private data to a non-private server.
- —Training AI models solely on synthetic data that does not originate from private data distributions.
- —Systems where the proxy model data is not compared between different private data servers.
- —Aggregating models without first generating synthetic data based on private data distributions.
The clever bit
The core innovation is generating synthetic data that captures the essence of the private data's distributions and patterns. This synthetic data is then used to train proxy models, allowing knowledge to be shared and aggregated into a global model without ever exposing the original, sensitive private data.
Why it matters
This patent addresses a critical challenge in modern AI development: accessing and utilizing sensitive data, such as patient health information, for training without violating privacy regulations like HIPAA. It enables collaborative AI training across organizations that cannot share raw data, potentially accelerating research in fields like healthcare and finance.
Real-world examples
- 1.Training medical diagnostic AI using data from multiple hospitals without sharing patient records.
- 2.Developing fraud detection models across different financial institutions.
- 3.Collaborative AI research on sensitive datasets in academic settings.
Generated by PatentBrief · Not legal advice · patentbrief.org
US 12518214 · 2026