PatentBrief · Patent BriefUS 12518214

Training AI on Private Data Without Seeing It

This patent describes a way to train artificial intelligence models using private data stored on many separate computers, by generating fake data that mimics the real data's patterns, so the private data itself never leaves its original location.

Patent Number

US 12518214

Status

Active

Filing Date

April 21, 2023

Grant Date

January 6, 2026

Expiration

April 21, 2043

Claims

Assignee

Nant Holdings IP

Inventors

Christopher W. Szeto, Stephen Charles Benz, Nicholas J. Witchey

Citations

0 forward · 52 backward

What it covers

This patent outlines a system for distributed machine learning where private data stays put. Imagine many computers, each holding sensitive information like patient health records. A central system sends a 'task' definition to these private computers. Each private computer's 'modeling agent' uses its local private data to create synthetic, or fake, data that mimics the real data's patterns. It then trains a 'proxy model' on this synthetic data. The system then collects this proxy model data from multiple private servers. If the data from different servers looks similar in shape or properties, it's combined into a 'global model.' If the data looks different, it might signal a problem with the original private data, like corruption or missing information.

What it doesn't cover

—Systems where private data is de-identified or exposed to unauthorized systems.
—Systems that directly transmit the original local private data to a non-private server.
—Training AI models solely on synthetic data that does not originate from private data distributions.
—Systems where the proxy model data is not compared between different private data servers.
—Aggregating models without first generating synthetic data based on private data distributions.

The clever bit

The core innovation is generating synthetic data that captures the essence of the private data's distributions and patterns. This synthetic data is then used to train proxy models, allowing knowledge to be shared and aggregated into a global model without ever exposing the original, sensitive private data.

Why it matters

This patent addresses a critical challenge in modern AI development: accessing and utilizing sensitive data, such as patient health information, for training without violating privacy regulations like HIPAA. It enables collaborative AI training across organizations that cannot share raw data, potentially accelerating research in fields like healthcare and finance.

Real-world examples

1.Training medical diagnostic AI using data from multiple hospitals without sharing patient records.
2.Developing fraud detection models across different financial institutions.
3.Collaborative AI research on sensitive datasets in academic settings.

Generated by PatentBrief · Not legal advice · patentbrief.org

US 12518214 · 2026