Training AI on Private Data Without Seeing It
This patent describes a way to train artificial intelligence models using private data stored on many separate computers, by generating fake data that mimics the real data's patterns, so the private data itself never leaves its original location.
Original patent title: “Distributed machine learning systems including generation of synthetic data”
This patent describes a way to train artificial intelligence models using private data stored on many separate computers, by generating fake data that mimics the real data's patterns, so the private data itself never leaves its original location. Granted to Nant Holdings IP in 2026 with 25 claims, and it is expected to expire in 2043.
Key facts
Coverage
What does this patent actually cover?
This patent outlines a system for distributed machine learning where private data stays put. Imagine many computers, each holding sensitive information like patient health records. A central system sends a 'task' definition to these private computers. Each private computer's 'modeling agent' uses its local private data to create synthetic, or fake, data that mimics the real data's patterns. It then trains a 'proxy model' on this synthetic data. The system then collects this proxy model data from multiple private servers. If the data from different servers looks similar in shape or properties, it's combined into a 'global model.' If the data looks different, it might signal a problem with the original private data, like corruption or missing information.
The gap
What does this patent NOT cover?
- Systems where private data is de-identified or exposed to unauthorized systems.
- Systems that directly transmit the original local private data to a non-private server.
- Training AI models solely on synthetic data that does not originate from private data distributions.
- Systems where the proxy model data is not compared between different private data servers.
- Aggregating models without first generating synthetic data based on private data distributions.
These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.
What made this novel
The core innovation is generating synthetic data that captures the essence of the private data's distributions and patterns. This synthetic data is then used to train proxy models, allowing knowledge to be shared and aggregated into a global model without ever exposing the original, sensitive private data.
The Patent Drawing

Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.
Where you've seen this
Real-world examples
Training medical diagnostic AI using data from multiple hospitals without sharing patient records.
Developing fraud detection models across different financial institutions.
Collaborative AI research on sensitive datasets in academic settings.
Why it matters
The bigger picture
This patent addresses a critical challenge in modern AI development: accessing and utilizing sensitive data, such as patient health information, for training without violating privacy regulations like HIPAA. It enables collaborative AI training across organizations that cannot share raw data, potentially accelerating research in fields like healthcare and finance.
Filed
April 21, 2023
Granted
January 6, 2026
Market context
Who's building on this
Companies in this space
Companies and research institutions focused on federated learning and privacy-preserving AI are actively developing similar technologies. This includes major cloud providers like Google, Microsoft, and Amazon, as well as specialized AI startups exploring secure multi-party computation and differential privacy techniques.
Market impact
This patent's approach to privacy-preserving distributed machine learning is crucial for unlocking the value of sensitive datasets. It enables new forms of collaboration and data utilization that were previously impossible due to privacy concerns, potentially leading to more robust and accurate AI models across various industries.
Claim 1 — Plain English
What this patent covers
This patent outlines a system for distributed machine learning where private data stays put. Imagine many computers, each holding sensitive information like patient health records. A central system sends a 'task' definition to these private computers. Each private computer's 'modeling agent' uses its local private data to create synthetic, or fake, data that mimics the real data's patterns. It then trains a 'proxy model' on this synthetic data. The system then collects this proxy model data from multiple private servers. If the data from different servers looks similar in shape or properties, it's combined into a 'global model.' If the data looks different, it might signal a problem with the original private data, like corruption or missing information.
The clever bit
The core innovation is generating synthetic data that captures the essence of the private data's distributions and patterns. This synthetic data is then used to train proxy models, allowing knowledge to be shared and aggregated into a global model without ever exposing the original, sensitive private data.
What it does not cover
- Systems where private data is de-identified or exposed to unauthorized systems.
- Systems that directly transmit the original local private data to a non-private server.
- Training AI models solely on synthetic data that does not originate from private data distributions.
- Systems where the proxy model data is not compared between different private data servers.
- Aggregating models without first generating synthetic data based on private data distributions.
Patent timeline
Application submitted to the patent office
Application published, typically 18 months after filing
Patent officially issued
Patent enters public domain
PatentBrief Score
Impact Score
Early stage
Citation count
0/40
No citations yet
Claim breadth
17/20
Very broad protection
Recency
20/20
Granted within 5 years
Assignee scale
0/20
Independent or smaller assigneeassigneeThe entity that owns the patent — usually the inventor's employer or a company.Read more →
PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.
Heuristic Value Estimate
What this patent might be worth
$47K – $150K
Midpoint $94K · 16.9 yr remaining · industry ×1.6
Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.
The original legal language
Original claims
25 claims as filed with the patent office.
Concepts involved
Citations
Patent lineage
Cite this patent
Szeto, C. W., Benz, S. C., & Witchey, N. J. (2026). Training AI on Private Data Without Seeing It (U.S. Patent No. 12,518,214). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/12518214/distributed-machine-learning-systems-including-generation-of-synthetic-data
Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.
Embed
Add this patent to your site
Drop this plain-English patent card into any blog post or article — free, no signup. It always links back to the full breakdown here.
<div data-patentlens-widget data-patent-number="US12518214"></div> <script src="https://patentbrief.org/embed.js" async></script>
Stay in the loop
Get a weekly digest of new patents.
One email per week. No spam. Unsubscribe anytime.
Keep exploring
Related patents you should know
US 4683195 · 1987
How to Make Billions of Copies of a DNA Segment
This patent describes the Polymerase Chain Reaction (PCR), a method to rapidly create many copies of a specific piece of DNA or RNA, enabling its detection and analysis.
Cetus Corp
US 8697359 · 2014
How to Edit Genes in Human Cells Using an Engineered CRISPR System
This patent describes an engineered CRISPR-Cas9 system for precisely cutting DNA in eukaryotic cells to change how genes work, opening the door for gene editing in complex organisms.
Massachusetts Institute of Technology
US 7657849 · 2010
How the iPhone's Slide-to-Unlock Gesture Works
Apple's 2010 patent describes unlocking a device by dragging a specific graphical image across the touchscreen along a predefined path, a gesture that became iconic with the original iPhone.
Apple Inc
US 4733665 · 1988
How Doctors Implant a Permanent Stent Using a Balloon
This patent describes the method for placing a permanent, expandable wire mesh tube inside a blood vessel or other body tube using a balloon-tipped catheter to widen it and keep it open.
Expandable Grafts Partnership
US 4405829 · 1983
How RSA Public-Key Encryption Keeps Digital Messages Secret
This patent describes the foundational RSA algorithm, a method for securely sending messages where anyone can encrypt a message using a public key, but only the intended recipient can decrypt it using a secret private key.
Massachusetts Institute of Technology
US 4575330 · 1986
How 3D Printers Build Objects Layer by Layer from Liquid
This patent describes the foundational method for 3D printing, where a machine builds a three-dimensional object layer by layer by hardening a liquid material with light or other energy.
UVP Inc
Semantically similar
You might also find these interesting
US 9361579 · 2016 · International Business Machines Corp
How Computers Calculate Probabilities in Large Knowledge Bases
US 12438891 · 2025 · Cisco Technology
How Multiple AI Models Detect Unusual Behavior on Computer Networks
US 10824959 · 2020 · Amazon Technologies Inc
How to Make Artificial Intelligence Explain Its Own Decisions
US 10607134 · 2020
How AI Learns to Control Game Characters Based on Their Surroundings
More to explore
More in Software & Internet
US 4405829 · 1983 · Massachusetts Institute of Technology
How RSA Public-Key Encryption Keeps Digital Messages Secret
US 6285999 · 2001 · Leland Stanford Junior University
How Websites Get Ranked by Importance
US 5960411 · 1999 · Amazon com Inc
How Amazon's One-Click Ordering Works for Online Purchases
US 7669123 · 2010 · Facebook Inc
Displaying Friends' Activities in a Social Network Feed
New to patents?
Common Questions
Frequently Asked Questions
What does Training AI on Private Data Without Seeing It cover?
This patent describes a way to train artificial intelligence models using private data stored on many separate computers, by generating fake data that mimics the real data's patterns, so the private data itself never leaves its original location.
Who owns patent US 12518214?
Nant Holdings IP owns this patent, granted in 2026.
When does this patent expire?
This patent is expected to expire on April 21, 2043, when the invention enters the public domain.
What problem does this patent solve?
This patent addresses a critical challenge in modern AI development: accessing and utilizing sensitive data, such as patient health information, for training without violating privacy regulations like HIPAA. It enables collaborative AI training across organizations that cannot share raw data, potentially accelerating research in fields like healthcare and finance.
What does this patent NOT cover?
Systems where private data is de-identified or exposed to unauthorized systems.
Patent monitoring






