Skip to content
PatentBrief
Get alertsTop ↑

How to Train AI Models with Fake Data Using Generative Networks

This patent describes a method for training artificial intelligence models using specially generated fake, or 'synthetic,' data created by a generative adversarial network, ensuring the synthetic data is high-quality and safe for training.

ActiveExpires 2043Owned by Capital One ServicesInvented by Austin Walters, Kate Key, Mark Watson + 6 more

Original patent title: “Data model generation using generative adversarial networks

Plain-English explanation by SahiLast reviewed · June 25, 2026

This patent describes a method for training artificial intelligence models using specially generated fake, or 'synthetic,' data created by a generative adversarial network, ensuring the synthetic data is high-quality and safe for training. Owned by Capital One Services with 25 claims, and it is expected to expire in 2043.

Coverage

What does this patent actually cover?

The patent outlines a system for generating data models, like those used in AI, by first creating synthetic data. A "model optimizer" receives a request to build a data model (ClaimclaimA numbered sentence at the end of a patent that legally defines what the inventor owns. The most important section.Read more → 21). It then sets up computing resources and uses a "generative network" to create a synthetic dataset. This generative network includes a "decoder network" that takes simplified "decoder input data" from a "code space" and transforms it into more complex "decoder output data" in a "sample space" (Claim 21). The generative network is trained to ensure the synthetic data's structure, or "schema," matches that of real "reference data" (Claim 22). Before training, the model optimizer can check the synthetic data's quality by calculating scores for statistical correlation, data similarity, or overall data quality compared to the real data (Claim 23). If the synthetic data meets certain quality standards (Claim 24), the computing resources then use it to train the actual data model. Finally, this trained data model can be used to process real "production data" (Claim 21). For example, a bank could use this to generate fake customer transaction data that looks real but contains no actual customer information, then train a fraud detection AI on this fake data.

The gap

What does this patent NOT cover?

  • Does not cover generating synthetic data without using a generative network that specifically includes a decoder network transforming data from a code space to a sample space. (ClaimclaimA numbered sentence at the end of a patent that legally defines what the inventor owns. The most important section.Read more → 21)
  • Does not cover training data models directly with real-world, non-synthetic datasets. (ClaimclaimA numbered sentence at the end of a patent that legally defines what the inventor owns. The most important section.Read more → 21)
  • Does not cover synthetic data generation where the output data's schema does not match a reference dataset's schema. (ClaimclaimA numbered sentence at the end of a patent that legally defines what the inventor owns. The most important section.Read more → 22)
  • Does not cover methods that do not evaluate the synthetic dataset using at least one of a statistical correlation score, a data similarity score, or a data quality score. (ClaimclaimA numbered sentence at the end of a patent that legally defines what the inventor owns. The most important section.Read more → 23)
  • Does not cover scenarios where the 'code space' for the decoder input data has a dimensionality equal to or greater than the 'sample space' of the decoder output data. (ClaimclaimA numbered sentence at the end of a patent that legally defines what the inventor owns. The most important section.Read more → 25)
  • Does not cover systems that do not employ a 'model optimizer' to manage the request, resource provisioning, and evaluation steps. (ClaimclaimA numbered sentence at the end of a patent that legally defines what the inventor owns. The most important section.Read more → 21)

These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.

Key facts

Patent numberUS 20230297446
StatusActive
FieldAI & Machine Learning
AssigneeCapital One Services
InventorsAustin Walters, Kate Key, Mark Watson and 6 others
Filed2023
Expires2043
Claims25
Times cited0
LitigationNone on record
Value · $31K$100KMinimal

What made this novel

The noveltynoveltyThe requirement that an invention be different from anything publicly known before its priority date.Read more → lies in the structured process of generating synthetic data using a specific generative network architecture (decoder/encoder, code/sample space) and then rigorously evaluating its quality against real data's statistical properties and schema before using it for model training. This ensures the synthetic data is both realistic enough for effective training and sufficiently distinct for privacy.

The Patent Drawing

Representative patent drawing for Data model generation using generative adversarial networks (US 20230297446)
Representative figure · US 20230297446All figures on Google Patents →
Data model generation using ge…(Primary claim)ai mlsoftwarefinancetelecommunicationsconsumer electronics

Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.

Where you've seen this

Real-world examples

01

AI model training platforms for financial services

02

Healthcare data anonymization tools

03

Synthetic data generators for fraud detection systems

04

Machine learning development environments requiring privacy-preserving data

05

Cloud-based AI/ML training services

Why it matters

The bigger picture

This patent addresses a critical need in AI development: training powerful models without compromising sensitive real-world data. By generating high-quality synthetic data, organizations like Capital One can develop and test AI solutions more rapidly and securely. This approach helps comply with privacy regulations and reduces the risks associated with handling confidential information, enabling innovation in data-sensitive industries.

Filed

May 22, 2023

Market context

Who's building on this

Companies in this space

Capital One Services, the assigneeassigneeThe entity that owns the patent — usually the inventor's employer or a company.Read more →, is actively developing and applying AI and machine learning solutions, particularly in financial services. Other major financial institutions like JPMorgan Chase and Wells Fargo, along with cloud providers such as Amazon Web Services, Google Cloud, and Microsoft Azure, are also investing heavily in synthetic data generation capabilities to train their AI models while adhering to strict data privacy regulations.

Market impact

This technology enables the development of AI models in highly regulated industries by providing a pathway to train on data that mimics real-world characteristics without exposing sensitive information. It helps reduce the time and cost associated with data acquisition and anonymization, fostering innovation in areas like fraud detection, risk assessment, and personalized financial services. This approach could become a standard practice for AI model development where data privacy is paramount.

Claim 1 — Plain English

What this patent covers

The patent outlines a system for generating data models, like those used in AI, by first creating synthetic data. A "model optimizer" receives a request to build a data model (Claim 21). It then sets up computing resources and uses a "generative network" to create a synthetic dataset. This generative network includes a "decoder network" that takes simplified "decoder input data" from a "code space" and transforms it into more complex "decoder output data" in a "sample space" (Claim 21). The generative network is trained to ensure the synthetic data's structure, or "schema," matches that of real "reference data" (Claim 22). Before training, the model optimizer can check the synthetic data's quality by calculating scores for statistical correlation, data similarity, or overall data quality compared to the real data (Claim 23). If the synthetic data meets certain quality standards (Claim 24), the computing resources then use it to train the actual data model. Finally, this trained data model can be used to process real "production data" (Claim 21). For example, a bank could use this to generate fake customer transaction data that looks real but contains no actual customer information, then train a fraud detection AI on this fake data.

The clever bit

The novelty lies in the structured process of generating synthetic data using a specific generative network architecture (decoder/encoder, code/sample space) and then rigorously evaluating its quality against real data's statistical properties and schema before using it for model training. This ensures the synthetic data is both realistic enough for effective training and sufficiently distinct for privacy.

What it does not cover

  • Does not cover generating synthetic data without using a generative network that specifically includes a decoder network transforming data from a code space to a sample space. (Claim 21)
  • Does not cover training data models directly with real-world, non-synthetic datasets. (Claim 21)
  • Does not cover synthetic data generation where the output data's schema does not match a reference dataset's schema. (Claim 22)
  • Does not cover methods that do not evaluate the synthetic dataset using at least one of a statistical correlation score, a data similarity score, or a data quality score. (Claim 23)
  • Does not cover scenarios where the 'code space' for the decoder input data has a dimensionality equal to or greater than the 'sample space' of the decoder output data. (Claim 25)
  • Does not cover systems that do not employ a 'model optimizer' to manage the request, resource provisioning, and evaluation steps. (Claim 21)

Patent timeline

Filing

Application submitted to the patent office

Expiration

Patent enters public domain

PatentBrief Score

Impact Score

Limited data

Citation count

0/40

No citations yet

Claim breadth

17/20

Very broad protection

Recency

0/20

Older than 20 years

Assignee scale

0/20

Independent or smaller assigneeassigneeThe entity that owns the patent — usually the inventor's employer or a company.Read more →

PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.

Heuristic Value Estimate

What this patent might be worth

Minimal

$31K$100K

Midpoint $62K · 16.9 yr remaining · industry ×1.6

Adjust inputs →

Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.

The original legal language

Original claims

25 claims as filed with the patent office.

Concepts involved

ClaimPrior artNon-obviousnessNoveltySpecificationAssigneePatent term

Cite this patent

Walters, A., Key, K., Watson, M., Goodsitt, J., Pham, V., Truong, A., Taylor, K., Farivar, R., & Abad, F. A. T. How to Train AI Models with Fake Data Using Generative Networks (U.S. Patent No. 20,230,297,446). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/20230297446/data-model-generation-using-generative-adversarial-networks

Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.

Embed

Add this patent to your site

Drop this plain-English patent card into any blog post or article — free, no signup. It always links back to the full breakdown here.

<div data-patentlens-widget data-patent-number="US20230297446"></div>
<script src="https://patentbrief.org/embed.js" async></script>

Stay in the loop

Get a weekly digest of new patents.

One email per week. No spam. Unsubscribe anytime.

Keep exploring

Related patents you should know

US 4683195 · 1987

How to Make Billions of Copies of a DNA Segment

This patent describes the Polymerase Chain Reaction (PCR), a method to rapidly create many copies of a specific piece of DNA or RNA, enabling its detection and analysis.

Cetus Corp

US 8697359 · 2014

How to Edit Genes in Human Cells Using an Engineered CRISPR System

This patent describes an engineered CRISPR-Cas9 system for precisely cutting DNA in eukaryotic cells to change how genes work, opening the door for gene editing in complex organisms.

Massachusetts Institute of Technology

US 7657849 · 2010

How the iPhone's Slide-to-Unlock Gesture Works

Apple's 2010 patent describes unlocking a device by dragging a specific graphical image across the touchscreen along a predefined path, a gesture that became iconic with the original iPhone.

Apple Inc

US 4733665 · 1988

How Doctors Implant a Permanent Stent Using a Balloon

This patent describes the method for placing a permanent, expandable wire mesh tube inside a blood vessel or other body tube using a balloon-tipped catheter to widen it and keep it open.

Expandable Grafts Partnership

US 4965188 · 1990

How to Make Many Copies of a DNA Piece with Heat

This patent describes the Polymerase Chain Reaction (PCR) method, a technique to make millions of copies of a specific DNA segment using a heat-resistant enzyme and repeated temperature changes.

Cetus Corp

US 4235871 · 1980

How to Encapsulate Active Materials in Lipid Bubbles Efficiently

This patent describes a method for trapping biologically active substances inside tiny, multi-layered fat bubbles called liposomes, using a specific water-in-oil emulsion and gel-forming process to improve how much material gets captured.

Individual

More to explore

More in AI & Machine Learning

Browse all AI & Machine Learning

New to patents?

What is a patent?How to read a patentAnatomy of a claimHow strong is this patent?What the citations meanWhat it doesn't coverPatent glossary

Common Questions

Frequently Asked Questions

What does How to Train AI Models with Fake Data Using Generative Networks cover?

This patent describes a method for training artificial intelligence models using specially generated fake, or 'synthetic,' data created by a generative adversarial network, ensuring the synthetic data is high-quality and safe for training.

Who owns patent US 20230297446?

This patent is owned by Capital One Services.

When does this patent expire?

This patent is expected to expire on May 22, 2043, when the invention enters the public domain.

What problem does this patent solve?

This patent addresses a critical need in AI development: training powerful models without compromising sensitive real-world data. By generating high-quality synthetic data, organizations like Capital One can develop and test AI solutions more rapidly and securely. This approach helps comply with privacy regulations and reduces the risks associated with handling confidential information, enabling innovation in data-sensitive industries.

What does this patent NOT cover?

Does not cover generating synthetic data without using a generative network that specifically includes a decoder network transforming data from a code space to a sample space. (Claim 21)

Same assignee

More from Capital One Services

View all →
US 10599957·2020

How to Automatically Detect and Fix Changes in AI Model Data

Patent monitoring

Get notified when Capital One Services files a new patent

Get notified when this company files a new patent. Weekly digest · Confirm via email · Unsubscribe anytime.

Last reviewed: June 25, 2026 · PatentBrief is not a law firm and this is not legal advice.