PatentBrief · Patent BriefUS 20230297446

How to Train AI Models with Fake Data Using Generative Networks

This patent describes a method for training artificial intelligence models using specially generated fake, or 'synthetic,' data created by a generative adversarial network, ensuring the synthetic data is high-quality and safe for training.

Patent Number

US 20230297446

Status

Active

Filing Date

May 22, 2023

Grant Date

—

Expiration

May 22, 2043

Claims

Assignee

Capital One Services

Inventors

Austin Walters, Kate Key, Mark Watson, Jeremy Goodsitt, Vincent Pham, Anh Truong, Kenneth Taylor, Reza Farivar, Fardin Abdi Taghi Abad

Citations

0 forward · 0 backward

What it covers

The patent outlines a system for generating data models, like those used in AI, by first creating synthetic data. A "model optimizer" receives a request to build a data model (Claim 21). It then sets up computing resources and uses a "generative network" to create a synthetic dataset. This generative network includes a "decoder network" that takes simplified "decoder input data" from a "code space" and transforms it into more complex "decoder output data" in a "sample space" (Claim 21). The generative network is trained to ensure the synthetic data's structure, or "schema," matches that of real "reference data" (Claim 22). Before training, the model optimizer can check the synthetic data's quality by calculating scores for statistical correlation, data similarity, or overall data quality compared to the real data (Claim 23). If the synthetic data meets certain quality standards (Claim 24), the computing resources then use it to train the actual data model. Finally, this trained data model can be used to process real "production data" (Claim 21). For example, a bank could use this to generate fake customer transaction data that looks real but contains no actual customer information, then train a fraud detection AI on this fake data.

What it doesn't cover

—Does not cover generating synthetic data without using a generative network that specifically includes a decoder network transforming data from a code space to a sample space. (Claim 21)
—Does not cover training data models directly with real-world, non-synthetic datasets. (Claim 21)
—Does not cover synthetic data generation where the output data's schema does not match a reference dataset's schema. (Claim 22)
—Does not cover methods that do not evaluate the synthetic dataset using at least one of a statistical correlation score, a data similarity score, or a data quality score. (Claim 23)
—Does not cover scenarios where the 'code space' for the decoder input data has a dimensionality equal to or greater than the 'sample space' of the decoder output data. (Claim 25)
—Does not cover systems that do not employ a 'model optimizer' to manage the request, resource provisioning, and evaluation steps. (Claim 21)

The clever bit

The novelty lies in the structured process of generating synthetic data using a specific generative network architecture (decoder/encoder, code/sample space) and then rigorously evaluating its quality against real data's statistical properties and schema before using it for model training. This ensures the synthetic data is both realistic enough for effective training and sufficiently distinct for privacy.

Why it matters

This patent addresses a critical need in AI development: training powerful models without compromising sensitive real-world data. By generating high-quality synthetic data, organizations like Capital One can develop and test AI solutions more rapidly and securely. This approach helps comply with privacy regulations and reduces the risks associated with handling confidential information, enabling innovation in data-sensitive industries.

Real-world examples

1.AI model training platforms for financial services
2.Healthcare data anonymization tools
3.Synthetic data generators for fraud detection systems
4.Machine learning development environments requiring privacy-preserving data
5.Cloud-based AI/ML training services

Generated by PatentBrief · Not legal advice · patentbrief.org

US 20230297446 · 2026