# How to Train AI Models with Fake Data Using Generative Networks

> This patent describes a method for training artificial intelligence models using specially generated fake, or 'synthetic,' data created by a generative adversarial network, ensuring the synthetic data is high-quality and safe for training.

- **Patent:** US 20230297446
- **Original title:** Data model generation using generative adversarial networks
- **Owner:** Capital One Services
- **Status:** Active
- **Times cited:** 0
- **Field:** ai_ml, software, finance, telecommunications, consumer_electronics

## What it does

The patent outlines a system for generating data models, like those used in AI, by first creating synthetic data. A "model optimizer" receives a request to build a data model (Claim 21). It then sets up computing resources and uses a "generative network" to create a synthetic dataset. This generative network includes a "decoder network" that takes simplified "decoder input data" from a "code space" and transforms it into more complex "decoder output data" in a "sample space" (Claim 21). The generative network is trained to ensure the synthetic data's structure, or "schema," matches that of real "reference data" (Claim 22). Before training, the model optimizer can check the synthetic data's quality by calculating scores for statistical correlation, data similarity, or overall data quality compared to the real data (Claim 23). If the synthetic data meets certain quality standards (Claim 24), the computing resources then use it to train the actual data model. Finally, this trained data model can be used to process real "production data" (Claim 21). For example, a bank could use this to generate fake customer transaction data that looks real but contains no actual customer information, then train a fraud detection AI on this fake data.

## What it does NOT cover

- Does not cover generating synthetic data without using a generative network that specifically includes a decoder network transforming data from a code space to a sample space. (Claim 21)
- Does not cover training data models directly with real-world, non-synthetic datasets. (Claim 21)
- Does not cover synthetic data generation where the output data's schema does not match a reference dataset's schema. (Claim 22)
- Does not cover methods that do not evaluate the synthetic dataset using at least one of a statistical correlation score, a data similarity score, or a data quality score. (Claim 23)
- Does not cover scenarios where the 'code space' for the decoder input data has a dimensionality equal to or greater than the 'sample space' of the decoder output data. (Claim 25)
- Does not cover systems that do not employ a 'model optimizer' to manage the request, resource provisioning, and evaluation steps. (Claim 21)

## The clever bit

The novelty lies in the structured process of generating synthetic data using a specific generative network architecture (decoder/encoder, code/sample space) and then rigorously evaluating its quality against real data's statistical properties and schema before using it for model training. This ensures the synthetic data is both realistic enough for effective training and sufficiently distinct for privacy.

## Real-world examples

1. AI model training platforms for financial services
2. Healthcare data anonymization tools
3. Synthetic data generators for fraud detection systems
4. Machine learning development environments requiring privacy-preserving data
5. Cloud-based AI/ML training services

## Why it matters

This patent addresses a critical need in AI development: training powerful models without compromising sensitive real-world data. By generating high-quality synthetic data, organizations like Capital One can develop and test AI solutions more rapidly and securely. This approach helps comply with privacy regulations and reduces the risks associated with handling confidential information, enabling innovation in data-sensitive industries.

## Frequently asked questions

### What does How to Train AI Models with Fake Data Using Generative Networks cover?

This patent describes a method for training artificial intelligence models using specially generated fake, or 'synthetic,' data created by a generative adversarial network, ensuring the synthetic data is high-quality and safe for training.

### Who owns patent US 20230297446?

This patent is owned by Capital One Services.

### When does this patent expire?

This patent is expected to expire on May 22, 2043, when the invention enters the public domain.

### What problem does this patent solve?

This patent addresses a critical need in AI development: training powerful models without compromising sensitive real-world data. By generating high-quality synthetic data, organizations like Capital One can develop and test AI solutions more rapidly and securely. This approach helps comply with privacy regulations and reduces the risks associated with handling confidential information, enabling innovation in data-sensitive industries.

### What does this patent NOT cover?

Does not cover generating synthetic data without using a generative network that specifically includes a decoder network transforming data from a code space to a sample space. (Claim 21)

**Full plain-English explainer:** https://patentbrief.org/patent/us/20230297446/data-model-generation-using-generative-adversarial-networks

**Original patent:** https://patents.google.com/patent/US20230297446

---

_Source: PatentBrief — https://patentbrief.org. Patent facts are from public records; the plain-English explanation is PatentBrief's._
