PatentBrief — patentbrief.org

11170293 USView original on Google PatentsCompare ↗Brief ↗

How AI Systems Learn to Predict and Act Simultaneously

Name: PatentBrief
Address: Phoenix, AZ, US
Price range: Free

A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system.

Granted 2021ActiveExpires 2035Owned by Microsoft Technology Licensing LLCInvented by Ji He, Prabhdeep Singh, Lihong Li + 5 more

Original patent title: “Multi-model controller”

Plain-English explanation by SahiLast reviewed · June 15, 2026

A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system. Granted to Microsoft Technology Licensing LLC in 2021 with 15 claims and 4 forward citations.

Coverage

What does this patent actually cover?

This patent describes a system that uses two types of neural networks working in tandem to improve how an AI makes decisions. First, a Recurrent Neural Network (RNN) processes data to create a 'state' (a summary of what is happening) and predicts a result. Second, a Q-Network (QN) looks at that state to suggest the best action to take. The system then uses a 'reference result' (the actual outcome) to update the RNN using supervised learning, while simultaneously updating the Q-Network using reinforcement learning based on the action taken and the new state. This allows the system to get better at both understanding its environment and choosing the right actions over time.

The gap

What does this patent NOT cover?

Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.
Does not cover static models that do not update their parameters (the 'second' model versions) based on new observation values.
Does not cover non-recurrent neural networks that lack the ability to maintain state information over time.
Does not cover systems that do not use a communications interface to receive external reference result values.

These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.

Key facts

Patent number	US 11170293
Status	Active
Field	AI & Machine Learning
Assignee	Microsoft Technology Licensing LLC
Inventors	Ji He, Prabhdeep Singh, Lihong Li and 5 others
Filed	2015
Granted	2021
Claims	15
Times cited	4
Litigation	None on record

Value · $48K–$154KMinimal

What made this novel

The innovation lies in using the same reference result to update both the predictive model (RNN) and the decision-making model (QN) in a coordinated loop, ensuring the 'state' information used for decision-making is constantly being refined by the predictive model's accuracy.

Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.

Where you've seen this

Real-world examples

Autonomous robotics navigation

Real-time industrial process control

Automated trading systems

Adaptive game AI agents

Why it matters

The bigger picture

This patent is significant because it addresses the 'credit assignment' problem in AI, where a system needs to learn from both immediate feedback and long-term goals. By linking prediction (RNN) and action (QN) training, it helps bridge the gap between simple pattern recognition and complex decision-making. It is a foundational piece of engineering for autonomous agents that must navigate dynamic environments where they receive delayed or sparse rewards.

Filed

December 30, 2015

Granted

November 9, 2021

Market context

Who's building on this

Companies in this space

Microsoft continues to integrate these types of coordinated learning architectures into their Azure AI and robotics research divisions. Other major cloud and AI labs, such as Google DeepMind and OpenAI, utilize similar architectures for deep reinforcement learning agents that require both world-modeling and policy optimization.

Market impact

This patent formalizes a specific architecture for hybrid learning, which has become a standard approach in modern reinforcement learning. It provides a legal framework for how companies structure the 'training loop' in autonomous systems, influencing how developers design feedback mechanisms for AI agents in industrial and digital environments.

Claim 1 — Plain English

What this patent covers

The clever bit

What it does not cover

Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.
Does not cover static models that do not update their parameters (the 'second' model versions) based on new observation values.
Does not cover non-recurrent neural networks that lack the ability to maintain state information over time.
Does not cover systems that do not use a communications interface to receive external reference result values.

Patent timeline

FilingDec 30, 2015

Application submitted to the patent office

PublicationNov 9, 2021

Application published, typically 18 months after filing

GrantNov 9, 2021

Patent officially issued

PatentBrief Score

Impact Score

Strong

Citation count

14/40

Early citations

Claim breadth

10/20

Broad claims

Recency

20/20

Granted within 5 years

Assignee scale

20/20

Major company or institution

PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.

Heuristic Value Estimate

What this patent might be worth

Minimal

$48K – $154K

Midpoint $96K · 9.4 yr remaining · industry ×1.6

Adjust inputs →

Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.

Claim text not yet imported for this patent

The original legal language

Original claims

15 claims as filed with the patent office.

Concepts involved

Claim Prior art Non-obviousness Novelty Specification Assignee Patent term

Citations

Patent lineage

Cites earlier patents

earlier patents this invention cites as foundations

View prior art →

Cited by later patents

later patents that build on this invention

View patents →

Cite this patent

He, J., Singh, P., Li, L., He, X., Gao, J., Chen, J., Deng, L., & Li, X. (2021). How AI Systems Learn to Predict and Act Simultaneously (U.S. Patent No. 11,170,293). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/11170293/alphago-policy-and-value-networks

Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.

Embed

Add this patent to your site

Drop this plain-English patent card into any blog post or article — free, no signup. It always links back to the full breakdown here.

<div data-patentlens-widget data-patent-number="US11170293"></div>
<script src="https://patentbrief.org/embed.js" async></script>

Stay in the loop

Get a weekly digest of new patents.

One email per week. No spam. Unsubscribe anytime.

Keep exploring

Related patents you should know

US 4683195 · 1987

How to Make Billions of Copies of a DNA Segment

This patent describes the Polymerase Chain Reaction (PCR), a method to rapidly create many copies of a specific piece of DNA or RNA, enabling its detection and analysis.

Cetus Corp

US 8697359 · 2014

How to Edit Genes in Human Cells Using an Engineered CRISPR System

This patent describes an engineered CRISPR-Cas9 system for precisely cutting DNA in eukaryotic cells to change how genes work, opening the door for gene editing in complex organisms.

Massachusetts Institute of Technology

US 7657849 · 2010

How the iPhone's Slide-to-Unlock Gesture Works

Apple's 2010 patent describes unlocking a device by dragging a specific graphical image across the touchscreen along a predefined path, a gesture that became iconic with the original iPhone.

Apple Inc

US 4733665 · 1988

How Doctors Implant a Permanent Stent Using a Balloon

This patent describes the method for placing a permanent, expandable wire mesh tube inside a blood vessel or other body tube using a balloon-tipped catheter to widen it and keep it open.

Expandable Grafts Partnership

US 4965188 · 1990

How to Make Many Copies of a DNA Piece with Heat

This patent describes the Polymerase Chain Reaction (PCR) method, a technique to make millions of copies of a specific DNA segment using a heat-resistant enzyme and repeated temperature changes.

Cetus Corp

US 4235871 · 1980

How to Encapsulate Active Materials in Lipid Bubbles Efficiently

This patent describes a method for trapping biologically active substances inside tiny, multi-layered fat bubbles called liposomes, using a specific water-in-oil emulsion and gel-forming process to improve how much material gets captured.

Individual

Semantically similar

You might also find these interesting

SEARCH ALL

US 10282665 · 2019 · Sony Corp

How AI Agents Learn to Pick the Best Future Actions

US 11836577 · 2023 · Amazon Technologies

Training Robot AI Models Faster Using Smart Simulations

ai ml

US 11651227 · 2023 · SRI International Inc

How to Force AI to Follow Logical Rules During Training

US 10607134 · 2020

How AI Learns to Control Game Characters Based on Their Surroundings

More to explore

Frequently Asked Questions

What does How AI Systems Learn to Predict and Act Simultaneously cover?

A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system.

Who owns patent US 11170293?

Microsoft Technology Licensing LLC owns this patent, granted in 2021.

When does this patent expire?

This patent is expected to expire on November 9, 2041, when the invention enters the public domain.

What is patent US 11170293 cited by?

This patent has been cited by 4 later patents that build on its ideas.

What problem does this patent solve?

What does this patent NOT cover?

Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.

Same assignee

More from Microsoft Technology Licensing LLC

View all →

US 12217035·2025

How to Safely Shut Down Microservices Without Breaking Apps

US 11062228·2021

How AI Learns New Tasks Using Old Data Labels

US 10543427·2020

How Game Controllers Change Button Functions Using Plug-in Accessories

US 10402375·2019

How Operating Systems Display Cloud File Status Icons

Patent monitoring

Get notified when Microsoft Technology Licensing LLC files a new patent

Last reviewed: June 15, 2026 · PatentBrief is not a law firm and this is not legal advice.

How AI Systems Learn to Predict and Act Simultaneously

What does this patent actually cover?

What does this patent NOT cover?

Key facts

Real-world examples

The bigger picture

Who's building on this

What this patent covers

Patent timeline

Impact Score

What this patent might be worth

Original claims

Patent lineage

Cite this patent

Add this patent to your site

Get a weekly digest of new patents.

Related patents you should know

How to Make Billions of Copies of a DNA Segment

How to Edit Genes in Human Cells Using an Engineered CRISPR System

How the iPhone's Slide-to-Unlock Gesture Works

How Doctors Implant a Permanent Stent Using a Balloon

How to Make Many Copies of a DNA Piece with Heat

How to Encapsulate Active Materials in Lipid Bubbles Efficiently

You might also find these interesting

How AI Agents Learn to Pick the Best Future Actions

Training Robot AI Models Faster Using Smart Simulations

How to Force AI to Follow Logical Rules During Training

How AI Learns to Control Game Characters Based on Their Surroundings

More in AI & Machine Learning

How AI Models Understand Language Using 'Attention'

How Computers Find Hidden Connections Between Different Fields of Knowledge

How Cloud Systems Automatically Create and Train AI Data Models

How Facebook Uses Deep Learning to Predict What You Might Like

Frequently Asked Questions

More from Microsoft Technology Licensing LLC

Get notified when Microsoft Technology Licensing LLC files a new patent