Skip to content
PatentBrief
Get alertsTop ↑

How AI Systems Learn to Predict and Act Simultaneously

A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system.

Granted 2021ActiveExpires 2035Owned by Microsoft Technology Licensing LLCInvented by Ji He, Prabhdeep Singh, Lihong Li + 5 more

Original patent title: “Multi-model controller

Plain-English explanation by SahiLast reviewed · June 15, 2026

A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system. Granted to Microsoft Technology Licensing LLC in 2021 with 15 claims and 4 forward citations.

Key facts

Patent numberUS 11170293
StatusActive
FieldAI & Machine Learning
AssigneeMicrosoft Technology Licensing LLC
InventorsJi He, Prabhdeep Singh, Lihong Li and 5 others
Filed2015
Granted2021
Claims15
Times cited4
LitigationNone on record
Value · $48K$154KMinimal

Coverage

What does this patent actually cover?

This patent describes a system that uses two types of neural networks working in tandem to improve how an AI makes decisions. First, a Recurrent Neural Network (RNN) processes data to create a 'state' (a summary of what is happening) and predicts a result. Second, a Q-Network (QN) looks at that state to suggest the best action to take. The system then uses a 'reference result' (the actual outcome) to update the RNN using supervised learning, while simultaneously updating the Q-Network using reinforcement learning based on the action taken and the new state. This allows the system to get better at both understanding its environment and choosing the right actions over time.

The gap

What does this patent NOT cover?

  • Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.
  • Does not cover static models that do not update their parameters (the 'second' model versions) based on new observation values.
  • Does not cover non-recurrent neural networks that lack the ability to maintain state information over time.
  • Does not cover systems that do not use a communications interface to receive external reference result values.

These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.

What made this novel

The innovation lies in using the same reference result to update both the predictive model (RNN) and the decision-making model (QN) in a coordinated loop, ensuring the 'state' information used for decision-making is constantly being refined by the predictive model's accuracy.

Multi-model controller(Primary claim)ai mlsoftwaremechanicaltelecommunications

Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.

Where you've seen this

Real-world examples

01

Autonomous robotics navigation

02

Real-time industrial process control

03

Automated trading systems

04

Adaptive game AI agents

Why it matters

The bigger picture

This patent is significant because it addresses the 'credit assignment' problem in AI, where a system needs to learn from both immediate feedback and long-term goals. By linking prediction (RNN) and action (QN) training, it helps bridge the gap between simple pattern recognition and complex decision-making. It is a foundational piece of engineering for autonomous agents that must navigate dynamic environments where they receive delayed or sparse rewards.

Filed

December 30, 2015

Granted

November 9, 2021

Market context

Who's building on this

Companies in this space

Microsoft continues to integrate these types of coordinated learning architectures into their Azure AI and robotics research divisions. Other major cloud and AI labs, such as Google DeepMind and OpenAI, utilize similar architectures for deep reinforcement learning agents that require both world-modeling and policy optimization.

Market impact

This patent formalizes a specific architecture for hybrid learning, which has become a standard approach in modern reinforcement learning. It provides a legal framework for how companies structure the 'training loop' in autonomous systems, influencing how developers design feedback mechanisms for AI agents in industrial and digital environments.

Claim 1 — Plain English

What this patent covers

This patent describes a system that uses two types of neural networks working in tandem to improve how an AI makes decisions. First, a Recurrent Neural Network (RNN) processes data to create a 'state' (a summary of what is happening) and predicts a result. Second, a Q-Network (QN) looks at that state to suggest the best action to take. The system then uses a 'reference result' (the actual outcome) to update the RNN using supervised learning, while simultaneously updating the Q-Network using reinforcement learning based on the action taken and the new state. This allows the system to get better at both understanding its environment and choosing the right actions over time.

The clever bit

The innovation lies in using the same reference result to update both the predictive model (RNN) and the decision-making model (QN) in a coordinated loop, ensuring the 'state' information used for decision-making is constantly being refined by the predictive model's accuracy.

What it does not cover

  • Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.
  • Does not cover static models that do not update their parameters (the 'second' model versions) based on new observation values.
  • Does not cover non-recurrent neural networks that lack the ability to maintain state information over time.
  • Does not cover systems that do not use a communications interface to receive external reference result values.

Patent timeline

Filing

Application submitted to the patent office

Publication

Application published, typically 18 months after filing

Grant

Patent officially issued

PatentBrief Score

Impact Score

Strong

Citation count

14/40

Early citations

Claim breadth

10/20

Broad claimsclaimsThe numbered statements at the end of a patent that legally define what the inventor owns.Read more →

Recency

20/20

Granted within 5 years

Assignee scale

20/20

Major company or institution

PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.

Heuristic Value Estimate

What this patent might be worth

Minimal

$48K$154K

Midpoint $96K · 9.5 yr remaining · industry ×1.6

Adjust inputs →

Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.

The original legal language

Original claims

15 claims as filed with the patent office.

Concepts involved

ClaimPrior artNon-obviousnessNoveltySpecificationAssigneePatent term

Citations

Patent lineage

Cites earlier patents

16

earlier patents this invention cites as foundations

View prior art →

Cited by later patents

4

later patents that build on this invention

View patents →

Cite this patent

He, J., Singh, P., Li, L., He, X., Gao, J., Chen, J., Deng, L., & Li, X. (2021). How AI Systems Learn to Predict and Act Simultaneously (U.S. Patent No. 11,170,293). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/11170293/alphago-policy-and-value-networks

Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.

Embed

Add this patent to your site

Drop this plain-English patent card into any blog post or article — free, no signup. It always links back to the full breakdown here.

<div data-patentlens-widget data-patent-number="US11170293"></div>
<script src="https://patentbrief.org/embed.js" async></script>

Stay in the loop

Get a weekly digest of new patents.

One email per week. No spam. Unsubscribe anytime.

Keep exploring

Related patents you should know

US 4683195 · 1987

How to Make Billions of Copies of a DNA Segment

This patent describes the Polymerase Chain Reaction (PCR), a method to rapidly create many copies of a specific piece of DNA or RNA, enabling its detection and analysis.

Cetus Corp

US 8697359 · 2014

How to Edit Genes in Human Cells Using an Engineered CRISPR System

This patent describes an engineered CRISPR-Cas9 system for precisely cutting DNA in eukaryotic cells to change how genes work, opening the door for gene editing in complex organisms.

Massachusetts Institute of Technology

US 7657849 · 2010

How the iPhone's Slide-to-Unlock Gesture Works

Apple's 2010 patent describes unlocking a device by dragging a specific graphical image across the touchscreen along a predefined path, a gesture that became iconic with the original iPhone.

Apple Inc

US 4733665 · 1988

How Doctors Implant a Permanent Stent Using a Balloon

This patent describes the method for placing a permanent, expandable wire mesh tube inside a blood vessel or other body tube using a balloon-tipped catheter to widen it and keep it open.

Expandable Grafts Partnership

US 4965188 · 1990

How to Make Many Copies of a DNA Piece with Heat

This patent describes the Polymerase Chain Reaction (PCR) method, a technique to make millions of copies of a specific DNA segment using a heat-resistant enzyme and repeated temperature changes.

Cetus Corp

US 4235871 · 1980

How to Encapsulate Active Materials in Lipid Bubbles Efficiently

This patent describes a method for trapping biologically active substances inside tiny, multi-layered fat bubbles called liposomes, using a specific water-in-oil emulsion and gel-forming process to improve how much material gets captured.

Individual

More to explore

More in AI & Machine Learning

Browse all AI & Machine Learning

New to patents?

What is a patent?How to read a patentAnatomy of a claimHow strong is this patent?What the citations meanWhat it doesn't coverPatent glossary

Common Questions

Frequently Asked Questions

What does How AI Systems Learn to Predict and Act Simultaneously cover?

A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system.

Who owns patent US 11170293?

Microsoft Technology Licensing LLC owns this patent, granted in 2021.

When does this patent expire?

This patent is expected to expire on November 9, 2041, when the invention enters the public domain.

What is patent US 11170293 cited by?

This patent has been cited by 4 later patents that build on its ideas.

What problem does this patent solve?

This patent is significant because it addresses the 'credit assignment' problem in AI, where a system needs to learn from both immediate feedback and long-term goals. By linking prediction (RNN) and action (QN) training, it helps bridge the gap between simple pattern recognition and complex decision-making. It is a foundational piece of engineering for autonomous agents that must navigate dynamic environments where they receive delayed or sparse rewards.

What does this patent NOT cover?

Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.

Same assignee

More from Microsoft Technology Licensing LLC

View all →
US 12217035·2025

How to Safely Shut Down Microservices Without Breaking Apps

US 11062228·2021

How AI Learns New Tasks Using Old Data Labels

US 10543427·2020

How Game Controllers Change Button Functions Using Plug-in Accessories

US 10402375·2019

How Operating Systems Display Cloud File Status Icons

Patent monitoring

Get notified when Microsoft Technology Licensing LLC files a new patent

Get notified when this company files a new patent. Weekly digest · Confirm via email · Unsubscribe anytime.

Last reviewed: June 15, 2026 · PatentBrief is not a law firm and this is not legal advice.