How AI Systems Learn to Predict and Act Simultaneously
A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system.
Original patent title: “Multi-model controller”
A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system. Granted to Microsoft Technology Licensing LLC in 2021 with 15 claims and 4 forward citations.
Key facts
Coverage
What does this patent actually cover?
This patent describes a system that uses two types of neural networks working in tandem to improve how an AI makes decisions. First, a Recurrent Neural Network (RNN) processes data to create a 'state' (a summary of what is happening) and predicts a result. Second, a Q-Network (QN) looks at that state to suggest the best action to take. The system then uses a 'reference result' (the actual outcome) to update the RNN using supervised learning, while simultaneously updating the Q-Network using reinforcement learning based on the action taken and the new state. This allows the system to get better at both understanding its environment and choosing the right actions over time.
The gap
What does this patent NOT cover?
- Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.
- Does not cover static models that do not update their parameters (the 'second' model versions) based on new observation values.
- Does not cover non-recurrent neural networks that lack the ability to maintain state information over time.
- Does not cover systems that do not use a communications interface to receive external reference result values.
These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.
What made this novel
The innovation lies in using the same reference result to update both the predictive model (RNN) and the decision-making model (QN) in a coordinated loop, ensuring the 'state' information used for decision-making is constantly being refined by the predictive model's accuracy.
Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.
Where you've seen this
Real-world examples
Autonomous robotics navigation
Real-time industrial process control
Automated trading systems
Adaptive game AI agents
Why it matters
The bigger picture
This patent is significant because it addresses the 'credit assignment' problem in AI, where a system needs to learn from both immediate feedback and long-term goals. By linking prediction (RNN) and action (QN) training, it helps bridge the gap between simple pattern recognition and complex decision-making. It is a foundational piece of engineering for autonomous agents that must navigate dynamic environments where they receive delayed or sparse rewards.
Filed
December 30, 2015
Granted
November 9, 2021
Market context
Who's building on this
Companies in this space
Microsoft continues to integrate these types of coordinated learning architectures into their Azure AI and robotics research divisions. Other major cloud and AI labs, such as Google DeepMind and OpenAI, utilize similar architectures for deep reinforcement learning agents that require both world-modeling and policy optimization.
Market impact
This patent formalizes a specific architecture for hybrid learning, which has become a standard approach in modern reinforcement learning. It provides a legal framework for how companies structure the 'training loop' in autonomous systems, influencing how developers design feedback mechanisms for AI agents in industrial and digital environments.
Claim 1 — Plain English
What this patent covers
This patent describes a system that uses two types of neural networks working in tandem to improve how an AI makes decisions. First, a Recurrent Neural Network (RNN) processes data to create a 'state' (a summary of what is happening) and predicts a result. Second, a Q-Network (QN) looks at that state to suggest the best action to take. The system then uses a 'reference result' (the actual outcome) to update the RNN using supervised learning, while simultaneously updating the Q-Network using reinforcement learning based on the action taken and the new state. This allows the system to get better at both understanding its environment and choosing the right actions over time.
The clever bit
The innovation lies in using the same reference result to update both the predictive model (RNN) and the decision-making model (QN) in a coordinated loop, ensuring the 'state' information used for decision-making is constantly being refined by the predictive model's accuracy.
What it does not cover
- Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.
- Does not cover static models that do not update their parameters (the 'second' model versions) based on new observation values.
- Does not cover non-recurrent neural networks that lack the ability to maintain state information over time.
- Does not cover systems that do not use a communications interface to receive external reference result values.
Patent timeline
Application submitted to the patent office
Application published, typically 18 months after filing
Patent officially issued
PatentBrief Score
Impact Score
Strong
Citation count
14/40
Early citations
Claim breadth
10/20
Broad claimsclaimsThe numbered statements at the end of a patent that legally define what the inventor owns.Read more →
Recency
20/20
Granted within 5 years
Assignee scale
20/20
Major company or institution
PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.
Heuristic Value Estimate
What this patent might be worth
$48K – $154K
Midpoint $96K · 9.5 yr remaining · industry ×1.6
Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.
The original legal language
Original claims
15 claims as filed with the patent office.
Concepts involved
Citations
Patent lineage
Cite this patent
He, J., Singh, P., Li, L., He, X., Gao, J., Chen, J., Deng, L., & Li, X. (2021). How AI Systems Learn to Predict and Act Simultaneously (U.S. Patent No. 11,170,293). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/11170293/alphago-policy-and-value-networks
Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.
Embed
Add this patent to your site
Drop this plain-English patent card into any blog post or article — free, no signup. It always links back to the full breakdown here.
<div data-patentlens-widget data-patent-number="US11170293"></div> <script src="https://patentbrief.org/embed.js" async></script>
Stay in the loop
Get a weekly digest of new patents.
One email per week. No spam. Unsubscribe anytime.
Keep exploring
Related patents you should know
US 4683195 · 1987
How to Make Billions of Copies of a DNA Segment
This patent describes the Polymerase Chain Reaction (PCR), a method to rapidly create many copies of a specific piece of DNA or RNA, enabling its detection and analysis.
Cetus Corp
US 8697359 · 2014
How to Edit Genes in Human Cells Using an Engineered CRISPR System
This patent describes an engineered CRISPR-Cas9 system for precisely cutting DNA in eukaryotic cells to change how genes work, opening the door for gene editing in complex organisms.
Massachusetts Institute of Technology
US 7657849 · 2010
How the iPhone's Slide-to-Unlock Gesture Works
Apple's 2010 patent describes unlocking a device by dragging a specific graphical image across the touchscreen along a predefined path, a gesture that became iconic with the original iPhone.
Apple Inc
US 4733665 · 1988
How Doctors Implant a Permanent Stent Using a Balloon
This patent describes the method for placing a permanent, expandable wire mesh tube inside a blood vessel or other body tube using a balloon-tipped catheter to widen it and keep it open.
Expandable Grafts Partnership
US 4965188 · 1990
How to Make Many Copies of a DNA Piece with Heat
This patent describes the Polymerase Chain Reaction (PCR) method, a technique to make millions of copies of a specific DNA segment using a heat-resistant enzyme and repeated temperature changes.
Cetus Corp
US 4235871 · 1980
How to Encapsulate Active Materials in Lipid Bubbles Efficiently
This patent describes a method for trapping biologically active substances inside tiny, multi-layered fat bubbles called liposomes, using a specific water-in-oil emulsion and gel-forming process to improve how much material gets captured.
Individual
More to explore
More in AI & Machine Learning
US 10452978 · 2019 · Google LLC
How AI Models Understand Language Using 'Attention'
US 6523026 · 2003 · Huntsman International LLC
How Computers Find Hidden Connections Between Different Fields of Knowledge
US 11615208 · 2023 · Capital One Services LLC
How Cloud Systems Automatically Create and Train AI Data Models
US 10402750 · 2019 · Facebook Inc
How Facebook Uses Deep Learning to Predict What You Might Like
New to patents?
Common Questions
Frequently Asked Questions
What does How AI Systems Learn to Predict and Act Simultaneously cover?
A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system.
Who owns patent US 11170293?
Microsoft Technology Licensing LLC owns this patent, granted in 2021.
When does this patent expire?
This patent is expected to expire on November 9, 2041, when the invention enters the public domain.
What is patent US 11170293 cited by?
This patent has been cited by 4 later patents that build on its ideas.
What problem does this patent solve?
This patent is significant because it addresses the 'credit assignment' problem in AI, where a system needs to learn from both immediate feedback and long-term goals. By linking prediction (RNN) and action (QN) training, it helps bridge the gap between simple pattern recognition and complex decision-making. It is a foundational piece of engineering for autonomous agents that must navigate dynamic environments where they receive delayed or sparse rewards.
What does this patent NOT cover?
Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.
Same assignee
More from Microsoft Technology Licensing LLC
Patent monitoring
