# How AI Systems Learn to Predict and Act Simultaneously

> A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system.

- **Patent:** US 11170293
- **Original title:** Multi-model controller
- **Owner:** Microsoft Technology Licensing LLC
- **Granted:** 2021
- **Status:** Active
- **Times cited:** 4
- **Field:** ai_ml, software, mechanical, telecommunications

## What it does

This patent describes a system that uses two types of neural networks working in tandem to improve how an AI makes decisions. First, a Recurrent Neural Network (RNN) processes data to create a 'state' (a summary of what is happening) and predicts a result. Second, a Q-Network (QN) looks at that state to suggest the best action to take. The system then uses a 'reference result' (the actual outcome) to update the RNN using supervised learning, while simultaneously updating the Q-Network using reinforcement learning based on the action taken and the new state. This allows the system to get better at both understanding its environment and choosing the right actions over time.

## What it does NOT cover

- Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.
- Does not cover static models that do not update their parameters (the 'second' model versions) based on new observation values.
- Does not cover non-recurrent neural networks that lack the ability to maintain state information over time.
- Does not cover systems that do not use a communications interface to receive external reference result values.

## The clever bit

The innovation lies in using the same reference result to update both the predictive model (RNN) and the decision-making model (QN) in a coordinated loop, ensuring the 'state' information used for decision-making is constantly being refined by the predictive model's accuracy.

## Real-world examples

1. Autonomous robotics navigation
2. Real-time industrial process control
3. Automated trading systems
4. Adaptive game AI agents

## Why it matters

This patent is significant because it addresses the 'credit assignment' problem in AI, where a system needs to learn from both immediate feedback and long-term goals. By linking prediction (RNN) and action (QN) training, it helps bridge the gap between simple pattern recognition and complex decision-making. It is a foundational piece of engineering for autonomous agents that must navigate dynamic environments where they receive delayed or sparse rewards.

## Frequently asked questions

### What does How AI Systems Learn to Predict and Act Simultaneously cover?

A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system.

### Who owns patent US 11170293?

Microsoft Technology Licensing LLC owns this patent, granted in 2021.

### When does this patent expire?

This patent is expected to expire on November 9, 2041, when the invention enters the public domain.

### What is patent US 11170293 cited by?

This patent has been cited by 4 later patents that build on its ideas.

### What problem does this patent solve?

This patent is significant because it addresses the 'credit assignment' problem in AI, where a system needs to learn from both immediate feedback and long-term goals. By linking prediction (RNN) and action (QN) training, it helps bridge the gap between simple pattern recognition and complex decision-making. It is a foundational piece of engineering for autonomous agents that must navigate dynamic environments where they receive delayed or sparse rewards.

### What does this patent NOT cover?

Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.

**Full plain-English explainer:** https://patentbrief.org/patent/us/11170293/alphago-policy-and-value-networks

**Original patent:** https://patents.google.com/patent/US11170293

---

_Source: PatentBrief — https://patentbrief.org. Patent facts are from public records; the plain-English explanation is PatentBrief's._


## Related patents

Semantically similar inventions in the PatentBrief corpus:

- [How AI Agents Learn to Pick the Best Future Actions](https://patentbrief.org/patent/us/10282665/action-selection-with-a-reward-estimator-applied-to-machine-learning) — A method for an AI agent to predict which actions will yield the highest rewards by analyzing past experiences and refining its decision-making model.
- [Training Robot AI Models Faster Using Smart Simulations](https://patentbrief.org/patent/us/11836577/reinforcement-learning-model-training-through-simulation) — This patent describes a cloud service that helps train artificial intelligence models for robots by running simulations, even suggesting improvements to the AI's learning rules before starting.
- [How to Force AI to Follow Logical Rules During Training](https://patentbrief.org/patent/us/11651227/muzero) — A system that uses a dual-headed neural network to ensure AI models obey specific logical rules by embedding those rules directly into the training process.
- [How AI Learns to Control Game Characters Based on Their Surroundings](https://patentbrief.org/patent/us/10607134/artificially-intelligent-systems-devices-and-methods-for-learning-andor-using-an-avatars-circumstances-for-autonomous-avatar-operation) — A system that allows digital characters to automatically perform actions by matching their current environment to previously learned experiences stored in a database.
- [Training AI Models Across Different Computers](https://patentbrief.org/patent/us/12574477/distributed-deep-learning-using-a-distributed-deep-neural-network) — This 2026 patent describes a way to train AI models on one computer, send a version to another computer for further training with private data, and then update the original model with the improvements.