PatentBrief · Patent BriefUS 11170293

How AI Systems Learn to Predict and Act Simultaneously

A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system.

Patent Number

US 11170293

Status

Active

Filing Date

December 30, 2015

Grant Date

November 9, 2021

Expiration

~December 2035 (estimated)

Claims

Assignee

Microsoft Technology Licensing LLC

Inventors

Ji He, Prabhdeep Singh, Lihong Li, Xiaodong He, Jianfeng Gao, Jianshu Chen, Li Deng, Xiujun Li

Citations

4 forward · 16 backward

What it covers

This patent describes a system that uses two types of neural networks working in tandem to improve how an AI makes decisions. First, a Recurrent Neural Network (RNN) processes data to create a 'state' (a summary of what is happening) and predicts a result. Second, a Q-Network (QN) looks at that state to suggest the best action to take. The system then uses a 'reference result' (the actual outcome) to update the RNN using supervised learning, while simultaneously updating the Q-Network using reinforcement learning based on the action taken and the new state. This allows the system to get better at both understanding its environment and choosing the right actions over time.

What it doesn't cover

—Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.
—Does not cover static models that do not update their parameters (the 'second' model versions) based on new observation values.
—Does not cover non-recurrent neural networks that lack the ability to maintain state information over time.
—Does not cover systems that do not use a communications interface to receive external reference result values.

The clever bit

The innovation lies in using the same reference result to update both the predictive model (RNN) and the decision-making model (QN) in a coordinated loop, ensuring the 'state' information used for decision-making is constantly being refined by the predictive model's accuracy.

Why it matters

This patent is significant because it addresses the 'credit assignment' problem in AI, where a system needs to learn from both immediate feedback and long-term goals. By linking prediction (RNN) and action (QN) training, it helps bridge the gap between simple pattern recognition and complex decision-making. It is a foundational piece of engineering for autonomous agents that must navigate dynamic environments where they receive delayed or sparse rewards.

Real-world examples

1.Autonomous robotics navigation
2.Real-time industrial process control
3.Automated trading systems
4.Adaptive game AI agents

Generated by PatentBrief · Not legal advice · patentbrief.org

US 11170293 · 2026