How AI Systems Learn to Predict and Act Simultaneously
A method for training AI models that combines supervised learning for prediction with reinforcement learning for decision-making in a single, coordinated system.
Patent Number
US 11170293
Status
Active
Filing Date
December 30, 2015
Grant Date
November 9, 2021
Expiration
~December 2035 (estimated)
Claims
15
Assignee
Microsoft Technology Licensing LLC
Inventors
Ji He, Prabhdeep Singh, Lihong Li, Xiaodong He, Jianfeng Gao, Jianshu Chen, Li Deng, Xiujun Li
Citations
4 forward · 16 backward
What it covers
This patent describes a system that uses two types of neural networks working in tandem to improve how an AI makes decisions. First, a Recurrent Neural Network (RNN) processes data to create a 'state' (a summary of what is happening) and predicts a result. Second, a Q-Network (QN) looks at that state to suggest the best action to take. The system then uses a 'reference result' (the actual outcome) to update the RNN using supervised learning, while simultaneously updating the Q-Network using reinforcement learning based on the action taken and the new state. This allows the system to get better at both understanding its environment and choosing the right actions over time.
What it doesn't cover
- —Does not cover systems that rely solely on supervised learning without a reinforcement learning feedback loop.
- —Does not cover static models that do not update their parameters (the 'second' model versions) based on new observation values.
- —Does not cover non-recurrent neural networks that lack the ability to maintain state information over time.
- —Does not cover systems that do not use a communications interface to receive external reference result values.
The clever bit
The innovation lies in using the same reference result to update both the predictive model (RNN) and the decision-making model (QN) in a coordinated loop, ensuring the 'state' information used for decision-making is constantly being refined by the predictive model's accuracy.
Why it matters
This patent is significant because it addresses the 'credit assignment' problem in AI, where a system needs to learn from both immediate feedback and long-term goals. By linking prediction (RNN) and action (QN) training, it helps bridge the gap between simple pattern recognition and complex decision-making. It is a foundational piece of engineering for autonomous agents that must navigate dynamic environments where they receive delayed or sparse rewards.
Real-world examples
- 1.Autonomous robotics navigation
- 2.Real-time industrial process control
- 3.Automated trading systems
- 4.Adaptive game AI agents
Generated by PatentBrief · Not legal advice · patentbrief.org
US 11170293 · 2026