PatentBrief · Patent BriefUS 10282665

How AI Agents Learn to Pick the Best Future Actions

A method for an AI agent to predict which actions will yield the highest rewards by analyzing past experiences and refining its decision-making model.

Patent Number

US 10282665

Status

Active

Filing Date

June 12, 2015

Grant Date

May 7, 2019

Expiration

June 12, 2035

Claims

Assignee

Sony Corp

Inventors

Yoshiyuki Kobayashi

Citations

0 forward · 6 backward

What it covers

This patent describes a system where an AI agent learns from its history to make better decisions. It records 'action history data'—which includes the state the agent was in, the action it took, and the reward it received. The system uses this data to build a 'reward estimator' that predicts how much reward a future action might generate. By comparing these predicted rewards for various possible next steps, the agent selects and executes the action with the highest estimated value. This process allows the agent to continuously improve its performance as it gathers more data.

What it doesn't cover

—Does not cover general machine learning algorithms that do not specifically use reward estimation based on action history data.
—Does not cover hardware-specific implementations, as the claims focus on the logical process performed by a CPU.
—Does not cover reinforcement learning methods that rely solely on trial-and-error without a basis-function-based reward estimator.

The clever bit

The system uses 'basis functions' to transform raw state and action data into 'feature amount vectors,' which allows the AI to map complex, high-dimensional experiences into a space where it can more easily calculate and predict rewards.

Why it matters

This technology is fundamental to modern autonomous systems, such as robotics and game AI, where an agent must navigate complex environments. By formalizing how an agent evaluates the potential 'reward' of its next move, Sony provides a framework for more efficient decision-making in unpredictable scenarios. It represents a shift toward more structured, data-driven behavior in automated agents.

Real-world examples

1.Autonomous robot navigation in warehouses
2.Non-player character (NPC) behavior in video games
3.Automated resource management in cloud computing

Generated by PatentBrief · Not legal advice · patentbrief.org

US 10282665 · 2026