Action selection with a reward estimator applied to machine learning
There is provided an information processing apparatus including a reward estimator generator using action history data, including state data expressing a state, action data expressing an action taken by an agent, and a r
Patent Number
US 10282665
Status
Active
Filing Date
June 12, 2015
Grant Date
May 7, 2019
Expiration
~June 2035 (estimated)
Claims
14
Assignee
Sony Corp
Inventors
Yoshiyuki Kobayashi
Citations
0 forward · 6 backward
What it covers
There is provided an information processing apparatus including a reward estimator generator using action history data, including state data expressing a state, action data expressing an action taken by an agent, and a reward value expressing a reward obtained as a result of the action, as learning data to generate, through machine learning, a reward estimator estimating a reward value from inputted state data and action data. The reward estimator generator includes: a basis function generator generating a plurality of basis functions; a feature amount vector calculator calculating feature amount vectors by inputting state data and action data in the action history data into the basis functions; and an estimation function calculator calculating an estimation function estimating the reward value included in the action history data from the feature amount vectors according to regressive/discriminative learning. The reward estimator includes the plurality of basis functions and the estimation function.
Generated by PatentBrief · Not legal advice · patentbrief.org
US 10282665 · 2026