0.25
0.5
0.75
1.25
1.5
1.75
2
Quickly Learning to Make Good Decisions
Published on Jul 28, 20154186 Views
A fundamental goal of artificial intelligence is to create agents that learn to make good decisions as they interact with a stochastic environment. Some of the most exciting and valuable potential ap
Related categories
Chapter list
Quickly Learning to Make Good Decisions00:00
The Matrix00:07
Goal: Systems that can transform human capacity and well being - 100:49
Data01:00
Goal: Systems that can transform human capacity and well being - 201:21
Agents Making Decisions as Interact with People01:24
Data = People01:54
Outline - 102:56
Markov Decision Process (MDP) - 103:26
Markov Decision Process (MDP) - 203:41
Markov Decision Process (MDP) - 303:49
Policy π s → a04:05
ompute Optimal Policy π s → a04:14
Reinforcement Learning04:27
Multi-armd Bandits04:40
POMDP05:02
Exploration: Improve Model Parameter Estimates05:20
Exploitation: Act Assuming Current Model Estimates Are Correct05:43
Outline - 206:09
A Simple Question ...06:19
Why Model Uncertainty?08:07
Predictive Model Using Spectral Learning & Method of Moments08:48
Method of Moments: Distribution Described by its Moments09:24
Accuracy Estimates for Latent Variable Modeling10:22
Acting in Partially Observable Domains11:01
Partially Observable Reinforcement Learning - 111:26
Partially Observable Reinforcement Learning - 211:50
Probably Approximately Correct RL for Fully Observable MDPs13:07
Many Simple & Computationally Cheap RL Algorithms are PAC!14:25
Optimistic Partially Observable RL15:01
(Episodic) PAC POMDP Reinforcement Learning15:27
1st* PAC POMDP Algorithm16:20
Partially Observable Optimistic Model-Based (POOMB) RL16:55
Two Room: Strong Optimism Helps17:29
Tiger: Price of Too Much False Optimism18:36
How Much Optimism is Needed?19:19
Outline - 320:50
Transfer/Multitask/Lifelong Learning21:00
Reasonable Approximation21:11
Using for News Personalization 21:30
Outline - 421:48
Using the Past to Fake Future22:03
Old Data - 122:19
Offline Stationary Policy Evaluation - 123:13
Offline Stationary Policy Evaluation - 223:37
Offline Nonstationary Policy Evaluation23:55
Problems with Building a Simulator (Models)24:22
Use Data Directly To Evaluate Algorithm25:30
Non-Stationary Policy Evaluation for Multi-armed Bandits25:34
Queues: Outcomes of Pulling Arms26:01
Offline Bandit Algorithm Performance Evaluationwith Queues - 126:18
Offline Bandit Algorithm Performance Evaluationwith Queues - 226:19
Offline Bandit Algorithm Performance Evaluationwith Queues - 326:42
Offline Bandit Algorithm Performance Evaluationwith Queues - 426:52
Offline Bandit Algorithm Performance Evaluationwith Queues - 526:52
Offline Bandit Algorithm Performance Evaluationwith Queues - 626:54
Offline Bandit Algorithm Performance Evaluationwith Queues - 726:54
Offline Bandit Algorithm Performance Evaluationwith Queues - 826:55
Can Compare Estimated* Online Performance of Algorithms27:25
Game Example - 127:35
Game Example - 227:50
Using Prior Data to Evaluate New Algorithm: Better Data Efficiency28:07
Algorithm Performance Evaluated Using Offline Data29:06
Offline Nonstationary Policy Evaluation for Reinforced Learning - 129:25
Old Data 29:38
Queues: Outcomes For (s,a) Pairs29:56
Queue Evaluator: Select Action30:09
Queue Evaluator: Pull From Queue30:19
Queue Evaluator: Receiver r & s'30:21
Per State Rejection Sampler Evaluator - 130:25
Per State Rejection Sampler Evaluator - 230:46
Per State Rejection Sampler Evaluator - 330:52
Per State Rejection Sampler Evaluator - 431:01
Game Example - 331:36
Desirable Evaluator Properties31:51
No Revealed Randomness32:53
Revealed Randomness33:18
Offline Nonstationary Policy Evaluation for Reinforced Learning - 233:42
Outline - 533:54
Quickly Learning to Make Good Decisions34:03