Online Reinforcement Learning and Sequential Forecasting and Partial Feedback
author:
Peter Auer,
University of Leoben
You might be experiencing some problems with Your Video player.
| Slides | |
| 0:00 | Pump-Priming Projects “Online Performance of Reinforcement Learning” & “Sequential Forecasting and Partial Feedback” |
| 0:11 | Online performance of reinforcement learning with internal reward functions - 1 |
| 0:12 | Online performance of reinforcement learning with internal reward functions - 2 |
| 0:13 | Online performance of reinforcement learning with internal reward functions - 3 |
| 0:14 | Online performance of reinforcement learning with internal reward functions - 4 |
| 0:16 | Online performance of reinforcement learning with internal reward functions - 5 |
| 0:17 | Online performance of reinforcement learning with internal reward functions - 6 |
| 0:40 | Sequential Forecasting and Partial Feedback: Applications to Machine Learning |
| 1:42 | Activities (partial list) |
| 2:30 | Scientific outcome (partial list) |
| 2:46 | ML algorithms for parameter optimization: UCT |
| 4:02 | Example for parameter optimization with UCT - 1 |
| 4:12 | Example for parameter optimization with UCT - 2 |
| 4:34 | Example for parameter optimization with UCT - 3 |
| 5:11 | Example for parameter optimization with UCT - 4 |
| 5:41 | Example for parameter optimization with UCT - 5 |
| 6:09 | Example for parameter optimization with UCT - 6 |
| 6:21 | Application of UCT parameter optimization - 1 |
| 6:45 | Application of UCT parameter optimization - 2 |
| 7:04 | Apprenticeship learning using inverse reinforcement learning and gradient methods - 1 |
| 7:25 | Apprenticeship learning using inverse reinforcement learning and gradient methods - 2 |
| 7:35 | Apprenticeship learning using inverse reinforcement learning and gradient methods - 3 |
| 7:51 | Apprenticeship learning using inverse reinforcement learning and gradient methods - 4 |
| 8:07 | Solving the IRL task |
| 9:33 | IRL – Experiments |
| 11:02 | Reinforcement Learning |
| 12:32 | Undiscounted online regret - 1 |
| 13:04 | Undiscounted online regret - 2 |
| 13:54 | Bounds on the regret - 1 |
| 15:01 | Bounds on the regret - 2 |
| 15:55 | Relation to other work: PAC-like bounds - 1 |
| 16:49 | Relation to other work: PAC-like bounds - 2 |
| 17:12 | Relation to other work: PAC-like bounds - 3 |
| 17:46 | Relation to other work: PAC-like bounds - 4 |
| 18:43 | Relation to other work: log T bounds - 1 |
| 18:59 | Relation to other work: log T bounds - 2 |
| 19:14 | Relation to other work: log T bounds - 3 |
| 19:40 | Relation to other work: log T bounds - 4 |
| 20:48 | The UCRL algorithm: Upper Confidence Reinforcement Learning |
| 22:18 | Details of UCRL: The bias - 1 |
| 22:37 | Details of UCRL: The bias - 2 |
| 22:50 | Details of UCRL: The bias - 3 |
| 23:09 | Details of UCRL: The bias - 4 |
| 23:27 | Details of UCRL: The bias - 5 |
| 23:39 | Details of UCRL: Analysis - 1 |
| 23:42 | Details of UCRL: Analysis - 2 |
| 23:45 | Details of UCRL: Analysis - 3 |
| 23:48 | Details of UCRL: Analysis - 4 |
| 23:52 | Details of UCRL: Analysis - 5 |
| 24:00 | Future research - 1 |
| 24:05 | Future research - 2 |
| 24:32 | Future research - 3 |
| 24:42 | Future research - 4 |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Visitors who watched this lecture also watched...
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





