event thumbnail image
Pascal Symposium meeting
Pascal

Online Reinforcement Learning and Sequential Forecasting and Partial Feedback

author: Peter Auer, University of Leoben
You might be experiencing some problems with Your Video player.
Slides
0:00 Pump-Priming Projects “Online Performance of Reinforcement Learning” & “Sequential Forecasting and Partial Feedback”
0:11 Online performance of reinforcement learning with internal reward functions - 1
0:12 Online performance of reinforcement learning with internal reward functions - 2
0:13 Online performance of reinforcement learning with internal reward functions - 3
0:14 Online performance of reinforcement learning with internal reward functions - 4
0:16 Online performance of reinforcement learning with internal reward functions - 5
0:17 Online performance of reinforcement learning with internal reward functions - 6
0:40 Sequential Forecasting and Partial Feedback: Applications to Machine Learning
1:42 Activities (partial list)
2:30 Scientific outcome (partial list)
2:46 ML algorithms for parameter optimization: UCT
4:02 Example for parameter optimization with UCT - 1
4:12 Example for parameter optimization with UCT - 2
4:34 Example for parameter optimization with UCT - 3
5:11 Example for parameter optimization with UCT - 4
5:41 Example for parameter optimization with UCT - 5
6:09 Example for parameter optimization with UCT - 6
6:21 Application of UCT parameter optimization - 1
6:45 Application of UCT parameter optimization - 2
7:04 Apprenticeship learning using inverse reinforcement learning and gradient methods - 1
7:25 Apprenticeship learning using inverse reinforcement learning and gradient methods - 2
7:35 Apprenticeship learning using inverse reinforcement learning and gradient methods - 3
7:51 Apprenticeship learning using inverse reinforcement learning and gradient methods - 4
8:07 Solving the IRL task
9:33 IRL – Experiments
11:02 Reinforcement Learning
12:32 Undiscounted online regret - 1
13:04 Undiscounted online regret - 2
13:54 Bounds on the regret - 1
15:01 Bounds on the regret - 2
15:55 Relation to other work: PAC-like bounds - 1
16:49 Relation to other work: PAC-like bounds - 2
17:12 Relation to other work: PAC-like bounds - 3
17:46 Relation to other work: PAC-like bounds - 4
18:43 Relation to other work: log T bounds - 1
18:59 Relation to other work: log T bounds - 2
19:14 Relation to other work: log T bounds - 3
19:40 Relation to other work: log T bounds - 4
20:48 The UCRL algorithm: Upper Confidence Reinforcement Learning
22:18 Details of UCRL: The bias - 1
22:37 Details of UCRL: The bias - 2
22:50 Details of UCRL: The bias - 3
23:09 Details of UCRL: The bias - 4
23:27 Details of UCRL: The bias - 5
23:39 Details of UCRL: Analysis - 1
23:42 Details of UCRL: Analysis - 2
23:45 Details of UCRL: Analysis - 3
23:48 Details of UCRL: Analysis - 4
23:52 Details of UCRL: Analysis - 5
24:00 Future research - 1
24:05 Future research - 2
24:32 Future research - 3
24:42 Future research - 4

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: