Online Markov Decision Processes under Bandit Feedback thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Online Markov Decision Processes under Bandit Feedback

Published on Mar 25, 20113126 Views

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary. The goal of the learning agent is to compete

Related categories