Online Markov Decision Processes under Bandit Feedback

Published on 2011-03-253138 Views

Gergely Neu

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary. The goal of the learning agent is to compete

Knowledge 4 All Foundation Video Journal Volume 1

Related categories

Markov Processes

Online Markov Decision Processes under Bandit Feedback

Gergely Neu

Knowledge 4 All Foundation Video Journal Volume 1

Related categories

Presentation