Future Information Minimization as PAC Bayes regularization in Reinforcement Learning

Published on 2012-01-254822 Views

Naftali Tishby

Interactions between an organism and its environment are commonly treated in the framework of Markov Decision Processes (MDP). While standard MDP is aimed at maximizing expected future rewards (value)

New Frontiers in Model Order Selection

Related categories

Reinforcement Learning

Presentation

Future Information Minimization as PAC Bayes regularization in Reinforcement Learning00:00

[Partially Observed] Markov Decision Processes01:01

Reinforcement Learning revisited02:26

The Agent Learns a Policy02:51

Graphical model for the perception-action-cycle04:22

Information pickup: I-gains and Bellman optimality10:30

Bellman meets Shannon10:32

Decision-sequences and information11:22

Decision/action sequences and information12:19

Proof idea: Recursive Information-chain rules ...13:54

Application: Uncertainty reduction and source coding19:02

Application: Huffman coding and Bellman optimality20:15

Application: Sequential hypothesis testing20:51

How much information is needed for valuable behavior?21:28

Value (extrinsic) and Information (intrinsic) ...21:36

Combining (future) Value and Information26:07

Trading Value and (future) Information26:09

Information bounded RL29:53

Maze32:00

More complex maze ... - 134:29

Animation - 134:42

Animation - 234:59

Animation - 335:00

More complex maze ... - 235:27

Global convergence theorem35:33

PAC-Bayes Generalization Theorem (McAllester 2001)36:38

PAC-Bayes Robustness Theorem for I-RL37:28

The optimal tradeoff between Value and Future Information40:08

Beyond MDP: Extracting only valuable past information41:05

Sequential Information gathering41:16

Conclusions44:36