How to discount Information: Information flow in sensing-acting systems and the emergence of heirarchies

Published on 2012-10-164052 Views

Naftali Tishby

We argue that consistent formulation of optimal sensing and control must include information terms, yielding an extension of the standard POMDP setting. To make the standard reward/costs terms consist

Workshop on Statistical Physics 2012 - Granada

Related categories

Presentation

How to discount Information00:00

The Perception - Action Cycle01:08

Perception - Action Cycles07:16

The essence of the cycle08:44

The brain’s primary task: making valuable predictions12:23

Hierarchies and reverse hierarchies13:27

We see what we expect to see17:55

Key assumptions18:28

Outline18:29

Part 122:29

Markov Decision Processes22:36

Reinforcement Learning revisited24:26

The Agent Learns a Policy24:52

Policy Iteration26:33

Information gathering26:49

Bellman meets Shannon26:53

Graphical model for the perception-action-cycle (1)27:04

Graphical model for the perception-action-cycle (2)31:41

Decision-sequences and information (1)34:35

Decision-sequences and information (2)36:08

Proof idea37:42

Application44:15

Simple example: The 3 coin problem45:12

Huffman coding and Bellman optimality46:34

How much information is needed?46:35

Value (extrinsic) and Information (intrinsic)…46:36

Some Control-Information Dualities48:08

Combining (future) Value and Information48:10

Trading Value and (future) Information48:13

Free Energy minimization50:19

Simple sum-rules50:20

Graphs54:45

Global convergence theorem56:52

PAC-Bayes Generalization Theorem57:31

PAC-Bayes Robustness Theorem for I-RL (1)57:33

PAC-Bayes Robustness Theorem for I-RL (2)57:46

Moving through a mind field (1)57:47

Moving through a mind field (2)57:50