Online Learning and Bregman Divergences

Published on 2007-02-2511778 Views

Manfred K. Warmuth

L 1: Introduction to Online Learning (Predicting as good as the best expert, Predicting as good as the best linear combination of experts, Additive versus multiplicative family of updates)\\ L 2: Breg

MLSS 2006 - Taipei

Related categories

Presentation

Online Learning and Bregman Divergences00:06

Bregman Divergences [Br,CL,Cs]01:01

Exponential Family of Distributions01:42

Expectation parameter04:10

Primal & Dual Parameters05:55

Gaussian (unit variance)06:50

Bernoulli08:39

Poisson10:40

Manfred1_Page_2512:07

Bregman Div. as Rel. Ent. between Distributions16:26

Area unchanged When Slide Flipped17:23

Dual divergence for Bernoulli19:56

Dual divergence for Poisson22:35

Dual matching loss for sigmoid transfer func.24:02

Example: Gaussian density estimation27:15

Derivation of Updates29:12

On-line Algorithm [AW]31:52

Alternate Motivation of Same On-Line Update32:27

Shrinkage Towards Initial34:46

Key Lemma [AW]37:29

Main Theorem37:55

Bounds for the Forward Algorithm38:15

Why Bregman divergences?42:17

General setup of on-line learning43:15

Minimax Algorithm for T Trials43:52

Gaussian45:51

Last-step Minimax47:35

Last-step Minimax: Bernoulli49:23

Synopsis of methods52:12

Content of this tutorial56:47

Simple conversions56:54

Expected loss bounds [HW]57:00

Tail bound [CCG]01:03:20

Application: Adaptive Channel Equalization01:04:22

Application: Caching [GBW]01:07:02

Caching Policies01:07:58

Which Policy to Choose?01:09:02

Characteristics Vary with Time01:09:44

Randomly Permuted Request Stream01:11:12

Want “Adaptive” Policy01:12:27