## Learning through Exploration

author: John Langford, Yahoo! Research
author: Alina Beygelzimer, IBM Watson Research Center
published: Oct. 1, 2010,   recorded: July 2010,   views: 1723
Categories
You might be experiencing some problems with Your Video player.

# Slides

0:00 Slides A Tutorial on Learning through Exploration Example of Learning through Exploration Another Example: Clinical Decision Making The Contextual Bandit Setting (1) The Contextual Bandit Setting (2) The Contextual Bandit Setting (3) The Contextual Bandit Setting (4) The Contextual Bandit Setting (5) Basic Observation #1 Basic Observation #2 Outline - online, stochastic Idea 1: Follow the Leader (1) Idea 1: Follow the Leader (2) Idea 1: Follow the Leader (3) Idea 2: Explore  then Follow the Leader (EFTL- ) (1) Idea 2: Explore  then Follow the Leader (EFTL- ) (2) Theorem, Proof Unknown T Idea 3: Exponential Weight Algorithm for Exploration and Exploitation with Experts Idea 2: Explore  then Follow the Leader (EFTL- ) (2) Unknown T Idea 3: Exponential Weight Algorithm for Exploration and Exploitation with Experts Theorem, Proof Idea 3: Exponential Weight Algorithm for Exploration and Exploitation with Experts Theorem: [Auer et al. '95] (1) Theorem: [Auer et al. '95] (2) EXP4 can be modi ed to succeed with high probability Theorem, Proof Summary so far Outline, Argmax Regression kdd2010_beygelzimer_langford_lte_Page_25 kdd2010_beygelzimer_langford_lte_Page_26 Approach 1: The Regression Approach (1) Idea 3: Exponential Weight Algorithm for Exploration and Exploitation with Experts Approach 1: The Regression Approach (1) Approach 1: The Regression Approach (2) Proof sketch: Fix x (graph) Approach 2: Importance-Weighted Classi cation Approach (Zadrozny'03) (1) Approach 2: Importance-Weighted Classi cation Approach (Zadrozny'03) (2) Proof sketch: Fix x (graph) Approach 2: Importance-Weighted Classi cation Approach (Zadrozny'03) (1) Approach 3: The O set Trick for K = 2 (two actions) (1) Approach 2: Importance-Weighted Classi cation Approach (Zadrozny'03) (2) Approach 3: The O set Trick for K = 2 (two actions) (1) Approach 3: The O set Trick for K = 2 (two actions) (2) Induced binary distribution D (1) Induced binary distribution D - Example 1 (1) Induced binary distribution D - Example 1 (2) Induced binary distribution D - Example 2 (1) Induced binary distribution D - Example 2 (2) Induced binary distribution D - Example 3 (1) Induced binary distribution D - Example 3 (2) Analysis for K = 2 Denoising for K > 2 arms Training on example (x; 3; 0:75; 0:5) (1) Training on example (x; 3; 0:75; 0:5) (2) Training on example (x; 3; 0:75; 0:5) (3) Denoising with K arms: Analysis A Comparison of Approaches

# Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Part 1 1:22:31
!NOW PLAYING

Part 2 1:02:18

# Description

This tutorial is about learning through exploration. The goal is to learn how to make decisions in partial feedback settings where an agent repeatedly observes some information, chooses an action, and then learns how this action paid off (but doesn't get to see how other actions would have paid off). We plan to cover all aspects of this general problem: learning, evaluation, limitations of ability to learn in this setting, and the relationship to traditional supervised learning.