event thumbnail image
Reinforcement Learning

Automatic Discovery and Transfer of MAXQ Hierarchies

author: Neville Mehta, Oregon State University

Description

We present an algorithm, HI-MAT (Hierarchy Induction via Models And Trajectories), that discovers MAXQ task hierarchies by applying dynamic Bayesian network models to a successful trajectory from a source reinforcement learning task. HI-MAT discovers subtasks by analyzing the causal and temporal relationships among the actions in the trajectory. Under appropriate assumptions, HI-MAT induces hierarchies that are consistent with the observed trajectory and have compact value-function tables employing safe state abstractions. We demonstrate empirically that HI-MAT constructs compact hierarchies that are comparable to manually-engineered hierarchies and facilitate significant speedup in learning when transferred to a target task.

You might be experiencing some problems with Your Video player.
Slides
0:00 Automatic Discovery and Transfer of MAXQ Hierarchies
0:07 Motivation (1)
2:06 Motivation (2)
2:46 Our Approach: HI-MAT
3:11 Markov Decision Process
3:49 Dynamic Bayesian Network (DBN) (1)
4:26 Dynamic Bayesian Network (DBN) (2)
5:18 Hierarchical RL: MAXQ Framework
6:30 MAXQ: Execution Semantics (1)
6:39 MAXQ: Execution Semantics (2)
6:45 MAXQ: Execution Semantics (3)
6:54 MAXQ: Execution Semantics (4)
7:07 MAXQ: Execution Semantics (5)
7:16 MAXQ: Execution Semantics (6)
7:22 MAXQ: Execution Semantics (7)
7:23 MAXQ: Execution Semantics (8)
7:26 Hierarchy Learning Problem
8:12 Desired Properties
9:28 HI-MAT Algorithm (1)
9:39 HI-MAT Algorithm (2)
9:47 HI-MAT Algorithm (3)
10:03 HI-MAT Algorithm (4)
10:47 HI-MAT Algorithm (5)
11:04 HI-MAT Algorithm (6)
11:26 HI-MAT Algorithm (7)
11:47 HI-MAT Algorithm (8)
11:57 HI-MAT Algorithm (9)
12:04 HI-MAT Algorithm (10)
12:15 HI-MAT Algorithm (11)
12:43 HI-MAT Algorithm (12)
13:05 HI-MAT Algorithm (13)
13:17 HI-MAT Algorithm (14)
13:29 HI-MAT Algorithm (15)
13:53 HI-MAT Algorithm (16)
14:09 HI-MAT Algorithm (17)
14:29 Empirical Evaluation: Hypotheses
15:33 Experimental Setup: Taxi (1)
16:04 Experimental Setup: Taxi (2)
16:49 Results: Taxi (1)
16:59 Results: Taxi (2)
17:29 Results: Taxi (3)
18:04 Results: Taxi (4)
18:19 Results: Taxi (5)
18:33 Experimental Setup: Wargus (1)
18:43 Experimental Setup: Wargus (2)
18:53 Induced Wargus Hierarchy
18:57 Hand-Built Wargus Hierarchy
19:10 Induced Wargus Hierarchy
19:13 VISA’s Wargus Hierarchy
19:17 Results: Wargus (1)
19:33 Results: Wargus (2)
20:13 Contribution of the Trajectory
21:01 Modified Bitflip Domain
21:33 Modified Bitflip Domain: Example
21:50 VISA’s Causal Graph
22:05 Modified Bitflip CAT
22:12 Hierarchy Comparison
22:31 Results: 7-bit Modified Bitflip
22:38 Conclusion

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: