Trade-Offs in Sampling-Based Adversarial Planning
published: July 21, 2011, recorded: June 2011, views: 176
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
The Upper Confidence bounds for Trees (UCT) algorithm has in recent years captured the attention of the planning and game-playing community due to its notable success in the game of Go. However, attempts to reproduce similar levels of performance in domains that are the forte of Minimax-style algorithms have been largely unsuccessful, making any comparative studies of the two hard. In this paper, we study UCT in the game of Mancala, which to our knowledge is the first domain where both search algorithms perform quite well with minimal enhancement. We focus on the three key components of the UCT algorithm in its purest form - targeted node expansion, state value estimation via playouts and averaging backups - and look at their contributions to the overall performance of the algorithm. We study the trade-offs involved in using alternate ways to perform these steps. Finally, we demonstrate a novel hybrid approach to enhancing UCT, that exploits its superior decision accuracy in regions of the search space with few terminal nodes.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !