TD-Gammon
Encyclopedia
TD-Gammon was a computer
backgammon
program developed in 1992 by Gerald Tesauro at IBM
's Thomas J. Watson Research Center
. Its name comes from the fact that it is an artificial neural net trained by a form of temporal-difference learning, specifically TD-lambda.
TD-Gammon achieved a level of play just slightly below that of the top human backgammon players of the time. It explored strategies that humans had not pursued and led to advances in the theory of correct backgammon play.
look-ahead), feeds each resulting board position into its evaluation function
, and chooses the move that leads to the board position that got the highest score. In this respect, TD-Gammon is no different than almost any other computer board-game program. TD-Gammon's innovation was in how it learned its evaluation function.
TD-Gammon's learning algorithm is as follows:
1. Each example consists of feeding the program a complete transcript of a game: all board positions from beginning to end, and a vector Y consisting of four bits, indicating the outcome of the game: White wins normally, Black wins normally, White wins a gammon, Black wins a gammon. (Backgammons are ignored because of their extreme rarity.)
2. The final Y vector is compared against the evaluation of the final board position, and the weights in the neural net are updated to bring the evaluation function closer to Y. Then, for each preceding board position, the weights in the neural net are updated to make the evaluation function for each Y(t) closer to Y(t+1).
Thus the evaluation function is made to be more and more internally consistent: the evaluation of any board position is made to grow closer to the evaluation of the board after the following move (hence "temporal-difference learning").
(also written by Tesauro), where an expert trained the program by supplying the "correct" evaluation of each position, TD-Gammon was at first programmed "knowledge-free". In early experimentation, using only a raw board encoding with no human-designed features, TD-Gammon reached a level of play comparable to Neurogammon: that of an intermediate-level human backgammon player.
Even though TD-Gammon discovered insightful features on its own, Tesauro wondered if its play could be improved by using hand-designed features like Neurogammon's. Indeed, the self-training TD-Gammon with expert-designed features soon surpassed all previous computer backgammon programs. It stopped improving after about 1,500,000 games (self-play) using 80 hidden units.
For example, on the opening play, the conventional wisdom was that given an roll of 2-1, 4-1, or 5-1, White should move a single checker from point 6 to point 5. Known as "slotting", this technique trades the risk of a hit for the opportunity to develop an aggressive position. TD-Gammon found that the more conservative play of 24-23 was superior. Tournament players began experimenting with TD-Gammon's move, and found success. Within a few years, slotting had disappeared from tournament play (it's now reappearing for 2-1, though).
Backgammon expert Kit Woolsey
found that TD-Gammon's positional judgement, especially its weighing of risk against safety, was superior to his own or any human's.
TD-Gammon's excellent positional play was undercut by occasional poor endgame play. The endgame requires a more analytic approach, sometimes with extensive lookahead. TD-Gammon's limitation to two-ply lookahead put a ceiling on what it could achieve in this part of the game. TD-Gammon's strengths and weaknesses were the opposite of symbolic artificial intelligence programs and most computer software in general: it was good at matters that require an intuitive "feel", bad at systematic analysis.
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
backgammon
Backgammon
Backgammon is one of the oldest board games for two players. The playing pieces are moved according to the roll of dice, and players win by removing all of their pieces from the board. There are many variants of backgammon, most of which share common traits...
program developed in 1992 by Gerald Tesauro at IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
's Thomas J. Watson Research Center
Thomas J. Watson Research Center
The Thomas J. Watson Research Center is the headquarters for the IBM Research Division.The center is on three sites, with the main laboratory in Yorktown Heights, New York, 38 miles north of New York City, a building in Hawthorne, New York, and offices in Cambridge, Massachusetts.- Overview :The...
. Its name comes from the fact that it is an artificial neural net trained by a form of temporal-difference learning, specifically TD-lambda.
TD-Gammon achieved a level of play just slightly below that of the top human backgammon players of the time. It explored strategies that humans had not pursued and led to advances in the theory of correct backgammon play.
Algorithm for play and learning
During play, TD-Gammon examines at each turn all possible legal moves and all their possible responses (two-plyPly (game theory)
In two-player sequential games, a ply refers to one turn taken by one of the players. The word is used to clarify what is meant when one might otherwise say "turn"....
look-ahead), feeds each resulting board position into its evaluation function
Evaluation function
An evaluation function, also known as a heuristic evaluation function or static evaluation function, is a function used by game-playing programs to estimate the value or goodness of a position in the minimax and related algorithms...
, and chooses the move that leads to the board position that got the highest score. In this respect, TD-Gammon is no different than almost any other computer board-game program. TD-Gammon's innovation was in how it learned its evaluation function.
TD-Gammon's learning algorithm is as follows:
1. Each example consists of feeding the program a complete transcript of a game: all board positions from beginning to end, and a vector Y consisting of four bits, indicating the outcome of the game: White wins normally, Black wins normally, White wins a gammon, Black wins a gammon. (Backgammons are ignored because of their extreme rarity.)
2. The final Y vector is compared against the evaluation of the final board position, and the weights in the neural net are updated to bring the evaluation function closer to Y. Then, for each preceding board position, the weights in the neural net are updated to make the evaluation function for each Y(t) closer to Y(t+1).
Thus the evaluation function is made to be more and more internally consistent: the evaluation of any board position is made to grow closer to the evaluation of the board after the following move (hence "temporal-difference learning").
Experiments and stages of training
Unlike previous neural-net backgammon programs such as NeurogammonNeurogammon
Neurogammon is a computer backgammon program written by Gerald Tesauro at IBM's Thomas J. Watson Research Center. It was the first viable computer backgammon program implemented as a neural net, and set a new standard in computer backgammon play. It won the 1st Computer Olympiad in London in 1989,...
(also written by Tesauro), where an expert trained the program by supplying the "correct" evaluation of each position, TD-Gammon was at first programmed "knowledge-free". In early experimentation, using only a raw board encoding with no human-designed features, TD-Gammon reached a level of play comparable to Neurogammon: that of an intermediate-level human backgammon player.
Even though TD-Gammon discovered insightful features on its own, Tesauro wondered if its play could be improved by using hand-designed features like Neurogammon's. Indeed, the self-training TD-Gammon with expert-designed features soon surpassed all previous computer backgammon programs. It stopped improving after about 1,500,000 games (self-play) using 80 hidden units.
Advances in backgammon theory
TD-Gammon's exclusive training through self-play (rather than tutelage) enabled it to explore strategies that humans previously hadn't considered or had ruled out erroneously. Its success with unorthodox strategies had a significant impact on the backgammon community.For example, on the opening play, the conventional wisdom was that given an roll of 2-1, 4-1, or 5-1, White should move a single checker from point 6 to point 5. Known as "slotting", this technique trades the risk of a hit for the opportunity to develop an aggressive position. TD-Gammon found that the more conservative play of 24-23 was superior. Tournament players began experimenting with TD-Gammon's move, and found success. Within a few years, slotting had disappeared from tournament play (it's now reappearing for 2-1, though).
Backgammon expert Kit Woolsey
Kit Woolsey
Kit Woolsey is an American bridge and backgammon player. He graduated from Oberlin College in 1964. He earned a master's degree in mathematics from the University of Illinois at Urbana-Champaign in 1965....
found that TD-Gammon's positional judgement, especially its weighing of risk against safety, was superior to his own or any human's.
TD-Gammon's excellent positional play was undercut by occasional poor endgame play. The endgame requires a more analytic approach, sometimes with extensive lookahead. TD-Gammon's limitation to two-ply lookahead put a ceiling on what it could achieve in this part of the game. TD-Gammon's strengths and weaknesses were the opposite of symbolic artificial intelligence programs and most computer software in general: it was good at matters that require an intuitive "feel", bad at systematic analysis.