In developing software designed to play some simple strategy games against a human opponent, I used several popular techniques for evaluating possible moves and choosing the best one for the computer at each turn of the game, including the standard "minimax" algorithm.
To assess whether one move is better than another, each possible board position in the game (referred to as a "node") is assigned a numerical value indicating whether the position is favorable to one player (a positive value), or the other player (a negative value), or not favorable to either side (a value of zero). Thus, one player is always trying to steer the game to the higher values. That maximizing player will hereafter be referred to as "MAX", and is usually designated to represent the computer. The opponent, "MIN" — usually a human — is always trying to steer the game to the nodes with lower values.
Two nodes are connected by the player's move that changes the board from the earlier position to the next. These connections can be thought of as "branches" forming what is termed the "search tree" of potential paths of future play.
In a nutshell, minimax operates by considering all the legal moves available to whichever player whose turn it is at that node. For each one of those resulting nodes, it considers all the legal replies available to the opponent, and for each one of those, all the legal replies, etc. — as far down into the search tree as possible. Minimax looks forward into the game's potential future as far as allowed by the limits of the computer's technical capacity or, more often, the limits of the human player's patience for the computer to "Make a move, slowpoke!"
At all MAX nodes, the move selected is the one that offers the highest value, and at all MIN nodes, the lowest value. So at each level (or "depth"), going deeper into the search tree, the algorithm is either maximizing or minimizing — hence the name "minimax".
Minimax works quite well, but the number of nodes that must be examined and evaluated grows exponentially with the number of possible moves at each depth. For instance, when using minimax for tic-tac-toe, within the search tree, the first player to move has nine squares from which to choose, but for each one of those nodes, must consider only eight remaining countermoves, and then just seven in reply, etc., down to just one possible move. However, if we ignore the fact that some of those permutations end earlier than nine moves due to a winning row, then the total number of nodes that the program must consider is 9 factorial (362,880). When using minimax for playing chess, the first player to move (White) has 20 possible moves from which to choose. As the position opens up and more pieces can move around, then that number usually increases quickly. According to Wikipedia, "An average position typically has thirty to forty possible moves" and "The game-tree complexity of chess" is 10 to the power of 120 (a 1 following by 120 zeroes)! Much of that phenomenal difference between chess and tic-tac-toe is that in the former, pieces can move around, unlike tic-tac-toe (and some other strategy games, such as Reversi).
As a consequence of these massive numbers of nodes that must be examined, most computers playing a game of any complexity have the resources to look ahead in the search tree to only a limited number of levels ("plies"), thus impairing their competitive performance. Consequently, game programmers will utilize any known method to significantly reduce the number of nodes without reducing the quality of analysis.
Alpha-beta pruning tries to alleviate this burden by proactively discarding branches of the search tree that a player would not choose because it has already found one of equal or better value. The technique operates by keeping two values, alpha and beta. Alpha is the best MAX value yet found and beta is the best MIN value yet found. Both are passed down the search tree to lower levels, but not up to higher levels. If the current best MIN node value (beta) is less than or equal to alpha, then the remaining MIN nodes can be skipped because MAX would not choose that branch and instead would choose the alpha branch. Similarly, if the current best MAX node value (alpha) is greater than or equal to beta, then the remaining MAX nodes can be skipped because MIN would not choose that branch.
In practice, this technique drastically reduces the number of nodes that must be generated and evaluated — by about 85%, in my testing — thus allowing the program to look farther ahead and play a stronger game.
Alpha-beta pruning has traditionally been considered a flawless and key part of any efficient searching strategy that supposedly never changes the final outcome. But in practice I have encountered an unexpected problem which results in one player choosing an ultimately losing move instead of one that ties the game.
Imagine this scenario: The maximizing player, MAX, has two moves to evaluate at level L. The first move results in a tie, so its value is set to 0 and alpha is set to 0, since we know MAX will not need to settle for less than a tie. In searching the second MAX move, with alpha = 0, at level L+1, the minimizing player, MIN, has two reply moves to evaluate. The second MIN move is a forced win for MIN. However, the first MIN reply results in a tie, so its value is 0 and beta is set to 0. This triggers an alpha-beta cutoff for MIN because beta (0) is less than or equal to alpha (also 0). In other words, the remaining MIN replies are pruned away because presumably MAX would not choose a branch of the search tree that appears no better than one found earlier — in this scenario, that very first MAX move.
As a result, that second MIN move (a forced win for MIN) is never seen, and in turn that second MAX move is given a value of 0. Thus the computer might not choose that first MAX move (a tie) but instead the second MAX move, resulting in an avoidable loss for MAX in the actual game.
Here's a more concrete example: A tic-tac-toe board can be represented as a 3x3 grid of squares numbered 0 to 8:
Let's say that the MIN player chooses square 8. The MAX player has eight possible replies. The first four replies examined — squares 0, 1, 2, and 3 — are all evaluated to eventually result in a loss for MAX. Square 4 is evaluated as a tie, with a value of 0, and thus alpha is set to 0. Square 5 is next evaluated. As part of that process, the alpha value of 0 is passed down, and the possible replies by MIN are evaluated one at a time. Square 0 is evaluated for MIN as a tie, and thus beta is set to 0. The next possible reply by MIN, square 1, is also a tie. At this point, both beta and alpha are 0, triggering an alpha-beta pruning of the remaining MIN replies. These include squares 4, 6, and 7, all of which allow MIN an easy win. But these are never seen, because of the alpha-beta cut-off. A value of 0 is passed up the search tree. Consequently, MAX incorrectly sees square 5 as a tie just as good as square 4, even though playing at square 5 would likely result in a loss.
Does this not appear to be a fatal weakness of alpha-beta pruning? It certainly was in the game software as I was developing it — exhibited by the computer choosing suicidal moves whenever the human played at a square that was further down the list of possible moves and after a square that resulted in a tie game.
Several counterarguments come to mind. The first one asks, "Is this not merely a special case — and perhaps even a rare one — in which both the earlier MAX node and its MIN reply are evaluated as ties?" No, not only in my experience is that situation not a rarity, but as long as the two moves share any identical value (not necessarily zero) and it is low enough to be assigned to beta, then the premature cut-off still occurs. I chose zero for the node values for clarity and because that is what I observed happening in my own game testing.
Another possible objection is one I received from another person — someone who clearly has never written minimax code: "Why would MAX choose a branch almost guaranteed to lead to a loss, and not always stay with the earlier and safer branch?" This naïve confusion results from thinking about the problem while already knowing the unavoidable loss that will stem from MAX choosing that bad branch. The computer program, on the other hand, sees only each move (and its node evaluation) one at a time, and not the whole tree. Unlike a human, the program has no way to foresee that the second move is an awful choice because the losing path is pruned as a result of beta, and thus that second move is incorrectly evaluated as safe. That's the whole point of this apparent problem.
The final objection is much stronger: "In the scenario presented, wouldn't MAX choose the earlier move and not the later one — since the latter move appears to be no more promising than the former move — thereby avoiding the losing branch?" That may happen, yet it may not, if for example the program chooses randomly among any equally-attractive moves in order to add some variety to the play by eschewing the same deterministic branches game after game — as I do in my own code.
It is possible that my analysis is flawed somehow. If it is and you see the flaw, I would appreciate your feedback.