This is Kurt's Lecture material on this subject.In order to explore this wild and wacky world of adversarial or game search, we're going to have to introduce a game. It's a simple game for two players, and it's called hexapawn, for reasons which will become obvious.
The rules of hexapawn (at least in it's original form), are as follows:
W W W
- - -
B B B
As we said, white always gets to move first. This gives
white three possible initial moves, which are represented in
this way:
W W W
- - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W W - W W W -
W - - - W - - - W
B B B B B B B B B
But of course, white doesn't get to make all three moves.
White has to choose one and go with it. So let's say that
white opts for that move on the left. Our resulting state
space then looks like this:
W W W
- - -
B B B
/
/
/
/
/
/
- W W
W - -
B B B
Everything was fine up until now. Now it's our turn. What
will we do? Well, what we'd like to do is look at all of our
possible next moves and make the best one, right? Sure. So
let's see what our options are on this move:
W W W
- - -
B B B
/
/
/
/
/
/
- W W
W - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W - W W - W W
B - - W B - W - B
B - B B - B B B -
Can we tell from just this which possible next move is the
best one? Maybe. That one on the left looks sort of nice,
since it leaves us with one more pawn than our opponent. But
we really can't tell just by looking at these different
boards which move is likely to lead to a win for us. Maybe
we could get a better idea of which of our three possible
moves is the best one by looking even further ahead to see
what white might do on the next turn:
W W W
- - -
B B B
/
/
/
/
/
/
- W W
W - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W - W W - W W
B - - W B - W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
^ ^
| |
white white
wins wins
Well now, maybe we know a little more than before. In fact,
we can see two boards there that indicate victory for our
opponent, and we could probably make some reasonable attempts
at estimating how close the other boards might be to either a
win or a loss for us. However, we could also look ahead yet
another move, and then another, and so on until we had mapped
out all the possibilities. The problem with doing this is
that it's going to cost us lots of computational resources.
This may not be a big deal when we're playing hexapawn with
only three pawns on a side, but it will be a big deal if we
extended the game to eight pawns on a side, for example. Or
maybe instead of hexapawn variations, we're playing something
like chess. Now the computational expense will be far too
prohibitive, so we're going to have to settle on some
arbitrary cutoff for looking ahead in this or any game.
Since I'm running out of room to display all the possible
boards at the same level, let's make life easy for me and set
our arbitrary cutoff for looking ahead at two levels or two
moves ahead.
Providing that estimate is the job of something called the "static board evaluation function". A static board evaluation function takes as input the current state of a game (i.e., the board) and returns a value corresponding to the "goodness" of that current state or board. By "goodness" we mean how close that board is to a victory for us---the closer, the better. A simple static board evaluation function might return, say, a positive number if the board is good for us, a negative number if the board is not good for us (but is consequently good for our opponent), and maybe a zero if the board is neither bad nor good for either player.
How might we design such a function? Here's a weak first attempt at one. Since we're playing on the black side of the board, we'll have the function return a +10 if the board is such that black wins. And we'll have it return a -10 if white wins. (If we were playing on the white side of the board, we'd want it to return a +10 if white won, and a -10 if black won.) Since we win if we can get one of our pawns across the board to the other side, we should have the function take that into account too. So if neither side has won, let's have our function return the number of black pawns with clear paths in front of them minus the number of white pawns in front of them. Oh, and since we win if our opponent's pawns are all removed from the board, let's have the function incorporate that. We'll have the function count the number of black pawns on the board, subtract the number of white pawns, add that number to the previous number, and return that result. There, that wasn't so bad, was it?
W W W
- - -
B B B
/
/
/
/
/
/
- W W
W - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W - W W - W W
B - - W B - W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
The two boards that have been assigned a value of -10 are, of
course, the boards that represent victories for white. In
the case of the leftmost of those two boards, it's a victory
for white because it's now our turn (i.e., black's turn) and
we can't move any of our pawns. The rightmost of these two
boards is a victory for white because white has moved one of
its pawns all the way across the board. But let's take a look at another board. The one at the very left, for example, has been given a value of zero. Yet it's easy for us to see that, since it's our turn, and we only have one black pawn that we can move, if we just move that one black pawn forward one space, we'll have blocked any possible move by white and we win the game. So why isn't that board given a value of +10? Because in order to figure that out, our static board evaluation function would have to look ahead one more move. But a static board evaluation function is exactly that---static. It doesn't look ahead. If we set a limit on the number of moves we want to look ahead in order to play the game in a reasonable amount of time, but then we have our board evaluation function look even further ahead, we're going to eat up additional computing resources that we were trying to save, and we're also going to end up writing the same code twice. So there's absolutely no advantage to having the board evaluation function look ahead an additional move or two or three--- instead, we should just readjust our original depth cutoff so that it allows us to look more moves into the future.
Now let's go back and look at those bottom two levels in our hexapawn state space:
- W W - W W - W W
B - - W B - W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
What can we do with those numbers that have been assigned to
the boards? Those boards all represent possible results of a
move by white. Those numbers can be used to tell us which of
those moves white is more likely to make. For example, in
the leftmost subtree, we might guess that white is more
likely to make the move that results in a board with value 0
than the moves that result in boards with value 1, because a
board with value 0 is better for white than a board with
value 1, which favors us. That assumes, of course, that we
trust our evaluation function. Similarly, in the middle and
rightmost subtrees, white is going to prefer the moves that
result in a board with a value of -10 (a victory for white),
right? We can indicate those preferences by taking the
minimum values among those board values in each subtree and
propagating them up one level:
- W W
W - -
B B B
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
- W W - W W - W W
0 B - - -10 W B - -10 W - B
B - B B - B B B -
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
/ | \ / \ / | \
- - W - - W - W - - W - - W - - W W - - W - - W
W - - B W - B - W W B W W W - - - B W W B W - W
B - B B - B B - B B - B B - B B W - B B - B B -
0 1 1 -10 -1 -10 0 -1
OK, now how can we use that information? We use it in almost
exactly the same way as we did before. We can figure out
which move we should make of the three that are available to
us by finding the maximum of the values that we just
propagated upward. One of those values was a 0 and the other
two were -10. Of course, the 0 is better for us, so we'd
choose to make that move. Let's go back and see what we've done here. First we started with some arrangement of pawns on the board and the knowledge that it was our turn. We generated all the moves we might make, and then we generated all the moves that our opponent could make after we made our move. We arbitrarily chose to look only two moves ahead, but we could have looked further if we wanted to give up the computational resources to do so. We then applied our static board evaluation function to the bottom-most boards (i.e., the terminal leaves on the tree that is our state space) and assigned a numeric value corresponding to "goodness" to each of those boards. Those bottom-most boards are each the result of a possible move by white. We assumed that white would always make the best move it possibly could, so we propagated the minimum values up from the leaves to the immediate parents. And then we assumed that we would want to make the best possible move that we could, and we chose that move by selecting the maximum of the values that had just been propagated upwards.
Because we chose to look ahead only two moves, the first propagation was of minimum values from the very bottom level, followed by a propagation of maximum values upward from that level. If we had chosen to look ahead three moves, we'd first propagate maximum values from the bottom, then minimums, then maximums. If we were looking ahead four moves, we'd start with minimums, then maximums, then minimums, then maximums. And so on, and so on. The procedure that we just described has a name, "minimax", and it's the heart of game-playing computer programs.
The minimax procedure relies on two assumptions. First, there must be an adequate and reasonably accurate board evaluation technique. It doesn't have to be perfect, but it does have to work more often than not. The second assumption is that the relative merit of a move becomes more obvious as you search deeper and deeper into the state space. Why? Because if that weren't so, there wouldn't be any value in doing the search in the first place. But keep in mind that for any given game, or at least any given implementation, one or the other (or both) of these assumptions may not be true.
Next time, we'll look at an addition to the minimax procedure which might save us some time and space.