ICGA Journal

Vol. 25, No. 1 - March 2002

Table of Contents

Table of Contents1
Towards Strong Solutions (H.J. van den Herik) 1
Gradual Abstract Proof Search (T. Cazenave) abstract3
Alpha-Beta-Conspiracy Search (D. McAllester and D. Yuret) abstract 16
A Lockless Transposition-Table Implementation for Parallel Search (R.M. Hyatt and T. Mann) 36
Information for Contributors40
News, Information, Tournaments, and Reports:41
Report on the First Jenazon Cup (I. Alth÷fer)41
The Match Van Wely vs. REBEL CENTURY 4.0 (J. van Reek and J.W.H.M. Uiterwijk) 45
Go-moku Still Alive (A. Nosovsky)47
Special Sessions on Heuristic Search and Computer Playing at JCIS 2002 (K-H. Chen) 49
The 11th International Paderborn Computer-Chess Championship (U. Lorenz) 50
The 7th Computer Olympiad (Maastricht, The Netherlands) 51
Call for Papers Computer-Games Workshop (Maastricht, The Netherlands) 52
The 10th World Computer-Chess Championship (H.J. van den Herik) 53
Tournament Rules for the World Computer-Chess Championship 2002 (The Board of ICCA) 54
The ICCA Triennial Meeting (The Board of ICCA)55
A Change to the ICCA Constitutions and ICCA By-Laws (The Board of ICCA) 55
Propositions for a New ICGA Board (The Board of ICCA) 57
Financial Overview (Treasurer ICCA Board)58
The 2000 ChessBase Best-Publication Award (The Board of ICCA) 59
The ICGA Journal Awards 2000 and 2001 (The Board of ICCA) 59
The 2002 Herschberg Best-Annotation Award (The Board of ICCA) 60
Calendar of Computer-Games Events in 2001-200360
Tablebase of Contents, continued (J.W.H.M. Uiterwijk) 61
The Swedish Rating List (T. Karlsson)63
How the ICGA Journal Reaches You64



Many researchers are attracted by the idea to develop a method that is able to solve a game. When one investigates this research aim more closely two possible approaches come to the fore. First, there are the game-playing researchers who love the game and would like to know everything about it. They usually have expert knowledge of that specific game and aim at developing a game-dependent method that solves their game. Second, there are the AI specialists, well-known for their deep understanding of the knowledge-and-search approaches as practised in the game-playing community. They know that they have a powerful tool at their disposal, and they search for an application domain, typically a game not yet solved.

Solving a game is something like revealing a mystery. However, in everyday research terminology it is nothing more than determining a property with regard to the outcome of the game. In his thesis Victor Allis distinguished three types of game solutions: ultra-weak, weak and strong. An ultra-weak solution means that the game-theoretic value of the initial position of the game has been established (e.g., Hex is a win for the first player). A weak solution means that for the initial position a strategy has been determined to obtain at least the game-theoretic value (e.g., the winning strategy for Go-moku and Renju). A strong solution means that for all legal positions a strategy has been determined to obtain the game theoretic value (e.g., the chess endgame databases).

The current issue of the Journal deals with solving games as can be read from Tristan Cazenave's contribution, Alexander Nosovsky's call for solutions, and Eugene Nalimov's brief scientific biography. This is the place to warn the reader explicitly that (1) different games may have the same name and (2) the same game may have different names. We provide two examples and encourage our readers from now on to read all related articles with this warning in mind. According to Lasker (1934) the Japanese game of Go-moku has two restrictions (see p. 47 of this issue) but in the Western world the latter have disappeared. The Japanese version is now called Renju, and Go-moku has ousted the correct name of Five in a Row. Moreover, there are many other names, mostly country dependent (see again p. 47). Other appearances of at least two names for the same game are the couple: Phutball and Philosopher's Football, as well as Atari-Go and Ponnuki-Go.

Tristan Cazenave is a researcher who clearly belongs to the group of scientists that possess a powerful tool and are searching for an appropriate application domain. In the past Cazenave developed the technique of Abstract Proof Search (APS) and he has recently extended the technique to Gradual Abstract Proof Search (GAPS). To see how powerful it is, he went to the world of "small games", i.e., games that are pocket editions of the original game. This well-known technique was used to solve Domineering and is applied in Othello (solving 6x6 Othello) as well as in the domain of chess and checkers (endgame databases). Cazenave has now solved two versions of Phutball, namely for a 9x9 and an 11x11 grid. The original game, invented by Conway, is usually played on a 19x15 grid (or on a 19x19 Go Board). So, there is still some way to go. Nevertheless the performance deserves our admiration.

Cazenave's technique allowed him to solve some smaller versions of the game Atari-Go, too. There he solved 6x6 Atari-Go with a crosscut. The "official" definition of Atari-Go is for 9x9 or even 19x19 boards (without a crosscut). Assuming it is possible to go down this road, there will certainly be a very long way to go. Personal communication with colleague researcher Erik van der Werf revealed the latter's research activities in the domain of Ponnuki-Go. Van der Werf, independently, has solved the plain 5x5 version and all 6x6 versions with a symmetrical centre, a crosscut being among them.

Cazenave's technique used for solving these games is a generalisation of threat-space search, a technique where at any time the opponent has only a limited set of replies. Hence, the search algorithm represents the application of a single-agent search to two-player games. Victor Allis called an analogous generalisation Dependency-Based Search, a name which has never been adopted by other researchers, though it describes in general terms remarkably well the method how to solve a game. The essence is twofold: for the knowledge expert it is the true application of what the expert knows, and for the search expert it is a straightforward doubling of the search depth.

Reviewing the current state of the art, your Editor concludes that two different tasks remain for the game researchers. First, to continue the process of solving "little brother" games such as 13x13 Phutball, 6x6 Atari-Go without crosscut and 6-piece chess endgames with Pawns, etc. as advocated by Cazenave and Nalimov. Second, to start extensive research on "solving" all legal positions of a game, such as Nosovsky calls for with respect to Renju. The latter can be tested in problem-solving competitions. Although the future of this Journal is not specifically bounded to solving games, as is proved by the current contributions on Alpha-Beta-Conspiracy Number Search and Transposition Tables, we wholeheartedly encourage researchers to report on their domain-dependent optimisations since we consider them as a major step towards a deeper understanding of the intricacies of a game.

Jaap van den Herik


Tristan Cazenave1

Paris, France


Gradual Abstract Proof Search (GAPS) is a new 2-player search technique. It has been used to prove that 11x11 Phutball is a win for the first player in 25 plies or fewer, and that 6x6 Atari-Go with a crosscut in the centre is a win for the first player in 15 plies or fewer. In the paper, I give domain-dependent optimizations for the two games. Moreover, I describe theoretical and experimental comparisons with other related algorithms.

1 Labo IA, UniversitÚ Paris 8, 2 rue de la LibertÚ, 93526, St-Denis, France. Email: cazenave@ai.univ-paris8.fr.


David McAllester1 and Denize Yuret2

New Providence NJ, USA


We introduce a variant of α-β search in which each node is associated with two depths rather than one. The purpose of α-β search is to find strategies for each player that together establish a value for the root position. A max strategy establishes a lower bound and the min strategy establishes an upper bound. It has long been observed that forced moves should be searched more deeply. Here we make the observation that in the max strategy we are only concerned with the forcedness of max moves and in the min strategy we are only concerned with the forcedness of min moves. This leads to two measures of depth --- one for each strategy --- and to a two-depth variant of α-β called ABC search. The two-depth approach can be formally derived from conspiracy theory and the structure of the ABC procedure is justified by two theorems relating ABC search and conspiracy numbers.

1 AutoReason, 85 Magnolia Drive, New Providence NJ 07974, USA. Email: mcallester@autoreason.com.
2 Koc University, Rumeli Feneri Yolu, 80910 Sariyer, Istanbul, Turkey. Email: dyuret@ku.edu.tr.



Robert M. Hyatt1 and Timothy Mann2


Birmingham Al, USA

Palo Alto CA, USA


Parallel search programs use traditional alpha-beta tree searching, which includes the concept of transposition/refutation tables. Storing these entries in parallel can result in corrupt data that causes significant problems to the search. To prevent this, the common solution uses an atomic lock/unlock to implement a critical section that avoids the problem, but this incurs a significant performance penalty on many architectures. The paper describes a new approach that completely eliminates the need for any synchronization through locks, while still avoiding the problem of retrieving and using corrupted data.

1. Introduction

Assume that an entry in the transposition table (entry t) contains three pieces of data; the hash signature (the value used to confirm that this hash-table entry represents the right position), the best move and the search result. In Crafty, a hash entry contains more information, but those three pieces are the important ones for this discussion.

Now further assume that the hash signature requires 64 bits to store, and that the move and score can be combined into one 64-bit word. This gives us two 64-bit words for a single transposition-table entry. Name these two words W1(k) and W2(k) for chess position k. For the remainder of this discussion, assume that 64 bit memory read/write operations are atomic, that is the entire 64-bit value is read/written in one cycle. This leads to the conclusion that a 128-bit read/write is broken up into two 64 bit reads/writes, and therefore this is not atomic since it becomes two distinct cpu-to-memory transactions.

When the program determines that it is time to store an entry into the transposition table, it computes the transposition-table index by taking the low-order N bits of the hash signature, and using this as the table address where this 128-bit entry is to be stored. The problem arises where two different hash signatures match in the right-most N bits, so that both address the same transposition-table entry (that is, W1(j)&MASK matches W1(k)&MASK exactly, where positions j and k are different). For positions j and k, we then have W1(j) and W2(j) as one entry, and then W1(k) and W2(k) as the second entry. What happens if two threads (using two or more processors) try to store these two table entries at approximately the same time, or if one thread tries to store at this table entry while another simultaneously tries to read that table entry? Both will address the same 128 bits in the transposition table. However, recall that W1(j) != W1(k) and W2(j) != W2(k) (j and k are two totally different chess positions that just happen to map to the same table entry because the hash signatures for both have the same low-order bits in W1). The main concern here is that there is no "atomic" way to store both W1(k) and W2(k) in one memory operation, nor is there a way to completely read W1(k) and W2(k) without having a small chance that another thread will change one of the two parts but not both before the read is completed. When two threads try to store two different 128-bit entries at the same table address, or one thread reads the 128-bit entry while another thread is writing to the same entry, the resulting data will look like one of the following:

W1(j) W2(j) [ok]
W1(k) W2(k) [ok]
W1(j) W2(k) [bad]
W1(k) W2(j) [bad]

The above happens because when any thread tries to store an entry, it has to do two (or more) memory writes. And if two threads try to store in the same entry concurrently, then any of the above can happen since the stores can be done in any of 4 different orders. (It should be noted that this same problem occurs when one thread is trying to read the 128 bits from memory to access the information, while another thread is simultaneously writing different information to that hash entry for another position because memory writes and reads can be interleaved in several different orders.)

The problem becomes an issue when the data is used. Based on the four possibilities above, there is a 50% probability that when one thread matches its hash signature with W1(p) (where p is either j or k), that it will get a match, but the W2(p) does not go with the W1(p) signature. For any position p in the table, it is essential that W1(p) is paired with W2(p), because the information in W2(p) is trusted.

This problem was first seen in Cray Blitz right after it was parallelized in 1983. The initial thought was "who cares which entry gets stored; there is no real reason to prefer one over the other since it cannot be determined (now) which one will be more important in the future?" But testing and debugging exposed the problem rather quickly. For safety, Cray Blitz tested the best move part of W2(p) to make sure it was a legal move, as making an illegal move (like castling) would create an extra Rook or King on the board. During testing, the program would occasionally report an error message from the move-validation routine that said "Illegal move from hash table." It was then necessary to discover why this was happening, because it never happened when testing with a serial (non-parallel) search.

We discovered that one thread would attempt to store W1(j) and W2(j) at entry X in the table. Simultaneously, another thread would attempt to store W1(k) and W2(k) at entry X, or it would attempt to read W1(k) and W2(k) from entry X. Since the processors queue up memory reads and writes and they proceed independently, there was no way to be sure that the first process completed both writes before the second would start its own reads/writes. Now the entry could have either W1(k) and W2(j) or W1(j) and W2(k), either of which is a serious problem.

The obvious solution is to use an atomic lock around the memory reads and writes, so that once a single thread writes W1(j) at table entry X, it is impossible for any other thread to read/write anything in the table until the first thread completes the entry by writing W2(j) as well. This neatly solved the problem, and the errors no longer occurred.

The problem with the above solution is that for machines with more than 2 processors, the locks begin to affect performance. For machines with slow atomic locking facilities (most today excepting the Cray are terribly slow at this operation) the penalty is even more pronounced. We wanted a better performance option than the atomic lock solution provided.

Another solution, used by at least one parallel program, is simply to ignore the problem. If the program carefully validates the best move before playing it on the board, then the board corruption problem can be eliminated. Of course, the program ends up using a score from a wrong position every now and then, which can still lead to errors, although the errors are likely to be non-catastrophic since the board does not get corrupted. We tried this approach for a while, but with small transposition tables and fast hardware, the probability of errors seemed too high to ignore.

2. The lockless strategy

One more observation is important to the solution of the problem. With no lock, it is possible to find (a) no move or score if the current hash signature does not match the table-entry signature, (b) a correct move/score for this position if the signatures match, or (c) a wrong move/score if the signatures match but the wrong W2 value is present. Of the three, only the last one has to be avoided.

Our approach is to modify slightly the way we store entries, as follows:

T = W1(p) & MASK;
W1(T) = W1(p) ^ W2(p)
W2(T) = W2(p)

In the above, T is the table address, <W1(p),W2(p)> is the value we want to store in the table position T for the current position p. Now the question is, how does that work and how does it cure our problem?

First, when we later do a table probe, we want to find W1(p) in the table (here p is the position we are trying to find), where we know that W1(p) matches the value W1(p) we stored previously. We take W1(T) from the table and exclusive-or it with W2(T) which should give us the original W1(p) value. If we store W1(j) and W2(j) in the table at entry T, then we really store W1(j)^W2(j) as the first word, and W2(j) as the second word. If we come back later we will match properly, since W1(j)^W2(j)^W2(j) is simply W1(j) (if you exclusive-or a value with the same value twice, you get the original value you started out with before the first exclusive-or operation.) Let us try an error case, where we store W1(j), then W1(k) and W2(k) followed by W2(j) (which caused the error previously). At the table entry T, we store W1(j)^W2(j) in the first word, and then we are interrupted. The second thread comes along and stores W1(k)^W2(k) in the first word, and W2(k) in the second word. The first thread continues and stores W2(j) in the table. We now have W1(k)^W2(k) for the first word in the table at entry T, and W2(j) as the second.

Now let us suppose we reach position j again and we go to the table to see if we can match the current hash signature with W1. We take the table entry T, and compute W1(T)^W2(T). Which is W1(k)^W2(k)^W2(j). Note that we originally exclusive-or'ed W1(k) with W2(k), but then we lost W2(k) and had it replaced by W2(j). If you compute W1(k)^W2(k)^W2(j) you do not get W1(k) again. You get something else entirely. The current thread then decides "no match here" and continues searching without using the corrupted entry.

The same thing happens if the other thread tries to find position K. It too will fail since W1 needs its paired W2 to decrypt the original W1 value. What we get, then, is a simple methodology to eliminate the error-condition, because we will never get a match if either W1 or W2 is changed independently. Since W1 depends on W2 to decrypt it, the two must remain paired in the table or neither will be used until they are overwritten.

3. Conclusions

The first thing that should be obvious is that when two threads try to store at the same table entry concurrently, most of the time you get either W1(j) and W2(j) or you get W1(k) and W2(k). It is much more rare to see the non-paired error cases where either W1(j) is paired with W2(k) or vice-versa. This implies that most of the time, this is not an issue at all.

However, on occasion, two threads try a simultaneous update and the effect is a "lost" table entry because it can not be matched by W1(j) or W1(k) since the matching W2 part was lost. The disadvantage of this is that we get an entry that is totally unusable, but by doing so we will _never_ get an entry that is corrupted and provides a bad score or move. Of course, this entry will be overwritten at some point with useful data so the penalty is a short-lived one.

With today's large memories, it is not uncommon to have millions of transposition-table entries. When you analyze the probability of two threads storing at the same entry concurrently, the odds are very small. When you add in the odd timing that tickles this bug, the odds are reduced even further. In fact, it is likely that this happens just a few times per game with 4 processors or less. The problem is that when it happens, the search would normally get a bad score which could change the tree. Or even more importantly, it could get an illegal move that would corrupt data structures and likely lose a game. The current scheme eliminates the locking overhead completely, and guarantees that there will be no corrupted entries that actually get used by the search. 

This methodology has been used in Crafty for over two years with good results. On machines where the previous atomic locking methodology caused some performance problems, these problems were completely eliminated when the locks were removed. Yet we can still trust the best move and score to be 100% accurate with no danger of incorrect values.

This methodology works just as well on transposition tables with different characteristics than the ones used for the actual implementation. For example, there is no real requirement that an entry have a 64-bit signature and a 64-bit data value. This approach would work well with a 32-bit signature that is stored, plus 64 bits of information to go with that signature. Only the XOR operation would need to be changed to make sure that that W2 is needed to decrypt the W1 signature.

4. References

Hyatt, R.M. (1998). The Dynamic Tree-Splitting Parallel Search Algorithm, ICCA Journal, Vol. 20, No. 1, pp. 3-19.

Nelson, H. (1985). Hash Tables in Cray Blitz. ICCA Journal, Vol. 8, No. 1, pp. 3-13.

Slate, D. and Atkin, L. (1977). CHESS 4.5 - the Northwestern University Chess Program. Chess Skills in Man and Machine (ed. P.W. Frey), 2nd ed., pp. 82-118. Springer-Verlag, New York, N.Y. ISBN 0-387-90790-4.

Hyatt, R.M., Nelson, H., and Gower, A. (1990). Cray Blitz. Computers, Chess, and Cognition (eds. T.A. Marsland and J. Schaeffer). Springer-Verlag, New York, N.Y. ISBN 0-387-97415-6.

1 Computer and Information Sciences, University of Alabama at Birmingham, 115A Campbell Hall, UAB Station Birmingham, Al 35294-1170. Email: hyatt@cis.uab.edu.

2 Compaq Systems Research Center. VMware, Inc., 3145 Porter Drive, Palo Alto, CA 94304, USA. Email: tim@tim-mann.org.