The Annual Computer Poker Competition took place a few weeks ago at AAAI, the main AI conference. It looks like there were about 30 bots submitted, both from academics and recreational programmers from around the world. There were 6 different competitions overall: three different variations of poker (two-player limit, two-player no-limit, and three-player limit, all Texas Hold'em), and for each variation two different scoring metrics (total bankroll (tbr), and bankroll instant runoff (iro)).
For each metric, each pair/threesome of bots played several matches, each consisting of 3000 duplicate hands. Duplicate hands work by dealing out a hand, then dealing out the same hand with the bots and positions reversed. The memories of the programs are erased between the duplications, so they can't remember the first play of the hand. This is a well-known technique for reducing variance, so that fewer hands are needed to obtain a desired level of statistical significance. Several of these 3000-duplicate-hand matches are played between each set of bots until a desired level of significance is obtained.
The tbr metric works by playing all the entrants against each other, and ranking the bots by their total profits. The iro metric is a bit more complicated. First, all entrants play against each other, and the bot with the worst head-to-head record is eliminated. Then, the records of all remaining bots are recomputed, and the remaining bot with the worst record is eliminated. This is continued until only one bot remains. (These descriptions were for the two-player competitions, and the three-player competition was pretty similar).
The different scoring rules are designed to encourage different types of algorithmic innovations. In particular, the iro encourages equilibrium or approximate-equilibrium bots. If a bot actually played an equilibrium, it would beat or break even against every other bot in expectation, and would win the iro competition in theory if enough matches were played. On the other hand, an equilibrium bot might not exploit weaker opponents as much as exploitative bots that use learning/opponent modeling. So this competition really encourages both game-theoretically "optimal" strategies, as well as exploitative strategies that do very well against weak opponents. I think both of these problems -- playing optimally and exploiting weak opponents -- are very important problems in AI.
While the competition is really great for the field, and provides a rigorous experimental testbed for these important problems, it is not without significant issues in my opinion.
My main criticism is with respect to a selection bias that prevents the tbr metric from being very meaningful. In particular, almost all of the bots submitted to the competitions (especially the two-player limit competition, which has received the most attention over the years and is the most competitive), are approximate-equilibrium bots and are quite good. The reason for this is pretty simple; if a good university spends a lot of time on a program for the competition, it will probably be very good (especially since most of the strong programs publish papers detailing the approach used). And if for some reason a team produces a bot that isn't very good, the team can just choose to not submit the bot and avoid the embarrassment of a poor finish. It is not hard to assess how good/bad your bot is too, since bots from previous competitions are made available for testing. So in short, it is very unlikely a team will submit a weak bot.
In addition, if only one or two weak bots are submitted, these bots will totally skew the results, and essentially determine the winner, since the winrates against the weak bot(s) will be significantly larger than the profits/losses against the stronger bots. So the results are not robust at all against the addition of a weak bot. Ideally we'd like to be able to make more general conclusions from the competition than just that the winning bot was especially good at exploiting one specific bot.
To put this in perspective, recall Axelrod's experiments described in The Evolution of Cooperation. To summarize briefly, he ran a competition for the repeated prisoners' dilemma game, which drew tons of entries from people in many different fields (e.g., econ, math, cs, psych, sociology). The submissions contained a very wide variety of strategies, and the winner ended up being Tit for Tat (TFT) -- a strategy that works by first cooperating, then playing whatever its opponent did the previous round. This was very significant, as it suggested that not only can cooperation emerge in this game (while game-theoretic analysis would suggest otherwise), but one of the most cooperative strategies actually has the highest performance against a wide variety of "reasonable" strategies.
Now imagine what would happen if all of the entries submitted were "rational." In fact, for the finitely-repeated prisoner's dilemma (Axelrod's experiments were repeated 200 times with each opponent pair), always defecting is the unique Nash equilibrium, and is also the only strategy surviving iterative dominance. So game theory would clearly suggest playing always defect. If everyone did this, then everyone would break even (and get a very low payoff). In fact, against a field of people playing Always Defect, TFT would actually come in last, since it would lose the first round before getting the (D,D) payoff for all later rounds. If this had happened, then it is unlikely TFT would have received all of the attention it did, and we might have greatly different views towards the possibility of cooperation in this setting.
I feel like a similar phenomenon is happening in the poker bankroll competitions. The current set of submitted bots just doesn't provide a good mechanism for identifying strong exploitative bots in a meaningful or robust way. This is definitely not a fault of the organizers of the competition in any way. But I think it is definitely an issue that needs to be addressed.
One idea to address this would be to reuse old versions of bots, which presumably are well behind the state of the art. But this can be problematic. First, the authors of those bots might have graduated and the code/jar files might not be available. Also, this might give an unfair advantage to the teams that wrote these bots, since they have access to the code and can determine how to better exploit them. Finally, these bots also might not demonstrate the same kinds of mistakes that many humans make.
One further idea would be to try to somehow learn some models of average human play from actual hand histories. I imagine this would involve getting access to tons of data, and again I'm not sure if the personal benefits would outweigh the time (though it would benefit all of the other competition entrants too). Also, this could potentially lead to a pretty cool paper if new algorithmic techniques were developed, i.e., "Learning human play in poker." On the other hand, it could also lead to a terrible paper, i.e., "The development of an extremely mediocre poker bot by standard techniques." In any case, right now this seems like the most promising solution to me, especially with the availability of large databases of human play from online poker.
Anyway, I should probably say that I'm not just being results-oriented and venting because we did poorly in the competition. In fact, CMU's team came in first in the no-limit bankroll competition, though we did worse than I hoped in the limit bankroll competition. Results aside, I think there are some major theoretical shortcomings of the current competition mechanism that should be addressed, though overall it is a great success.
Subscribe to:
Post Comments (Atom)
This is different than prisoners' dilemma game because if you play hu you and your opponent cannot cooperate so that both of you earn more unlike the prisoners' dilemma.
ReplyDeleteOnly thing that would happen is evolution of collusion. That is exactly what happend to prisoners' dilemma competition. One both would intentionally lose money to a friendly bot.
I mean, the game of poker itself isn't very similar to the prisoner's dilemma. But the tournament structure and issues that arise are very similar. My main point is that both tournaments need weak competitors in order for the results to be meaningful, and this is pretty tricky to do in the poker competition.
ReplyDelete