Category Archives: 扑克AI

德州扑克AI ——(吴昊 熊兵兵 合译)

德州扑克AI ——(吴昊 熊兵兵 合译)

  作者:Mick West

(该英文文献所在的网址为http://rrurl.cn/lQlN3B

原文链接:http://www.cnblogs.com/tuanzang/archive/2013/03/27/2985497.html

本文最初发表于“Inner Product”中的“游戏开发者”栏目,时间是2005年11月,我最近编写了为扑克牌的AI编写了一系列。我一开始以为这将是一件容易的事,但它证明了这比我想象的要复杂很多。本文为初露头角的扑克AI程序员提供了一个基础的,一个简单的实现。无制约的德州扑克AI,覆盖一副牌力量的评估的基本知识和下注的基本知识。通过运用本文提供的一些原理,你可以很快地模拟出一个强悍的德州扑克AI,并且对你将来做牌类的AI的每一步到底做什么有一个坚实的基础,我这里假设你懂得纸牌的基本术语。

(如图,这是一个可以计算德州扑克的牌的赔率的计数器)

德州扑克(TEXAS HOLDEM)

  游戏AI的目标,我认为有两点:

(1)    让玩家有一种有趣和令人愉快的体验。

(2)    从属于目标(1),也就是说在玩家拥有了一种有趣而愉快的体验的基础上,尽量让他们得到一种“挑战”,而不像游戏《植物大战僵尸》一样,玩家没有一丝一毫的挫败感。

  扑克的数据类型

  你需要考虑利用一些数据结构来标识这些状态信息(这一点,我已经在吴昊品游戏核心算法 Round 15之吴昊教你玩德州扑克中做到了,方法就是位标识存储),以下就是利用位/字节对如下信息的一些存储(更好的存储方式,本文留给了读者自己去想)

花色(suit)是在0—3范围的整型变量,梅花=0,方块=1,红心=2,黑桃=3

点数(rank)是在0—12范围的整型变量,其中,令2(deuce)=0,3=1……13(King)=11,1(Ace)=12,每一个花色都有13个不同大小的点数

一张纸牌(card)是在0—51范围内的整型变量,我们提出如下公式

card = suit*13 + rank.

Suit = card/13

Rank = card%13

我们利用64bit的空间来存储一手牌(实际上,空间上面依然有一些浪费,其中的52bit被使用,而有14bit被留作陪葬品了)

我们利用一个整型变量来描述你手上的牌型(poker type), 其中 0= no pair, 1=pair, 2=two pair, 3=trips, 4=straight, 5=flush, 6=full house, 7=quads, 8=straight flush.

  牌值分析

  我们利用一个32位的整型变量来表征一手牌的值,它表示一手牌的相对价值和一手牌的实力。通过两手牌的值,你可以判断出哪一手牌的实力更强悍。

我们利用6个半字节(4位)来表征一手牌,其中,最重要的四位代表牌的牌型,后面的5个半字节量来表征不同等级的牌在牌值分析中的价值。

例如:

样例1:AH QD 4S KH 8C是一个没有对子的牌型(有时候我们说成是散牌或者是高A),所以,所以,type设置为0,剩下的五个ranks按照五张牌的递减顺序排列,(A,K,Q,8,4)被翻译为以下的五个数:12,11,10,6,2(或者对于16进制来说,为C,B,A,6,2),再结合高位的牌型0标识,给出了一个32bit的整型变量:0x000CBA62.这里,我们需要注意两点:(1)我们的这种数据结构忽略了花色的信息,但是,唯有我们在分析同花顺的时候,才有必要了解到高位信息。(2)注意到两个高位的牌值都为0.

样例2和样例3的解释同理,所以,我在这里就忽略了。

计算牌值

  我们现在需要的就是得到一手牌,然后计算以下这手牌的牌值。

这牵扯到牌型,插入半字节变量的牌的等级,以上。

一手拥有着四种花色(梅花,方块,红桃,黑桃)都有13bit(对于每一种花色来说),13bit可以提供仅仅8192种组合,我们可以通过预处理8K的表中的若干像这样的位(在13bit内部的)(如果你有五个或者更多的相同花色的牌,那你就得到了一个同花顺),或者是一手牌中的任何强悍的牌,你也可以从一个特别的bit组合中预处理出最高的五张牌,作为起步牌。

如果你要去计算等级(草花  方块  红桃 黑桃),那么该行列值就应该是一个位字段。这个位字段会为你所拥有的至少一个牌设定一个值。在这里设定的这个值是你所拥有的,也是一个特定的值。我们计算出每个草花方块 红桃 黑桃中的设定片段的数字值,并减去在每一个特定的行列值中的数字值,得到重复的行列数值,以此用来作为确定你有什么类型的底牌的基础。

例如:如果你有2D AS AH 2C 2H,你可以迅速确认你有五张牌,只有两个独特的rank,重要的是,你必须有一个葫芦(福尔豪斯)或者是一个铁支。更多的简单测试将几乎决定你要什么。整个评估函数将包括像这样的测试,逐步削减可能的牌型。

因为这个函数更多地包含了位运算,表查询和简单比较,它会变得非常快(位运算的优势嘛),它也非常适合于微调优化,确切的实现将取决于目标体系结构,你可能可以利用一些特定的处理器指令,这样会变得更有效率。

计算一手牌的力量

一手牌的力量计算是你这手牌可以赢的概率,给你底牌,明牌和留在对手手中的牌,一手牌的力量是一个介于0.0(彻底地输)和1.0(彻底地赢)之间的一个浮点数,例如,HS为33%的话,说明你有33%的概率可以赢。

一个最简单和最方便的手段来计算HS的方法就是模拟许多许多次游戏的过程,计算你的牌可以赢的次数(这有点像数学建模里面的黑箱操作),比如你模拟1000次这个游戏,在模拟中,你赢了423 次,那你可以近似的确定,你赢这场游戏的概率(HS值)为0.423.

模拟整个过程是很简单的:

(1)设置分数0

(2)移除你所知的牌(底牌和明牌)

(3)重复1000次(或者更多吧,这取决于CPU的资源和期望得到的精确程度)

(4)随机化剩下的组

(5)对付你对手的底牌,和你剩下的公共牌

(6)评估所有的手牌,看看谁有最好的!

(7)如果你有最好的话,那么加上1/(和你拥有相同命运的牌值的人的数目)(这通常是1)

(8)结束if循环

(9)结束1000次模拟

你这一手牌的力量=你所得的分数/你进行模拟实验的总次数

更精确的考虑我觉得几乎没有必要,所以,在这里也略去。

(如图,此为2011年百度之星的总决赛,当年的题目就是德州扑克的AI)

POT的赔率

POT的赔率的计算=你为了叫牌下的注/(你叫牌下的注+POT内的钱的总数)

回报率

回报率指的是,你如果要下这手牌,可以得到的金额与你下注的比值(引入了牌力的大小)

回报率=一手牌的力量/POT的赔率

我们的基本策略就是如果回报率大于1的话,我们就将拍拿在手上。

对于弃牌(FOLD)/叫牌(CALL)/加倍(RAISE)的选

对于每一个Round(不同于吴昊系列的Round,这里指的是一次游戏)的下注中计算机都需要决定是否需要弃牌/叫牌/加倍(被称为FCR决定),忽略目前叫加倍有多大的价值,我们得到一个比率量(返回值RR),这里提供一个基于一定可能性的既简单又非常实用的映射(映射的两个量为RR和FCR)

如果RR<0.8,那么95%选择弃牌,0%选择叫牌,5%选择加倍(这里加倍的目的是为了虚张声势)

如果RR<1.0,那么80%选择弃牌,5%选择叫牌,15%选择加倍(虚张声势,虚张声势!!!)

如果RR<1.3,那么0%选择弃牌,60%选择叫牌,40%选择加倍

另外,如果RR>=1.3,那么0%选择弃牌,30%选择叫牌,70%选择加倍

如果弃牌和叫牌的数量都为0的话,那么,叫牌吧!

不要过于在意以上列出的精确的百分比,这些数目将决定于你计算你的一手牌的力量值,你也许想过通过上一轮下注的多少来改变你目前的下注,你甚至想通过改变这些数值来创建具有不同个性的对手。

利用这个非常简单的介于RR和FCR之间的映射决定可以让你拥有一个令人惊讶的既有道理的又有娱乐性的AI玩家。他们将趋于玩强有力的手牌。他们也许偶尔会虚张声势,他们也不会轻易因为他们手上的牌太好而感到惊讶,他们也会在虚张声势地叫加倍之后处理薄弱的手牌,他们也会坚持寻找一个合理的机会来得到一个同花顺或者是铁支,让游戏的娱乐性大为增强。

没有一种情况是必胜或者是必败的,这是一个非常重要的道理,这说明你永远都不能根据你的AI对手的行动来揣测出他的牌(除非他们弃牌,这种信息也不能真正帮到你),如果他们加倍的话,那你可是要小心了,他们可能是有一手非常好的牌,但是也是有1比20的概率,他们的手上可能只有一手非常非常差劲的牌。

筹码保护

在你还有很多钱而且盲注比较小的时候,这个简单的规则可以支持的工作。但是,当你的筹码稀释,盲注增加的之后,你就得考虑一下你的金钱的可持续性了。同样地,偶尔,那些玩家也会“全力投入”,他们会赌上自己筹码内的所有的金钱,所以我们必须让AI变得更有逻辑性,来防止在筹码内的金钱很少的时候,不让差的叫牌发生。

假设你有AD,2D,公共牌是QC,KC,2C,那么你有一对牌,但是也有可能是同花顺,在POT内又500美元,赌注是100美元,对手为两个玩家,但是,这是你最后的100美元。POT赔率为100/600=0.1666,你的一手牌的力量为0.297,所以你的回报率为1.8.如果你将这种情景一遍一遍地重复,你将又可能每次得到平均80%的回报率。然而,这是你最后的100美元了,你有70%的概率会失去一切,那么,不要再下注了!

来处理这些事情,我们可以给予一个简单的启发式:

如果我的建议赌注将大大地维持我现在的筹码,那么在我有一次很有自信的赢的机会的时候,不要去下注。

可以部分地描述为:

如果(筹码-下注)<(盲注*4)并且(HS<0.5)那么就弃牌

含义是如果叫牌会让你只剩下不到四倍的盲注,那就不用叫牌,除非你有50%以上的胜算。

扑克是一个复杂的游戏,有着非常多种类的不可思议的情况等你去处理。我建议你让这些极少数个别的情况越少越好,这样可以减少游戏中更少的风险漏洞,但是,我们可以利用一些启发式算法(经验法则)来处理这种模糊的情况,让AI的逻辑具备更多的复杂性,大大提高可玩性。

(我有一个朋友是华中科技大学的百度俱乐部的,他当年也来参观了总决赛,听说得第一名的一个人是利用了一个无理手,让玩家们不断出牌,自己坚决不出,到了最后大家都没有好牌之后才出自己的牌,利用这种奇葩的AI战术取得了非常好的效果!)

  测试扑克的智能性

  平均来说,和一般玩家快速地玩单付德州扑克游戏只需要大约30分钟的时间。理想情况下,通过自然人玩家和智能玩家来竞争你才能完成你的测试,并且找出其中包含的问题。不幸的是,由于玩家的随意性正在一步步得到解决,玩家很容易通过低于标准杆逻辑获得幸运的牌号并硬的游戏,或者甚至于通过有缺陷的逻辑也可以实现这么一点。我已经发现至少需要10场比赛来开始得到对于AI玩家的素质的清晰了解,通过超过100次的游戏才能真正确定这种素质。

这对于测试项目来说往往会造成一种不理智的负担,并在获取AI 玩家身上发生的变化上引入一个非常长时间的时延。  解决的办法是自动化测试。认证机构应该设定不同的变种AI玩家,以使得不同变种的AI可以在一个设定的速度非常高的游戏中互相对战。你也应该编写一些简单的扑克AI的组合,如AI,它总是适于所有的,其他那些易于用手而不是用手指。然后, 你对AI的松紧程度进行设置来应对这些对手,同时确保其赢得适当比例的比赛。如果你写的评价和模拟得当,那么你应该能够在一秒种左右时间内模拟一整场比赛(您可能要减少反复的模拟,以加快测试速度)。

自然人测试的最好用途就是去试图使他们找到AI的利用性,然后你可以编纂到一个临时的AI对手,包括利用此漏洞的测试套件。你进而可以调整你的AI,直到探测到它失败的漏洞,同时仍然能够打败所有其他(标准)的对手。

Programming Poker AI

Programming Poker AI

This article was originally published in the “Inner Product” column in Game Developer Magazine, November 2005

I recently programmed the AI for the World Series of Poker, developed by Left Field Productions and published by Activision. I started out thinking it would be an easy task. But it proved a lot more complex than I initially thought.

This article for the budding poker AI programmer provides a foundation for a simple implementation of No-Limit Texas Holdem Poker AI, covering the basics of hand strength evaluation and betting. By following the recipe set out here, you will quickly become able to implement a reasonably strong poker AI, and have a solid foundation on which to build. I assume you are familiar with the basic terminology of poker.

TEXAS HOLDEM

The goal of any game playing AI is twofold. The primary purpose is to allow the player to have a fun and enjoyable experience. The secondary purpose, subordinate to the first, is to play a strong enough game to provide sufficient challenge to the majority of players in your intended audience.

POKER DATA TYPES

You will need an implementation of the following data types. I’m going to describe them at the bit/byte implementation level, leaving the high level abstraction up to you.

A “suit” is an integer in the range 0..3, where 0=Clubs, 1=Diamonds, 2=Hearts, 3=Spades

A “rank” is an integer in the range 0..12, where 0 = 2 (deuce), 1 = 3, 11 = King, 12 = Ace. This is the cards in a suit arranged in rank order

A “card” is an integer in the range 0..51, hence
card = suit*13 + rank.
Suit = card/13
Rank = card%13

A “Hand” is a 52 bit data type, where each bit represents a single card. This can be stored as four 16 bit words for ease of use, where each 16 bit word represents the potential cards in one suit (using 13 of the 16 bits) (figure 1)

A “Hand Type” is an integer representing the type of poker hand you have, where 0= no pair, 1=pair, 2=two pair, 3=trips, 4=straight, 5=flush, 6=full house, 7=quads, 8=straight flush.

ENCODING HAND VALUES

A “Hand Value” is a 32 bit integer representing the relative value or strength of any hand of cards. By comparing two hand values, you can see which hand is stronger in a game of poker.
The hand value can conveniently be represented as a series of six 4-bit nibbles, where the most significant nibble represents the Hand Type, then the next five nibbles represent the different ranks of the cards in the order of significance to the hand value. (figure. 2)

Example 1: AH QD 4S KH 8C is a “no pair” hand type (sometimes called a “high card”, or in this case “Ace high”). So, the hand type nibble is set to 0. The remaining nibbles in the Hand Value are filled out with the ranks of the five cards in descending order. (A, K, Q, 8, 4), which translated into rank indices: 12,11,10,6,2 (or C,B,A,6,2 in hexadecimal), and when combined with the hand type (0) in the high nibble, gives us a 32 bit integer: 0x000CBA62.

The individual suits of the cards are basically ignored in the final hand value. The only time suit is significant is when it contributes to a flush. Also, note the top two nibbles of the Hand Value are always zero.

Example 2: 4D JD 3D 4C AD is a pair of fours, with Ace, Jack, Three kickers. The hand type is a pair, (type 1), then the ranks follow, starting with the rank of the pair, then the ranks of the kickers, so 4,A,J,3, which gives us 0x0012C910.

Example 3: 7C, 6C, 5C, 4C, 3D is a straight (type 4). More specifically it’s a seven high straight. The only rank of import here is the seven (rank 5). So the hand value is encoded as 0×00450000. We save ourselves a bunch of instructions in ignoring the four low cards after we’ve determined it is a straight.

Look at the resultant hand values of the above examples, you can clearly see how the better hands always have a higher hand value, making determining the wining hand a simple comparison.

CALCULATING HAND VALUES

What we now need is a function that takes a hand, and returns a hand value. This involves determining the hand type, then inserting the nibbles for the hand ranks, as above.

A hand is four words (clubs, diamonds, hearts, spades) of 13 bits each. 13 bits can be arranged in just 8192 combination, which means we can accelerate the evaluation of a hand by pre-calculating 8K tables of things like the number of bits set in a (13 bit) word (if you have five or more of the same suit, then you’ve got a flush), or the highest card of any straight in the hand. You can also pre-calculate a table of the highest five cards from a particular bit combination, which you can then use to set the kicker cards.

If you calculate ranks = (hearts | diamonds | clubs | spades) then the value ranks is a bit-field with a bit set for every card rank that you have at least one of. The number of bits set here is the number of unique ranks you have. We calculate the number of bits in each of hearts, diamonds, clubs and spades, and subtract the number of bits in the unique ranks, giving the number of duplicated ranks, to be used as the basis of determining what type of hand you have.

Example: if you have 2D AS AH 2C 2H, you can very quickly determine that you have five cards, that there are just two unique ranks, and hence you must have either a full house or four of a kind. A few more simple tests will determine exactly what you have. The entire evaluation function will consist of tests like this, gradually whittling down the possible hand types.

Since the function consists mostly of bitwise operations, table lookups and simple comparisons, it is going to be very fast. It’s also very amenable to fine tuning optimization, and the exact implementation will depend on the target architecture. You may be able to take advantage of some processor specific instructions to greatly improve the efficiency.

CALCULATING HAND STRENGTH

Hand strength is the probability that you will win the hand, given your hole cards, the community cards, and the opponents who remain in the hand. Hand strength is a floating point number between 0.0 (certain loss) and 1.0 (certain win). For example, a HS of 0.33 means you have a 33% chance of winning.

The easiest and most flexibly way of calculating the HS is to simulate the progress of the game a very large number of time, and count the number of those times you win. Say you simulate the game 1,000 times, and in the simulation, you win 423 games, then you have a high degree of certainty of having an approximate HS of 423/1000, or 0.423.

The procedure for simulating a game is very simple:

Create a pack of cards
Set score = 0
Remove the known cards (your hole cards, and any community cards)
Repeat 1000 times (or more, depending on CPU resources and desired accuracy)
Shuffle the remaining pack
Deal your opponent’s hole cards, and the remaining community cards
Evaluate all hands, and see who has the best hands
If you have the best hand then
Add 1/(number of people with the same hand value) to your score (usually 1)
End if
end repeat
Hand Strength = score/number of loops (1000 in this case).

To be more accurate, we have to run our simulation with people dropping out if they are dealt hole cards below a certain threshold. In practice, the determination of if a player stays in or not in a simulation is a probabilistic function of the strength of their hole cards, their table position, their stack size, the blind size and their previous behavior. For now we can just modify the simulation, so after dealing the opponents hole cards, remove any non-blind players with hole cards worse than, say, a pair of sixes. While not particularly elegant, it will still give you a useful number.

POT ODDS

The pot odds number is the ratio of your bet or call to the size of the pot after you bet (the amount you will win). For example, if the bet is $20, and there is $40 in the pot, then the pot odds are 20/(20+40) = 0.333.

RATE OF RETURN

Rate of return is the “on average” proportion of how much you will multiply your bet by, if you stay in the hand.

Rate of Return = Hand Strength / Pot Odds.

The base strategy we implement is to mostly stay in hands with a rate of return greater than 1.

THE FOLD/CALL/RAISE DECISION

For each round of betting the computer needs to decide if it is going to fold, call or raise (The FCR decision). Ignoring the question for the moment of how much to raise for now, then given a Rate of Return (RR), it’s possible to provide a very simple (yet useful) mapping between RR and FCR.

If RR < 0.8 then 95% fold, 0 % call, 5% raise (bluff)
If RR < 1.0 then 80%, fold 5% call, 15% raise (bluff)
If RR <1.3 the 0% fold, 60% call, 40% raise
Else (RR >= 1.3) 0% fold, 30% call, 70% raise
If fold and amount to call is zero, then call.

Don’t pay too much attention to the precise percentages listed above, the numbers will depend on the way you calculate your hand strength, and you’ll want to vary them depending on which betting round you are in. You will also want to vary these numbers to create players with different personalities.

Using this very simple mapping between the RR and the FCR decision can give you a surprisingly reasonable and entertaining player. They will tend to play strong hands, they will occasionally bluff, they won’t scare easy if their hand is good, and they will abandon weak hands when raised, and they will stick around on a reasonable chance of a flush or straight draw, making for entertaining gameplay.

The fact that none of the percentages is 100% is also important. That means you can never deduce the hand strength of your AI opponent based on their actions (unless they fold, where the information does not really help you). If they raise, then they could have any kind of hand strength – probably a good one, but it might be the 1 in 20 times when they are bluffing with a very weak hand.

STACK PROTECTION

The simple rules above work well when your stack of chips is large and the blinds are small. However as your stack shrinks and the blinds increase then the amount of money you need to commit to stay in a hand can become a very substantial proportion of your stack. Also, occasionally other players might go “all-in”, betting their entire stack of chips, so we need some logic to prevent the AI from making bad calls when short stacked.

Say you have AD, 2D and the flop is QC, KC, 2C. So you have a pair of twos, but there is a possible flush out there. There is $500 in the pot and the bet is $100 to stay in against two player, but it’s your last $100. The pot odds are 100/600 = 0.1666, your hand strength is 0.297, so your rate of return is about 1.8. So if you could play this situation over and over again you would make on average an 80% profit each time. However, it’s your last $100, and you have about a 70% chance of loosing everything. Don’t make that bet!

To handle this we can use a simple heuristic, along the lines of:

“If my proposed bet will substantially commit my stack, then don’t do it unless I have a strong chance of winning”

which might be implemented in part by:

“if (stack- bet) < (blind * 4) and (HS < 0.5) then fold”

Meaning if the call would leave you with less than four times the big blind, then don’t call unless you have a greater than 50% chance of winning.

Poker is a complex game, with a surprisingly large number of different types of situations like this that you have to handle somehow. I recommend you have as few special cases as possible, as it reduced the risk of an exploit being introduced into the game via some obscure special case. However, you should anticipate a number of heuristics (rules of thumb) being hard coded into the AI logic.

TESTING POKER AI

Playing a quick single table game of Texas Holdem takes around 30 minutes on average with human players. Ideally you would perform your testing by having human players play against the AI and trying to find problems with it. Unfortunately, due to the random hands being dealt, it’s very easy for one player to simply get lucky and win the game with sub-par logic, or even flawed logic. I’ve found it takes at least ten games to begin to get a clear picture of the qualities of an AI player, and more like a hundred games to be really sure. This often creates an unreasonably burden on the testing department, and introduces a very long delay in getting feedback on AI changes.

The solution is automated testing. The AI should be set up so that different variants of AI can play against each other in a very high speed set of games. You should also code a few simplistic poker AI’s into the mix, such as an AI that always goes all in, or another that simply always raises with a hand better than a pair of fives. Then you set your AI loose against these opponents, and make sure that it wins the appropriate percentage of games. If you coded your evaluation and simulation appropriately, then you should be able to simulate an entire game in about a second. (You might want to reduce the iterations of the simulation a bit to speed up testing).

The best use of your human testers is to try to get them to find an exploit of the AI, then you can codify this exploit into a temporary AI opponent to include in your test suite. You can then tweak your AI until it defeats the exploit, while still being able to defeat all the other (standard) opponents.

MORE WORK

What I’ve set out here is just a foundation for poker AI. By following the process laid out here you will get a reasonably strong and entertaining opponent. Here’s a quick list of the topics you might want to look into

• Pre-flop hand strength tables
• Opponent modeling.
• Implied Odds.
• Personality modeling
• Positional play
• Probabilistic search space
• Game theory and Nash Equilibrium.

RESOURCES:

– Sklansky, David, The Theory of Poker, 1999, Two Plus Two Publishing. – Provides various discussion of pot odds, implied odds, etc, with many heuristics that might be useful.
– The University of Alberta Computer Poker Research Group:
http://www.cs.ualberta.ca/~games/poker/ A number of research papers on implementing poker AI.
– Hold’em Killer, Evin Peretz, http://www.holdemkiller.blogspot.com/ – A blog on implementing poker AI.
– Poker-Eval, http://freshmeat.net/projects/poker-eval/ – A GPL Licensed poker hand evaluation library.

zr9558's Blog