# How AI is helping detect fraud and fight criminals

http://venturebeat.com/2017/02/18/how-ai-is-helping-detect-fraud-and-fight-criminals/

AI is about to go mainstream. It will show up in the connected home, in your car, and everywhere else. While it’s not as glamorous as the sentient beings that turn on us in futuristic theme parks, the use of AI in fraud detection holds major promise. Keeping fraud at bay is an ever-evolving battle in which both sides, good and bad, are adapting as quickly as possible to determine how to best use AI to their advantage.

There are currently three major ways that AI is used to fight fraud, and they correspond to how AI has developed as a field. These are:

1. Rules and reputation lists
2. Supervised machine learning
3. Unsupervised machine learning

## Rules and reputation lists

Rules and reputation lists exist in many modern organizations today to help fight fraud and are akin to “expert systems,” which were first introduced to the AI field in the 1970s. Expert systems are computer programs combined with rules from domain experts.They’re easy to get up and running and are human-understandable, but they’re also limited by their rigidity and high manual effort.

A “rule” is a human-encoded logical statement that is used to detect fraudulent accounts and behavior. For example, an institution may put in place a rule that states, “If the account is purchasing an item costing more than $1000, is located in Nigeria, and signed up less than 24 hours ago, block the transaction.” Reputation lists, similarly, are based on what you already know is bad. A reputation list is a list of specific IPs, device types, and other single characteristics and their corresponding reputation score. Then, if an account is coming from an IP on the bad reputation list, you block them. While rules and reputation lists are a good first attempt at fraud detection and prevention, they can be easily gamed by cybercriminals. These days, digital services abound, and these companies make the sign-up process frictionless. Therefore, it takes very little time for fraudsters to make dozens, or even thousands, of accounts. They then use these accounts to learn the boundaries of the rules and reputation lists put in place. Easy access to cloud hosting services, VPNs, anonymous email services, device emulators, and mobile device flashing makes it easy to come up with unsuspicious attributes that would miss reputation lists. Since the 1990s, expert systems have fallen out of favor in many domains, losing out to more sophisticated techniques. Clearly, there are better tools at our disposal for fighting fraud. However, a significant number of fraud-fighting teams in modern companies still rely on this rudimentary approach for the majority of their fraud detection, leading to massive human review overhead, false positives, and sub-optimal detection results. ## Supervised machine learning (SML) Machine learning is a subfield of AI that attempts to address the issue of previous approaches being too rigid. Researchers wanted the machines to learn from data, rather than encoding what these computer programs should look for (a different approach from expert systems). Machine learning began to make big strides in the 1990s, and by the 2000s it was effectively being used in fighting fraud as well. Applied to fraud, supervised machine learning (SML) represents a big step forward. It’s vastly different from rules and reputation lists because instead of looking at just a few features with simple rules and gates in place, all features are considered together. There’s one downside to this approach. An SML model for fraud detection must be fed historical data to determine what the fraudulent accounts and activity look like versus what the good accounts and activity look like. The model would then be able to look through all of the features associated with the account to make a decision. Therefore, the model can only find fraud that is similar to previous attacks. Many sophisticated modern-day fraudsters are still able to get around these SML models. That said, SML applied to fraud detection is an active area of development because there are many SML models and approaches. For instance, applying neural networks to fraud can be very helpful because it automates feature engineering, an otherwise costly step that requires human intervention. This approach can decrease the incidence of false positives and false negatives compared to other SML models, such as SVM and random forest models, since the hidden neurons can encode many more feature possibilities than can be done by a human. ## Unsupervised machine learning (UML) Compared to SML, unsupervised machine learning (UML) has cracked fewer domain problems. For fraud detection, UML hasn’t historically been able to help much. Common UML approaches (e.g., k-means and hierarchical clustering, unsupervised neural networks, and principal component analysis) have not been able to achieve good results for fraud detection. Having an unsupervised approach to fraud can be difficult to build in-house since it requires processing billions of events all together and there are no out-of-the-box effective unsupervised models. However, there are companies that have made strides in this area. The reason it can be applied to fraud is due to the anatomy of most fraud attacks. Normal user behavior is chaotic, but fraudsters will work in patterns, whether they realize it or not. They are working quickly and at scale. A fraudster isn’t going to try to steal$100,000 in one go from an online service. Rather, they make dozens to thousands of accounts, each of which may yield a profit of a few cents to several dollars. But those activities will inevitably create patterns, and UML can detect them.

The main benefits of using UML are:

• You can catch new attack patterns earlier
• All of the accounts are caught, stopping the fraudster from making any money
• Chance of false positives is much lower, since you collect much more information before making a detection decision

## Putting it all together

Each approach has its own advantages and disadvantages, and you can benefit from each method. Rules and reputation lists can be implemented cheaply and quickly without AI expertise. However, they have to be constantly updated and will only block the most naive fraudsters. SML has become an out-of-the box technology that can consider all the attributes for a single account or event, but it’s still limited in that it can’t find new attack patterns. UML is the next evolution, as it can find new attack patterns, identify all of the accounts associated with an attack, and provide a full global view. On the other hand, it’s not as effective at stopping individual fraudsters with low-volume attacks and is difficult to implement in-house. Still, it’s certainly promising for companies looking to block large-scale or constantly evolving attacks.

A healthy fraud detection system often employs all three major ways of using AI to fight fraud. When they’re used together properly, it’s possible to benefit from the advantages of each while mitigating the weaknesses of the others.

AI in fraud detection will continue to evolve, well beyond the technologies explored above, and it’s hard to even grasp what the next frontier will look like. One thing we know for sure, though, is that the bad guys will continue to evolve along with it, and the race is on to use AI to detect criminals faster than they can use it to hide.

Catherine Lu is a technical product manager at DataVisor, a full-stack online fraud analytics platform.

# DeepMind Could Bring The Best News Recommendation Engine

Reinforcement Learning, a key Google DeepMind algorithm, could overhaul news recommendation engines and greatly improve users stickiness. After beating a Go Grand Master, the algorithm could become the engine of choice for true personalization.

Demis Hassabis, DeepMind’s founder and CEO, a great communicator, gives captivating lectures, this Oxford University one among the best, delivered on February 24th. The 40-year old PhD in Cognitive Neuroscience, and Computer Science graduate from MIT and Harvard, offers this explanation of his work:

“The core of what we do focuses around what we call Reinforcement Learning. And that’s how we think about intelligence at DeepMind.

[Hassabis then shows the following diagram]

We start with the agent system, the A.I. That agent finds itself in some kind of environment, trying to achieve a goal. In a real-world environment, the agent could be a robot or, in a virtual environment, an avatar.

The agent interacts with the environment in two ways. Firstly against observations through its sensory operators. We currently use vision but we start to think about other modalities.

One of the jobs of the agent system is to build the best possible model of the environment out there, just based on these incomplete and noisy observations that [the agent] is receiving in real time. And it keeps updating its model in the face of new evidences.

Once it built this model, the second job of the agent is to make predictions of what is going to happen next. If you can make predictions, you can start planning about what to do. So if you try to achieve a goal, the agent will have a set of actions available. The decision making problem is to pick which action would be the best to take toward your goal.

Once the agent has decided that based on its model, and its planned trajectories, it executes actions that may or may not make some changes in the environment, and that drives the observations…”

Reinforcement Learning is a highly complex process. First, the observed environment is very noisy, incomplete and largely consists of unstructured pieces of data. When DeepMind decided to tackle basic Atari games like Breakout and Pong, the input was nothing but raw pixels, and the output was predictions — likely target position — and then actions — racket placement. All of the above aimed at maximizing the subsequent reward: survival and score. After a few hundreds games, the machine was able to devise on its own creative strategies that would surprise even its creators (read here, or view this video, time code 10:27).

Over time, the tests will migrate to more complex environment such as 3D games in which it becomes harder to distinguish the pattern of a wall from a useful piece of information.

A rather challenging signa-to-noise environment

DeepMind’s future goals involves dealing with very large and complex sets of data such as genomics, climate, energy or macroeconomics.

Regardless of the nature of the input stream, the principle is roughly the same. The A.I. system relies on a deep neural network to filter raw sensory data and form meaningful patterns to be analyzed. It then builds an optimized statistical model, updates it in real time, and derives the best possible actions from the set of observations available at a given moment. Then the whole system loops back.

How does this connect to improving news production?

Before we get into this, let’s roll back a little bit.

For a news production system, recommending related stories or videos is the most efficient way to increase reader engagement. For media who rely on advertising, sold on CPM or on a per-click basis, raising the number of page views per reader has a direct impact on ad revenue. Paid-for media are less sensitive to page views, but reminding readers of the breadth and depth of an editorial production is a key contributor to a news brand’s status — and a way to underline its economic value.

But there is a problem: today’s news recommendation systems are often terrible.

To display related stories or videos, publishers unwilling to invest in smart, fine-tuned systems have settled for engines based on crude semantic analysis of content. Hence embarrassing situations arise. For example, a piece about a major pedophile cover-up by the French clergy will trigger suggestions about child care. Or another where the state of an intellectual debate will suggest a piece on spelling, or an another one about waste management. The worst are stories automatically proposed by Outbrain or Taboola and picked up all over the web: not only are they the same everywhere, but they tap into the same endless field of click-bait items. The only virtue of these two systems is the direct cash-back for the publisher.

Something needs to be done to improve recommendation systems. A.I. and Reinforcement Learning offer a promising path.

In Demis Hassabis’ demonstration, the important words are : Environment, Observations, Models, Predictions and Actions. Let’s consider these keywords in the context of news production and consumption.

The Environment is dual. The external side is built on the general news cycle. At any given moment, automatically assessing a topic’s weight is reasonably easy, but even though they’re critical to predicting how the news cycle will evolve, detecting low-noise signals is much trickier. As for the internal news environment, it is simply the output of various contents produced (or curated) by the newsroom.

Observations are multiple: they include the vast range of available analytics, again at two levels: how a piece of contents is faring in general (against the background noise or the competition), and how each individual reader behaves. Here, things become interesting.

Models are then fed with a mix of statistical and behavioral data such as: “Stories [x] containing [semantic footprint] perform well against context [of this type of information].” Or: Reader #453.09809 is currently interested by [topics], but she has on her radar this [low-noise topic] that is small but growing.

Predictions detect both contents and topics that have the best lift in the news cycle and pique the reader’s interest, dynamically, in real time.

Actions will then range form putting stories or videos in front of the audience and, more specifically, at the individual level. Personalization will shift from passive (the system serves stories based of the presumed and generally static reader profile) to dynamic, based on current and predicted interest.

Demis Hassabis makes clear that enhanced personalization is on DeepMind’s roadmap:

“Personalization doesn’t work very well. It currently sums up to averaging the crowd as opposed to adapting to the human individual”

It would be unrealistic to see a news outlet developing such A.I.-based recommendation engine on its own, but we could easily foresee companies already working on A.I. selling it as SaaS (Software as a Service.)

A new generation of powerful recommendation engines could greatly benefit the news industry. It would ensure much higher reader loyalty, reinforce brand trust (it recommends stories that are both good and relevant), and help build deeper and more consistent news packages while giving a new life to archives.

Who will jump on the opportunity? Probably those who are the most prone to invest in tech. I would guess Buzzfeed and the Jeff Bezos-owned Washington Post.

— frederic.filloux@mondaynote.com

# 无监督机器学习：超越规则引擎和有监督机器学习的反欺诈分析方法

>>>>规则引擎

>>>>工作机制

IF(user_email=type_free_email_service) AND (comment_character_count ≥ 150 per sec) {

flag user_account as spammer

mute comment

}

1.高于1000 － 否认（如拒绝交易，暂停帐户）

2.低于300－接受（如确认订单，通过内容）

3.介于300到1000－提示需要增加额外的审核，置入人工审校池

➜优势

➜局限性

➜工作机制

➜优势

➜局限性

DataVisor采用的关联分析方法将欺诈行为相似的群组归为一类。而另一种检测技术－异常检测，将不符合好用户行为特点的用户均列为欺诈对象。其原理是假设坏用户都是孤立于正常用户之外的单个用户或小群组。下面图表列举了欺诈者F1、F3、群组F2，以及好用户群组G1和G2。异常检测模型只能发现此类孤立的欺诈行为，但在鉴别大规模的群组欺诈时就会面临很大的挑战。在这一点上，相比于异常检测，无监督分析的优势显而易见。

DataVisor把无监督分析算法结合规则引擎和机器学习模型一起使用。对于客户来说，这种全方位的检测在提供欺诈信息列表的同时，也会提供给客户新的欺诈检测模型，并帮助用户创建新的检测规则。一旦DataVisor的检测方式发现客户遇到新型未知欺诈，无监督检测可以有效提前早期预警。

# 德州扑克AI ——（吴昊 熊兵兵 合译）

作者：Mick West

（该英文文献所在的网址为http://rrurl.cn/lQlN3B

（如图，这是一个可以计算德州扑克的牌的赔率的计数器）

游戏AI的目标，我认为有两点：

（1）    让玩家有一种有趣和令人愉快的体验。

（2）    从属于目标（1），也就是说在玩家拥有了一种有趣而愉快的体验的基础上，尽量让他们得到一种“挑战”，而不像游戏《植物大战僵尸》一样，玩家没有一丝一毫的挫败感。

扑克的数据类型

你需要考虑利用一些数据结构来标识这些状态信息（这一点，我已经在吴昊品游戏核心算法 Round 15之吴昊教你玩德州扑克中做到了，方法就是位标识存储），以下就是利用位/字节对如下信息的一些存储（更好的存储方式，本文留给了读者自己去想）

card = suit*13 + rank.

Suit = card/13

Rank = card%13

牌值分析

我们利用一个32位的整型变量来表征一手牌的值，它表示一手牌的相对价值和一手牌的实力。通过两手牌的值，你可以判断出哪一手牌的实力更强悍。

我们现在需要的就是得到一手牌，然后计算以下这手牌的牌值。

（1）设置分数0

（2）移除你所知的牌（底牌和明牌）

（3）重复1000次（或者更多吧，这取决于CPU的资源和期望得到的精确程度）

（4）随机化剩下的组

（5）对付你对手的底牌，和你剩下的公共牌

（6）评估所有的手牌，看看谁有最好的！

（7）如果你有最好的话，那么加上1/(和你拥有相同命运的牌值的人的数目)（这通常是1）

（8）结束if循环

（9）结束1000次模拟

（如图，此为2011年百度之星的总决赛，当年的题目就是德州扑克的AI）

POT的赔率

POT的赔率的计算=你为了叫牌下的注/（你叫牌下的注+POT内的钱的总数）

（我有一个朋友是华中科技大学的百度俱乐部的，他当年也来参观了总决赛，听说得第一名的一个人是利用了一个无理手，让玩家们不断出牌，自己坚决不出，到了最后大家都没有好牌之后才出自己的牌，利用这种奇葩的AI战术取得了非常好的效果！）

测试扑克的智能性

平均来说，和一般玩家快速地玩单付德州扑克游戏只需要大约30分钟的时间。理想情况下，通过自然人玩家和智能玩家来竞争你才能完成你的测试，并且找出其中包含的问题。不幸的是，由于玩家的随意性正在一步步得到解决，玩家很容易通过低于标准杆逻辑获得幸运的牌号并硬的游戏，或者甚至于通过有缺陷的逻辑也可以实现这么一点。我已经发现至少需要10场比赛来开始得到对于AI玩家的素质的清晰了解，通过超过100次的游戏才能真正确定这种素质。

# Programming Poker AI

## Programming Poker AI

This article was originally published in the “Inner Product” column in Game Developer Magazine, November 2005

I recently programmed the AI for the World Series of Poker, developed by Left Field Productions and published by Activision. I started out thinking it would be an easy task. But it proved a lot more complex than I initially thought.

This article for the budding poker AI programmer provides a foundation for a simple implementation of No-Limit Texas Holdem Poker AI, covering the basics of hand strength evaluation and betting. By following the recipe set out here, you will quickly become able to implement a reasonably strong poker AI, and have a solid foundation on which to build. I assume you are familiar with the basic terminology of poker.

TEXAS HOLDEM

The goal of any game playing AI is twofold. The primary purpose is to allow the player to have a fun and enjoyable experience. The secondary purpose, subordinate to the first, is to play a strong enough game to provide sufficient challenge to the majority of players in your intended audience.

POKER DATA TYPES

You will need an implementation of the following data types. I’m going to describe them at the bit/byte implementation level, leaving the high level abstraction up to you.

A “suit” is an integer in the range 0..3, where 0=Clubs, 1=Diamonds, 2=Hearts, 3=Spades

A “rank” is an integer in the range 0..12, where 0 = 2 (deuce), 1 = 3, 11 = King, 12 = Ace. This is the cards in a suit arranged in rank order

A “card” is an integer in the range 0..51, hence
card = suit*13 + rank.
Suit = card/13
Rank = card%13

A “Hand” is a 52 bit data type, where each bit represents a single card. This can be stored as four 16 bit words for ease of use, where each 16 bit word represents the potential cards in one suit (using 13 of the 16 bits) (figure 1)

A “Hand Type” is an integer representing the type of poker hand you have, where 0= no pair, 1=pair, 2=two pair, 3=trips, 4=straight, 5=flush, 6=full house, 7=quads, 8=straight flush.

ENCODING HAND VALUES

A “Hand Value” is a 32 bit integer representing the relative value or strength of any hand of cards. By comparing two hand values, you can see which hand is stronger in a game of poker.
The hand value can conveniently be represented as a series of six 4-bit nibbles, where the most significant nibble represents the Hand Type, then the next five nibbles represent the different ranks of the cards in the order of significance to the hand value. (figure. 2)

Example 1: AH QD 4S KH 8C is a “no pair” hand type (sometimes called a “high card”, or in this case “Ace high”). So, the hand type nibble is set to 0. The remaining nibbles in the Hand Value are filled out with the ranks of the five cards in descending order. (A, K, Q, 8, 4), which translated into rank indices: 12,11,10,6,2 (or C,B,A,6,2 in hexadecimal), and when combined with the hand type (0) in the high nibble, gives us a 32 bit integer: 0x000CBA62.

The individual suits of the cards are basically ignored in the final hand value. The only time suit is significant is when it contributes to a flush. Also, note the top two nibbles of the Hand Value are always zero.

Example 2: 4D JD 3D 4C AD is a pair of fours, with Ace, Jack, Three kickers. The hand type is a pair, (type 1), then the ranks follow, starting with the rank of the pair, then the ranks of the kickers, so 4,A,J,3, which gives us 0x0012C910.

Example 3: 7C, 6C, 5C, 4C, 3D is a straight (type 4). More specifically it’s a seven high straight. The only rank of import here is the seven (rank 5). So the hand value is encoded as 0×00450000. We save ourselves a bunch of instructions in ignoring the four low cards after we’ve determined it is a straight.

Look at the resultant hand values of the above examples, you can clearly see how the better hands always have a higher hand value, making determining the wining hand a simple comparison.

CALCULATING HAND VALUES

What we now need is a function that takes a hand, and returns a hand value. This involves determining the hand type, then inserting the nibbles for the hand ranks, as above.

A hand is four words (clubs, diamonds, hearts, spades) of 13 bits each. 13 bits can be arranged in just 8192 combination, which means we can accelerate the evaluation of a hand by pre-calculating 8K tables of things like the number of bits set in a (13 bit) word (if you have five or more of the same suit, then you’ve got a flush), or the highest card of any straight in the hand. You can also pre-calculate a table of the highest five cards from a particular bit combination, which you can then use to set the kicker cards.

If you calculate ranks = (hearts | diamonds | clubs | spades) then the value ranks is a bit-field with a bit set for every card rank that you have at least one of. The number of bits set here is the number of unique ranks you have. We calculate the number of bits in each of hearts, diamonds, clubs and spades, and subtract the number of bits in the unique ranks, giving the number of duplicated ranks, to be used as the basis of determining what type of hand you have.

Example: if you have 2D AS AH 2C 2H, you can very quickly determine that you have five cards, that there are just two unique ranks, and hence you must have either a full house or four of a kind. A few more simple tests will determine exactly what you have. The entire evaluation function will consist of tests like this, gradually whittling down the possible hand types.

Since the function consists mostly of bitwise operations, table lookups and simple comparisons, it is going to be very fast. It’s also very amenable to fine tuning optimization, and the exact implementation will depend on the target architecture. You may be able to take advantage of some processor specific instructions to greatly improve the efficiency.

CALCULATING HAND STRENGTH

Hand strength is the probability that you will win the hand, given your hole cards, the community cards, and the opponents who remain in the hand. Hand strength is a floating point number between 0.0 (certain loss) and 1.0 (certain win). For example, a HS of 0.33 means you have a 33% chance of winning.

The easiest and most flexibly way of calculating the HS is to simulate the progress of the game a very large number of time, and count the number of those times you win. Say you simulate the game 1,000 times, and in the simulation, you win 423 games, then you have a high degree of certainty of having an approximate HS of 423/1000, or 0.423.

The procedure for simulating a game is very simple:

Create a pack of cards
Set score = 0
Remove the known cards (your hole cards, and any community cards)
Repeat 1000 times (or more, depending on CPU resources and desired accuracy)
Shuffle the remaining pack
Deal your opponent’s hole cards, and the remaining community cards
Evaluate all hands, and see who has the best hands
If you have the best hand then
Add 1/(number of people with the same hand value) to your score (usually 1)
End if
end repeat
Hand Strength = score/number of loops (1000 in this case).

To be more accurate, we have to run our simulation with people dropping out if they are dealt hole cards below a certain threshold. In practice, the determination of if a player stays in or not in a simulation is a probabilistic function of the strength of their hole cards, their table position, their stack size, the blind size and their previous behavior. For now we can just modify the simulation, so after dealing the opponents hole cards, remove any non-blind players with hole cards worse than, say, a pair of sixes. While not particularly elegant, it will still give you a useful number.

POT ODDS

The pot odds number is the ratio of your bet or call to the size of the pot after you bet (the amount you will win). For example, if the bet is $20, and there is$40 in the pot, then the pot odds are 20/(20+40) = 0.333.

RATE OF RETURN

Rate of return is the “on average” proportion of how much you will multiply your bet by, if you stay in the hand.

Rate of Return = Hand Strength / Pot Odds.

The base strategy we implement is to mostly stay in hands with a rate of return greater than 1.

THE FOLD/CALL/RAISE DECISION

For each round of betting the computer needs to decide if it is going to fold, call or raise (The FCR decision). Ignoring the question for the moment of how much to raise for now, then given a Rate of Return (RR), it’s possible to provide a very simple (yet useful) mapping between RR and FCR.

If RR < 0.8 then 95% fold, 0 % call, 5% raise (bluff)
If RR < 1.0 then 80%, fold 5% call, 15% raise (bluff)
If RR <1.3 the 0% fold, 60% call, 40% raise
Else (RR >= 1.3) 0% fold, 30% call, 70% raise
If fold and amount to call is zero, then call.

Don’t pay too much attention to the precise percentages listed above, the numbers will depend on the way you calculate your hand strength, and you’ll want to vary them depending on which betting round you are in. You will also want to vary these numbers to create players with different personalities.

Using this very simple mapping between the RR and the FCR decision can give you a surprisingly reasonable and entertaining player. They will tend to play strong hands, they will occasionally bluff, they won’t scare easy if their hand is good, and they will abandon weak hands when raised, and they will stick around on a reasonable chance of a flush or straight draw, making for entertaining gameplay.

The fact that none of the percentages is 100% is also important. That means you can never deduce the hand strength of your AI opponent based on their actions (unless they fold, where the information does not really help you). If they raise, then they could have any kind of hand strength – probably a good one, but it might be the 1 in 20 times when they are bluffing with a very weak hand.

STACK PROTECTION

The simple rules above work well when your stack of chips is large and the blinds are small. However as your stack shrinks and the blinds increase then the amount of money you need to commit to stay in a hand can become a very substantial proportion of your stack. Also, occasionally other players might go “all-in”, betting their entire stack of chips, so we need some logic to prevent the AI from making bad calls when short stacked.

Say you have AD, 2D and the flop is QC, KC, 2C. So you have a pair of twos, but there is a possible flush out there. There is $500 in the pot and the bet is$100 to stay in against two player, but it’s your last $100. The pot odds are 100/600 = 0.1666, your hand strength is 0.297, so your rate of return is about 1.8. So if you could play this situation over and over again you would make on average an 80% profit each time. However, it’s your last$100, and you have about a 70% chance of loosing everything. Don’t make that bet!

To handle this we can use a simple heuristic, along the lines of:

“If my proposed bet will substantially commit my stack, then don’t do it unless I have a strong chance of winning”

which might be implemented in part by:

“if (stack- bet) < (blind * 4) and (HS < 0.5) then fold”

Meaning if the call would leave you with less than four times the big blind, then don’t call unless you have a greater than 50% chance of winning.

Poker is a complex game, with a surprisingly large number of different types of situations like this that you have to handle somehow. I recommend you have as few special cases as possible, as it reduced the risk of an exploit being introduced into the game via some obscure special case. However, you should anticipate a number of heuristics (rules of thumb) being hard coded into the AI logic.

TESTING POKER AI

Playing a quick single table game of Texas Holdem takes around 30 minutes on average with human players. Ideally you would perform your testing by having human players play against the AI and trying to find problems with it. Unfortunately, due to the random hands being dealt, it’s very easy for one player to simply get lucky and win the game with sub-par logic, or even flawed logic. I’ve found it takes at least ten games to begin to get a clear picture of the qualities of an AI player, and more like a hundred games to be really sure. This often creates an unreasonably burden on the testing department, and introduces a very long delay in getting feedback on AI changes.

The solution is automated testing. The AI should be set up so that different variants of AI can play against each other in a very high speed set of games. You should also code a few simplistic poker AI’s into the mix, such as an AI that always goes all in, or another that simply always raises with a hand better than a pair of fives. Then you set your AI loose against these opponents, and make sure that it wins the appropriate percentage of games. If you coded your evaluation and simulation appropriately, then you should be able to simulate an entire game in about a second. (You might want to reduce the iterations of the simulation a bit to speed up testing).

The best use of your human testers is to try to get them to find an exploit of the AI, then you can codify this exploit into a temporary AI opponent to include in your test suite. You can then tweak your AI until it defeats the exploit, while still being able to defeat all the other (standard) opponents.

MORE WORK

What I’ve set out here is just a foundation for poker AI. By following the process laid out here you will get a reasonably strong and entertaining opponent. Here’s a quick list of the topics you might want to look into

• Pre-flop hand strength tables
• Opponent modeling.
• Implied Odds.
• Personality modeling
• Positional play
• Probabilistic search space
• Game theory and Nash Equilibrium.

RESOURCES:

– Sklansky, David, The Theory of Poker, 1999, Two Plus Two Publishing. – Provides various discussion of pot odds, implied odds, etc, with many heuristics that might be useful.
– The University of Alberta Computer Poker Research Group:
http://www.cs.ualberta.ca/~games/poker/ A number of research papers on implementing poker AI.
– Hold’em Killer, Evin Peretz, http://www.holdemkiller.blogspot.com/ – A blog on implementing poker AI.
– Poker-Eval, http://freshmeat.net/projects/poker-eval/ – A GPL Licensed poker hand evaluation library.

# Unsupervised Analytics: Moving Beyond Rules Engines and Learning Models

Unsupervised Analytics: Moving Beyond Rules Engines and Learning Models

Rules engines, machine learning models, ID verification, or reputation lookups (e.g. email, IP blacklists and whitelists) and unsupervised analytics? I’ve often been asked which one to use and should you only go with one over the others. There is a place for each to provide value and you should anticipate incorporating some combination of these fraud solutions along with solid domain expertise to build a fraud management system that best accounts for your business, products and users. With that said, rules engines and learning models are two of the major foundational components of a company’s fraud detection architecture. I’ll explain how they work, discuss the benefits and limitations of each and highlight the demand for unsupervised analytics that can go beyond rules engines and machine learning in order to catch new fraud that has yet to be seen.

Rules Engines

Image Source

How they work

Rules engines partition the operational business logic from the application code, enabling non-engineering fraud domain experts (e.g. Trust & Safety or Risk Analysts) with SQL/database knowledge to manage the rules themselves. So what types of rules are effective? Rules can be as straightforward as a few lines of logic: If A and B, then do C. For example,

IF (user_email = type_free_email_service) AND (comment_character_count ≥ 150 per sec) {

flag user_account as spammer

mute comment

}

Rules engines can also employ weighted scoring mechanisms. For example, in the table below each rule has a score value, positive or negative, which can be assigned by an analyst. The points for all of the rules triggered will be added together to compute an aggregate score. Subsequently, rules engines aid in establishing business operation workflows based on the score thresholds. In a typical workflow, there could be three types of actions to take based on the score range:

1. Above 1000 – Deny (e.g. reject a transaction, suspend the account)
2. Below 300 – Accept (e.g. order is ok, approve the content post)
3. Between 300 and 1000 – Flag for additional review and place into a manual review bin

Rules engines can take black lists (e.g. IP addresses) and other negative lists derived from consortium databases as input data. An analyst can add a new rule as soon as he or she encounters a new fraud/risk scenario, helping the company benefit from the real-world insights of the analyst on the ground seeing the fraud every day. As a result, rules engines give businesses the control and capability to handle one-off brute force attacks, seasonality and short-term emerging trends.

Limitations

Rules engines have limitations when it comes to scale. Fraudsters don’t sit idle after you catch them. They will change what they do after learning how you caught them to prevent being caught again. Thus, the shelf life of rules can be a couple of weeks or even as short as a few days before their effectiveness begins to diminish. Imagine having to add, remove, and update rules and weights every few days when you’re in a situation with hundreds or thousands of rules to run and test. This could require huge operational resources and costs to maintain.

If a fraud analyst wants to calculate the accept, reject, and review rates for 3 rules and get the changes in those rates for adjusting each rule down or up by 100 points, that would require 8 changes: 23^ = 8 (values^rules). Testing 10 rules with 3 different values would be over 59K changes! As the number of rules increases, the time to make adjustments increases quickly.

Rules engines don’t automatically learn from analyst observations or feedback. As fraudsters adapt their tactics, businesses can be temporarily exposed to new types of fraud attacks. And since rules engines treat information in a binary fashion and may not detect subtle nuances, this can lead to higher instances of false positives and negative customer experiences.

Learning Models

Image Source

How they work

Supervised machine learning is the most widely used learning approach when it comes to fraud detection. A few of the learning techniques include decision trees, random forests, nearest neighbors, Support Vector Machines (SVM) and Naive Bayes. Machine learning models often solve complex computations with hundreds of variables (high-dimensional space) in order to accurately determine cases of fraud.

Having a good understanding of both what is and what is not fraud plays a central role in the process of creating models. The input data to the models influences their effectiveness. The models are trained on known cases of fraud and non-fraud (e.g. labeled training data), which then facilitate its ability to classify new data and cases as either fraudulent or not. Because of their ability to predict the label for a new unlabeled data set, trained learning models fill in the gap and bolster the areas where rules engines may not provide great coverage.

Below is a simplified example of how a supervised machine learning program would classify new data into the categories of non-fraud or fraud. Training data informs the model of the characteristics of two types of fraudsters: 1) credit card fraudsters and 2) spammers. Three features: 1) the email address structure, 2) the IP address type, and 3) the density of linked accounts are indicative of the type of fraud attack (e.g. response variable). Note in reality, there could be hundreds of features for a model.

The trained model recognizes that a user with:

• an email address that has 5 letters followed by 3 numbers
• using an anonymous proxy
• with a medium density (e.g. 10) of connected accounts

is a credit card fraudster.

It also knows recognizes that a user with:

• an email address structure with a “dot” pattern
• using an IP address from a datacenter
• with a high density (e.g. 30+) of linked accounts

is a spammer.

Now suppose your model is evaluating new users from the batch of users below. It computes the email address structure, IP address type, and density of linked accounts for each user. If working properly, it will classify the users in Cases 2 and 3 as spammers and the users in Cases 1, 4 and 5 as credit card fraudsters.

Because of their ability to predict the label for a new unlabeled data set, trained learning models fill in the gap and bolster the areas where rules engines may not provide great coverage. Learning models have the ability to digest millions of row of data scalably, pick up from past behaviors and continually improve their predictions based on new and different data. They can handle unstructured data (e.g. images, email text) and recognize sophisticated fraud patterns automatically even if there are thousands of features/variables in the input data set. With learning models, you can also measure effectiveness and improve it by only changing algorithms or algorithm parameters.

Limitations

Trained learning models, while powerful, have their limitations. What happens if there are no labeled examples for a given type of fraud? Given how quickly fraud is evolving, this is not that uncommon of an occurrence. After all, fraudsters change schemes and conduct new types of attacks around the clock. If we have not encountered the fraud attack pattern, and therefore do not have sufficient training data, the trained learning models may not have the appropriate support to return good and reliable results.

As seen in the diagram below, collecting and labeling data is a crucial part of building a learning model and the time required to generate accurate training labels can be weeks to months. Labeling can involve teams of fraud analysts reviewing cases thoroughly, categorizing it with the right fraud tags, and undergoing a verification process before being used as training data. In the event a new type of fraud emerges, a learning model may not be able to detect it until weeks later after sufficient data has been acquired to properly train it.

Unsupervised Analytics – Going Beyond Rules Engines and Learning Models

While both of these approaches are critical pieces of a fraud detection architecture, here at DataVisor we take it one step further. DataVisor employs unsupervised analytics, which do not rely on having prior knowledge of the fraud patterns. In other words no training data is needed. The core component of the algorithm is theunsupervised attack campaign detection which leverages correlation analysis and graph processing to discover the linkages between fraudulent user behaviors, create clusters and assign new examples into one or the other of the clusters.

The unsupervised campaign detection provides the attack campaign group info and also the self-generated training data, both of which can be fed into our machine learning models to bootstrap them. With this data, the supervised machine learning will pick up patterns and find the fraudulent users that don’t fit into these large attack campaign groups. This framework enables DataVisor to uncover fraud attacks perpetrated by individual accounts, as well as organized mass scale attacks coordinated among many users such as fraud and crime rings – adding a valuable piece to your fraud detection architecture with a “full-stack.”

Our correlation analysis groups fraudsters “acting” similarly into the same cluster. In contrast, anomaly detection, another useful technique, finds the set of fraud objects that are considerably dissimilar from the remainder of the good users. It does this is by assuming anomalies do not belong to any group or they belong to small/sparse clusters. See graph below for anomaly detection illustrating fraudsters F1, F3, and group F2and good users G1 and G2. The benefits of unsupervised analytics is on display when comparing it to anomaly detection. While anomaly detection can find outlying fraudsters from a given data set, it would encounter a challenge identifying large fraud groups.

With unsupervised analytics, DataVisor collaborates with rules engines and machine learning models. For customers, the analytics provides them a list of the fraudsters and also gives their fraud analysts insights to create new rules. When DataVisor finds fraud that has not been encountered by a customer previously, the data from the unsupervised campaign detection can serve as early warning signals and/or training data to their learning models, creating new and valuable dimensions to their model’s accuracy.

By focusing on early detection and discovering unknown fraud, DataVisor has helped customers to become better and more efficient in solving fraud in diverse range of areas such as:

• Identifying fake user registration and account takeovers (ATO)
• Detecting fraudulent financial transactions and activity
• Discovering user acquisition and promotion abuse
• Preventing social spam, fake posts, reviews and likes

Stay tuned for future blog posts where I will address topics such as new online fraud attacks, case review management tools, and a closer look into DataVisor’s fraud detection technology stack. If you want to learn more about how DataVisor can help you fight online fraud, please visit https://datavisor.com/ or schedule atrial.

# Value iteration networks

Value Iteration Networks Tamar et al., NIPS 2016

‘Value Iteration Networks’ won a best paper award at NIPS 2016. It tackles two of the hot issues in reinforcement learning at the moment: incorporating longer range planning into the learned strategies, and improving transfer learning from one problem to another. It’s two for the price of one, as both of these challenges are addressed by an architecture that learns to plan.

In the grid-world domain shown below, a standard reinforcement learning network, trained on several instances of the world, may still have trouble generalizing to a new unseen domain (right-hand image).

(This setup is very similar to the maze replanning challenge in ‘Strategic attentive writer for learning macro actions‘ from the Google DeepMind team that we looked at earlier this year. Both papers were published at the same time).

… as we show in our experiments, while standard CNN-based networks can be easily trained to solve a set of such maps, they do not generalize well to new tasks outside this set, because they do not understand the goal-directed nature of the behavior. This observation suggests that the computation learned by reactive policies is different from planning, which is required to solve a new task.

Planning is not a new problem – the value iteration algorithm based on Markov decision processes (MDP) has been known since 1957! What Tamar et al. do in this work though, is embed a value iteration (VI) planning component inside the overall neural network architecture. And the breakthrough insight is that the VI algorithm itself can be encoded by a specific type of CNN, which means it is differentiable.

By embedding such a VI network module inside a standard feed-forward classification network, we obtain an NN model that can learn the parameters of a planning computation that yields useful predictions. The VI block is differentiable, and the whole network can be trained using standard backpropagation.

It really is pretty cool – you give the network the machinery that can be used for planning, and it figures out all by itself the best way to use it.

Using the approach, Tamar et al. show that value iteration networks (VINS) generalize better to new grid-world scenarios than either CNNs following the DQN architecture, or fully convolutional networks (FCNs):

(Note there is no comparison to the contemporary STRAW architecture from the DeepMind team that also extends DQNs with planning).

Importantly, note that the prediction loss for the reactive policies is comparable to the VINs, although their success rate is significantly worse. This shows that this is not a standard case of overfitting/underfitting of the reactive policies. Rather, VIN policies, by their VI structure, focus prediction errors on less important parts of the trajectory, while reactive policies do not make this distinction, and learn the easily predictable parts of the trajectory yet fail on the complete task.

They also demonstrated planning success using Mars landscape images for Mars Rover navigation, planning in a physical simulation setting, and planning in the WebNav setting which requires navigating links of a web site towards a goal page.

What I’d love to see is how well the VIN architecture performs on theFrostbite Challenge.

Let’s take a closer look at how it all works, starting with the value iteration algorithm itself, then how to encode that in a NN, before finally putting it all together in a complete architecture.

### Standard value iteration

“A standard model for sequential decision making and planning is the Markov Decision Process (MDP).”

You have a set of states $s \in S$, a set of actions $a \in A$, a reward function $R(s,a)$ that gives the anticipated reward for taking action $a$ in state $s$, and atransition kernel, $P(s'|s,a)$ that encodes the probability of the next state given the current state and action. A policy $\pi(a|s)$ prescribes the action distribution for each state.

(Note the similarity between this structure and the action matrix of STRAW).

The goal in an MDP is to find a policy that obtains high rewards in the long term.

You can consider the value of a state under some policy as the expected discounted sum of rewards when starting from that state and following the policy. A optimal policy will find the maximal long-term return possible from a given state. Value iteration computes the rewards by iterating over the action steps ($\gamma \in (0,1)$ is a discount factor):

### Encoding value iteration in a neural network

Our starting point is the VI algorithm (1). Our main observation is that each iteration of VI may be seen as passing the previous value function Vn and reward function R through a convolution layer and max-pooling layer. In this analogy, each channel in the convolution layer corresponds to the Q-function for a specific action, and convolution kernel weights correspond to the discounted transition probabilities. Thus by recurrently applying a convolution layer K times, K iterations of VI are effectively performed.

This idea leads to the following network structure:

A reward ‘image’ $\bar{R}$ (to follow the more normal CNN formulation of working with images) is fed into convolutional layer $\bar{Q}$ with $\bar{A}$ channels. Each channel corresponds to $\bar{Q}(\bar{s},\bar{a})$ for action $\bar{a}$. The layer is max-pooled along the actions channel to produce the next-iteration value function layer. This is stacked with the reward $\bar{R}$ and fed back in K times, to performK iterations of value iteration.

### The full Value Iteration Network model

The value-iteration module we just described can now be embedded into a full value iteration network as follows:

In many systems, if you’re in a given state, and you take a given action, the set of possible states you end up in is much smaller than the overall universe of states. More precisely, the the states for which $\bar{P}(\bar{s'}|\bar{s},\bar{a}) > 0$ is a small subset of $\bar{S}$.

In NN terminology, this is a form of attention, in the sense that for a given label prediction (action), only a subset of the input features (value function) is relevant. Attention is known to improve learning performance by reducing the effective number of network parameters during learning.

This is the purpose of the attention module added into the feedback loop in the diagram above. With the inclusion of the CNN-based value iteration module, everything in the value iteration network is differentiable:

This allows us to treat the planning module as just another NN, and by back-propagating through it, we can train the whole policy end-to-end.

To implement a VIN, you need to specify the state and action spaces for the planning module ($\bar{S}$ and $\bar{A}$), the reward functions $f_R$ and $f_P$, and the attention function. The authors call this the process of VIN design.

Once a VIN design is chose, implementing the VIN is straightforward, as it is simply a form of CNN. The networks in our experiments all required only several lines of Theano code.

# Value iteration networks

Excellent

Value Iteration Networks Tamar et al., NIPS 2016

‘Value Iteration Networks’ won a best paper award at NIPS 2016. It tackles two of the hot issues in reinforcement learning at the moment: incorporating longer range planning into the learned strategies, and improving transfer learning from one problem to another. It’s two for the price of one, as both of these challenges are addressed by an architecture that learns to plan.

In the grid-world domain shown below, a standard reinforcement learning network, trained on several instances of the world, may still have trouble generalizing to a new unseen domain (right-hand image).

(This setup is very similar to the maze replanning challenge in ‘Strategic attentive writer for learning macro actions‘ from the Google DeepMind team that we looked at earlier this year. Both papers were published at the same time).

… as we show in our experiments, while standard…

View original post 1,025 more words