How AI is helping detect fraud and fight criminals

February 28, 2017 zr9558 Leave a comment

How AI is helping detect fraud and fight criminals

AI is about to go mainstream. It will show up in the connected home, in your car, and everywhere else. While it’s not as glamorous as the sentient beings that turn on us in futuristic theme parks, the use of AI in fraud detection holds major promise. Keeping fraud at bay is an ever-evolving battle in which both sides, good and bad, are adapting as quickly as possible to determine how to best use AI to their advantage.

There are currently three major ways that AI is used to fight fraud, and they correspond to how AI has developed as a field. These are:

Rules and reputation lists
Supervised machine learning
Unsupervised machine learning

Rules and reputation lists

Rules and reputation lists exist in many modern organizations today to help fight fraud and are akin to “expert systems,” which were first introduced to the AI field in the 1970s. Expert systems are computer programs combined with rules from domain experts.They’re easy to get up and running and are human-understandable, but they’re also limited by their rigidity and high manual effort.

A “rule” is a human-encoded logical statement that is used to detect fraudulent accounts and behavior. For example, an institution may put in place a rule that states, “If the account is purchasing an item costing more than $1000, is located in Nigeria, and signed up less than 24 hours ago, block the transaction.”

Reputation lists, similarly, are based on what you already know is bad. A reputation list is a list of specific IPs, device types, and other single characteristics and their corresponding reputation score. Then, if an account is coming from an IP on the bad reputation list, you block them.

While rules and reputation lists are a good first attempt at fraud detection and prevention, they can be easily gamed by cybercriminals. These days, digital services abound, and these companies make the sign-up process frictionless. Therefore, it takes very little time for fraudsters to make dozens, or even thousands, of accounts. They then use these accounts to learn the boundaries of the rules and reputation lists put in place. Easy access to cloud hosting services, VPNs, anonymous email services, device emulators, and mobile device flashing makes it easy to come up with unsuspicious attributes that would miss reputation lists.

Since the 1990s, expert systems have fallen out of favor in many domains, losing out to more sophisticated techniques. Clearly, there are better tools at our disposal for fighting fraud. However, a significant number of fraud-fighting teams in modern companies still rely on this rudimentary approach for the majority of their fraud detection, leading to massive human review overhead, false positives, and sub-optimal detection results.

Supervised machine learning (SML)

Machine learning is a subfield of AI that attempts to address the issue of previous approaches being too rigid. Researchers wanted the machines to learn from data, rather than encoding what these computer programs should look for (a different approach from expert systems). Machine learning began to make big strides in the 1990s, and by the 2000s it was effectively being used in fighting fraud as well.

Applied to fraud, supervised machine learning (SML) represents a big step forward. It’s vastly different from rules and reputation lists because instead of looking at just a few features with simple rules and gates in place, all features are considered together.

There’s one downside to this approach. An SML model for fraud detection must be fed historical data to determine what the fraudulent accounts and activity look like versus what the good accounts and activity look like. The model would then be able to look through all of the features associated with the account to make a decision. Therefore, the model can only find fraud that is similar to previous attacks. Many sophisticated modern-day fraudsters are still able to get around these SML models.

That said, SML applied to fraud detection is an active area of development because there are many SML models and approaches. For instance, applying neural networks to fraud can be very helpful because it automates feature engineering, an otherwise costly step that requires human intervention. This approach can decrease the incidence of false positives and false negatives compared to other SML models, such as SVM and random forest models, since the hidden neurons can encode many more feature possibilities than can be done by a human.

Unsupervised machine learning (UML)

Compared to SML, unsupervised machine learning (UML) has cracked fewer domain problems. For fraud detection, UML hasn’t historically been able to help much. Common UML approaches (e.g., k-means and hierarchical clustering, unsupervised neural networks, and principal component analysis) have not been able to achieve good results for fraud detection.

Having an unsupervised approach to fraud can be difficult to build in-house since it requires processing billions of events all together and there are no out-of-the-box effective unsupervised models. However, there are companies that have made strides in this area.

The reason it can be applied to fraud is due to the anatomy of most fraud attacks. Normal user behavior is chaotic, but fraudsters will work in patterns, whether they realize it or not. They are working quickly and at scale. A fraudster isn’t going to try to steal $100,000 in one go from an online service. Rather, they make dozens to thousands of accounts, each of which may yield a profit of a few cents to several dollars. But those activities will inevitably create patterns, and UML can detect them.

The main benefits of using UML are:

You can catch new attack patterns earlier
All of the accounts are caught, stopping the fraudster from making any money
Chance of false positives is much lower, since you collect much more information before making a detection decision

Putting it all together

Each approach has its own advantages and disadvantages, and you can benefit from each method. Rules and reputation lists can be implemented cheaply and quickly without AI expertise. However, they have to be constantly updated and will only block the most naive fraudsters. SML has become an out-of-the box technology that can consider all the attributes for a single account or event, but it’s still limited in that it can’t find new attack patterns. UML is the next evolution, as it can find new attack patterns, identify all of the accounts associated with an attack, and provide a full global view. On the other hand, it’s not as effective at stopping individual fraudsters with low-volume attacks and is difficult to implement in-house. Still, it’s certainly promising for companies looking to block large-scale or constantly evolving attacks.

A healthy fraud detection system often employs all three major ways of using AI to fight fraud. When they’re used together properly, it’s possible to benefit from the advantages of each while mitigating the weaknesses of the others.

AI in fraud detection will continue to evolve, well beyond the technologies explored above, and it’s hard to even grasp what the next frontier will look like. One thing we know for sure, though, is that the bad guys will continue to evolve along with it, and the race is on to use AI to detect criminals faster than they can use it to hide.

Catherine Lu is a technical product manager at DataVisor, a full-stack online fraud analytics platform.

推荐系统

DeepMind Could Bring The Best News Recommendation Engine

February 15, 2017 zr9558 Leave a comment

Reinforcement Learning, a key Google DeepMind algorithm, could overhaul news recommendation engines and greatly improve users stickiness. After beating a Go Grand Master, the algorithm could become the engine of choice for true personalization.

My interest for DeepMind goes back to its acquisition by Google, in January 2014, for about half a billion dollars. Later in California, I had conversations with Artificial Intelligence and deep learning experts; they said Google had in fact captured about half of the world’s best A.I. minds, snatching several years of Stanford A.I. classes, and paying top dollar for talent. Acquiring London startup Deep Mind was a key move in a strategy aimed at cornering the A.I. field. My interlocutors at Google and Stanford told me it could lead to major new iterations of the company, with A.I. percolating in every branch of Google (now Alphabet), from improving search to better YouTube recommendations, to more advanced projects such as predictive health care or automated transportation.

Demis Hassabis, DeepMind’s founder and CEO, a great communicator, gives captivating lectures, this Oxford University one among the best, delivered on February 24th. The 40-year old PhD in Cognitive Neuroscience, and Computer Science graduate from MIT and Harvard, offers this explanation of his work:

“The core of what we do focuses around what we call Reinforcement Learning. And that’s how we think about intelligence at DeepMind.

[Hassabis then shows the following diagram]

We start with the agent system, the A.I. That agent finds itself in some kind of environment, trying to achieve a goal. In a real-world environment, the agent could be a robot or, in a virtual environment, an avatar.

The agent interacts with the environment in two ways. Firstly against observations through its sensory operators. We currently use vision but we start to think about other modalities.

One of the jobs of the agent system is to build the best possible model of the environment out there, just based on these incomplete and noisy observations that [the agent] is receiving in real time. And it keeps updating its model in the face of new evidences.

Once it built this model, the second job of the agent is to make predictions of what is going to happen next. If you can make predictions, you can start planning about what to do. So if you try to achieve a goal, the agent will have a set of actions available. The decision making problem is to pick which action would be the best to take toward your goal.

Once the agent has decided that based on its model, and its planned trajectories, it executes actions that may or may not make some changes in the environment, and that drives the observations…”

Reinforcement Learning is a highly complex process. First, the observed environment is very noisy, incomplete and largely consists of unstructured pieces of data. When DeepMind decided to tackle basic Atari games like Breakout and Pong, the input was nothing but raw pixels, and the output was predictions — likely target position — and then actions — racket placement. All of the above aimed at maximizing the subsequent reward: survival and score. After a few hundreds games, the machine was able to devise on its own creative strategies that would surprise even its creators (read here, or view this video, time code 10:27).

Over time, the tests will migrate to more complex environment such as 3D games in which it becomes harder to distinguish the pattern of a wall from a useful piece of information.

A rather challenging signa-to-noise environment

DeepMind’s future goals involves dealing with very large and complex sets of data such as genomics, climate, energy or macroeconomics.

Regardless of the nature of the input stream, the principle is roughly the same. The A.I. system relies on a deep neural network to filter raw sensory data and form meaningful patterns to be analyzed. It then builds an optimized statistical model, updates it in real time, and derives the best possible actions from the set of observations available at a given moment. Then the whole system loops back.

How does this connect to improving news production?

Before we get into this, let’s roll back a little bit.

For a news production system, recommending related stories or videos is the most efficient way to increase reader engagement. For media who rely on advertising, sold on CPM or on a per-click basis, raising the number of page views per reader has a direct impact on ad revenue. Paid-for media are less sensitive to page views, but reminding readers of the breadth and depth of an editorial production is a key contributor to a news brand’s status — and a way to underline its economic value.

But there is a problem: today’s news recommendation systems are often terrible.

To display related stories or videos, publishers unwilling to invest in smart, fine-tuned systems have settled for engines based on crude semantic analysis of content. Hence embarrassing situations arise. For example, a piece about a major pedophile cover-up by the French clergy will trigger suggestions about child care. Or another where the state of an intellectual debate will suggest a piece on spelling, or an another one about waste management. The worst are stories automatically proposed by Outbrain or Taboola and picked up all over the web: not only are they the same everywhere, but they tap into the same endless field of click-bait items. The only virtue of these two systems is the direct cash-back for the publisher.

Something needs to be done to improve recommendation systems. A.I. and Reinforcement Learning offer a promising path.

In Demis Hassabis’ demonstration, the important words are : Environment, Observations, Models, Predictions and Actions. Let’s consider these keywords in the context of news production and consumption.

The Environment is dual. The external side is built on the general news cycle. At any given moment, automatically assessing a topic’s weight is reasonably easy, but even though they’re critical to predicting how the news cycle will evolve, detecting low-noise signals is much trickier. As for the internal news environment, it is simply the output of various contents produced (or curated) by the newsroom.

Observations are multiple: they include the vast range of available analytics, again at two levels: how a piece of contents is faring in general (against the background noise or the competition), and how each individual reader behaves. Here, things become interesting.

Models are then fed with a mix of statistical and behavioral data such as: “Stories [x] containing [semantic footprint] perform well against context [of this type of information].” Or: Reader #453.09809 is currently interested by [topics], but she has on her radar this [low-noise topic] that is small but growing.

Predictions detect both contents and topics that have the best lift in the news cycle and pique the reader’s interest, dynamically, in real time.

Actions will then range form putting stories or videos in front of the audience and, more specifically, at the individual level. Personalization will shift from passive (the system serves stories based of the presumed and generally static reader profile) to dynamic, based on current and predicted interest.

Demis Hassabis makes clear that enhanced personalization is on DeepMind’s roadmap:

“Personalization doesn’t work very well. It currently sums up to averaging the crowd as opposed to adapting to the human individual”

It would be unrealistic to see a news outlet developing such A.I.-based recommendation engine on its own, but we could easily foresee companies already working on A.I. selling it as SaaS (Software as a Service.)

A new generation of powerful recommendation engines could greatly benefit the news industry. It would ensure much higher reader loyalty, reinforce brand trust (it recommends stories that are both good and relevant), and help build deeper and more consistent news packages while giving a new life to archives.

Who will jump on the opportunity? Probably those who are the most prone to invest in tech. I would guess Buzzfeed and the Jeff Bezos-owned Washington Post.

— frederic.filloux@mondaynote.com

安全业务领域

无监督机器学习：超越规则引擎和有监督机器学习的反欺诈分析方法

February 14, 2017 zr9558 Leave a comment

原文链接：http://www.twoeggz.com/news/3147867.html

规则引擎，机器学习模型，设备指纹，黑白名单（例如邮件、IP地址黑白名单）和无监督检测分析？经常会有人问，我们应该选择哪种反欺诈检测方式？其实每一种方法都有其独特的优势，企业应该结合反欺诈解决方案及反欺诈行业专家经验，搭建出一套最适合自己公司业务、产品以及用户类型的反欺诈管理系统。

规则引擎和学习模型是传统反欺诈系统构建中重要的两个基本组成部分。接下来的文章中会介绍这两套系统是如何工作的？它们各自的优势和局限性是什么？为什么无监督分析算法优越于规则引擎和机器学习模型，以及使用无监督分析算法在捕捉新型欺诈时的必要性。

>>>>规则引擎

无监督机器学习：超越规则引擎和有监督机器学习的反欺诈分析方法

>>>>工作机制

规则引擎将商业业务逻辑和应用程序代码划分开来，安全和风险分析师等基于SQL或数据库知识就可以独自管理运行规则。有效的规则可以通过几行逻辑代码一目了然的进行表述：If A and B, then do C。例如：

IF(user_email=type_free_email_service) AND (comment_character_count ≥ 150 per sec) {

flag user_account as spammer

mute comment

}

规则引擎同样可以使用加权打分评分机制。例如，下表中的每一项规则都对应一个分值，正数或负数，这个分值可以由分析师赋值。所有规则的分数会被加起来，之后得到一个总计分数。规则引擎基于分数临界值创建出业务运维流程。在一个典型的运维流中，根据分数范围，一般会分为三种行为类型，例如：

1.高于1000 －否认（如拒绝交易，暂停帐户）

2.低于300－接受（如确认订单，通过内容）

3.介于300到1000－提示需要增加额外的审核，置入人工审校池

无监督机器学习：超越规则引擎和有监督机器学习的反欺诈分析方法

➜优势

规则引擎可以从数据库中导入数据，挑选出黑名单（如IP地址）和其它坏的列表。每当一个新的欺诈情况发生后，分析师会增加一个新规则，以保证公司在可预见范围内免于欺诈风险。这样通过使用规则引擎，公司便可以避免一些周期性出现的欺诈。

➜局限性

一旦欺诈规模增大，规则引擎就会展现出局限性。欺诈者不会在被捕捉后依旧坐以待毙，他们会研究你是如何捕捉他们，之后变换新的方式，避免再次被捉到。所以，规则作用的时间很有限,可能是几周，甚至几天。试想一下，当你在运行和测试成百上千条新的规则同时，还需要每隔几天增加新的规则，删除或更新之前的规则，并对规则进行加权，这无疑要花费大量运营资源,时间，和费用来维护。

如果一个反欺诈分析师要在3种规则下计算出通过、拒绝及比例数字，并通过比例变化情况调整每一项规则的分值，需要做出8种改变：2^3 = 8（values^rules）。而测试3种不同值的10种规则需要做出超过5.9万次变化。逐渐随着规则数量增加，改变频率也会随之快速增长。

无监督机器学习：超越规则引擎和有监督机器学习的反欺诈分析方法

规则引擎不会从分析观察或反馈中自动学习。由于欺诈者经常改变欺诈方式，导致数据会间歇性暴露在各种新的攻击下。此外，规则引擎是基于二进制方式处理信息，有可能无法完全检测到数据细微差别，这会导致出现更高的误判率及用户负面体验。

有监督机器学习模型

➜工作机制

有监督机器学习模式是反欺诈检测中最为广泛使用的机器学习模式。其中包含的几个学习技术分别有决策树算法，随机森林，最近邻算法，支持向量机和朴素贝叶斯分类。机器学习通常从有标签数据中自动创建出模型，来检测欺诈行为。

无监督机器学习：超越规则引擎和有监督机器学习的反欺诈分析方法

在创建模型的过程中，清楚了解哪些是欺诈行为，哪些不是，会起到至关重要的作用。模型中倒入的数据会影响其检测效果。用已知欺诈数据和正常数据做训练集，可以训练出学习模型来填补并增强规则引擎无法覆盖的复杂欺诈行为。

下面是一个关于有监督机器学习机制如何将新的数据划分为欺诈和非欺诈的例子。训练数据通过识别模型特点，可以预知两种类型欺诈者： 1. 信用卡欺诈者 2. 垃圾信息制造者。以下三种特征对识别欺诈攻击类型非常有帮助：1. 邮件地址结构 2. IP地址类型 3. 关联账户密度指示欺诈攻击类型（如变化的回复）。实际上，一个典型的模型有成百上千种特征。

在此例中，拥有以下特征的用户会被训练出的模型识别为信用卡欺诈：

邮箱地址前5个是字母，后3个是数字

使用匿名代理

中等密度关联账号（例如10）

有以下特征的用户会被识别为垃圾信息制造者：

邮箱地址按某种形式随机生成的

使用数据中心的IP地址

高密度关联账号（例如30+）

假设现在你的模型正在从下面一批用户里评估风险，这个模型会计算每个用户的邮件地址结构，IP地址类型以及账号关联密度。正常情况下，模型会将第二种和第三种用户归类为垃圾制造者，把第一、第四、第五种归为信用卡欺诈者。

无监督机器学习：超越规则引擎和有监督机器学习的反欺诈分析方法

➜优势

训练学习模型填补并增强了规则引擎无法覆盖的范围，学习模型可以通过增加训练数据持续提高其检测效率。学习模型可以处理非结构数据（如图像，邮件内容），即使有成千上万的输入信息变化特征，也可以自动识别复杂的欺诈模式。

➜局限性

虽然有监督机器学习创建模型功能比较强大，但同时也有局限性。如果出现之前没有标签案例的、新的欺诈类型该怎么办？由于欺诈方式经常变化，这种情况普遍存在。毕竟欺诈者在不停地变化欺诈手段，日以继夜的实施各种新型攻击，如果之前没有遇到这种欺诈攻击模式，也没有足够的训练数据，那么训练出的模型就不能返回优质、可靠的结果。

从下图中可以看出，收集数据和标记数据是创建有监督机器学习过程中最重要的部分。产出准确的训练标签可能需要花费数周到数月的时间。并且产生标签的过程需要反欺诈分析团队全面审核案例，将数据进行正确标签分类，并在投入使用前进行验证测试。除非学习模型之前有足够的相应训练数据，否则一旦出现新的攻击，学出的模型将会无法识别。

无监督机器学习：超越规则引擎和有监督机器学习的反欺诈分析方法

无监督机器学习－超越规则引擎和有监督机器学习

以上两种欺诈检测框架都有各自明显的局限性，DataVisor创新的无监督机器学习算法弥补了这两种模型的不足。无监督检测算法无需依赖于任何标签数据来训练模型。这种检测机制算法的核心内容是无监督欺诈行为检测，通过利用关联分析和相似性分析，发现欺诈用户行为间的联系，创建群组，并在一个或多个其他群组中发掘新型欺诈行为和案例。

无监督机器学习：超越规则引擎和有监督机器学习的反欺诈分析方法

无监督检测提供了攻击的群组信息，并自动生成训练数据，之后汇入到有监督的机器学习模块中。基于这些数据，有监督机器学习通过模型结构，可以进一步发现大规模攻击群组之外的欺诈用户。DataVisor所采用的这种框架模式不仅可以找出由个人账号发起的攻击，更重要的是可以有效发现由多个账号组成的欺诈或犯罪团伙实施的有组织的大规模攻击，为客户反欺诈检测框架增加至关重要的早期全方位检测。

无监督机器学习：超越规则引擎和有监督机器学习的反欺诈分析方法

DataVisor采用的关联分析方法将欺诈行为相似的群组归为一类。而另一种检测技术－异常检测，将不符合好用户行为特点的用户均列为欺诈对象。其原理是假设坏用户都是孤立于正常用户之外的单个用户或小群组。下面图表列举了欺诈者F1、F3、群组F2，以及好用户群组G1和G2。异常检测模型只能发现此类孤立的欺诈行为，但在鉴别大规模的群组欺诈时就会面临很大的挑战。在这一点上，相比于异常检测，无监督分析的优势显而易见。

无监督机器学习：超越规则引擎和有监督机器学习的反欺诈分析方法

DataVisor把无监督分析算法结合规则引擎和机器学习模型一起使用。对于客户来说，这种全方位的检测在提供欺诈信息列表的同时，也会提供给客户新的欺诈检测模型，并帮助用户创建新的检测规则。一旦DataVisor的检测方式发现客户遇到新型未知欺诈，无监督检测可以有效提前早期预警。

通过专注于早期检测和发现未知欺诈，DataVisor帮助客户在欺诈解决方案的各个方面提升机制、提高效率：

鉴别虚假用户注册和帐户侵权；

检测虚假金融交易和活动；

发现虚假推广和促销滥用；

阻止社交垃圾信息，虚假内容发布、虚假阅读量和虚假点赞数量；

翻译者：Lily.Wang

扑克AI

德州扑克AI ——（吴昊熊兵兵合译）

February 13, 2017 zr9558 2 Comments

德州扑克AI ——（吴昊熊兵兵合译）

作者：Mick West

（该英文文献所在的网址为http://rrurl.cn/lQlN3B）

原文链接：http://www.cnblogs.com/tuanzang/archive/2013/03/27/2985497.html

本文最初发表于“Inner Product”中的“游戏开发者”栏目，时间是2005年11月，我最近编写了为扑克牌的AI编写了一系列。我一开始以为这将是一件容易的事，但它证明了这比我想象的要复杂很多。本文为初露头角的扑克AI程序员提供了一个基础的，一个简单的实现。无制约的德州扑克AI，覆盖一副牌力量的评估的基本知识和下注的基本知识。通过运用本文提供的一些原理，你可以很快地模拟出一个强悍的德州扑克AI，并且对你将来做牌类的AI的每一步到底做什么有一个坚实的基础，我这里假设你懂得纸牌的基本术语。

（如图，这是一个可以计算德州扑克的牌的赔率的计数器）

德州扑克（TEXAS HOLDEM）

游戏AI的目标，我认为有两点：

（1）让玩家有一种有趣和令人愉快的体验。

（2）从属于目标（1），也就是说在玩家拥有了一种有趣而愉快的体验的基础上，尽量让他们得到一种“挑战”，而不像游戏《植物大战僵尸》一样，玩家没有一丝一毫的挫败感。

扑克的数据类型

你需要考虑利用一些数据结构来标识这些状态信息（这一点，我已经在吴昊品游戏核心算法 Round 15之吴昊教你玩德州扑克中做到了，方法就是位标识存储），以下就是利用位/字节对如下信息的一些存储（更好的存储方式，本文留给了读者自己去想）

花色（suit）是在0—3范围的整型变量，梅花=0，方块=1，红心=2，黑桃=3

点数（rank）是在0—12范围的整型变量，其中，令2（deuce）=0,3=1……13(King)=11，1（Ace）=12,每一个花色都有13个不同大小的点数

一张纸牌(card)是在0—51范围内的整型变量，我们提出如下公式

card = suit*13 + rank.

Suit = card/13

Rank = card%13

我们利用64bit的空间来存储一手牌（实际上，空间上面依然有一些浪费，其中的52bit被使用，而有14bit被留作陪葬品了）

我们利用一个整型变量来描述你手上的牌型（poker type）, 其中 0= no pair, 1=pair, 2=two pair, 3=trips, 4=straight, 5=flush, 6=full house, 7=quads, 8=straight flush.

牌值分析

我们利用一个32位的整型变量来表征一手牌的值，它表示一手牌的相对价值和一手牌的实力。通过两手牌的值，你可以判断出哪一手牌的实力更强悍。

我们利用6个半字节（4位）来表征一手牌，其中，最重要的四位代表牌的牌型，后面的5个半字节量来表征不同等级的牌在牌值分析中的价值。

例如：

样例1：AH QD 4S KH 8C是一个没有对子的牌型（有时候我们说成是散牌或者是高A），所以，所以,type设置为0，剩下的五个ranks按照五张牌的递减顺序排列，（A,K,Q,8,4）被翻译为以下的五个数：12,11,10,6,2（或者对于16进制来说，为C,B,A,6,2），再结合高位的牌型0标识，给出了一个32bit的整型变量：0x000CBA62.这里，我们需要注意两点：（1）我们的这种数据结构忽略了花色的信息，但是，唯有我们在分析同花顺的时候，才有必要了解到高位信息。（2）注意到两个高位的牌值都为0.

样例2和样例3的解释同理，所以，我在这里就忽略了。

计算牌值

我们现在需要的就是得到一手牌，然后计算以下这手牌的牌值。

这牵扯到牌型，插入半字节变量的牌的等级，以上。

一手拥有着四种花色（梅花，方块，红桃，黑桃）都有13bit(对于每一种花色来说)，13bit可以提供仅仅8192种组合，我们可以通过预处理8K的表中的若干像这样的位（在13bit内部的）（如果你有五个或者更多的相同花色的牌，那你就得到了一个同花顺），或者是一手牌中的任何强悍的牌，你也可以从一个特别的bit组合中预处理出最高的五张牌，作为起步牌。

如果你要去计算等级（草花方块红桃黑桃），那么该行列值就应该是一个位字段。这个位字段会为你所拥有的至少一个牌设定一个值。在这里设定的这个值是你所拥有的，也是一个特定的值。我们计算出每个草花方块红桃黑桃中的设定片段的数字值，并减去在每一个特定的行列值中的数字值，得到重复的行列数值，以此用来作为确定你有什么类型的底牌的基础。

例如：如果你有2D AS AH 2C 2H，你可以迅速确认你有五张牌，只有两个独特的rank，重要的是，你必须有一个葫芦（福尔豪斯）或者是一个铁支。更多的简单测试将几乎决定你要什么。整个评估函数将包括像这样的测试，逐步削减可能的牌型。

因为这个函数更多地包含了位运算，表查询和简单比较，它会变得非常快（位运算的优势嘛），它也非常适合于微调优化，确切的实现将取决于目标体系结构，你可能可以利用一些特定的处理器指令，这样会变得更有效率。

计算一手牌的力量

一手牌的力量计算是你这手牌可以赢的概率，给你底牌，明牌和留在对手手中的牌，一手牌的力量是一个介于0.0（彻底地输）和1.0（彻底地赢）之间的一个浮点数，例如，HS为33%的话，说明你有33%的概率可以赢。

一个最简单和最方便的手段来计算HS的方法就是模拟许多许多次游戏的过程，计算你的牌可以赢的次数（这有点像数学建模里面的黑箱操作），比如你模拟1000次这个游戏，在模拟中，你赢了423 次，那你可以近似的确定，你赢这场游戏的概率（HS值）为0.423.

模拟整个过程是很简单的：

（1）设置分数0

（2）移除你所知的牌（底牌和明牌）

（3）重复1000次（或者更多吧，这取决于CPU的资源和期望得到的精确程度）

（4）随机化剩下的组

（5）对付你对手的底牌，和你剩下的公共牌

（6）评估所有的手牌，看看谁有最好的！

（7）如果你有最好的话，那么加上1/(和你拥有相同命运的牌值的人的数目)（这通常是1）

（8）结束if循环

（9）结束1000次模拟

你这一手牌的力量=你所得的分数/你进行模拟实验的总次数

更精确的考虑我觉得几乎没有必要，所以，在这里也略去。

（如图，此为2011年百度之星的总决赛，当年的题目就是德州扑克的AI）

POT的赔率

POT的赔率的计算=你为了叫牌下的注/（你叫牌下的注+POT内的钱的总数）

回报率

回报率指的是，你如果要下这手牌，可以得到的金额与你下注的比值（引入了牌力的大小）

回报率=一手牌的力量/POT的赔率

我们的基本策略就是如果回报率大于1的话，我们就将拍拿在手上。

对于弃牌（FOLD）/叫牌（CALL）/加倍（RAISE）的选择

对于每一个Round(不同于吴昊系列的Round，这里指的是一次游戏)的下注中计算机都需要决定是否需要弃牌/叫牌/加倍（被称为FCR决定），忽略目前叫加倍有多大的价值，我们得到一个比率量（返回值RR），这里提供一个基于一定可能性的既简单又非常实用的映射（映射的两个量为RR和FCR）

如果RR<0.8，那么95%选择弃牌，0%选择叫牌，5%选择加倍（这里加倍的目的是为了虚张声势）

如果RR<1.0,那么80%选择弃牌，5%选择叫牌，15%选择加倍（虚张声势，虚张声势！！！）

如果RR<1.3，那么0%选择弃牌，60%选择叫牌，40%选择加倍

另外，如果RR>=1.3，那么0%选择弃牌，30%选择叫牌，70%选择加倍

如果弃牌和叫牌的数量都为0的话，那么，叫牌吧！

不要过于在意以上列出的精确的百分比，这些数目将决定于你计算你的一手牌的力量值，你也许想过通过上一轮下注的多少来改变你目前的下注，你甚至想通过改变这些数值来创建具有不同个性的对手。

利用这个非常简单的介于RR和FCR之间的映射决定可以让你拥有一个令人惊讶的既有道理的又有娱乐性的AI玩家。他们将趋于玩强有力的手牌。他们也许偶尔会虚张声势，他们也不会轻易因为他们手上的牌太好而感到惊讶，他们也会在虚张声势地叫加倍之后处理薄弱的手牌，他们也会坚持寻找一个合理的机会来得到一个同花顺或者是铁支，让游戏的娱乐性大为增强。

没有一种情况是必胜或者是必败的，这是一个非常重要的道理，这说明你永远都不能根据你的AI对手的行动来揣测出他的牌（除非他们弃牌，这种信息也不能真正帮到你），如果他们加倍的话，那你可是要小心了，他们可能是有一手非常好的牌，但是也是有1比20的概率，他们的手上可能只有一手非常非常差劲的牌。

筹码保护

在你还有很多钱而且盲注比较小的时候，这个简单的规则可以支持的工作。但是，当你的筹码稀释，盲注增加的之后，你就得考虑一下你的金钱的可持续性了。同样地，偶尔，那些玩家也会“全力投入”，他们会赌上自己筹码内的所有的金钱，所以我们必须让AI变得更有逻辑性，来防止在筹码内的金钱很少的时候，不让差的叫牌发生。

假设你有AD,2D，公共牌是QC,KC,2C，那么你有一对牌，但是也有可能是同花顺，在POT内又500美元，赌注是100美元，对手为两个玩家，但是，这是你最后的100美元。POT赔率为100/600=0.1666,你的一手牌的力量为0.297,所以你的回报率为1.8.如果你将这种情景一遍一遍地重复，你将又可能每次得到平均80%的回报率。然而，这是你最后的100美元了，你有70%的概率会失去一切，那么，不要再下注了！

来处理这些事情，我们可以给予一个简单的启发式：

如果我的建议赌注将大大地维持我现在的筹码，那么在我有一次很有自信的赢的机会的时候，不要去下注。

可以部分地描述为：

如果（筹码-下注）<(盲注*4)并且（HS<0.5）那么就弃牌

含义是如果叫牌会让你只剩下不到四倍的盲注，那就不用叫牌，除非你有50%以上的胜算。

扑克是一个复杂的游戏，有着非常多种类的不可思议的情况等你去处理。我建议你让这些极少数个别的情况越少越好，这样可以减少游戏中更少的风险漏洞，但是，我们可以利用一些启发式算法（经验法则）来处理这种模糊的情况，让AI的逻辑具备更多的复杂性，大大提高可玩性。

（我有一个朋友是华中科技大学的百度俱乐部的，他当年也来参观了总决赛，听说得第一名的一个人是利用了一个无理手，让玩家们不断出牌，自己坚决不出，到了最后大家都没有好牌之后才出自己的牌，利用这种奇葩的AI战术取得了非常好的效果！）

测试扑克的智能性

平均来说，和一般玩家快速地玩单付德州扑克游戏只需要大约30分钟的时间。理想情况下，通过自然人玩家和智能玩家来竞争你才能完成你的测试，并且找出其中包含的问题。不幸的是，由于玩家的随意性正在一步步得到解决，玩家很容易通过低于标准杆逻辑获得幸运的牌号并硬的游戏，或者甚至于通过有缺陷的逻辑也可以实现这么一点。我已经发现至少需要10场比赛来开始得到对于AI玩家的素质的清晰了解，通过超过100次的游戏才能真正确定这种素质。

这对于测试项目来说往往会造成一种不理智的负担，并在获取AI 玩家身上发生的变化上引入一个非常长时间的时延。解决的办法是自动化测试。认证机构应该设定不同的变种AI玩家，以使得不同变种的AI可以在一个设定的速度非常高的游戏中互相对战。你也应该编写一些简单的扑克AI的组合，如AI，它总是适于所有的，其他那些易于用手而不是用手指。然后，你对AI的松紧程度进行设置来应对这些对手，同时确保其赢得适当比例的比赛。如果你写的评价和模拟得当，那么你应该能够在一秒种左右时间内模拟一整场比赛（您可能要减少反复的模拟，以加快测试速度）。

自然人测试的最好用途就是去试图使他们找到AI的利用性，然后你可以编纂到一个临时的AI对手，包括利用此漏洞的测试套件。你进而可以调整你的AI，直到探测到它失败的漏洞，同时仍然能够打败所有其他（标准）的对手。

扑克AI

Programming Poker AI

February 13, 2017 zr9558 Leave a comment

Programming Poker AI

By Mick West

This article was originally published in the “Inner Product” column in Game Developer Magazine, November 2005

I recently programmed the AI for the World Series of Poker, developed by Left Field Productions and published by Activision. I started out thinking it would be an easy task. But it proved a lot more complex than I initially thought.

This article for the budding poker AI programmer provides a foundation for a simple implementation of No-Limit Texas Holdem Poker AI, covering the basics of hand strength evaluation and betting. By following the recipe set out here, you will quickly become able to implement a reasonably strong poker AI, and have a solid foundation on which to build. I assume you are familiar with the basic terminology of poker.

TEXAS HOLDEM

The goal of any game playing AI is twofold. The primary purpose is to allow the player to have a fun and enjoyable experience. The secondary purpose, subordinate to the first, is to play a strong enough game to provide sufficient challenge to the majority of players in your intended audience.

POKER DATA TYPES

You will need an implementation of the following data types. I’m going to describe them at the bit/byte implementation level, leaving the high level abstraction up to you.

A “suit” is an integer in the range 0..3, where 0=Clubs, 1=Diamonds, 2=Hearts, 3=Spades

A “rank” is an integer in the range 0..12, where 0 = 2 (deuce), 1 = 3, 11 = King, 12 = Ace. This is the cards in a suit arranged in rank order

A “card” is an integer in the range 0..51, hence
card = suit*13 + rank.
Suit = card/13
Rank = card%13

A “Hand” is a 52 bit data type, where each bit represents a single card. This can be stored as four 16 bit words for ease of use, where each 16 bit word represents the potential cards in one suit (using 13 of the 16 bits) (figure 1)

A “Hand Type” is an integer representing the type of poker hand you have, where 0= no pair, 1=pair, 2=two pair, 3=trips, 4=straight, 5=flush, 6=full house, 7=quads, 8=straight flush.

ENCODING HAND VALUES

A “Hand Value” is a 32 bit integer representing the relative value or strength of any hand of cards. By comparing two hand values, you can see which hand is stronger in a game of poker.
The hand value can conveniently be represented as a series of six 4-bit nibbles, where the most significant nibble represents the Hand Type, then the next five nibbles represent the different ranks of the cards in the order of significance to the hand value. (figure. 2)

Example 1: AH QD 4S KH 8C is a “no pair” hand type (sometimes called a “high card”, or in this case “Ace high”). So, the hand type nibble is set to 0. The remaining nibbles in the Hand Value are filled out with the ranks of the five cards in descending order. (A, K, Q, 8, 4), which translated into rank indices: 12,11,10,6,2 (or C,B,A,6,2 in hexadecimal), and when combined with the hand type (0) in the high nibble, gives us a 32 bit integer: 0x000CBA62.

The individual suits of the cards are basically ignored in the final hand value. The only time suit is significant is when it contributes to a flush. Also, note the top two nibbles of the Hand Value are always zero.

Example 2: 4D JD 3D 4C AD is a pair of fours, with Ace, Jack, Three kickers. The hand type is a pair, (type 1), then the ranks follow, starting with the rank of the pair, then the ranks of the kickers, so 4,A,J,3, which gives us 0x0012C910.

Example 3: 7C, 6C, 5C, 4C, 3D is a straight (type 4). More specifically it’s a seven high straight. The only rank of import here is the seven (rank 5). So the hand value is encoded as 0×00450000. We save ourselves a bunch of instructions in ignoring the four low cards after we’ve determined it is a straight.

Look at the resultant hand values of the above examples, you can clearly see how the better hands always have a higher hand value, making determining the wining hand a simple comparison.

CALCULATING HAND VALUES

What we now need is a function that takes a hand, and returns a hand value. This involves determining the hand type, then inserting the nibbles for the hand ranks, as above.

A hand is four words (clubs, diamonds, hearts, spades) of 13 bits each. 13 bits can be arranged in just 8192 combination, which means we can accelerate the evaluation of a hand by pre-calculating 8K tables of things like the number of bits set in a (13 bit) word (if you have five or more of the same suit, then you’ve got a flush), or the highest card of any straight in the hand. You can also pre-calculate a table of the highest five cards from a particular bit combination, which you can then use to set the kicker cards.

If you calculate ranks = (hearts | diamonds | clubs | spades) then the value ranks is a bit-field with a bit set for every card rank that you have at least one of. The number of bits set here is the number of unique ranks you have. We calculate the number of bits in each of hearts, diamonds, clubs and spades, and subtract the number of bits in the unique ranks, giving the number of duplicated ranks, to be used as the basis of determining what type of hand you have.

Example: if you have 2D AS AH 2C 2H, you can very quickly determine that you have five cards, that there are just two unique ranks, and hence you must have either a full house or four of a kind. A few more simple tests will determine exactly what you have. The entire evaluation function will consist of tests like this, gradually whittling down the possible hand types.

Since the function consists mostly of bitwise operations, table lookups and simple comparisons, it is going to be very fast. It’s also very amenable to fine tuning optimization, and the exact implementation will depend on the target architecture. You may be able to take advantage of some processor specific instructions to greatly improve the efficiency.

CALCULATING HAND STRENGTH

Hand strength is the probability that you will win the hand, given your hole cards, the community cards, and the opponents who remain in the hand. Hand strength is a floating point number between 0.0 (certain loss) and 1.0 (certain win). For example, a HS of 0.33 means you have a 33% chance of winning.

The easiest and most flexibly way of calculating the HS is to simulate the progress of the game a very large number of time, and count the number of those times you win. Say you simulate the game 1,000 times, and in the simulation, you win 423 games, then you have a high degree of certainty of having an approximate HS of 423/1000, or 0.423.

The procedure for simulating a game is very simple:

Create a pack of cards
Set score = 0
Remove the known cards (your hole cards, and any community cards)
Repeat 1000 times (or more, depending on CPU resources and desired accuracy)
Shuffle the remaining pack
Deal your opponent’s hole cards, and the remaining community cards
Evaluate all hands, and see who has the best hands
If you have the best hand then
Add 1/(number of people with the same hand value) to your score (usually 1)
End if
end repeat
Hand Strength = score/number of loops (1000 in this case).

To be more accurate, we have to run our simulation with people dropping out if they are dealt hole cards below a certain threshold. In practice, the determination of if a player stays in or not in a simulation is a probabilistic function of the strength of their hole cards, their table position, their stack size, the blind size and their previous behavior. For now we can just modify the simulation, so after dealing the opponents hole cards, remove any non-blind players with hole cards worse than, say, a pair of sixes. While not particularly elegant, it will still give you a useful number.

POT ODDS

The pot odds number is the ratio of your bet or call to the size of the pot after you bet (the amount you will win). For example, if the bet is $20, and there is $40 in the pot, then the pot odds are 20/(20+40) = 0.333.

RATE OF RETURN

Rate of return is the “on average” proportion of how much you will multiply your bet by, if you stay in the hand.

Rate of Return = Hand Strength / Pot Odds.

The base strategy we implement is to mostly stay in hands with a rate of return greater than 1.

THE FOLD/CALL/RAISE DECISION

For each round of betting the computer needs to decide if it is going to fold, call or raise (The FCR decision). Ignoring the question for the moment of how much to raise for now, then given a Rate of Return (RR), it’s possible to provide a very simple (yet useful) mapping between RR and FCR.

If RR < 0.8 then 95% fold, 0 % call, 5% raise (bluff)
If RR < 1.0 then 80%, fold 5% call, 15% raise (bluff)
If RR <1.3 the 0% fold, 60% call, 40% raise
Else (RR >= 1.3) 0% fold, 30% call, 70% raise
If fold and amount to call is zero, then call.

Don’t pay too much attention to the precise percentages listed above, the numbers will depend on the way you calculate your hand strength, and you’ll want to vary them depending on which betting round you are in. You will also want to vary these numbers to create players with different personalities.

Using this very simple mapping between the RR and the FCR decision can give you a surprisingly reasonable and entertaining player. They will tend to play strong hands, they will occasionally bluff, they won’t scare easy if their hand is good, and they will abandon weak hands when raised, and they will stick around on a reasonable chance of a flush or straight draw, making for entertaining gameplay.

The fact that none of the percentages is 100% is also important. That means you can never deduce the hand strength of your AI opponent based on their actions (unless they fold, where the information does not really help you). If they raise, then they could have any kind of hand strength – probably a good one, but it might be the 1 in 20 times when they are bluffing with a very weak hand.

STACK PROTECTION

The simple rules above work well when your stack of chips is large and the blinds are small. However as your stack shrinks and the blinds increase then the amount of money you need to commit to stay in a hand can become a very substantial proportion of your stack. Also, occasionally other players might go “all-in”, betting their entire stack of chips, so we need some logic to prevent the AI from making bad calls when short stacked.

Say you have AD, 2D and the flop is QC, KC, 2C. So you have a pair of twos, but there is a possible flush out there. There is $500 in the pot and the bet is $100 to stay in against two player, but it’s your last $100. The pot odds are 100/600 = 0.1666, your hand strength is 0.297, so your rate of return is about 1.8. So if you could play this situation over and over again you would make on average an 80% profit each time. However, it’s your last $100, and you have about a 70% chance of loosing everything. Don’t make that bet!

To handle this we can use a simple heuristic, along the lines of:

“If my proposed bet will substantially commit my stack, then don’t do it unless I have a strong chance of winning”

which might be implemented in part by:

“if (stack- bet) < (blind * 4) and (HS < 0.5) then fold”

Meaning if the call would leave you with less than four times the big blind, then don’t call unless you have a greater than 50% chance of winning.

Poker is a complex game, with a surprisingly large number of different types of situations like this that you have to handle somehow. I recommend you have as few special cases as possible, as it reduced the risk of an exploit being introduced into the game via some obscure special case. However, you should anticipate a number of heuristics (rules of thumb) being hard coded into the AI logic.

TESTING POKER AI

Playing a quick single table game of Texas Holdem takes around 30 minutes on average with human players. Ideally you would perform your testing by having human players play against the AI and trying to find problems with it. Unfortunately, due to the random hands being dealt, it’s very easy for one player to simply get lucky and win the game with sub-par logic, or even flawed logic. I’ve found it takes at least ten games to begin to get a clear picture of the qualities of an AI player, and more like a hundred games to be really sure. This often creates an unreasonably burden on the testing department, and introduces a very long delay in getting feedback on AI changes.

The solution is automated testing. The AI should be set up so that different variants of AI can play against each other in a very high speed set of games. You should also code a few simplistic poker AI’s into the mix, such as an AI that always goes all in, or another that simply always raises with a hand better than a pair of fives. Then you set your AI loose against these opponents, and make sure that it wins the appropriate percentage of games. If you coded your evaluation and simulation appropriately, then you should be able to simulate an entire game in about a second. (You might want to reduce the iterations of the simulation a bit to speed up testing).

The best use of your human testers is to try to get them to find an exploit of the AI, then you can codify this exploit into a temporary AI opponent to include in your test suite. You can then tweak your AI until it defeats the exploit, while still being able to defeat all the other (standard) opponents.

MORE WORK

What I’ve set out here is just a foundation for poker AI. By following the process laid out here you will get a reasonably strong and entertaining opponent. Here’s a quick list of the topics you might want to look into

• Pre-flop hand strength tables
• Opponent modeling.
• Implied Odds.
• Personality modeling
• Positional play
• Probabilistic search space
• Game theory and Nash Equilibrium.

RESOURCES:

– Sklansky, David, The Theory of Poker, 1999, Two Plus Two Publishing. – Provides various discussion of pot odds, implied odds, etc, with many heuristics that might be useful.
– The University of Alberta Computer Poker Research Group:
http://www.cs.ualberta.ca/~games/poker/ A number of research papers on implementing poker AI.
– Hold’em Killer, Evin Peretz, http://www.holdemkiller.blogspot.com/ – A blog on implementing poker AI.
– Poker-Eval, http://freshmeat.net/projects/poker-eval/ – A GPL Licensed poker hand evaluation library.

This entry is filed under Game Development, Inner Product. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

安全业务领域

Unsupervised Analytics: Moving Beyond Rules Engines and Learning Models

February 13, 2017 zr9558 Leave a comment

Unsupervised Analytics: Moving Beyond Rules Engines and Learning Models

无监督机器学习：超越规则引擎和有监督机器学习的新一代反欺诈分析方法

Rules engines, machine learning models, ID verification, or reputation lookups (e.g. email, IP blacklists and whitelists) and unsupervised analytics? I’ve often been asked which one to use and should you only go with one over the others. There is a place for each to provide value and you should anticipate incorporating some combination of these fraud solutions along with solid domain expertise to build a fraud management system that best accounts for your business, products and users. With that said, rules engines and learning models are two of the major foundational components of a company’s fraud detection architecture. I’ll explain how they work, discuss the benefits and limitations of each and highlight the demand for unsupervised analytics that can go beyond rules engines and machine learning in order to catch new fraud that has yet to be seen.

Rules Engines

Image Source

How they work

Rules engines partition the operational business logic from the application code, enabling non-engineering fraud domain experts (e.g. Trust & Safety or Risk Analysts) with SQL/database knowledge to manage the rules themselves. So what types of rules are effective? Rules can be as straightforward as a few lines of logic: If A and B, then do C. For example,

IF (user_email = type_free_email_service) AND (comment_character_count ≥ 150 per sec) {

flag user_account as spammer

mute comment

}

Rules engines can also employ weighted scoring mechanisms. For example, in the table below each rule has a score value, positive or negative, which can be assigned by an analyst. The points for all of the rules triggered will be added together to compute an aggregate score. Subsequently, rules engines aid in establishing business operation workflows based on the score thresholds. In a typical workflow, there could be three types of actions to take based on the score range:

Above 1000 – Deny (e.g. reject a transaction, suspend the account)
Below 300 – Accept (e.g. order is ok, approve the content post)
Between 300 and 1000 – Flag for additional review and place into a manual review bin

Advantages

Rules engines can take black lists (e.g. IP addresses) and other negative lists derived from consortium databases as input data. An analyst can add a new rule as soon as he or she encounters a new fraud/risk scenario, helping the company benefit from the real-world insights of the analyst on the ground seeing the fraud every day. As a result, rules engines give businesses the control and capability to handle one-off brute force attacks, seasonality and short-term emerging trends.

Limitations

Rules engines have limitations when it comes to scale. Fraudsters don’t sit idle after you catch them. They will change what they do after learning how you caught them to prevent being caught again. Thus, the shelf life of rules can be a couple of weeks or even as short as a few days before their effectiveness begins to diminish. Imagine having to add, remove, and update rules and weights every few days when you’re in a situation with hundreds or thousands of rules to run and test. This could require huge operational resources and costs to maintain.

If a fraud analyst wants to calculate the accept, reject, and review rates for 3 rules and get the changes in those rates for adjusting each rule down or up by 100 points, that would require 8 changes: 23^ = 8 (values^rules). Testing 10 rules with 3 different values would be over 59K changes! As the number of rules increases, the time to make adjustments increases quickly.

Rules engines don’t automatically learn from analyst observations or feedback. As fraudsters adapt their tactics, businesses can be temporarily exposed to new types of fraud attacks. And since rules engines treat information in a binary fashion and may not detect subtle nuances, this can lead to higher instances of false positives and negative customer experiences.

Learning Models

Image Source

How they work

Supervised machine learning is the most widely used learning approach when it comes to fraud detection. A few of the learning techniques include decision trees, random forests, nearest neighbors, Support Vector Machines (SVM) and Naive Bayes. Machine learning models often solve complex computations with hundreds of variables (high-dimensional space) in order to accurately determine cases of fraud.

Having a good understanding of both what is and what is not fraud plays a central role in the process of creating models. The input data to the models influences their effectiveness. The models are trained on known cases of fraud and non-fraud (e.g. labeled training data), which then facilitate its ability to classify new data and cases as either fraudulent or not. Because of their ability to predict the label for a new unlabeled data set, trained learning models fill in the gap and bolster the areas where rules engines may not provide great coverage.

Below is a simplified example of how a supervised machine learning program would classify new data into the categories of non-fraud or fraud. Training data informs the model of the characteristics of two types of fraudsters: 1) credit card fraudsters and 2) spammers. Three features: 1) the email address structure, 2) the IP address type, and 3) the density of linked accounts are indicative of the type of fraud attack (e.g. response variable). Note in reality, there could be hundreds of features for a model.

The trained model recognizes that a user with:

an email address that has 5 letters followed by 3 numbers
using an anonymous proxy
with a medium density (e.g. 10) of connected accounts

is a credit card fraudster.

It also knows recognizes that a user with:

an email address structure with a “dot” pattern
using an IP address from a datacenter
with a high density (e.g. 30+) of linked accounts

is a spammer.

Now suppose your model is evaluating new users from the batch of users below. It computes the email address structure, IP address type, and density of linked accounts for each user. If working properly, it will classify the users in Cases 2 and 3 as spammers and the users in Cases 1, 4 and 5 as credit card fraudsters.

Advantages

Because of their ability to predict the label for a new unlabeled data set, trained learning models fill in the gap and bolster the areas where rules engines may not provide great coverage. Learning models have the ability to digest millions of row of data scalably, pick up from past behaviors and continually improve their predictions based on new and different data. They can handle unstructured data (e.g. images, email text) and recognize sophisticated fraud patterns automatically even if there are thousands of features/variables in the input data set. With learning models, you can also measure effectiveness and improve it by only changing algorithms or algorithm parameters.

Limitations

Trained learning models, while powerful, have their limitations. What happens if there are no labeled examples for a given type of fraud? Given how quickly fraud is evolving, this is not that uncommon of an occurrence. After all, fraudsters change schemes and conduct new types of attacks around the clock. If we have not encountered the fraud attack pattern, and therefore do not have sufficient training data, the trained learning models may not have the appropriate support to return good and reliable results.

As seen in the diagram below, collecting and labeling data is a crucial part of building a learning model and the time required to generate accurate training labels can be weeks to months. Labeling can involve teams of fraud analysts reviewing cases thoroughly, categorizing it with the right fraud tags, and undergoing a verification process before being used as training data. In the event a new type of fraud emerges, a learning model may not be able to detect it until weeks later after sufficient data has been acquired to properly train it.

Unsupervised Analytics – Going Beyond Rules Engines and Learning Models

While both of these approaches are critical pieces of a fraud detection architecture, here at DataVisor we take it one step further. DataVisor employs unsupervised analytics, which do not rely on having prior knowledge of the fraud patterns. In other words no training data is needed. The core component of the algorithm is theunsupervised attack campaign detection which leverages correlation analysis and graph processing to discover the linkages between fraudulent user behaviors, create clusters and assign new examples into one or the other of the clusters.

The unsupervised campaign detection provides the attack campaign group info and also the self-generated training data, both of which can be fed into our machine learning models to bootstrap them. With this data, the supervised machine learning will pick up patterns and find the fraudulent users that don’t fit into these large attack campaign groups. This framework enables DataVisor to uncover fraud attacks perpetrated by individual accounts, as well as organized mass scale attacks coordinated among many users such as fraud and crime rings – adding a valuable piece to your fraud detection architecture with a “full-stack.”

Our correlation analysis groups fraudsters “acting” similarly into the same cluster. In contrast, anomaly detection, another useful technique, finds the set of fraud objects that are considerably dissimilar from the remainder of the good users. It does this is by assuming anomalies do not belong to any group or they belong to small/sparse clusters. See graph below for anomaly detection illustrating fraudsters F1, F3, and group F2and good users G1 and G2. The benefits of unsupervised analytics is on display when comparing it to anomaly detection. While anomaly detection can find outlying fraudsters from a given data set, it would encounter a challenge identifying large fraud groups.

With unsupervised analytics, DataVisor collaborates with rules engines and machine learning models. For customers, the analytics provides them a list of the fraudsters and also gives their fraud analysts insights to create new rules. When DataVisor finds fraud that has not been encountered by a customer previously, the data from the unsupervised campaign detection can serve as early warning signals and/or training data to their learning models, creating new and valuable dimensions to their model’s accuracy.

By focusing on early detection and discovering unknown fraud, DataVisor has helped customers to become better and more efficient in solving fraud in diverse range of areas such as:

Identifying fake user registration and account takeovers (ATO)
Detecting fraudulent financial transactions and activity
Discovering user acquisition and promotion abuse
Preventing social spam, fake posts, reviews and likes

Stay tuned for future blog posts where I will address topics such as new online fraud attacks, case review management tools, and a closer look into DataVisor’s fraud detection technology stack. If you want to learn more about how DataVisor can help you fight online fraud, please visit https://datavisor.com/ or schedule atrial.

DeepMind

Value iteration networks

February 9, 2017 zr9558 Leave a comment

Value Iteration Networks Tamar et al., NIPS 2016

‘Value Iteration Networks’ won a best paper award at NIPS 2016. It tackles two of the hot issues in reinforcement learning at the moment: incorporating longer range planning into the learned strategies, and improving transfer learning from one problem to another. It’s two for the price of one, as both of these challenges are addressed by an architecture that learns to plan.

In the grid-world domain shown below, a standard reinforcement learning network, trained on several instances of the world, may still have trouble generalizing to a new unseen domain (right-hand image).

(This setup is very similar to the maze replanning challenge in ‘Strategic attentive writer for learning macro actions‘ from the Google DeepMind team that we looked at earlier this year. Both papers were published at the same time).

… as we show in our experiments, while standard CNN-based networks can be easily trained to solve a set of such maps, they do not generalize well to new tasks outside this set, because they do not understand the goal-directed nature of the behavior. This observation suggests that the computation learned by reactive policies is different from planning, which is required to solve a new task.

Planning is not a new problem – the value iteration algorithm based on Markov decision processes (MDP) has been known since 1957! What Tamar et al. do in this work though, is embed a value iteration (VI) planning component inside the overall neural network architecture. And the breakthrough insight is that the VI algorithm itself can be encoded by a specific type of CNN, which means it is differentiable.

By embedding such a VI network module inside a standard feed-forward classification network, we obtain an NN model that can learn the parameters of a planning computation that yields useful predictions. The VI block is differentiable, and the whole network can be trained using standard backpropagation.

It really is pretty cool – you give the network the machinery that can be used for planning, and it figures out all by itself the best way to use it.

Using the approach, Tamar et al. show that value iteration networks (VINS) generalize better to new grid-world scenarios than either CNNs following the DQN architecture, or fully convolutional networks (FCNs):

(Note there is no comparison to the contemporary STRAW architecture from the DeepMind team that also extends DQNs with planning).

Importantly, note that the prediction loss for the reactive policies is comparable to the VINs, although their success rate is significantly worse. This shows that this is not a standard case of overfitting/underfitting of the reactive policies. Rather, VIN policies, by their VI structure, focus prediction errors on less important parts of the trajectory, while reactive policies do not make this distinction, and learn the easily predictable parts of the trajectory yet fail on the complete task.

They also demonstrated planning success using Mars landscape images for Mars Rover navigation, planning in a physical simulation setting, and planning in the WebNav setting which requires navigating links of a web site towards a goal page.

What I’d love to see is how well the VIN architecture performs on theFrostbite Challenge.

Let’s take a closer look at how it all works, starting with the value iteration algorithm itself, then how to encode that in a NN, before finally putting it all together in a complete architecture.

Standard value iteration

“A standard model for sequential decision making and planning is the Markov Decision Process (MDP).”

You have a set of states $s \in S$ , a set of actions $a \in A$ , a reward function $R(s,a)$ that gives the anticipated reward for taking action $a$ in state $s$ , and atransition kernel, $P(s'|s,a)$ that encodes the probability of the next state given the current state and action. A policy $\pi(a|s)$ prescribes the action distribution for each state.

(Note the similarity between this structure and the action matrix of STRAW).

The goal in an MDP is to find a policy that obtains high rewards in the long term.

You can consider the value of a state under some policy as the expected discounted sum of rewards when starting from that state and following the policy. A optimal policy will find the maximal long-term return possible from a given state. Value iteration computes the rewards by iterating over the action steps ( $\gamma \in (0,1)$ is a discount factor):

Encoding value iteration in a neural network

Our starting point is the VI algorithm (1). Our main observation is that each iteration of VI may be seen as passing the previous value function Vn and reward function R through a convolution layer and max-pooling layer. In this analogy, each channel in the convolution layer corresponds to the Q-function for a specific action, and convolution kernel weights correspond to the discounted transition probabilities. Thus by recurrently applying a convolution layer K times, K iterations of VI are effectively performed.

This idea leads to the following network structure:

A reward ‘image’ $\bar{R}$ (to follow the more normal CNN formulation of working with images) is fed into convolutional layer $\bar{Q}$ with $\bar{A}$ channels. Each channel corresponds to $\bar{Q}(\bar{s},\bar{a})$ for action $\bar{a}$ . The layer is max-pooled along the actions channel to produce the next-iteration value function layer. This is stacked with the reward $\bar{R}$ and fed back in K times, to performK iterations of value iteration.

The full Value Iteration Network model

The value-iteration module we just described can now be embedded into a full value iteration network as follows:

In many systems, if you’re in a given state, and you take a given action, the set of possible states you end up in is much smaller than the overall universe of states. More precisely, the the states for which $\bar{P}(\bar{s'}|\bar{s},\bar{a}) > 0$ is a small subset of $\bar{S}$ .

In NN terminology, this is a form of attention, in the sense that for a given label prediction (action), only a subset of the input features (value function) is relevant. Attention is known to improve learning performance by reducing the effective number of network parameters during learning.

This is the purpose of the attention module added into the feedback loop in the diagram above. With the inclusion of the CNN-based value iteration module, everything in the value iteration network is differentiable:

This allows us to treat the planning module as just another NN, and by back-propagating through it, we can train the whole policy end-to-end.

To implement a VIN, you need to specify the state and action spaces for the planning module ( $\bar{S}$ and $\bar{A}$ ), the reward functions $f_R$ and $f_P$ , and the attention function. The authors call this the process of VIN design.

Once a VIN design is chose, implementing the VIN is straightforward, as it is simply a form of CNN. The networks in our experiments all required only several lines of Theano code.

Uncategorized