What Does Machine Learning Mean for Fraud Prevention?

Everyone in fraud prevention must have heard the term machine learning bandied about in discussions about the changing nature of their work. However, the recent hype suggests that some of the people promoting machine learning solutions are either overstating or misrepresenting what the technology is currently doing and what it is capable of in the near future.

In this article, will answer: What is machine learning? What are the practical differences between supervised and unsupervised learning in machine learning? When people talk about machine learning as the future of fraud prevention what do they mean?

What is Machine Learning?

For those with 10 minutes to spare, watch this great video where the CEO of e-commerce solution company Shoppimon explains machine learning in non-technical terms.

For those who prefer to read, machine learning is a subset of artificial intelligence, which involves programming computers to learn like a human and make decisions based on understanding a large amount of information. Machine learning itself is the programming of software to create it’s own methods (i.e. software-writing software) for solving problems based on available information.

Three major types of machine learning techniques are used to solve problems: classification, correlation and clustering. Classification involves teaching a machine to label different elements as belonging to specific categories. The computer software is then programmed to to solve problems using the categories of data. For example, you could teach a computer that financial transactions with certain data characteristics are fraudulent and those with other characteristics are not. Then the software would classify on its own new transactions as either fraudulent or legitimate using what it learned. The same software would then create it’s own equations to minimize the amount of approvals of transactions defined as fraudulent and maximize the approval of legitimate transactions.

Practically all anti-fraud software solutions based on machine learning use at least classification. When data scientists refer to “supervised learning” in fraud, they are usually taking about the process of training computer software to classify transactions as fraudulent or not.

Correlation involves using regression analysis to determine what combination of elements is more likely or less likely to occur. When regression analysis is used to determine the likeliness of different factors to indicate a fraudulent transaction it is also considered a form of supervised learning.

Clustering involves programming a computer to classify elements of different groups without using pre-defined categories. Since it requires no active training of the computer software by a human regarding input-output it is considered a form of unsupervised learning.

Supervised learning vs. unsupervised learning

To gain a more in-depth non-technical understanding of artificial intelligence, check out this 30-minute video where Andrew Ng, the former director of Stanford’s AI Lab, explains the current state of artificial intelligence technology.

For those who prefer to read, most of the value from artificial intelligence technology today – including machine learning solutions in fraud prevention – involves teaching a computer to take inputs and output a simple answer. This is referred to by data scientists as supervised learning. What the computer is doing in supervised learning is using the input data with labelled outputs to create a function that will take new input data and assign the correct outputs to them.

Supervised learning requires expert humans to classify the training data into the relevant categories before feeding it to the software. As such, to maintain its predictive power, the software requires talented fraud analysts on staff to spot new trends in fraud and label the transactions. This re-training must be frequent enough to catch fraudsters as they learn from their failures and change their tactics.

Unsupervised learning doesn’t rely on training data labelled by expert humans. Instead the computer software examines a very large quantity of data and tries find shared characteristics among different elements. In fraud, as in other areas, it can be used to detect anomalies across a large batch of new transactions and highlight them for further review.

Benefits of machine learning for fraud prevention

Machine learning fraud prevention solutions enable automated review of massive amounts of transactions based on the logic that can be continually updated to match recent fraud trends. This should reduce the amount of manual review required as humans are no longer need to sift through many transactions that look obviously fraudulent based on their data characteristics. As long as they are regularly retrained, machine learning systems should increase the amount of legitimate transactions approved by reducing the reliance on less flexible rules (e.g. no orders from Nigeria or Russia) that block many good orders.

Instead machine learning enable highly skilled human analysts to focus on the harder cases that take more research or knowledge to determine the validity of the transaction. It will also increase the demand for fraud analysts who are either capable of working with data science teams or can handle the training of the software themselves.

Potential drawbacks of machine learning in fraud

The biggest issue with machine learning in fraud prevention is that it requires a large amount of transaction data just to train computers to accurately sift out fraudulent transactions. This is a problem for fraud prevention in all business verticals because fraudulent transactions are always a small fraction of total transactions processed.

Practically speaking, this means it’s difficult for small and medium size merchants to create and train their own internally built machine learning solutions. Such merchants would normally be better off using a fraud solution vendor that incorporates machine learning in their automated solution and has a lot of experience working with customers in the merchant’s vertical. This way the merchant can profit from the vendor’s experience having worked with similar customers. That means as a potential customer, you should ask machine learning solution vendors what customers they’ve already served.

The second biggest issue with machine learning in fraud solutions is that as fraudsters’ tactics change, the computer software must be retrained using fresh data classified by human experts. This means that you cannot get rid of your entire manual review staff. You will need creative, highly-skilled fraud prevention experts to discover the latest fraud tactics or your expensive automated software solution will become less and less effective over time. That means you will need to either retain your best fraud analysts or ask your solution vendor if they assign outsourced fraud prevention analysts to work on client accounts.

A third issue is the ensuring there is enough high quality data statistically related to transaction validity to fine-tune machine learning solutions. This is important for training supervised learning solutions but also matters for ensuring there is enough data to detect anomalies with solutions that use unsupervised learning.

There are several ways to address this. One way is to work with a fraud solution provider who is working with many large scale merchants or payment service providers, so they will have a lot data to fine tune their statistical models. This is why some vendors like to talk about the “network” of merchants they serve.

Another way to ensure your solution gets the data it needs is to use a solution platform that connects to many data feeds. Today, most of the platforms will offer you plug-in or application programming interface (API) feeds from multiple suppliers of behavioral, biometric, transaction, IP and people data. This is why when you talk to platform providers you must ask not only how much their basic service costs but also what data provider API feeds they offer and at what cost.

Machine learning solutions are data hungry, so you need to find a cost-effective way to feed them or you won’t be getting the optimal results from your system.

Machine learning is fraud prevention’s future

For the past two decades, the biggest, most well-known fraud solutions on the market were automated systems that implemented rules devised by humans to maximize the acceptance of good orders while accepting the least amount of fraud possible. Originally, these systems had a relatively small number of hard-coded rules for accepting and denying orders and passed the rest for manual review. Already in the first decade of the millenium there was a shift to more flexible risk scoring engines, which give weighted values to transactions based on a larger number of data points to decide whether to accept, deny or pass to manual review. This increased the number of data points that factor into the yes/no decision on transactions into the dozens – enough for a human to design but too time-consuming and tiresome for anything but a machine to implement.

Machine learning fraud systems are taking the next step by essentially creating risk scoring engines designed by machines. These systems can incorporate hundreds of relevant data points for a single transaction to determine if they are fraudulent, something humans would find difficult even to design. Already most of the automated solutions on the market include some machine learning, although there are still many medium and smaller sized merchants using non-machine learning based systems. The biggest merchants and most of the payment service providers have already adopted solutions with machine learning because they deal with large enough transaction volume that the benefits far outweigh the drawbacks and even a few basis points improvement in transactions accepted can be financially significant. The advantages of using mature machine learning systems will over time displace the risk and rule engines that do not use this technology even for smaller merchants.

The question of the next decade in fraud prevention is what type or combination of machine learning algorithms are best for different types of merchants, payment service providers and financial institutions. Most of the solutions on the market today are based partly or entirely on supervised learning and classification algorithms. This makes sense as in fraud prevention, as in the rest of the economy, most of the value in machine learning is being driven by supervised learning. However, a much smaller group of technologically advanced solutions already use a combination of supervised and unsupervised learning to predict fraud and uncover anomalies for further examination.

Expect vendors to talk even more about machine learning in the future and be prepared to ask them relevant questions about the pros and cons of the technology they are using.

Tagged with:
Posted in:
Author: Ronen Shnidman

Ronen Shnidman is the Head of Content at the chargeback management solution provider Justt. He has years of experience covering fraud and eCommerce both as a marketer and as a journalist. He was also involved in the establishment of About Fraud.