Important Questions to Ask During a Machine Learning Sales Pitch
To excel in fraud prevention requires approaching everything with skepticism. If you’re a top-notch fraud analyst, there’s a good chance you read contracts before signing. You might be a person who reviews restaurant receipts before paying. You take the descriptor “nitpicky” as a compliment.
A healthy dose of skepticism and attention to detail is necessary both to catch fraudsters, but also when evaluating fraud solution vendors’ claims. Machine learning will revolutionize fraud prevention — by spotting fake insurance claims, blocking account takeover, and on and on — but getting the right machine learning solution for your company requires the same approach you take with the rest of your work.
If you endorse a provider’s solution internally, you want to be sure that it will perform correctly. Otherwise, it could be your job on the line. How can you contribute to the search for the best machine learning model for your organization’s needs?
“You will want to test prospective models, but not just to assess performance,” says Frank McKenna, Chief Fraud Strategist at PointPredictive, which provides a machine learning fraud prevention solution to auto loan lenders. “You need to build a business case for the investment. It’s hard to do that if you don’t know how it works.”
McKenna says each prospective client should evaluate how different machine learning models will integrate with their organization’s fraud stack in terms of timing, business processes and other tools. In addition, while machine learning models are really good at predicting risk, some are better than others at explaining how they do so. You have to dig into each model’s decision-making process. Many machine learning models operate as “black boxes.” They fail to explain how they reach their decisions, a potential problem at a time when expectations for transparency are rising around the world.
Let’s burrow further into these criteria for choosing and trusting a machine learning model for online fraud prevention.
Testing in retrospect or in tandem?
Before you concern yourself with a model’s ability to explain its decisions, or consider the effort involved in integration, you want assurance that it will work. For supervised learning models —the vast majority of your options — convention dictates that you test the model on old training data containing instances of confirmed fraud.
Providing a large enough data set may be a challenge for some online merchants, according to Ryan Knauber, a data scientist at Viztric, a business analytics consulting firm. “Let’s say as an example that there’s something special about a merchant’s fraud cases occurring in the winter in Belgium,” says Knauber. “Fraud is relatively rare. It’s going to be even more rare that you find fraud under those circumstances.”
Featurespace CTO and co-founder Dave Excell agrees with Knauber. “The performance of machine learning solutions depends on consistently high data quality, otherwise the results won’t be as successful as anticipated,” Excell says. “It’s important to ensure that the historic data used to construct the machine learning model is also representative of the data that is in production.”
Frequently, historical data isn’t properly labeled for a prospective model. That can be a problem. You have to ask yourself whether it’s worth the significant effort to label historical data just to test a model.
Ravelin CMO Gerry Carr offers a compromise: run the prospective solution in parallel with your current fraud prevention solution. “Let the model make its decisions, then compare those against the ones made using your current process,” says Carr. “It takes no more time than a retrospective integration, and it gives more useful results.”
Why are the results more useful? They’re more current. Fraud tactics change quickly. If you’re testing a solution with three-month-old data — the time it might take you you to finish formatting historical data — then your fraud landscape might have already begun to change.
This approach will spare you from formatting a mountain of historical data. Save that energy for integration.
Integration complexity before touching the API
Machine learning is fundamentally different from a rules-based engine. Shouldn’t its integration differ, too?
Not really, according Nethone Chief Product Officer Aleksander Kijek. “Basically, integrating a static rule-based solution is the same as integrating a machine learning engine,” says Kijek. “Of course the machine learning engine can leverage far more data in a quicker manner. Greater complexity will arise from the possibility of adding new data providers for better predictions.”
Once you get past the issue of providing accurately labeled historical data, the next issue is the solution’s application programming interface (API). Is it robust and well-documented? Does the vendor have much experience helping clients with the integration process?
What’s typically missing from a review of an API, says Ravelin’s Carr, is insight into how the vendor will handle your data. What does their data engineering setup look like? How do they extract features (e.g. order values, currencies, card types, etc.)? How much interaction is there between the provider and the merchant in order to make sure everything is correct?
You should balance the solution’s ability to engage in wholesale data collection with a holistic view of the needs of your business, according to Nethone’s Kijek. “For example, usually the online transaction process is automated, but chargeback resolution is not,” says Kijek. “Making the system wait for batch updates of chargebacks can restrict timely reactions, degrade the level of feedback, and lower the model’s accuracy.”
Even in the same business, differences between products might require entirely different algorithms with their own uniquely tuned parameters. For example, credit card fraud would require completely different treatment compared with fraud in mortgage applications or auto loans.
“It’s critical that the provider work to deeply understand your business,” says Viztric’s Knauber. “Ask providers how they will tailor their models to your unique business. An on-site integration engineer with PhDs in Statistics and Computer Engineering will be useless without a deep understanding of the context around the problem to be solved.”
Merchants and financial institutions still rely on insights developed by humans to prevent fraud. However, there will come a time when we’ll have to trust automated machines to deliver insights both better and faster than humanly possible. How can we become comfortable with a solution’s decision-making process when it’s inherently opaque?
“Even if somebody can give you a reasonable-sounding explanation [for his or her actions], it probably is incomplete, and the same could very well be true for artificial intelligence,” Jeff Clune, an assistant professor at the University of Wyoming, said in a interview with MIT’s Technology Review in 2017. “It might just be part of the nature of intelligence that only part of it is exposed to rational explanation. Some of it is just instinctual, or subconscious, or inscrutable.”
Machine learning solutions in the fraud prevention space are no exception.
“There isn’t a simple way to explain a machine’s decision in a way that a human could understand,” says Ravelin’s Carr. “If you wanted complete explainability, you’d sacrifice many of the techniques and processes that make machine learning so valuable. You’d go back to using rules.”
To give humans a glimpse into the black box, some models are designed to surface the driving factors and sub-factors underlying their decisions. For example, a decision may be based upon the user’s network, identity, payment method and velocity. Each of those factors may be influenced by multiple sub-factors.
“We surface factors as a way to get an explanation of what contributed to a score,” says Carr. “If the customer’s network was the main contributing factor, then the analyst can drill in and see the data that led the machine to that decision. Likewise, if multiple payment instruments were used in a short period of time, then the model should surface that as part of its explanation.”
The goal is to build trust in the machine’s decision. While a human analyst can’t hope to understand how the model weighs all of the hundreds of relationships between contributing factors, the model’s reasoning should correlate to the sum of the human’s experience, intuition and sense for probability.
However, in comparing two machine learning models’ performance, it may not be easy to decide which model is more deserving of trust.
“Beware the illusion of perfection in your comparison,” warns Carr. “There might not be a very clear delta. You can get comparability of models’ decisions, but that’s just one deciding factor.”
“Some types of predictions can be better than others,” adds Knauber. “A credit card fraud model might be more accurate in finding fraud online than in the analog world. Performance could be split along other types of similar categories.”
If there’s no clear winner, then how can you determine which model is better for your organization?
Machine learning choices require more than data
Nethone’s Kijek circles back to decidedly more soft, human factors. “The client is mostly interested in performance [and] machine learning is just a tool,” he says. “You can hire a data scientist to check if the provider’s model performs well, but…it’s more important to be comfortable with your vendor and their ability to take you through their model’s decisioning.”
And here we are back at “wetware,” your forte. Be prepared to ask nitpicky questions (in addition to those above, you’ll find more here). Get the answers you need until you are certain your choice is the best for your use case and vertical.