Solutions overview
Harness the power of your data
Support and investigations
Support services for Ravelin
Online payment fraud
Account security
Policyabuse
Marketplace fraud
3DSecure
Resource Zone
Deep dives on fraud & payments topics
API & developer docs
APIs, glossary, guides, libraries and SDKs
Global Payment Regulation Map
Track PSD2 & more with a full report
Blog
The latest fraud & payments updates
Insights
In-depth guides to fraud, payments & security
About Ravelin
Discover the story about Ravelin
Careers
Join our dynamic team
Customers
Read more about our happy customers
Press
Get the latest Ravelin news
Support & investigations
Accept more payments securely
Protect your customer accounts
Policy abuse
Stop policy abuse to protect your bottom line
Ravelin for marketplace fraud
3D Secure
Ravelin 3DS & SDKs
Resource zone
Global Payment regulation map
Read more about our happy custmomers
Blog / Fraud Analytics
What data does a machine learning engine need to predict chargebacks and fraud?
Share this article:
There is more noise than signal when it comes to machine learning (ML) and its role in fraud detection, or more accurately, fraud prediction. Opinions vary from the deeply cynical to the almost magical and the net result is a great deal of confusion over what it is and is not capable of doing - read more about this here.
For an introduction to ML and how we use it at Ravelin, listen to my colleague Dr. Eddie Bell’s excellent podcast. In this blog post I’ll attempt to tackle one aspect of the ML - what data does it need to be effective, why does it need this data and what does ‘effective’ mean anyway?
To start at the end, effective means accurate and accurate means getting most predictions right most of the time about which orders and customers are likely to result in chargebacks. To make those predictions accurately we need access to a merchant’s data.
I’ll say straight away that in this blog post we’re simplifying matters. Any merchant will have data specific to their own business and that’s great, more data is always better than less. However, there are general truths that we can talk about.
One consequence of magical thinking around ML is that somehow the models will come up with an accurate prediction with minimal inputs. Unfortunately, this is not the case. Ravelin requires a reasonable amount of data to make good predictions and the better the data the better the response. This is managed through the integration process at the start of an engagement where the data is consumed through the API.
Ravelin uses a micro-model architecture, which is lots of little discrete models that, in aggregate, combine to give a prediction.
But for clarity we an bundle them into three categories.
The percentages in the diagram are purely indicative of what a % of a merchant's data would contribute to a prediction and a determination. We can dig into a little more detail on each.
The identity model is everything a merchant can tell us about the customer on their system. From the initial sign-up, email location, device, timestamps -this can be anything up to 100 attributes, but usually much less. You can read the API here.
"Ensemble Machine Learning: It is tempting to see these models as discrete and atomic but important to realise that they are not. These models can predict individually but not as effectively as when they are combined into what is called ensemble models. Combined models are multiples more effective than models working in isolation."
This is the big one. This is everything a customer does, orders, and pays with on the site. For the technically-minded there is more API documentation to explore here.
This is a rich seam for machine learning fraud detection. It is where we find the most variety in data types that are available, but equally where we find the most compelling contributions to fraud prediction accuracy. This can easily reach to 200 or so attributes, and within those attributes, the models can mine 1000s of features.
“Orders versus Customers: An important point to make here is that while in this blog post we’ve focused on a customer-centric view of predictions, it is equally possible to do it with an order-centric view. It uses slightly different models, and a different dashboard, but we’ve found the prediction results to be very good. Customer history however, definitely provides a richer resolution in fraud prediction.
Finally we have the network model. This is especially well-developed in Ravelin as it is something we know adds significant marginal gain when combined with the other models. Here we pull information such as a device ID, location information and quickly map out connections in the data that look highly suspicious. This model is less data-intense as it pulls from other sources. There are also JS snippets available that pull data from your site and app, making it a very straightforward part of the integration process.
The algorithms that underpin Ravelin are built on the experiences of our existing clients. They evolve constantly and we continually update the models for our clients based on the chargeback data we receive, the review feedback from the client’s analysts, and what we are learning across our client base in general.
We also employ investigations analysts to look into specific anomalies or errors and in aggregate their findings can result in model adaptations.
In short, Ravelin is a powerful chargeback prediction engine from the get-go. However for a new client, the engine is highly reliant on the quality and quantity of the data that is fed into it. Where there are gaps, the performance simply is not as good.
Now, data quality is a hard thing to define. No-one wants to admit that their baby is ugly and there are often hard conversations about how and why certain data is missing. This is something that the integrations team is very used to. Equally the detection team at Ravelin is creative about working around gaps and ensuring optimal performance (ergo recommending the optimal block and acceptance rates for a business).
This is an open, productive and valuable process, worth investing the time and energy on getting it right. We believe the integration phase is the bedrock of trust is engaging with Ravelin. It’s essential for a successful relationship that the client trusts the predictions that Ravelin makes.
Learn more about machine learning here.
Gerry Carr, CMO
Fraud prevention is a delicate balance between stopping fraud and maintaining good customer experiences. But what is the most effective way to measure this outcome?
Ravelin Technology, Writer
Blog / Machine Learning
Online payment fraud is one of the biggest threats facing grocery merchants. And it’s only gotten worse. How are fraudsters using the cost of living crisis to take advantage of your business?
There’s a new fraud threat on the rise – and it’s your customers. First-party fraud is infamously tricky to catch and a huge revenue risk. How can you detect and deter criminal behavior in your customer base?