Blog / Machine learning, Ravelin product

ML model building: Here's how Ravelin builds machine learning models

A machine learning model can make predictions. And Ravelin builds models that predict fraud, so how do we do it?

ML model building: Here's how Ravelin builds machine learning models

Today, we're going to examine closely how Ravelin custom-builds AI-native fraud detection for online companies who want to accept more payments with confidence and serve more customers safely.

Let’s imagine we’re building a machine learning model to detect fraud for a food delivery business. Our fictional business is called DeliverDinner.

When DeliverDinner joins Ravelin as a new client, they start to send live transaction traffic to our API.

Every time a customer registers, adds an item to their basket, or does anything on the DeliverDinner website, a JSON request is sent to the Ravelin API. This means we store a lot of data about DeliverDinner customers and everything they’ve ever done in their account. We bundle these into customer profiles.

To use this data for machine learning, we need to do three things:

  1. Label the customers as fraudulent/not fraudulent
  2. Describe the customers in computer language
  3. Train the model

Step 1: Assign labels


We look at any customer who has had a fraudulent chargeback or who has been manually reviewed as fraudulent by the merchant and label them as fraud. But we also label good customers as such. The goal is to have a wealth of examples of what a good customer looks like for this company, and also what a fraudulent customer looks like – be they an opportunist first party, professional cybercriminal or anywhere in between.

Step 2: Create features

Creating features is equivalent to describing each customer in a way that the computer can understand. We are using maths and data for fraud prevention: In other words, we are to using them to describe the characteristics of a customer which indicate if they would be fraudulent or genuine.

This is based on the same aspects that a fraud analyst would look at to make the decision.

Some very simple examples of features which could be good indicators of fraud are:

  • Order rate: Fraudsters order at a much more rapid pace, we quantify this as number of orders per week.
  • Email: Fraudsters might have a dodgy-looking email – for instance, we may quantify this as the percentage of digits in the email address.
  • Delivery location: It could be somewhere typically genuine/unlikely to be fraud like a penthouse apartment, or it could be somewhere fraudulent that implies a "drop location", such as a park. We quantify this as the location fraud rate %.
  • Card velocity: The number of different cards used or attempted to be used by a customer within a reasonable amount of time can also be a fraud signal.

There are, of course, several much more elaborate features – yet the idea remains the same: Each supports the overall calculation of how likely a customer is to be a fraudster or abuser.

All features are created as a number, as the model can’t absorb raw text. We build up our features and categorize them into groups. We call these groups megafamilies – and we surface them on the Dashboard as well, to help our merchants know which aspects of a customer's presence are unusual and might indicate fraud.

Step 3: Train the model


Next, we need to feed the algorithm the data so that it can learn how to solve the problem. At this stage, we feed in the training data.

The training data is a bunch of DeliverDinner data about customers, described in terms of their features and labels to let the algorithm know if they are a fraudster or a genuine customer. This helps the model learn how to tell the difference between genuine and fraudulent customers.

Within DeliverDinner’s dataset, this might show, for instance, that genuine customers tend to order around once a week, they tend to use the same card each time, and that the billing and delivery address are often the same. Fraudsters might show that they order several times a week, use lots of different cards, that their cards have failed registration and that the billing and delivery address don’t often match.

The algorithm will take this at face value, and learn the perfect way to check if a given customer's behaviour and characteristics look more like those in the genuine customer pile or the fraudulent customer pile.

When we show the model a new customer it hasn’t seen before, it compares it with the genuine and fraudulent customers it has seen before and produces a fraud score – a recommendation. Most of the features used to calculate this recommendation are unique to DeliverDinner. However, also taken into account are consortium features, which look at the characteristics of fraudsters across Ravelin's 340+ merchants.

This score represents how likely the new customer is to be fraudulent. On the Ravelin Dashboard, DeliverDinner's fraud analysts can see in detail which families of features contributed to a recommendation, including subfamilies. Below, for instance, the Consortium megafamily contributed 36 of the 98 points, with 33 of those points coming from the Email consortium contributor.

data scoring

Allow, review or prevent – and how do you decide this?

For the majority of DeliverDinner's customers, the fraud score will be quite low, as there are many more genuine customers than fraudsters.

  • When the score is low, Ravelin recommends allowing the customer and the transaction to go through.
  • If it’s a medium score, we recommend a Review of the transaction. For example, sending the customer a 3D Secure challenge to authenticate.
  • If the score is very high we’d recommend blocking the customer from making the transaction.

Setting the right limits for Allow/Review/Prevent thresholds depends on precision/recall, as is customary in machine learning.

Precision asks: Of all the prevented customers, what proportion were fraudsters?

Recall asks: Of all the fraudsters, what proportion did we prevent?

If your prevent threshold is at 95, you’re blocking a very small percentage of customers. You’d have very high precision – you’re only blocking a few customers that you’re fairly sure are fraudsters. This means you'll have a very low false-positive rate. However, recall is likely to be low as there are likely to be fraudsters with scores under 95 which you’re not blocking.

Let's look at the opposite situation. If you have a block threshold of 5, you’re preventing a huge amount of your traffic and so you’re likely to have very poor precision – and probably end up with lots of false positives. You will have high recall, because as you’re going to block most if not all of the fraudsters.

Of course, those are exaggerated numbers – in reality, most fraud managers would not block everyone with a score of over 5, nor would they allow through everyone less than 95.

But there's a balancing act between the two. Where you set your thresholds depends on your individual business priorities. It’s easy to tweak these depending on your risk appetite, current goals, or if you are more concerned about chargebacks or false positives.

Sometimes, fraud managers think about fraud detection in terms of "accuracy". Yet, because AI-native fraud protection such as Ravelin's is based on sophisticated machine learning algorithms, understanding precision, recall and setting risk thresholds is key for to assessing the efficiency and success of ML models, and make sure they are always is improving.

Ravelin builds custom fraud prevention for each of our merchants – which involves several models for each merchant, always improving and ensuring we continue to provide the best possible results.

Ravelin Logo

Let AI-native fraud detection power your growth

Find out how Ravelin leverages artificial intelligence to allow merchants to accept more payments and more customers with confidence.