What’s the difference between artificial intelligence and machine learning?
Artificial intelligence (AI) has been in sci-fi movies for nearly 100 years, in films like RoboCop, The Matrix, Star Wars and The Avengers. In reality, today’s AI is not quite the same as its portrayed in the movies… yet. But there is some truth in that AI is like computers acting with human intelligence.
Machine learning is a subset of AI, and the key difference is the ‘learning’. With machine learning, we are able to give a computer a large amount of information and it can learn how to make decisions about the data, similar to a way that a human does.
Machine learning has many uses in our everyday lives - for example email spam detection, image recognition and product recommendations eg. for Netflix subscribers.
Deep learning is a subset of machine learning. The key advantage deep learning gives is the ability to create flexible models for specific tasks (like fraud detection). With traditional machine learning, we couldn’t create bespoke models as easily - we’ll explain why this is so important later on.
Machine learning is a set of methods and techniques that let computers recognise patterns and trends and generate predictions based on those
Old school fraud detection
Traditionally businesses relied on rules alone to block fraudulent payments. Today, rules are still an important part of the anti-fraud toolkit but in the past, using them on their own also caused some issues.
Using lots of rules tends to result in a high number of false positives - meaning you’re likely to block a lot of genuine customers. For example, high-value orders and orders from high-risk locations are more likely to be fraudulent. But if you enable a rule which blocks all transactions over $500 or every payment from a risky region, you’ll lose out on lots of genuine customers’ business too.
The thresholds for fraudy behaviour can change over time - if your prices change, the average order value can go up, meaning that orders over $500 become the norm, and so rules can become invalid. Rules are also based on absolute yes/no answers, so don’t allow you to adjust the outcome or judge where a payment sits on the risk scale.
Inefficient and hard to scale
Using a rules-only approach means that your library must keep expanding as fraud evolves. This makes the system slower and puts a heavy maintenance burden on your fraud analyst team, demanding increasing numbers of manual reviews. Fraudsters are always working on smarter, faster and more stealthy ways to commit fraud online. Today, criminals use sophisticated methods to steal enhanced customer data and impersonate genuine customers, making it even more difficult for rules based on typical fraud accounts to detect this kind of behaviour.
Rules and machine learning are complementary tools for fraud detection
Although machine learning has delivered a huge upgrade to fraud detection systems, it doesn’t mean you should give up using rules completely. Your anti-fraud strategy should still include some rules where it makes sense, and also incorporate the benefits of machine learning technology.
Why is machine learning suited to fraud detection?
When it comes to fraud decisions, you need results FAST! Research shows that the longer a buyer’s journey takes the less likely they are to complete checkout.
Machine learning is like having several teams of analysts running hundreds of thousands of queries and comparing the outcomes to find the best result - this is all done in real-time and only takes milliseconds.
As well as making real-time decisions, machine learning is assessing individual customer behaviour as it happens. It’s constantly analyzing ‘normal’ customer activity, so when it spots an anomaly it can automatically block or flag a payment for analyst review.
Every online business wants to increase its transaction volume. With a rules only system, increasing amounts of payment and customer data puts more pressure on the rules library to expand. But with machine learning it’s the opposite - the more data the better.
Machine learning systems improve with larger datasets because this gives the system more examples of good and bad eg. genuine and fraudulent customers. This means the model can pick out the differences and similarities between behaviors more quickly and use this to predict fraud in future transactions.
Efficient (and cheap!)
Remember that machine learning is like having several teams running analysis on hundreds of thousands of payments per second. The human cost of this would be immense - the cost of machine learning is just the cost of the servers running.
Machine learning does all the dirty work of data analysis in a fraction of the time it would take for even 100 fraud analysts. Unlike humans, machines can perform repetitive, tedious tasks 24/7 and only need to escalate decisions to a human when specific insight is needed.
In the same way, machine learning can often be more effective than humans at uncovering non-intuitive patterns or subtle trends which might only be obvious to a fraud analyst much later.
Machine learning models are able to learn from patterns of normal behavior. They are very fast to adapt to changes in that normal behaviour and can quickly identify patterns of fraud transactions.
This means that the model can identify suspicious customers even when there hasn’t been a chargeback yet. For example, a neural network can look at suspicious signals such as how many pages a customer browses before making an order, determine whether they are copying and pasting information by resizing their windows and flag the customer for review.
How does a machine learning system work?
We use a few different forms of machine learning at Ravelin - here’s a simple explanation of how a supervised machine learning system works. Listen to this podcast to hear more detail about the process.
How a machine learning system works:
When it comes to fraud detection, the more data the better.
For supervised machine learning, the data must be labelled as good (genuine customers who have never committed fraud) or bad (customers with a chargeback associated with them or have been manually labelled as fraudsters).
Features describe customer behaviour, and fraudulent behaviours are known as fraud signals.
At Ravelin, we group features into five main categories, each of which has hundreds or thousands of individual features:
Number of digits in the customer’s email address, age of their account, number of devices customer was seen on, fraud rate of customer's IP address.
Number of orders they made in their first week, number of failed transactions, average order value, risky basket contents.
Fraud rate of issuing bank, similarity between customer name and billing name, cards from different countries.
Shipping address matches the billing address, shipping country matches country of customer's IP address, fraud rate at customer’s location.
Number of emails, phone numbers or payment methods shared within a network, age of the customer’s network.
An algorithm is a set of rules to be followed when solving complex problems, like a mathematical equation or even a recipe. The algorithm uses customer data described by our features to learn how to make predictions eg. fraud/not fraud.
In the beginning, we’ll train the algorithm on an online seller’s own historical data, we call this a training set. The more fraud in this training set the better, so that the machine has lots of examples to learn from.
Create a model
When training is complete you have a model specific to your business, which can detect fraud in milliseconds.
We constantly keep an eye on the model to make sure it is behaving as it should, and we’re always looking for ways to improve it. We regularly improve, update and upload a new model for every client so that the system will always detect the latest fraud techniques.
Deep neural networks and a micro-model architecture
At Ravelin, We use deep neural networks - a neural network is a machine learning model architecture loosely inspired by the structure of biological brains. A neural network mimics how the human brain observes patterns. The key benefit of a neural network is the ability to create a flexible, bespoke model for an individual business which is based on its own fraud, which gives greater accuracy.
Neural networks have deep layers of machine models
We use a mixed-data approach for our neural network architecture. This means we create separate neural networks that focus on different aspects about the customer: their behaviour (eg. the average number of orders per week), the natural language associated with them (eg. the order items, email address), the anomalousness of their activity (eg. whether they proceed through checkout 20x faster than the average customer) or any image data associated with them (eg. profile picture).
We combine the layers in these networks into a singular model. This final neural network is trained to learn which aspects about the individual business’s customers are most important for detecting fraud.
How can you tell the model is working?
After the training, to check that the model is working correctly, we show the model some data which it has never seen before, but which we know the fraud outcomes for. If the model detects the fraud correctly, we can deploy it to be used against the online business’ transactions. We also do some automatic common-sense analysis on recent date for which we do not have fraud labels to ensure the model will behave correctly when it is deployed.
There are certain fraudy situations which the model should always pick up on - some examples are:
- High velocity of new payment methods eg. a customer adds new 10 payment cards in an hour
- Suspicious email address eg. a mismatch between the account name or name on the card, or rude/naughty words in the email
- A customer placing lots of orders of high value good eg. luxury alcohol
- Orders from a particularly fraudy location, shipping to a known fraud hotspot or a PO Box rather than a residential address
All of these examples should be flagged as fraudy - so what happens when the machine makes a prediction?
Using machine learning to generate a fraud risk score
At the point of the transaction, the model gives each customer a risk score on the scale of 1-100. The higher the score, the higher the probability of fraud.
You can choose what level of risk is right for your business, and set thresholds for what proportion of transactions you want to allow, block and manually review or challenge using 3D Secure.
Setting the right risk threshold for your business
The next step is to ask yourself, where on the scale is the right risk threshold for my business?
Threshold analysis - precision and recall
Determining the right risk threshold involves doing data analysis based on the principles of precision and recall. It’s a complicated balancing act between:
- True positives (how many fraudsters we block)
- False positives (how many good people we block)
- False negatives (how many fraudsters we allow)
Scale is very important here. For context, the typical acceptance rate for a Ravelin client is usually higher than 98 or 99%, so almost all transactions are approved. It’s within the small band of rejected transactions where the optimization occurs.
Risk analysis asks how close to 100% acceptance you can get without the cost of fraud becoming too high
The right level of risk is individual to each business. A business with a high volume of low-value transactions (eg. food delivery) may set the risk threshold very high, so that they can ensure they are blocking the least possible amount of genuine transactions.
Our investigation analysts are experts in these calculations and can help you find the right thresholds to suit your risk appetite.
Does your business need its own machine learning model?
It’s always best to use your own customer data for your business as it will be the most accurate at detecting fraud within your future customers. Different business models can have very different customer order cycles and amounts - for example it could be normal for someone to order from a food delivery business every day, whereas this would be very unusual for online clothes sales.
There are also huge variations in other aspects, for example it might take customers only a few minutes to order from a ticketing site, but a taxi app order can take as long as the ride lasts.
Unlike other fraud providers, we build a 100% personalised model for each of our clients, so predictions will be based on fraud signals in their customer base alone. This stops the model being swayed by patterns in unrelated industries, creating more specific predictions and better performance.
What if you don’t have enough data to train your own model?
There’s always the chance that an online business might not have enough data to train their own model right away. A business might have a very low sales volume, mainly sell through affiliates, or sometimes the logging simply hasn’t been set up to collect the data in the right format.
It’s no problem if you don’t have enough data to train your own model right away. To get your business up and running quickly, we’ll use a generic model based on historical fraud patterns we’ve seen before. We don’t share any of the customer data between businesses, but we can re-use the algorithms we’ve already trained. This makes it easy for us to pick the right components off the shelf and it means you can start using a model to detect fraud in just one week.
Because we use this micro-model approach, we can pick and choose the ones which are most suitable for your individual business to make up a semi-customised larger model. As soon as the model starts working on your data it will begin to adapt and tailor to your customer base, and therefore become more effective. The model improves as we give it more data, chargebacks and manual reviews.
Why it’s important to use historical data and not just recent data
We’ve found that within a month we have chargebacks for around 30% of fraud - that means up to 70% of fraud hasn't been recorded yet. This means that if we used only the most recent data, the model wouldn't be able to distinguish the hidden fraudsters (who haven’t made a chargeback yet) from the rest of recent genuine customers. Listen to Eddie Bell, our Head of Machine Learning, talking about this more in this webinar.
Understanding the results - looking inside the black box
Machine learning is often called a black box as you can’t really inspect how it’s doing what it’s doing.
Although it’s difficult to inspect everything the model does, we can get an understanding of how it works through testing cause and effect. We make subtle, controlled changes to the data we feed into the model and measure the output so we can tell what data the model favoured when it made the prediction, eg. did it prioritise suspicious email addresses or high transaction values. Every single customer’s prediction is instantly explained in full on the dashboard.
How human insight complements machine learning
When used successfully, machine learning removes heavy burden of data analysis from your fraud detection team. The results help the team with investigation, insights and reporting. Machine learning doesn’t replace the fraud analyst team, but gives them the ability to reduce the time spent on manual reviews and data analysis. This means analysts can focus on the most urgent cases and assess alerts faster with more accuracy, and also reduce the number of genuine customers declined.
Machine learning makes the role of a fraud analyst more efficient, as their time is freed up to do more strategic work. Analysts improve and optimise machine learning fraud detection systems through reviewing and labelling customers and tuning the rules. Machines are exceptionally good at doing the heavy lifting in data analysis, number crunching and output. They work tirelessly through the night and never complain aboutt working weekends.
Machines are less good at dealing with uncertainty. There are cases that are new, or that are difficult, or somehow different. Edge cases are those that require more attention and may be difficult to determine - this is where the human insight comes in and provides massive value.
The expert human intervention here is not just at the point approving a transaction. It’s more a case of analysis after the event and labelling the data in a way that gives rapid feedback to a machine. Remember, labelled data is the ultimate training set for a machine. So the more confirmed behaviour labels it can receive the more accurate a result there is likely to be.
During live fraud events, fraud analysts can use the manual reviews to let us know when an attack is happening. Human insight is key to stop fraud attacks and limit the negative impact.
Tackling other fraud issues with machine learning
Machine learning adapts very quickly to any changes in normal behavior patterns. This means they are excellent at detecting situations like account takeover - when a genuine customer’s account is hacked and used to commit fraud. A customer might have a good payment and transaction history and then suddenly start acting out of character. Machine learning would pick up and flag to the fraud analyst to review the customer quickly.
Unpaid cash payments
Some online businesses give customers the choice to pay cash on delivery - eg. fast food delivery. Sometimes these types of orders are made and left unpaid for a number of reasons - it could be a prank, a mistake or someone could have fallen asleep. Even though there’s no payment methods involved, the machine can learn from the customers labelled as unpaid cash payment to detect and prevent these orders in the future.
Using machine learning with graph networks
Machine learning is great at looking at patterns in individual customer behavior and quickly alerting customers when things change. It becomes even more powerful when you pair it with a graph network and link analysis techniques.
Graph networks allow you to join the dots between customers and uncover connections by device, address, payment methods etc. Learn more about using link analysis and graph databases for fraud detection here.