Harness the power of your data
Support and investigations
Support services for Ravelin
Online payment fraud
Deep dives on fraud & payments topics
API & developer docs
APIs, glossary, guides, libraries and SDKs
Global Payment Regulation Map
Track PSD2 & more with a full report
The latest fraud & payments updates
In-depth guides to fraud, payments &
Discover the story about Ravelin
Join our dynamic team
Read more about our happy customers
Get the latest Ravelin news
Support & investigations
Accept more payments securely
Protect your customer accounts
Stop policy abuse to protect your bottom line
Ravelin for marketplace fraud
Ravelin 3DS & SDKs
Global Payment regulation map
In-depth guides to fraud, payments & security
Read more about our happy custmomers
Machine learning for fraud detection and prevention is accurate, efficient and fast. But most models can only handle numeric inputs. Ravelin Machine Learning Engineer, Shayan Sadeghieh and Data Scientist, Antons Tocilins-Ruberts, explain the importance of NLP and text signals in catching fraud…
Share this article:
Machine learning systems are perfect for the dynamic and fast-paced nature of fraud. They can make real-time decisions and assess customer behavior as it happens. All you have to do is feed the model as much customer and transaction data as possible.
The only snag is that machine learning models can only understand numbers. For example, order value or size. But, as we know, fraud signals don't always appear in numeric form. Text information, like delivery notes or item descriptions, can be key indicators of fraud.
So if your model is only trained on examples of numeric fraud, you’re missing out on valuable information. How can you ensure that your machine learning model catches all fraudulent behavior?
A machine learning model is able to spot strange customer activity, which it then automatically blocks or flags for analyst review. But how does it know what ‘normal’ customer behavior is to begin with?
Machine learning models go through training cycles. During these cycles, the model is fed examples of what genuine and fraudulent behavior looks like for your customers. The more examples it receives, the better it becomes at telling the difference. And ultimately make accurate predictions.
When your customer is completing their checkout, we calculate thousands of features about that customer. These features can be broken down into: identity, orders, payment information, location and network. This information is fed into your model to produce a risk score on a scale of 1-100. The higher the score, the higher the probability of fraud. This is extremely effective but only to a certain point.
Let’s use the example of a large online marketplace. Marketplaces allow customers and sellers to create custom names and descriptions, so they have a lot of text data. And these free-text fields carry a lot of unique fraud signals.
First of all, some items are just more likely to be fraudulent because they're popular and high value. So the item name is a valuable indicator.
Secondly, specific feature listed in the item’s name or description can raise flags. For instance, if the description for an iPhone says that it’s jailbroken.
Finally, the overall quality of text can often point to fraudulent suppliers. Typos, short sentences, suspicious links - all of these could suggest fraud.
We want to ensure that fraudulent behavior in all forms is picked up by your machine learning model. And overlooking text data limits its performance and capacity to do so. We need to be able to feed all of this data into the machine learning model during training and in a production environment. But how?
The solution and challenge is converting these text signals into numeric form. This is where Natural Language Processing (NLP) comes into play.
NLP is a branch of artificial intelligence that works to give computers the ability to understand written text or spoken language. In our case, we send text fields to an NLP model during the feature extraction process. The NLP model returns numbers that represent those text fields.
Those numerically encoded text features can then be fed into your CNP model along with other features to get a recommendation. The process is illustrated below.
Under the hood, our NLP models use state-of-the-art embedding techniques. Word embeddings are number representations of text that encode the meaning of the words.
It does this by grouping similar words closer together and dissimilar words further apart. This allows us to take things like context and word ordering into consideration – factors a simple model might miss.
Let’s imagine that our NLP model has learned that the two most important features of an item are its price and the popularity of the item category. The item embeddings might look something like this:
Using this two-dimensional embedding method, we’re able to easily separate the most fraudulent items. In this case, iPhones and Sneakers. Of course, real-life embeddings are more complicated and have higher dimensionality. But the motivation is the same. Using these embeddings we’re able to meaningfully encode text and then use it in our models.
For our gaming merchants, we can now distinguish between harder-to-sell items and those that are easier to shift. For example, prepaid cards raise a bigger red flag than game activation codes because they’re easier to sell or cash out. So fraudsters are big fans.
For our retail merchants, we can factor in the popularity of an item. We’ve found this to be quite an important signal for fraud.
For food delivery merchants, fraudsters love to order expensive alcohol and junk food (who knew!). So item names are incredibly useful signals.
Across industries, discount and shipping type information is frequently provided to us in the text fields. Now we can efficiently use this information to catch bad actors.
There are, of course, challenges when introducing an extra model into your fraud detection solution. But Ravelin has the necessary infrastructure in place to handle multiple parallel calls to different models. So latency isn’t an issue and we’re still able to support fast and frictionless predictions.
Fraud detection is an ever-evolving field and we’re constantly improving our models. NLP massively expands the capabilities and effectiveness of machine learning. But the work doesn’t stop there. Adding new languages and increasing the number of text fields it can process are just a couple developments on the horizon. Fraudsters are smart and adaptable, so staying one step ahead is not enough!
Learn more about machine learning for fraud detection.
Antons Tocilins-Ruberts, Data Scientist
Machine Learning Engineer
Blog / PSD2
How we’ve balanced fraud risk and friction: Deliveroo's journey with rule experimentation to reduce 3DS use by 40%...
Jack Dai, Data Scientist at Deliveroo
Blog / News
E-commerce CFOs need to understand the scale of the fraud risk that their businesses face. Our survey dives deep on where smart CFOs are directing investment to keep their companies secure...
Gerry Carr, CMO
Blog / Account Takeover
Two-factor authentication (2FA) is a widely used security measure designed to prevent account takeover (ATO). But there are very real gaps and limitations in its effectiveness that fraudsters can exploit...
Clayton Black, Product Manager
Subscribe to our newsletter to get the latest fraud & payments updates
sent direct to your inbox.