Ravelin Connect is the graph network element of Ravelin that we use to identify significant connections in our clients’ customer bases. The images are often beautiful, so much so that we frame them for new clients to hang on their walls.
But these graph networks are not art; they are science. And as we have pushed and pushed at the applications and use cases for this approach to fraud detection we have realised that we have built something not just unique but uniquely powerful in the world of fraud detection. Powerful not just on its own which it certainly can be, but also powerful in combination with Ravelin’s core Machine Learning product.
Ravelin Connect: a deeper look
Ravelin is a prediction engine that uses machine learning at its core to score transactions on their likelihood to be fraudulent. This works extremely well for scanning transactions that are actively progressing through a payment flow, comparing signals to previous experiences of fraud to produce a score.
But early on in our product development we realised that this only provides a partial picture of the fraud risk for a merchant. The network that a fraudster creates is just as relevant to making a fraud assessment as who they are or what they do or how they try to pay.
From the outset we determined that we would need a way that allowed us both to consume the network information provided by our clients and present it back to them in a way that made sense.
So we invested in a graph network tool to test this out. Initially we anticipated using an open source tool or buying something off the shelf but quickly realised that latency, flexibility and other issues meant it would be more effective for us to build it ourselves. Doing this required some brilliant engineering, design and coding; something that the Ravelin Connect project has continuously benefited from over its lifecycle. Hence Ravelin Connect was born and we trialled with some early customers.
The result? They loved it. Passionately. All the suspected ill behaviour, fake accounts, and fraud rings were suddenly visible in rich technicolour. Whole networks of suspicious accounts could be closed in an instant. With our clients we located and ended accounts responsible for millions of dollars of fraud.
But this was only the beginning.
Off-the-shelf graph networks will impressively but passively map connections in a database. But without any underlying data on the importance of these connections there is a priori problem - basically an analyst needs to know what they are looking for before they can find it.
This is actually still useful. Imagine the use case where an analyst is reviewing suspicious transactions which were approved but Ravelin had nominated to be reviewed. Take David Riley in this example.
We can take a look at David’s profile and see a lot of contributors to his relatively high score of 64. In this case the Network contribution is 18. We’ll come back to what a network contribution to an ML score means a little later.
For now though let’s take a closer look at David’s network.
Oh dear. To use the technical terminology we use in Ravelin, David is well at it. If we zoom closer we can see that David is sharing a phone number with “AR”. This is highly suspicious as AR has a chargeback so far.
So in short, the analyst through a review prompt based on score has uncovered a small network of fraudsters (this might very likely all be one person creating multiple accounts from a single device). However, what if the analyst had not been prompted by the suspicious score created by David Riley’s activity? How is fraud then discovered?
So here’s where we start to move from Ravelin’s ML models suggesting fraudsters and highlighting their networks, to promoting networks that are likely to contain bad actors.
A quick diversion before we go further. For simplicity we are considering fraudsters here to be accounts created to use stolen third party credentials. As we will see, Ravelin Connect can be used for a number of purposes better described as account security.
Let’s consider the NETWORKS features first. We can look at both the fastest growing networks or the largest networks. These are of interest because let’s remember what these networks are. They are grown by virtue of some shared attribute be it a card, a device, a phone number or an email. There has to be a reason for those connections. Eliminating any data issue (usually caught during the integration phase) means we are left with usually suspicious reasons for the connections.
Let’s look at the network with 155 nodes for example.
Drilling in here we can see a suspicious network grow. Initially we see in this case a device with a single user. But as we expand out we see more connections as we stretch out from the core user. We soon realise that there are 59 people connected within this network. Note that no-one in this network has been reviewed as a fraudster, nor has any chargeback been associated with this network. So it is unlikely to have been prompted for investigation through an ML score. But it is very large, and highly suspicious. Why are so many people connected to a single device? We’ll return later to a scenario that might be indicated by a network like that.
Searching on suspicious entities
So we searched on a network that was promoted either by its sheer size or the velocity with which it was growing. But we are also likely to be interested in the nodes or entities themselves. In this case we are referring to the phone numbers, email addresses, devices, customers, or cards that might have unusually high counts in the network.
Let’s take a look at Cards. I click into the network where I see two cards. I again uncover a suspicious network. Why would two cards be significant? Card details that are compromised in a data hack are sold multiple times on the darknet. Therefore they can be used unwittingly by different fraudsters who find themselves connected in a network, despite having no other connections. This might be what this network is showing us.
Confidence levels: assessing the power of connections
So how sure can we be sure that members of a network are indeed - at it?
The numbers of hops between nodes can be an indicator of the likelihood of an account being fraudulent. For instance if AC shares a device with BD, and BD is a confirmed fraudster then you can be fairly certain that AC is too. But what about the person who shares a card with AC but has no connection to BD directly? In our experience just being in this network is not a good sign, but clients clearly get nervous with the idea of blocking large groups of users who have not, technically, done anything wrong yet.
However investigations into these networks are very valuable. The outcome of those investigation is usually that even a seemingly large number of ‘hops’ still means the account is nefarious.
Networks as a feature: contributing to the Machine Learning score.
So we have been discussing largely the deterministic use of networks. That is to say, we determine by membership of a network that the user is a fraudster. But how do we do this probabilistically - that is, how do we determine that some network properties probably indicate a level of fraud risk. The great advantage of doing it this way is that the probabilistic attributes can be calculated numerically - i.e. it can be fed into an algorithm. Perhaps an example will help.
David has a score of 64. If you look closely 18 of that score was contributed by ‘Network’. Logically you might think that this is because of David’s specific network which we looked at earlier. But investigating a network as we did is difficult for a machine to do. What’s easy for a machine is to look at what a network looks like and how that compares to historical networks.
So David’s network has a bunch of nodes, some chargebacks, and some other properties that look like previously fraudulent networks to some degree. Not enormously, it only scores 18 so unlikely to be enough on its own to prevent a transaction. But as a contribution to an overall score it is significant - accounting for 28% of the score in this particular example. This ability to extract network ML features from a graph network is unique to Ravelin and in many scenarios - uniquely powerful.
What’s more, our Connect API offers the ability to extract those network features to feed into your own tools, rule systems or models if you wish. Passing in a customer’s details, you can retrieve features such as the number of hops to a chargeback or reviewed fraudster, the number of each type of node, and the count of connections each node type has.
We will explore the use of Ravelin Connect in some more specific use cases with which we help clients in some future posts. It really is a powerful means to visualise and so bring to life the stories that your data is there to tell you. Perhaps the most exciting part is that as our clients get to explore the connections in data that are often relevant but hard to see without a tool like Ravelin Connect. It generates more user-generated feature requests than anything else in the Ravelin product suite, which tells its own tale of how analysts are defining for themselves the boundaries of this product. Dive in.