Blog / machine learning
How we select the right business data for 'feature engineering'
Every business has a lot of data, but not all of it is relevant for fraud. Here's how we select specific data 'features' to analyze and get an indication of fraud.
First, what is a feature and how is it engineered?
At a basic level, a feature is an individual measurable property or characteristic, such as the cost of a transaction. Feature engineering is the process of extracting these meaningful characteristics to use as learning material for the algorithm.
We look for features to capture certain aspects that help us predict fraud. We group the types of features into the below categories.
These are the typical aspects that predict fraud, for example orders, transactions, cards, location, email. These features generally cover the data you would expect to find on your receipt and are customer-centric.
We derive behavioral features from the customer session - these are features are based on describing the customer actions eg. velocity of orders, time spent on the page, length of time between adding a new card and making an order. One purpose of extracting these features is to capture other subversive technology use eg. if a fraudster is using a script to scrape a webpage vs normal browsing activity.
Real-time features are based on the up to date, real-world incidences of fraud. These features are all based on categorical data - give the real-time rate of fraud by category eg. country / ASN card digits / email domain etc. An example feature could be the fraud rate in certain regions/countries.
One purpose of these features is to help merchants to expand into new markets where they have no existing data. We monitor the real-time traffic to help our merchants seamlessly move into new markets, without seeing any adverse effects from the machine learning models eg. bias.
Individual customer features
These features tell us about the similarity with the specific customer’s typical past behavior. This could be their typical spend, their regular billing address, home IP address etc.
Session tracking features
We divide features into customer-centric and entity-centric. Entities are things like devices, addresses, locations, domains and emails. An example feature is the number of orders shipped to a certain address. One purpose of these features is to alert us to a fraud goods drop-off point.
Network derived features
As well as customer-centric and entity-centric features, we also look for network level features. These features focus on network topology (network shape) as a means of enhancing our customer data. An example is account sharing between a family in the same house vs. account takeover where networks of hundreds of accounts use the same few devices.
To learn more about how we perform feature engineering and structure our machine learning models at Ravelin download the guide here.