Blog / Fraud analytics, Machine learning

How we select the right business data for feature engineering

Every business has a lot of data, but not all of it is relevant for fraud. Here's how we select specific data "features" to analyze and get an indication of fraud.

How we select the right business data for feature engineering

First, what is a feature and how is it engineered?

At a basic level, a feature is an individual measurable property or characteristic, such as the cost of a transaction. Feature engineering is the process of extracting these meaningful characteristics to use as learning material for the algorithm.

Building features

We look for features to capture certain aspects that help us predict fraud. We group the types of features into the below categories.

Icons delete receipt

Traditional features

These are the typical aspects that predict fraud, for example orders, transactions, cards, location, email. These features generally cover the data you would expect to find on your receipt and are customer-centric.

Icons customer insights manager

Behavioral features

We derive behavioral features from the customer session - these are features are based on describing the customer actions eg. velocity of orders, time spent on the page, length of time between adding a new card and making an order. One purpose of extracting these features is to capture other subversive technology use eg. if a fraudster is using a script to scrape a webpage vs normal browsing activity.

Icons geography

Real-time features

Real-time features are based on the up to date, real-world incidences of fraud. These features are all based on categorical data - give the real-time rate of fraud by category eg. country / ASN card digits / email domain etc. An example feature could be the fraud rate in certain regions/countries.

One purpose of these features is to help merchants to expand into new markets where they have no existing data. We monitor the real-time traffic to help our merchants seamlessly move into new markets, without seeing any adverse effects from the machine learning models eg. bias.

Icons customer

Individual customer features

These features tell us about the similarity with the specific customer’s typical past behavior. This could be their typical spend, their regular billing address, home IP address etc.

Icons mouse

Session tracking features

These features are a little more involved than the behavioral features. These features cover the data we get from Javascript eg. whether the customer is pasting a card number into the checkout, cookies, if they are using a password vault etc. One purpose of these features is to capture genuine customer behavior eg. taking time to change the size of a piece of clothing.

Icons house

Entity features

We divide features into customer-centric and entity-centric. Entities are things like devices, addresses, locations, domains and emails. An example feature is the number of orders shipped to a certain address. One purpose of these features is to alert us to a fraud goods drop-off point.

Icons network

Network-derived features

As well as customer-centric and entity-centric features, we also look for network level features. These features focus on network topology (network shape) as a means of enhancing our customer data. An example is account sharing between a family in the same house vs. account takeover where networks of hundreds of accounts use the same few devices.

To learn more about how we perform feature engineering and structure our machine learning models at Ravelin, download the guide here.

Related content