Personalize your customer journey with the RFM segmentation, Part II

A managerial approach

By Housni Hassani in Marketing Business Analytics Customer Insights

April 21, 2022

This is the Part II of a two-part series named: “Personalise your customer Journey With RFM segmentation”.

An online retail wants to better understand their customers in order to perform targeted marketing actions. In others words, the eRetail business wants to know which customers are active and which ones are moving from active to inactive. Knowing that will help the online retail to create targeted marketing to keep customers active. To solve this business challenge, we will use a grouping technique called customer segmentation, and group customers by their purchase behaviors.

Customer segmentation is the process of dividing a group customers into subgroups using specific elements e.g. demographics, geographic, psychographic, etc. There are multitude of way to perform a customer segmentation. In this two-part series, we will focus on a popular, easy to use and effective segmentation method called RFM segmentation

The first article was focused on how to implement an RFM segmentation in R using a statistical method (hierarchical clustering). In this part II, we will focus on how to perform an RFM segmentation using a managerial approach.

How should we split the customers?

There are infinite criteria that can be used to segment customers . However, there is a popular, an effective, and an easy-to-use segmentation method named RFM segmentation that use customers purchase behavior to group customer. In the RFM segmentation:

  • R stands for Recency. When was the last time the customer - purchased something?
  • F stand for Frequency. What is the number of total purchases made by the customer?
  • M stands for Monetary. What is the average amount spent when purchasing something?

It’s an excellent tool for marketers to target specific customers with communications and actions that are relevant for their behavior. Most specifically, it can help companies to manage wisely their budget,to increase revenue and profit by increasing sales, to better understand their customers, to gain competitive advantage against competitors, etc.

How can we perform the RFM segmentation in R ?

About the data

In this article, we will use the online store data set from the UCL repository available. Here. The data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.The company mainly sells unique all-occasion gift-ware. Many customers of the company are wholesalers."

We have done all the data exploration in the Part I. Please refer to it for further informations.

Data loading

# install and Load the relevant packages
if(!require("pacman")) install.packages("pacman")
p_load(readxl,tidyverse,kableExtra)
# Data loading 
o.data <- read.csv("https://raw.githubusercontent.com/hhousni/rfm_analysis/main/Online%20Retail.csv")
# Data prep
data <- o.data %>% 
  select(CustomerID, Quantity, UnitPrice, InvoiceDate) %>%
  mutate(InvoiceDate=as.Date(InvoiceDate, "%d/%m/%Y")) %>%
  drop_na(CustomerID) %>% 
  filter(Quantity > 0, UnitPrice > 0 )

# create the RFM indicators 
customersQ42011 <- data%>% 
  mutate (purchase_amount = (Quantity * UnitPrice)) %>% 
  arrange(CustomerID) %>%
  group_by(CustomerID,InvoiceDate) %>%
  summarise(Sales = sum(purchase_amount)) %>%
  mutate(datediff =as.numeric(as.Date("2011-12-10") - InvoiceDate)) %>% 
  group_by(CustomerID) %>%
  summarise(monetary  = mean(Sales),
            recency   = min(datediff),
            frequency = n(),
            fisrt_purchase = max(datediff))

head(customersQ42011)
## # A tibble: 6 x 5
##   CustomerID monetary recency frequency fisrt_purchase
##        <int>    <dbl>   <dbl>     <int>          <dbl>
## 1      12346   77184.     326         1            326
## 2      12347     616.       3         7            368
## 3      12348     449.      76         4            359
## 4      12349    1758.      19         1             19
## 5      12350     334.     311         1            311
## 6      12352     358.      37         7            297

We have created and stored the RFM indicators and stored them in a new data frame called customerQ42011 which going to be used to proceed the segmentation.

Customer Segmentation

The managerial method for segmentation requires marketers to set rules in order to split their customers. In this article, we choose firstly to split the online store customers by the recency indicator. In other words, we choose to slice customers by when was the last time the customer made a purchase. Then, w e choose secondly to create subgroups by adding the monetary value. We choose finally to add the first purchase date to create the new active subgroup.

We assume that this is a quarterly analysis, taking place one day after the last purchasing date i.e 2011-12-10. We consider that a customer is:

  • New active, if he has made his first purchase in the last 30 days.
  • Active high value, if he has made at least one purchase in the last 70 days and his average order value is equal or more than £458
  • Active low value, if he has made at least one purchase in the last 70 days and his average order value is less than £458
  • Warm high value, if he made at least one purchase between 70 days and 162 days and his average order value was equal or more than £458.
  • Warm low value, if he made at least one purchase between 70 days and 162 days and his average order value was less than £458.
  • Cold, if he made no order in the last 162 days with at least one order in the in the last 253 days.
  • Inactive, if his last order was in more than 253 days.
# Segment customers according to the rules define above. 
customersQ42011$segment <- "NA"
customersQ42011$segment[which(customersQ42011$recency <= 70)] = "Active"
customersQ42011$segment[which(customersQ42011$recency <= 70 & customersQ42011$fisrt_purchase <=30)] = "New active"
customersQ42011$segment[which(customersQ42011$segment == "Active" & customersQ42011$monetary >=458)] = "Active high value"
customersQ42011$segment[which(customersQ42011$segment == "Active" & customersQ42011$monetary < 458)] = "Active low value"
customersQ42011$segment[which(customersQ42011$recency <= 162 & customersQ42011$recency > 70)] = "Warm"
customersQ42011$segment[which(customersQ42011$segment == "Warm" & customersQ42011$monetary > 458)] = "Warm high value"
customersQ42011$segment[which(customersQ42011$segment == "Warm" & customersQ42011$monetary < 458)] = "Warm low value"
customersQ42011$segment[which(customersQ42011$recency <= 253 & customersQ42011$recency > 162)] = "Cold"
customersQ42011$segment[which(customersQ42011$recency > 253)] = "Inactive"

#create a factor to order the segment from new to inactive customers 
customersQ42011$segment = factor(customersQ42011$segment, levels = c("New active","Active high value","Active low value","Warm high value","Warm low value","Cold","Inactive"))

#create the average profile by segment.
customersQ42011 %>%
  group_by(segment) %>%
  summarise(NbCustomer = n(),
            Monetary   = paste0("£",round(mean(monetary))),
            Recency    = round(mean(recency)),
            Frequency  = round(mean(frequency))) %>%
  mutate("Percentage"  = NbCustomer/sum(NbCustomer)) %>%
  mutate(Percentage=paste0(round(Percentage*100),"%"))%>%
  select(segment,NbCustomer,Percentage,Monetary,Recency,Frequency)%>%
  kbl(caption = "Average profile by segment") %>%
  kable_styling()
Table 1: Average profile by segment
segment NbCustomer Percentage Monetary Recency Frequency
New active 252 6% £385 16 1
Active high value 629 14% £1038 26 7
Active low value 1679 39% £263 28 5
Warm high value 218 5% £880 103 2
Warm low value 588 14% £246 107 2
Cold 483 11% £461 206 2
Inactive 489 11% £484 310 1

The table above show results and the average profile by segment. For example, on average, the active high-value customers have spent £1038 and have made 7 purchases. While, on average the inactive customer spent £489 and have made 1 purchase.

From this, we can determine a certain amount of analysis to understand each segment behaviour. We can for instance interested to determine how much revenue is generated by cluster in the Q42011

# How much revenue is generated by each cluster in Q42011
data %>%
  filter(InvoiceDate >= "2011-10-01") %>%
  group_by(CustomerID) %>%
  summarise(revQ42011=sum(Quantity*UnitPrice)) %>%
  right_join(customersQ42011, by="CustomerID") %>%
  group_by(segment) %>%
  summarise("revQ42011"=paste0("£ ",sum(revQ42011))) %>% replace(is.na(.), 0) %>%
  filter(segment %in% c("New active","Active high value","Active low value")) %>% kbl(caption = "Total Q42011 revenue generated by segment") %>%
  kable_styling()
Table 2: Total Q42011 revenue generated by segment
segment revQ42011
New active £ 115023.66
Active high value £ 1727384.12
Active low value £ 876921.18

The table above shows the total revenue generated by segment in the Q42011. We see that the active high value, the active low value and the new active have generated respectively £1,727,384, £876,921 and £115,023.

We can go further by trying to know how much Q42011 revenue was generated by inactive, cold or warm Q32011 customers. Knowing that can help us to know how much revenue can be expected from inactive, cold and warm customers today going tomorrow.

Compare past and future periods

From Here, we consider that Q42011 didn’t happen. We are at the end of Q32011. And, we are doing a segmentation using the same conditions as we did in Q42011.

Data prep

# Split the data, keep only data before 2011-10-01
customersQ32011 <- data %>% 
  filter(InvoiceDate < "2011-10-01") %>%
  mutate (purchase_amount = (Quantity * UnitPrice)) %>% 
  arrange(CustomerID) %>%
  group_by(CustomerID,InvoiceDate) %>%
  summarise(Sales = sum(purchase_amount)) %>%
  mutate(datediff =as.numeric(as.Date("2011-10-02") - InvoiceDate)) %>% 
  group_by(CustomerID) %>%
  summarise(monetary  = mean(Sales),
            recency   = min(datediff),
            frequency = n(),
            fisrt_purchase = max(datediff))

# Perform segmentation in Q32011
customersQ32011$segment <- "NA"
customersQ32011$segment[which(customersQ32011$recency <= 70)] = "active"
customersQ32011$segment[which(customersQ32011$recency <= 70 & customersQ32011$fisrt_purchase <=30)] = "new active"
customersQ32011$segment[which(customersQ32011$segment == "active" & customersQ32011$monetary >=458)] = "active high value"
customersQ32011$segment[which(customersQ32011$segment == "active" & customersQ32011$monetary < 458)] = "active low value"
customersQ32011$segment[which(customersQ32011$recency <= 162 & customersQ32011$recency > 70)] = "warm"
customersQ32011$segment[which(customersQ32011$segment == "warm" & customersQ32011$monetary > 458)] = "warm high value"
customersQ32011$segment[which(customersQ32011$segment == "warm" & customersQ32011$monetary < 458)] = "warm low value"
customersQ32011$segment[which(customersQ32011$recency <= 253 & customersQ32011$recency > 162)] = "cold"
customersQ32011$segment[which(customersQ32011$recency > 251)] = "inactive"

customersQ32011$segment = factor(customersQ32011$segment, levels = c("new active","active high value","active low value","warm high value","warm low value","cold","inactive"))

customersQ32011 %>%
  group_by(segment) %>%
  summarise(NbCustomer = n(),
            Monetary   = paste0("£",round(mean(monetary))),
            Recency    = round(mean(recency)),
            Frequency  = round(mean(frequency))) %>%
  mutate("Percentage"  = NbCustomer/sum(NbCustomer)) %>%
  mutate(Percentage=paste0(round(Percentage*100),"%"))%>%
  select(segment,NbCustomer,Percentage,Monetary,Recency,Frequency)%>%
  kbl(caption = "Average profile by segment") %>%
  kable_styling()
Table 3: Average profile by segment
segment NbCustomer Percentage Monetary Recency Frequency
new active 292 8% £492 13 1
active high value 470 13% £915 27 6
active low value 1143 32% £266 30 5
warm high value 175 5% £1147 109 2
warm low value 695 19% £231 112 2
cold 587 16% £374 204 1
inactive 254 7% £596 287 1
data %>% 
  filter(InvoiceDate > "2011-10-01") %>% 
  mutate(revenueQ42011=Quantity * UnitPrice) %>%
  select(CustomerID,revenueQ42011) %>%
  right_join(customersQ32011, by="CustomerID") %>%
  group_by(segment) %>%
  summarise(rev=paste0("£",sum(revenueQ42011, na.rm = TRUE)))%>%
  kbl(caption = "Q42011 revenue generated by Q32011 segment") %>%
  kable_styling()
Table 4: Q42011 revenue generated by Q32011 segment
segment rev
new active £77145.37
active high value £1116832.21
active low value £558571.19
warm high value £122281.84
warm low value £313854.37
cold £92894.71
inactive £31443.54

The table 4 shows the contribution of Q32011 segments in the Q42011 revenue. The Q32011 active high-value customers have contributed the most in the Q42011 revenue while the cold customers have contributed the less. The most important insight from this table is that Q32011 warm customers have contributed about twice more than Q32011 new customers. This insight can be used to prioritise and to target each segment.

Conclusion

In this post, we wanted to show how to perform an RFM segmentation by using a managerial approach. We used recency to slice the online store database. Then we added the monetary value to understand the customer’s behaviours in each segment. Then we compare the past and the actual revenue to know how much revenue can be expected from each cluster in the future periods. And we have found that the eRetail can expect about twice more revenue from warm customers compared to new customers.