An Overview of RFM Analysis: Customer Segmentation
Sangamesh K S
February 23, 2018
Introduction
In this article we will do RFM Analysis and its application on a transaction data by which we cluster the people into segment.
For RFM Analysis we will use dataset from UCI ML:
Why RFM Analysis?
RFM Analysis is also known as Recency, Frequency and Monetary Analysis wherein we identify and classify customers in class based on how frequently he purchase, how recently he bought, how much amount of worth he bought from us. Using these information we can come up with better loyalty programs, reduce customer churn, reactivation campaigns.
This is necessary because it help us to know; which are the active i.e Class 1, at risk i.e Class 2 and curned i.e Class 3.
Previously RFM Analysis was done on Excel and after R becoming popular people have started using R. Even now you can do it on Excel but R has a package developed by an Indian Satish Hariharan who had eased the tedious job an compared to Excel.
Now let me demonstrate how it works by looking at the data
## InvoiceNo StockCode Description Quantity
## 1 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6
## 2 536365 71053 WHITE METAL LANTERN 6
## 3 536365 84406B CREAM CUPID HEARTS COAT HANGER 8
## 4 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6
## 5 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6
## 6 536365 22752 SET 7 BABUSHKA NESTING BOXES 2
## InvoiceDate UnitPrice CustomerID Country
## 1 40513.35 2.55 17850 United Kingdom
## 2 40513.35 3.39 17850 United Kingdom
## 3 40513.35 2.75 17850 United Kingdom
## 4 40513.35 3.39 17850 United Kingdom
## 5 40513.35 3.39 17850 United Kingdom
## 6 40513.35 7.65 17850 United Kingdom
Out of the whole data we require only transaction id/invoice number, invoice date(in date format), price, customer id and amount in an order without NA values. it is done because the function desired in such a format and order
df<-df1[c(1,5,6,7)]
df$amount<-(df1$Quantity*df1$UnitPrice)
df<-df[c(1,4,2,5)]
df<-na.omit(df)
df$InvoiceDate<-format(as.POSIXct(df$InvoiceDate,"%m/%d/%y %H:%M:%S %p",origin = "2010/1/12"),
format = "%d/%m/%Y")
## Warning in as.POSIXlt.POSIXct(x, tz): unknown timezone '%m/%d/%y %H:%M:%S
## %p'
df$InvoiceDate<-as.Date(df$InvoiceDate,origin="20/01/2010")
head(df)
## InvoiceNo CustomerID InvoiceDate amount
## 1 536365 17850 0012-01-20 15.30
## 2 536365 17850 0012-01-20 20.34
## 3 536365 17850 0012-01-20 22.00
## 4 536365 17850 0012-01-20 20.34
## 5 536365 17850 0012-01-20 20.34
## 6 536365 17850 0012-01-20 15.30
After which we will apply rfm function. The histogram you see below is the distribution of the final score divided in various class
rfm<-findRFM(df)
Now lets see number of customers
nrow(rfm)
## [1] 4372
Now lets see how customers are divided into segments and how many
table(rfm$FinalCustomerClass)
##
## Class-1 Class-2 Class-3
## 826 2730 816
Now if we want to know the customer who are class 1 i.e active customers than we will simply run a filter command to know
Class1<-filter(rfm,FinalCustomerClass=="Class-1")
head(Class1[c(1,16)])
## # A tibble: 6 x 2
## CustomerID FinalCustomerClass
## <chr> <chr>
## 1 12346 Class-1
## 2 12365 Class-1
## 3 12367 Class-1
## 4 12401 Class-1
## 5 12441 Class-1
## 6 12442 Class-1
Now lets look at the churn customers who are not transacting for a long time
Class3<-filter(rfm,FinalCustomerClass=="Class-3")
head(Class3[1])
## # A tibble: 6 x 1
## CustomerID
## <chr>
## 1 12347
## 2 12348
## 3 12349
## 4 12356
## 5 12357
## 6 12359
Conclusion
RFM is a very powerful tool and it can be applied very easily and turn to be effective on retail stores, online stores etc. going ahead we can also find churn rate and help in targeting customers.