Friday, 23 February 2018

An Overview of RFM Analysis: Customer Segmentation


An Overview of RFM Analysis: Customer Segmentation

Introduction

In this article we will do RFM Analysis and its application on a transaction data by which we cluster the people into segment.

For RFM Analysis we will use dataset from UCI ML:

Why RFM Analysis?

RFM Analysis is also known as Recency, Frequency and Monetary Analysis wherein we identify and classify customers in class based on how frequently he purchase, how recently he bought, how much amount of worth he bought from us. Using these information we can come up with better loyalty programs, reduce customer churn, reactivation campaigns.

This is necessary because it help us to know; which are the active i.e Class 1, at risk i.e Class 2 and curned i.e Class 3.

Previously RFM Analysis was done on Excel and after R becoming popular people have started using R. Even now you can do it on Excel but R has a package developed by an Indian Satish Hariharan who had eased the tedious job an compared to Excel.

Now let me demonstrate how it works by looking at the data

##   InvoiceNo StockCode                         Description Quantity
## 1    536365    85123A  WHITE HANGING HEART T-LIGHT HOLDER        6
## 2    536365     71053                 WHITE METAL LANTERN        6
## 3    536365    84406B      CREAM CUPID HEARTS COAT HANGER        8
## 4    536365    84029G KNITTED UNION FLAG HOT WATER BOTTLE        6
## 5    536365    84029E      RED WOOLLY HOTTIE WHITE HEART.        6
## 6    536365     22752        SET 7 BABUSHKA NESTING BOXES        2
##   InvoiceDate UnitPrice CustomerID        Country
## 1    40513.35      2.55      17850 United Kingdom
## 2    40513.35      3.39      17850 United Kingdom
## 3    40513.35      2.75      17850 United Kingdom
## 4    40513.35      3.39      17850 United Kingdom
## 5    40513.35      3.39      17850 United Kingdom
## 6    40513.35      7.65      17850 United Kingdom

Out of the whole data we require only transaction id/invoice number, invoice date(in date format), price, customer id and amount in an order without NA values. it is done because the function desired in such a format and order

df<-df1[c(1,5,6,7)]
df$amount<-(df1$Quantity*df1$UnitPrice)
df<-df[c(1,4,2,5)]
df<-na.omit(df)
df$InvoiceDate<-format(as.POSIXct(df$InvoiceDate,"%m/%d/%y %H:%M:%S %p",origin = "2010/1/12"),
                       format = "%d/%m/%Y")
## Warning in as.POSIXlt.POSIXct(x, tz): unknown timezone '%m/%d/%y %H:%M:%S
## %p'
df$InvoiceDate<-as.Date(df$InvoiceDate,origin="20/01/2010")
head(df)
##   InvoiceNo CustomerID InvoiceDate amount
## 1    536365      17850  0012-01-20  15.30
## 2    536365      17850  0012-01-20  20.34
## 3    536365      17850  0012-01-20  22.00
## 4    536365      17850  0012-01-20  20.34
## 5    536365      17850  0012-01-20  20.34
## 6    536365      17850  0012-01-20  15.30

After which we will apply rfm function. The histogram you see below is the distribution of the final score divided in various class

rfm<-findRFM(df)

Now lets see number of customers

nrow(rfm)
## [1] 4372

Now lets see how customers are divided into segments and how many

table(rfm$FinalCustomerClass)
## 
## Class-1 Class-2 Class-3 
##     826    2730     816

Now if we want to know the customer who are class 1 i.e active customers than we will simply run a filter command to know

Class1<-filter(rfm,FinalCustomerClass=="Class-1")
head(Class1[c(1,16)])
## # A tibble: 6 x 2
##   CustomerID FinalCustomerClass
##        <chr>              <chr>
## 1      12346            Class-1
## 2      12365            Class-1
## 3      12367            Class-1
## 4      12401            Class-1
## 5      12441            Class-1
## 6      12442            Class-1

Now lets look at the churn customers who are not transacting for a long time

Class3<-filter(rfm,FinalCustomerClass=="Class-3")
head(Class3[1])
## # A tibble: 6 x 1
##   CustomerID
##        <chr>
## 1      12347
## 2      12348
## 3      12349
## 4      12356
## 5      12357
## 6      12359

Conclusion

RFM is a very powerful tool and it can be applied very easily and turn to be effective on retail stores, online stores etc. going ahead we can also find churn rate and help in targeting customers.