Segmentation using Cluster Analysis
Sangamesh K S
November 13, 2017
Introduction
Today I will throw some light on one of the important concept of Marketing called STP which means Segmenting, Targeting and Positioning. The whole process of selling a product to a customer depend on the companies STP where in how the population of the group is divided into subgroup called segments, company identify one of the subgroup as potential customers and position the product to the customer i.e make the product impressive for the target segment. The whole process of STP starts with a segmentation. Today in this article I will be covering segmentation. As I have explained segmentation is a quantitative process of grouping a population into subgroups based on similarities and dissimilarity. We will go ahead solving various types of segmentation process using Cluster Analysis. Wherein we will do ***
Hierarchical Clustering
Mean based Clustering
Model Based Clustering
Cophenetic correlation coefficient (CPCC)
Latent Class Analysis
Comparing Clusters Solutions
Evaluating groups
When we see the types of segmentation using clustering there are only 4 types Hierarchal, mean based, LCA and model based clustering I want to give border prospective.
On this article we will look into hierarchy clustering for segmentation For any clustering technique the main things that affect the model are the things we input for the model. And here we have 6 variables
## age gender income kids ownHome subscribe
## 1 47.31613 Male 49482.81 2 ownNo subNo
## 2 31.38684 Male 35546.29 1 ownYes subNo
## 3 43.20034 Male 44169.19 0 ownYes subNo
## 4 37.31700 Female 81041.99 1 ownNo subNo
## 5 40.95439 Female 79353.01 3 ownYes subNo
## 6 43.03387 Male 58143.36 4 ownYes subNo
Looking at the data we will cluster the data using any distance function. Here I am using the R’s default clustering technique called “Euclidian Distance” . We can also use using distance Manhattan (finding using horizontal and vertical space), cosine (angle),Euclidain (n-dimension) etc. After converting the data into a similarity and dissimilarity matrix we can go ahead and plot the cluster in the form of a dendrogram. Which is a graphical representation of the segments using data visualization.
Our data is now clustered into hierarchical culturing using bottom up approach. Wherein the classification is done based on various segments based on dividing a set of customers into subgroups and later the subgroup join a larger subgroup, this technique is called agglometric bottom up approach for segmentation
Now we will segment the data with different level of k (segments).
plot(model.hclust,main = "Segmentation of Customers",xlab = "customers")
rect.hclust(model.hclust,k=2, border = "green")
rect.hclust(model.hclust,k=2, border = "green")
rect.hclust(model.hclust,k=4, border = "red")
rect.hclust(model.hclust,k=7, border = "blue")
cor(cophenetic(model.hclust),clus.daisy)
## [1] 0.7682436
Lets assume we have 4 cluster and if we want to know how many people it consist then
table(cutree(model.hclust,k=4))
##
## 1 2 3 4
## 124 136 18 22
1st culster consist of 124, 2nd with 136 and so on.
Now we will look at cophenetic fit
cor(cophenetic(model.hclust),clus.daisy)
## [1] 0.7682436
As the reading is above .70 it is a strong fit.