Tuesday, 14 November 2017

Segmentation using Cluster Analysis: Hierarchical Clustering


Segmentation using Cluster Analysis

Introduction

Today I will throw some light on one of the important concept of Marketing called STP which means Segmenting, Targeting and Positioning. The whole process of selling a product to a customer depend on the companies STP where in how the population of the group is divided into subgroup called segments, company identify one of the subgroup as potential customers and position the product to the customer i.e make the product impressive for the target segment. The whole process of STP starts with a segmentation. Today in this article I will be covering segmentation. As I have explained segmentation is a quantitative process of grouping a population into subgroups based on similarities and dissimilarity. We will go ahead solving various types of segmentation process using Cluster Analysis. Wherein we will do ***

  1. Hierarchical Clustering

  2. Mean based Clustering

  3. Model Based Clustering

  4. Cophenetic correlation coefficient (CPCC)

  5. Latent Class Analysis

  6. Comparing Clusters Solutions

  7. Evaluating groups


When we see the types of segmentation using clustering there are only 4 types Hierarchal, mean based, LCA and model based clustering I want to give border prospective.

On this article we will look into hierarchy clustering for segmentation For any clustering technique the main things that affect the model are the things we input for the model. And here we have 6 variables

##        age gender   income kids ownHome subscribe
## 1 47.31613   Male 49482.81    2   ownNo     subNo
## 2 31.38684   Male 35546.29    1  ownYes     subNo
## 3 43.20034   Male 44169.19    0  ownYes     subNo
## 4 37.31700 Female 81041.99    1   ownNo     subNo
## 5 40.95439 Female 79353.01    3  ownYes     subNo
## 6 43.03387   Male 58143.36    4  ownYes     subNo

Looking at the data we will cluster the data using any distance function. Here I am using the R’s default clustering technique called “Euclidian Distance” . We can also use using distance Manhattan (finding using horizontal and vertical space), cosine (angle),Euclidain (n-dimension) etc. After converting the data into a similarity and dissimilarity matrix we can go ahead and plot the cluster in the form of a dendrogram. Which is a graphical representation of the segments using data visualization.

Our data is now clustered into hierarchical culturing using bottom up approach. Wherein the classification is done based on various segments based on dividing a set of customers into subgroups and later the subgroup join a larger subgroup, this technique is called agglometric bottom up approach for segmentation

Now we will segment the data with different level of k (segments).

plot(model.hclust,main = "Segmentation of Customers",xlab = "customers")
rect.hclust(model.hclust,k=2, border = "green")
rect.hclust(model.hclust,k=2, border = "green")
rect.hclust(model.hclust,k=4, border = "red")
rect.hclust(model.hclust,k=7, border = "blue")

cor(cophenetic(model.hclust),clus.daisy)
## [1] 0.7682436

Lets assume we have 4 cluster and if we want to know how many people it consist then

table(cutree(model.hclust,k=4))
## 
##   1   2   3   4 
## 124 136  18  22

1st culster consist of 124, 2nd with 136 and so on.

Now we will look at cophenetic fit

cor(cophenetic(model.hclust),clus.daisy)
## [1] 0.7682436

As the reading is above .70 it is a strong fit.