Tuesday 14 November 2017

Customer segmentation using cluster analysis: Kmeans


Customer Segmentation using Cluster Analysis: Mean Based Model{k-means}

Introduction

In my pervious article I showed you how to segment a customer data into subgroups called clusters using hierarchical clustering technique which is a bottom up agglomerative clustering approach using similarity and dissimilarity matrix.

Now in this article I will be speaking about a another type of segmentation called mean based model i.e k means clustering which is a partitioning of data based on mean in which a random mean point is selected based on number of k we desire and the algorithm will try to partition the based on the number of k and mean point location.

Why different approach for clustering?

Partitioning the data tend to be sometime tricky and bottom up approach or model based approach fail to give you a satisfactory portioning then we can opt for a mean based approach.

Why k means?

K means is a pretty simple technique and involve only 2 steps pick a number of k and compute to separate the data based on mean distance. This same technique is used in image processing industry a lot. They feed the muti-dimensional image data to k means algo and pic dominant 3 colors (i.e number of k) and try to visualize and reduce the noise in the data (noise means colour disturbance in the data). They keep on running these algorithms with lower k and start increasing the k. Initially at lower k the data looks something like cartoon later when you keep on increasing the k the quality of the data will keep on increase for a certain extent.

Similarly we start with our data and see how the data looks and keep on trying different approach till there is a meaningful output or choose any other model which can do better clusterIn this type of clustering we don’t have the cpcc to find the strong fit like out previous type of clustering . But we need to judge visually or by using a package called Nbclust to determine the number of cluster to use in the cluster.