Monday, 6 November 2017

Exploratory Data Analysis on Marketing Research Data


Exploratory Data Analysis (EDA): Marketing Data

Introduction

Today we will do an Exploratory Data Analysis using R. The data set we are working with is a market research data which consist of 300 observation and is been classified into segments. We will use this segmented data and carry our further explore the data.Before we get into Data Analysis it is necessary to know

What is Exploratory Data Analysis?

In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task

EDA stands to be an effective analysis; usually carried out to understand the data and applied prior few Machine Learning Algorithms.

It is normally thought to “data analyst beginners”; as it does not require hardcore statistical knowledge. Even the outputs are so simple that the even a non-statistician can easily understand.

The main objective of this analysis is to explore and identify facts and figures which normally can’t be analyzed just looking at the data.

Interestingly you do not require any high-tech to carry this analysis even an Excel can be used to do similar task.

Before going further we will look at the structure of the data

str(mydata)
## 'data.frame':    300 obs. of  7 variables:
##  $ age      : num  47.3 31.4 43.2 37.3 41 ...
##  $ gender   : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 2 2 2 1 1 ...
##  $ income   : num  49483 35546 44169 81042 79353 ...
##  $ kids     : int  2 1 0 1 3 4 3 0 1 0 ...
##  $ ownHome  : Factor w/ 2 levels "ownNo","ownYes": 1 2 2 1 2 2 1 1 1 2 ...
##  $ subscribe: Factor w/ 2 levels "subNo","subYes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Segment  : Factor w/ 4 levels "Moving up","Suburb mix",..: 2 2 2 2 2 2 2 2 2 2 ...
head(mydata);tail(mydata)
##        age gender   income kids ownHome subscribe    Segment
## 1 47.31613   Male 49482.81    2   ownNo     subNo Suburb mix
## 2 31.38684   Male 35546.29    1  ownYes     subNo Suburb mix
## 3 43.20034   Male 44169.19    0  ownYes     subNo Suburb mix
## 4 37.31700 Female 81041.99    1   ownNo     subNo Suburb mix
## 5 40.95439 Female 79353.01    3  ownYes     subNo Suburb mix
## 6 43.03387   Male 58143.36    4  ownYes     subNo Suburb mix
##          age gender   income kids ownHome subscribe   Segment
## 295 36.14964   Male 40522.39    0  ownYes     subNo Moving up
## 296 32.95227 Female 43882.43    0  ownYes     subNo Moving up
## 297 40.96255 Female 64197.09    2   ownNo     subNo Moving up
## 298 38.22980   Male 47580.93    0   ownNo    subYes Moving up
## 299 33.17036   Male 60747.34    1   ownNo     subNo Moving up
## 300 34.38388   Male 53674.93    5  ownYes     subNo Moving up

The structure consist of 300 Observation one output and 6 feature i.e.(6+1). Where in segments is the output and Age, Gender, Income, Kids, Own House and Subscribers are the features. In this analysis we will look how Segment is related with all the features and other analysis.

We will look at the 1st feature Age and lets check out of 300 observation how the samples are been collected by using a Histogram.

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Now lets find the relation between Sex, Segment and the frequency using Dplyr Package

## Source: local data frame [8 x 4]
## Groups: gender [2]
## 
## # A tibble: 8 x 4
##   gender    Segment count    rfreq
##   <fctr>     <fctr> <int>    <dbl>
## 1 Female  Moving up    49 31.21019
## 2 Female Suburb mix    48 30.57325
## 3 Female  Travelers    40 25.47771
## 4 Female  Urban hip    20 12.73885
## 5   Male  Moving up    21 14.68531
## 6   Male Suburb mix    52 36.36364
## 7   Male  Travelers    40 27.97203
## 8   Male  Urban hip    30 20.97902

Now we are able to find the gender distribution with respect to the segment; segments consist females greater than male as per the observation.

Now we will go ahead and find which sex subscribe the product/service

##         
##          Female Male
##   subNo     136  124
##   subYes     21   19

As per the table; we are able to conclude majority of the subscribers are females and male’s proportion is not negligible after looking the data. By looking the data we can also conclude the product/service is a kind of unisex; by looking at both proportion.

As we have looked the relation of subscribers and sex. We will look the relation with the income. We will start with plotting a histogram of Income

As per the histogram we are able to see the data is showing no skew and majority of the data is concentrated between 20,000 to 80000, with negligible outliers

As we got a understanding of income we will look at the relation between income and sex

As per the bar plot we are able to see female income is on higer end.Now lets see this sapartlely.

##         Male  Female    Total
## [1,] 6981973 8298988 15280961

Now we can see that female income is little on higer end than male.

m<-m/Total*100
f<-f/Total*100
s<-cbind(m,f)
colnames(s)<-c("Male","Female")
s
##          Male   Female
## [1,] 45.69066 54.30934

Now we will go ahead and find the relation between segment and income

##      Segment  Income
## 1  Moving up 3716368
## 2 Suburb mix 5503382
## 3  Travelers 4977115
## 4  Urban hip 1084096
## [1] 15280961
##      Segment  Income   Percent
## 1  Moving up 3716368 24.320251
## 2 Suburb mix 5503382 36.014633
## 3  Travelers 4977115 32.570694
## 4  Urban hip 1084096  7.094423

As per the income analysis we are able to see the Travelers and Suburb Mix are having majority of income. Suburb the highest and urban hip the lowest.

Now we will look how many own a home as per our segment

##             
##              ownNo ownYes
##   Moving up     47     23
##   Suburb mix    52     48
##   Travelers     20     60
##   Urban hip     40     10

Interestingly Travelers own a home and urban hip the least won a home.

Now we will look, how many of the segments have kids

##      Segment Kids Percent
## 1  Moving up  134 35.1706
## 2 Suburb mix  192 50.3937
## 3  Travelers    0  0.0000
## 4  Urban hip   55 14.4357

We are able to see Traveling people do not have kids and Suburb Mix have the highest amount of kids

Now we will look at Market Cap with respect to the segment. Based on subscribers

Now we are able to see we are able to see that data in tabular format

##         
##          Moving up Suburb mix Travelers Urban hip
##   subNo         56         94        70        40
##   subYes        14          6        10        10

We have tapped Moving up Urban Hip and travels Segment. Later understanding this data we can do proper STP(segmenting Trageting and Position) of the product. We are able to see the 2 segments are potential and we can go ahead and taget out segment.