Thursday 9 November 2017

Factor Analysis an Dimension Reduction Technique using R: on a Marketing Survey Data


Factor Analysis on Brand Rating

Introduction

In our previous article I applied and showed you how to do PCA dimensional reduction. Now in this we will do EFA.

What is EFA?

In simple words EFA do the same what PCA do i.e dimension reduction. But the technique is to assess the relationship of various high level dimensions. Here we do analysis on unseen variables in the data like satisfaction based on customer revisiting the store, repurchase, Price Sensitivity etc(trying the link).That is the reason it is called latent variables. The customer revisiting are basically called as manifest variables.

Please Note: Both may look similar when you see the output but it is actually different.

** Before applying let me explain you the difference between PCA and EFA**

1)Where PFA look the set of variance in the data, EFA look for correlation in the data.

2)PFA involves linear composition of observed variables, EFA is based on Theoretical Latent Factor

3)EFA do not observe intrinsic variance, but PCA do

## 'data.frame':    1000 obs. of  10 variables:
##  $ perform: int  2 1 2 1 1 2 1 2 2 3 ...
##  $ leader : int  4 1 3 6 1 8 1 1 1 1 ...
##  $ latest : int  8 4 5 10 5 9 5 7 8 9 ...
##  $ fun    : int  8 7 9 8 8 5 7 5 10 8 ...
##  $ serious: int  2 1 2 3 1 3 1 2 1 1 ...
##  $ bargain: int  9 1 9 4 9 8 5 8 7 3 ...
##  $ value  : int  7 1 5 5 9 7 1 7 7 3 ...
##  $ trendy : int  4 2 1 2 1 1 1 7 5 4 ...
##  $ rebuy  : int  6 2 6 1 1 2 1 1 1 1 ...
##  $ brand  : Factor w/ 10 levels "a","b","c","d",..: 1 1 1 1 1 1 1 1 1 1 ...

Now we will use “nScree” which give us number of components to retain using kaiser rule and parallel analysis. ** For Addtional information on Kaiser rule and parallel Analysis visit the link below**

https://stats.stackexchange.com/questions/253535/the-advantages-and-disadvantages-of-using-kaiser-rule-to-select-the-number-of-pr

https://in.mathworks.com/matlabcentral/fileexchange/44996-parallel-analysis--pa--to-for-determining-the-number-of-components-to-retain-from-pca?requestedDomain=www.mathworks.com

##   noc naf nparallel nkaiser
## 1   3   2         3       3

Now you can see we have recieved 4 different observation noc, naf, narallel and nkaiser. After applying nScree Where in: Noc is Number of Components(optimal coordinaters), Naf is number of components to retain according to factors, nkaiser- are the components to retian as per kaiser rule and nparallel-components to retain a per parallel analysis.

Out of these analysis we can say that only 2-3 factors are appropriate

Now we will look at igean values

## eigen() decomposition
## $values
## [1] 2.9792956 2.0965517 1.0792549 0.7272110 0.6375459 0.5348432 0.3901044
## [8] 0.3120464 0.2431469
## 
## $vectors
##             [,1]        [,2]        [,3]        [,4]        [,5]
##  [1,] -0.2374679 -0.41991179  0.03854006  0.52630873  0.46793435
##  [2,] -0.2058257 -0.52381901 -0.09512739  0.08923461 -0.29452974
##  [3,]  0.3703806 -0.20145317 -0.53273054 -0.21410754  0.10586676
##  [4,]  0.2510601  0.25037973 -0.41781346  0.75063952 -0.33149429
##  [5,] -0.1597402 -0.51047254 -0.04067075 -0.09893394 -0.55515540
##  [6,] -0.3991731  0.21849698 -0.48989756 -0.16734345 -0.01257429
##  [7,] -0.4474562  0.18980822 -0.36924507 -0.15118500 -0.06327757
##  [8,]  0.3510292 -0.31849032 -0.37090530 -0.16764432  0.36649697
##  [9,] -0.4390184 -0.01509832 -0.12461593  0.13031231  0.35568769
##             [,6]         [,7]        [,8]        [,9]
##  [1,]  0.3370676  0.364179109 -0.14444718 -0.05223384
##  [2,]  0.2968860 -0.613674301  0.28766118  0.17889453
##  [3,]  0.1742059 -0.185480310 -0.64290436 -0.05750244
##  [4,] -0.1405367 -0.007114761  0.07461259 -0.03153306
##  [5,] -0.3924874  0.445302862 -0.18354764 -0.09072231
##  [6,]  0.1393966  0.288264900  0.05789194  0.64720849
##  [7,]  0.2195327  0.017163011  0.14829295 -0.72806108
##  [8,] -0.2658186  0.153572108  0.61450289 -0.05907022
##  [9,] -0.6751400 -0.388656160 -0.20210688  0.01720236

Even this say the same what nscree is giving. Look at $Value the 1st 3 readings are giving good result later there is a fall in the proportion.

Now if you remember that I was spaking about the kick in the PCA curve. It is almost the same and similar we are going to identify the most contributing factors in the data set.

Where in PCA look into each and every dimension factor look at few which are affecting the model.

Now both the result are giving different reading 2 and 3 so we will take both result.

## 
## Call:
## factanal(x = brand.scale, factors = 2)
## 
## Uniquenesses:
## perform  leader  latest     fun serious bargain   value  trendy   rebuy 
##   0.635   0.332   0.796   0.835   0.527   0.354   0.225   0.708   0.585 
## 
## Loadings:
##         Factor1 Factor2
## perform          0.600 
## leader           0.818 
## latest  -0.451         
## fun     -0.137  -0.382 
## serious          0.686 
## bargain  0.803         
## value    0.873   0.117 
## trendy  -0.534         
## rebuy    0.569   0.303 
## 
##                Factor1 Factor2
## SS loadings      2.245   1.759
## Proportion Var   0.249   0.195
## Cumulative Var   0.249   0.445
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 556.19 on 19 degrees of freedom.
## The p-value is 8.66e-106

By looking at uniqueness in the table we can say how much each feature are unique from each other.

by looking at loading 2 factor solution we can see there are few feature which are blank and this is due to the elemination done by the model.

## 
## Call:
## factanal(x = brand.scale, factors = 3)
## 
## Uniquenesses:
## perform  leader  latest     fun serious bargain   value  trendy   rebuy 
##   0.624   0.327   0.005   0.794   0.530   0.302   0.202   0.524   0.575 
## 
## Loadings:
##         Factor1 Factor2 Factor3
## perform          0.607         
## leader           0.810   0.106 
## latest  -0.163           0.981 
## fun             -0.398   0.205 
## serious          0.682         
## bargain  0.826          -0.122 
## value    0.867          -0.198 
## trendy  -0.356           0.586 
## rebuy    0.499   0.296  -0.298 
## 
##                Factor1 Factor2 Factor3
## SS loadings      1.853   1.752   1.510
## Proportion Var   0.206   0.195   0.168
## Cumulative Var   0.206   0.401   0.568
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 64.57 on 12 degrees of freedom.
## The p-value is 3.28e-09

Now if I start adding more factors the model will get dilute and cause difficult in analysis. So we look for the factor which capture the maximum information.

Now we will look at rotation. Rotation is a process of looking the data in a different angle. Why we do rotation, to see any some unseen things which are missed from our observation.

In PCA Analysis I had given a example of taking a photo, this is also the same kind of process, where in you link to see the data in a different view.

Please Note: When I do rotation no proportion and variance will change. If there is a change, then it is a hint there is something wrong.

Now we will compute an Oblique rotaion.

What is Oblique Rotation?

In Oblique Rotaion we twist 1st factor line anti cock wise and 2nd factor like clock wisr and try to see the data

There is other method called as orthogonal rotation where all factors move clock wise i.e a moving wheel.

but the default rotation is a varimax rotation i.e similar to orthogonal

## 
## Call:
## factanal(x = brand.scale, factors = 3, rotation = "oblimin")
## 
## Uniquenesses:
## perform  leader  latest     fun serious bargain   value  trendy   rebuy 
##   0.624   0.327   0.005   0.794   0.530   0.302   0.202   0.524   0.575 
## 
## Loadings:
##         Factor1 Factor2 Factor3
## perform          0.601         
## leader           0.816         
## latest                   1.009 
## fun             -0.381   0.229 
## serious          0.689         
## bargain  0.859                 
## value    0.880                 
## trendy  -0.267   0.128   0.538 
## rebuy    0.448   0.255  -0.226 
## 
##                Factor1 Factor2 Factor3
## SS loadings      1.789   1.733   1.430
## Proportion Var   0.199   0.193   0.159
## Cumulative Var   0.199   0.391   0.550
## 
## Factor Correlations:
##         Factor1 Factor2 Factor3
## Factor1  1.0000  -0.388  0.0368
## Factor2 -0.3884   1.000 -0.1091
## Factor3  0.0368  -0.109  1.0000
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 64.57 on 12 degrees of freedom.
## The p-value is 3.28e-09

Form this analysis i was able to find the insights of the data. With the factor correlation i am able to see there is a factor corelation between 2 and 3. which was not not visible in mehtod.

It is my person suggestion to use all types of rotation technique to use because it help in understanding the data and its insights

Now Let me apply data visuvalisation on the model.Where in I will show you all the path of a Standard Model

Now Let me show what difference we created with the help of a Factor Analysis.

Now you can see how we have reduced the noise using Factor Analysis

No comments:

Post a Comment