Marketing Analysis: Overview on Marketing Mix Modeling, Sales Driver Analysis, Marketing ROI and Marketing Optimization
Sangamesh K S
December 17, 2017
In this article I will be covering Marketing Mix Modeling, Sales Driver Analysis, Marketing Optimization and Marketing ROI. It is essential for an organization to know its combination factors for sales. Now here we will apply linear regression and find which advertisement mode is significant and what is the combination we have to use for the best advertisement campaign and it’s ROI.
Normally in organizations SPSS and SAS is preferred to analyze the same and we can also same on R. Here we will explore the limits of R for this analysis.
Marketing Mix Modeling, Sales Driver Analysis and Marketing Optimization eventually use Regression for analysis Now let’s load the data and have a look of the data
mydata<-read.xlsx("C:/Users/Sangmesh/Google Drive/Big Data using R/Kaggle/MMM.xlsx",sheetName="Sheet1")
head(mydata)
## Date Sales Email Facebook Road.Show
## 1 2017-01-01 10 5 100 1
## 2 2017-01-02 20 5 200 2
## 3 2017-01-03 40 10 400 4
## 4 2017-01-04 10 20 100 1
## 5 2017-01-05 50 10 500 10
## 6 2017-01-06 60 5 600 20
After seeing the sample of the data we can see the data is of income and expenditure data. Where in Sales is the income generated and the advertisement and promotional activities are the Marketing Expenditure wherein Email and Facebook are the Advertisement and Roadshow is the promotional activity
Now we will look at the discribe function in psych package and look for outliers and skweness
describe(mydata[-1])
## vars n mean sd median trimmed mad min max range skew
## Sales 1 31 42.29 22.61 40 41.24 29.65 10 90 80 0.38
## Email 2 31 6.35 6.45 5 5.00 4.45 0 30 30 2.08
## Facebook 3 31 216.45 229.01 100 172.40 103.78 30 900 870 1.47
## Road.Show 4 31 11.74 13.97 4 9.24 4.45 1 50 49 1.39
## kurtosis se
## Sales -0.90 4.06
## Email 4.25 1.16
## Facebook 1.43 41.13
## Road.Show 0.65 2.51
Here I had removed date as it does not have any significance with the data
By looking at the data we can see that there are outliers and as per the 5 assumptions of the linear regression. The data need to follow empirical rule and need to be in a standard distribution format.
#How to find the data is not following normal distribution?
The describe function give summary output I look for 4 columns i.e. min, max, skew and Kurtosis. As per the rule of thumb the mean and median must not be far apart in the normal distribution or there must be no too much of difference between the mean and median. The skew columns give us the indication of skewness in the data and Kurtosis show the presence of outlier in the data.
Now lets plot the graph of the
plot(mydata$Sales,type = "l",col="red",ylim = c(10,1000),ylab = "Sales",xlab = "Days")
lines(mydata$Email,col="green")
lines(mydata$Road.Show,col="yellow")
lines(mydata$Facebook,col="blue")
Where Red is Sales, green is Email spending, Yellow is expenditure on Roadshow.
Deriving Marketing ROI
Before we treat the data we will have a look at the ROI each which is also known as MROI for which we require the total of sales, advertisement and promotional activities
sales<-sum(mydata$Sales)
ad.email<-sum(mydata$Email)
ad.fb<-sum(mydata$Facebook)
ad.road<-sum(mydata$Road.Show)
ROI of Email Campaign
((sales-ad.email)/ad.email)*100
## [1] 565.4822
We had got a return of 565% on the Marketing Expenditure
Now we will look at the sales income
print(sales)
## [1] 1311
Now we will look at the Marketing Expenditure. which is good
print(ad.email)
## [1] 197
We had invested only 197 and received a profit of 1311 i.e. 565%
ROI of Facebook Campaign
((sales-ad.fb)/ad.fb)*100
## [1] -80.462
We had -80% ROI on Facebook Advertisement. which is bad and expensive
ROI of Roadshow
((sales-ad.road)/ad.road)*100
## [1] 260.1648
Roadshow is giving 260% Profit.
We actually can’t conclude one campaign to be best based on the ROI. ROI can only give us an overview which is expensive and which is cheap.
Thus we use MMM(Marketing Mix Modeling) to find the effectiveness and influence of the campaigns using Linear Regression Model.For which we have to transform the data back to normal and do scaling for Standardization.
mydata$log_Email<-log(mydata$Email)
mydata$log_Facebook<-log(mydata$Email)
mydata$log_Road.Show<-log(mydata$Road.Show)
mydata$log_Email<-as.numeric(noquote(gsub(-Inf,0,mydata$log_Email)))
mydata$log_Facebook<-as.numeric(noquote(gsub(-Inf,0,mydata$log_Facebook)))
sc_mydata<-scale(mydata[-1])
sc_mydata<-as.data.frame(sc_mydata)
head(sc_mydata)
## Sales Email Facebook Road.Show log_Email log_Facebook
## 1 -1.4284217 -0.2101349 -0.50850945 -0.7687248 0.1154308 0.1154308
## 2 -0.9860534 -0.2101349 -0.07183929 -0.6971618 0.1154308 0.1154308
## 3 -0.1013166 0.5653629 0.80150104 -0.5540359 0.9621709 0.9621709
## 4 -1.4284217 2.1163584 -0.50850945 -0.7687248 1.8089111 1.8089111
## 5 0.3410517 0.5653629 1.23817121 -0.1246581 0.9621709 0.9621709
## 6 0.7834201 -0.2101349 1.67484138 0.5909716 0.1154308 0.1154308
## log_Road.Show
## 1 -1.5088285
## 2 -0.9266223
## 3 -0.3444162
## 4 -1.5088285
## 5 0.4252185
## 6 1.0074246
Now we will apply Linear Regression over it. I want to select only 2 feature and build a model
Here I want to take Facebook and Roadshow
lm_fit<-lm(Sales~log_Facebook+log_Road.Show,data = sc_mydata)
summary(lm_fit)
##
## Call:
## lm(formula = Sales ~ log_Facebook + log_Road.Show, data = sc_mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.21321 -0.08095 -0.06862 0.08039 0.35303
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.178e-16 2.805e-02 0.000 1.000
## log_Facebook 1.078e-03 3.034e-02 0.036 0.972
## log_Road.Show 9.889e-01 3.034e-02 32.595 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1562 on 28 degrees of freedom
## Multiple R-squared: 0.9772, Adjusted R-squared: 0.9756
## F-statistic: 600.9 on 2 and 28 DF, p-value: < 2.2e-16
As per the model, Roadshow has a better turnover than Facebook denoted by “***" in coefficients
call: give the dependent and independent variables (x and y) where our sales is dependent and Log Facebook and Log Raod.show is the independent
Estimate: column explain the slop of each variable. we will discuss this in SDA and Marketing Optimization. in simple words we can say estimates are change respect to per unit change in the sales.
Std Error: Gives the expected difference of the model or explain the variation of the model.
t value: It give me how std is far away from the mean O in the normal distribution table. We want it to be far away from zero as this would indicate we could reject the null hypothesis - that is, we could declare a relationship between sales and Roadshow exists.
Pr(>|t|): Typically, a p-value of 5% or less is a good cut-off point. In our model example, the p-values are very close to zero. Note the ‘signif. Codes’ associated to each estimate. Three stars (or asterisks) represent a highly significant p-value. Consequently, a small p-value for the intercept and the slope indicates that we can reject the null hypothesis which allows us to conclude that there is a relationship between sales and Roadshow.
Residual standard error: Residual Standard Error is measure of the quality of a linear regression fit. Theoretically, every linear model is assumed to contain an error term E. Due to the presence of this error term, we are not capable of perfectly predicting our response variable (sales) from the predictor (Facebook and Roadshow).
The R-squared (R2) statistic provides a measure of how well the model is fitting the actual data. It takes the form of a proportion of variance. R2 is a measure of the linear relationship between our predictor variable (Facebook and roadshow) and our response / target variable (sales). It always lies between 0 and 1 (i.e.: a number near 0 represents a regression that does not explain the variance in the response variable well and a number close to 1 does explain the observed variance in the response variable)
F- stat: F-statistic is a good indicator of whether there is a relationship between our predictor and the response variables. The further the F-statistic is from 1 the better it is. However, how much larger the F-statistic needs to be depends on both the number of data points and the number of predictors. Generally, when the number of data points is large, an F-statistic that is only a little bit larger than 1 is already sufficient to reject the null hypothesis (H0 : There is no relationship between speed and distance). The reverse is true as if the number of data points is small, a large F-statistic is required to be able to ascertain that there may be a relationship between predictor and response variables.
Sales Driver Analysis
By using Linear Model we can do SDA wherein we will look at the significance and Hypothesis testing of the linear model and try to understand which is significant and which is not. Here by the model we can see the Roadshow is showing good significance and Facebook is not showing any significance.
Marketing Mix Optimization
By the above Linear Regression we can derive a linear formula which will act as a proportion formula for the investment distribution for marketing.
Sales=3.17+ 1.07(Facebook)+9.889(Roadshow)+Error
Any further distribution/ allocation of money need to be done based on the formula
I had only attempted to show how linear regression models are been applied on day to day activities and it is just a overview of the whole and can be explored further into various areas and can try with various predectors.