Saturday 20 January 2018

Time Series Analysis using ARIMA Model(PART-2)


Time Series Analysis using ARIMA Model(PART-2)

Introduction

This is the article the areas I will be covering are as follows:

  1. Based on Orders count received, which is the best day of the week.
  2. Based on Order count received, which is the worst day of the week.
  3. Using historical data, Predict order counts for each day in upcoming week.

On the same dataset I had applied Holt-Winter and Exponential Smoothing Technique in my previous article.Please Visit:https://experimentswithdatascience.blogspot.in/2017/12/demand-forecastingtime-series-smoothing.html

Now we will load the data and packages

data2<-read.csv("C:/Users/Sangmesh/Downloads/DA-Assignment/DA-Assignment/question2_data.csv",header = TRUE)
library(lubridate)
library(tseries)
library(forecast)
library(dplyr)

Now we will find the 1st and the 2nd question finding the best and worst

data2$date<-as.Date(data2$date,"%m/%d/%Y")
data.days<-wday(as.Date(data2$date,'%d-%m-%Y'), label=TRUE)
data5<-cbind(data2,data.days)
sub_group<-group_by(data5,data.days)
summarise(sub_group,total=sum(orders_count))
## # A tibble: 7 x 2
##   data.days total
##       <ord> <int>
## 1       Sun    80
## 2       Mon    60
## 3       Tue    50
## 4       Wed    20
## 5       Thu    40
## 6       Fri    75
## 7       Sat   100
train<-sample(1:28,21,replace = FALSE)
training<-data5[train,]
testing<-data5[-train,]

As per the data we can see that Saturday followed by Sunday is the day with highest sales (i.e Best day of the week) and Wednesday followed by Thursday is the day with week sales (i.e Worst day of the week)

Now we will start out time series analysis. 1st we will check the data is stationary or not using Dickey Fuller Test.

adf.test(data5$orders_count,alternative = "stationary")
## Warning in adf.test(data5$orders_count, alternative = "stationary"): p-
## value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  data5$orders_count
## Dickey-Fuller = -5.1701, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

The P Value do not look good and the data is stationary and accepet the alternative hypothesis

acf(data5$orders_count,main="ACF Plot")

By looking at this we can see it is following AR1 process. We cant conclude untill and unless we look at PACF which is a conditional corelation

pacf(data5$orders_count, main="PACF Plot") #

Only the 1st and the 2nd lag are significant rest are not significant. It also help to come with order of data but we will look at a innovative function called auto.arima function to do ARIMA Modeling

arima1<-auto.arima(data5$orders_count,trace = TRUE,test = "kpss",ic="bic",seasonal=TRUE)
## 
##  ARIMA(2,1,2) with drift         : Inf
##  ARIMA(0,1,0) with drift         : 188.3335
##  ARIMA(1,1,0) with drift         : 189.4493
##  ARIMA(0,1,1) with drift         : 189.0873
##  ARIMA(0,1,0)                    : 185.0385
##  ARIMA(1,1,1) with drift         : 192.354
## 
##  Best model: ARIMA(0,1,0)
summary(arima1)
## Series: data5$orders_count 
## ARIMA(0,1,0)                    
## 
## sigma^2 estimated as 49.07:  log likelihood=-90.87
## AIC=183.74   AICc=183.9   BIC=185.04
## 
## Training set error measures:
##                      ME     RMSE      MAE     MPE     MAPE      MASE
## Training set 0.03592857 6.879057 6.178786 -22.328 60.42257 0.9643192
##                   ACF1
## Training set 0.2676281
Box.test(arima1$residuals^2,type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  arima1$residuals^2
## X-squared = 1.6221, df = 1, p-value = 0.2028
jarque.bera.test(arima1$residuals)
## 
##  Jarque Bera Test
## 
## data:  arima1$residuals
## X-squared = 1.7425, df = 2, p-value = 0.4184
arima_forecast1<-forecast(arima1,h=8)
plot(arima_forecast1)

The data is not showing any drift(ARIMA without Drift ARIMA(0,1,0)) and I tried Ljung Box test and can see there is a arch effect in the data.

ARIMA(0,1,0) basically say that moving average and auto regressive do not have any significance.

one reason the auto arima model is giving random walk reading might be because we are using minimum bic we can look for aic similaly and another method is to use sarima function and come up with random models and start re predicting the values.

meanwhile jarque bera test is saying the model is significant.

No comments:

Post a Comment