Time Series Analysis using ARIMA Model(PART-2)
Sangamesh K S
January 21, 2018
Introduction
This is the article the areas I will be covering are as follows:
- Based on Orders count received, which is the best day of the week.
- Based on Order count received, which is the worst day of the week.
- Using historical data, Predict order counts for each day in upcoming week.
On the same dataset I had applied Holt-Winter and Exponential Smoothing Technique in my previous article.Please Visit:https://experimentswithdatascience.blogspot.in/2017/12/demand-forecastingtime-series-smoothing.html
Now we will load the data and packages
data2<-read.csv("C:/Users/Sangmesh/Downloads/DA-Assignment/DA-Assignment/question2_data.csv",header = TRUE)
library(lubridate)
library(tseries)
library(forecast)
library(dplyr)
Now we will find the 1st and the 2nd question finding the best and worst
data2$date<-as.Date(data2$date,"%m/%d/%Y")
data.days<-wday(as.Date(data2$date,'%d-%m-%Y'), label=TRUE)
data5<-cbind(data2,data.days)
sub_group<-group_by(data5,data.days)
summarise(sub_group,total=sum(orders_count))
## # A tibble: 7 x 2
## data.days total
## <ord> <int>
## 1 Sun 80
## 2 Mon 60
## 3 Tue 50
## 4 Wed 20
## 5 Thu 40
## 6 Fri 75
## 7 Sat 100
train<-sample(1:28,21,replace = FALSE)
training<-data5[train,]
testing<-data5[-train,]
As per the data we can see that Saturday followed by Sunday is the day with highest sales (i.e Best day of the week) and Wednesday followed by Thursday is the day with week sales (i.e Worst day of the week)
Now we will start out time series analysis. 1st we will check the data is stationary or not using Dickey Fuller Test.
adf.test(data5$orders_count,alternative = "stationary")
## Warning in adf.test(data5$orders_count, alternative = "stationary"): p-
## value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: data5$orders_count
## Dickey-Fuller = -5.1701, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary
The P Value do not look good and the data is stationary and accepet the alternative hypothesis
acf(data5$orders_count,main="ACF Plot")
By looking at this we can see it is following AR1 process. We cant conclude untill and unless we look at PACF which is a conditional corelation
pacf(data5$orders_count, main="PACF Plot") #
Only the 1st and the 2nd lag are significant rest are not significant. It also help to come with order of data but we will look at a innovative function called auto.arima function to do ARIMA Modeling
arima1<-auto.arima(data5$orders_count,trace = TRUE,test = "kpss",ic="bic",seasonal=TRUE)
##
## ARIMA(2,1,2) with drift : Inf
## ARIMA(0,1,0) with drift : 188.3335
## ARIMA(1,1,0) with drift : 189.4493
## ARIMA(0,1,1) with drift : 189.0873
## ARIMA(0,1,0) : 185.0385
## ARIMA(1,1,1) with drift : 192.354
##
## Best model: ARIMA(0,1,0)
summary(arima1)
## Series: data5$orders_count
## ARIMA(0,1,0)
##
## sigma^2 estimated as 49.07: log likelihood=-90.87
## AIC=183.74 AICc=183.9 BIC=185.04
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.03592857 6.879057 6.178786 -22.328 60.42257 0.9643192
## ACF1
## Training set 0.2676281
Box.test(arima1$residuals^2,type = "Ljung-Box")
##
## Box-Ljung test
##
## data: arima1$residuals^2
## X-squared = 1.6221, df = 1, p-value = 0.2028
jarque.bera.test(arima1$residuals)
##
## Jarque Bera Test
##
## data: arima1$residuals
## X-squared = 1.7425, df = 2, p-value = 0.4184
arima_forecast1<-forecast(arima1,h=8)
plot(arima_forecast1)
The data is not showing any drift(ARIMA without Drift ARIMA(0,1,0)) and I tried Ljung Box test and can see there is a arch effect in the data.
ARIMA(0,1,0) basically say that moving average and auto regressive do not have any significance.
one reason the auto arima model is giving random walk reading might be because we are using minimum bic we can look for aic similaly and another method is to use sarima function and come up with random models and start re predicting the values.
meanwhile jarque bera test is saying the model is significant.
No comments:
Post a Comment