Predictive analytics

There are various models that can be used for prediction purpose. Time series is a sequence of data points which is well-ordered based on time. Time series can be expressed asYt=f(t),(2.1)

where Y_t is the variable’s value in the study at time t. The components of time series analysis are

Trend
Seasonality
Cyclic
Randomness

Trend: It refers to increasing or decreasing data values over a long period of time. It may be classified as linear or nonlinear trend.

Seasonality: The nature of the data occurrences over a period of time. The data can be generated on weekly, monthly or yearly basis. This is known as periodic fluctuation of data. So, observation of data based on the fixed period is called seasonality of data.

Cyclic: This is also a periodic fluctuation, but it is not fixed on the seasonality component.

Randomness or uneven movements: Irregular variation of the observed values not happening in a cycle. For example: flood, wars, etc.

Time series analysis designed by George Box and Gwilym-Jenkins together is called Box–Jenkins methodology. The major processes of this method are

Selecting a model
Finding optimal parameters
Building ARIMA model
Making predictions

Selecting a model: Selection of data and checking whether the data are having a trend, seasonality or randomness.
Finding optimal parameters: The data should be stationary, which is an important feature of time series. Data should have constant mean and variance of data. If a model is having constant mean and variance of data, then it is called stationary. There are some techniques available to make a model stationary. They are
- Detrending: This is a technique to remove the trend.X(t)=(mean+trend×t)+error(2.2)
- Differencing: This is used to remove the non-stationarity and it is the integration process.X(t)−X(t−1)=ARMA(p,q)(2.3)
p = AR (autoregressive)q = MA (moving average)
Building ARIMA model: ARIMA (p,d,q) × (P,D,Q) is a model which works on stationary data. Where p is AR order, q is MA order and d is degree of differencing. If p = 0, then the data are stationary data. Then ACF and PACF should be calculated. ACF is analogous to the correlation function of two variables and the limitation of ACF is −1 and 1.ACF(h)=cov(yt,yt+h)cov(yt,yt)cov(yt+h,yt+h)=cov(h)cov(0)(2.4)where t is time and h = 0, 1, 2, 3…PACF is correlation among the remaining values in ARIMA. PACF is expressed asPACF(h)=corr(yt−yt*,yt+h−yt+h*) for h≥2=corr(yt,yt+1) for h=1,(2.5)where yt*=β1y(t+1)+β2y(t+2…),yt+h*=β1yt+h−1+β2yt+h−2…

Linear regression is used to eliminate the consequences of variables y_t and y_t+h·h – 1 and β values are based on linear regression (EMC 2015).

Making predictions: Using the residuals of ACF and PACF data, prediction of future points can be done with forecast() function.

Predictive analytics

Comments

Leave a Reply Cancel reply