In this section we consider the case of pure seasonality. Forecasts are based on the demand model of Eq. (11.13), in which the trend parameter is set to :
where s is the length of the seasonal cycle, i.e., a whole cycle consists of s time buckets.12 To get a grip of this model, imagine a yearly cycle consisting of 12 monthly time buckets, and say that we are at the end of December of year X. Then, t = 0 is December of year X and time bucket t = 1 corresponds to January of year X + 1. On the basis of the estimate , how can we forecast demand in January? Of course, we cannot take , since this involves the seasonal factor of December. We should multiply by the seasonal factor of January, but this is not , because is the estimate of this seasonal factor after observing Y1. Since s = 12 and h = 1, the correct answer is
where is the estimate of the seasonal factor of January at the end of January in year X. We should use seasonal factors estimated one year ago! After observing Y1, i.e., January demand in year X + 1, we can update , which will be used to forecast demand for January in year X + 2. It is easy to devise exponential smoothing formulas to update estimates of both level and seasonality factors:
Here α and γ are familiar smoothing coefficients in the range [0, 1]. Equation (11.23) is an adaptation of simple exponential smoothing, in which demand Yt is deseasonalized by dividing it by the current estimate of the correct seasonal factor, . To see why this is needed, think of ice cream demand in summer; we should not increase the estimate of level after observing high sales, as this is just a seasonal effect. The need for deseasonalization is quite common in many application domains. Equation (11.24) takes care of updating the seasonal factor of time bucket t, based on the previous estimate , and the new information on the seasonal factor, which is obtained by dividing the last observation by the revised level estimate.
Initialization requires fitting the parameters and , for j = 1, 2, …, s. We need s + 1 parameters, but actually the minimal fit sample consists of just s observations, i.e., a whole cycle. In fact, we lose one degree of freedom because the average value of multiplicative seasonality factors must be 1. Of course, the more cycles we use, the better the fit will be. Assuming that the fit sample consists of k full cycles, we have l = k·s observations. A reasonable way to fit parameters in this case is the following:
- The initial estimate of level is set to average demand over the fit sample:Note that we cannot take this plain average if the fit sample does not consist of full cycles; when doing so in such a case, we would overweight some seasons within the cycle.
- Seasonal factors are estimated by dividing average demand of time buckets corresponding to each season within the fit sample, divided by the level estimate:where k = l/s is the number of full cycles in the fit sample. To understand the sum above, say that we have k = 3 full cycles, each one consisting of s = 4 time buckets. Then, the seasonal factor for the first season within the cycle is estimated as
Example 11.7 Table 11.5 shows demand data for 6 time buckets. We assume s = 3, so what we see is a history consisting of two whole cycles. We want to initialize the smoother with a fit sample consisting of one cycle, and evaluate MAPE on a test sample consisting of the second cycle, applying exponential smoothing with coefficients α = 0.1 and γ = 0.2. Initialization yields the following parameters:
Table 11.5 Applying exponential smoothing with multiplicative seasonality.
A useful check is
If we apply Eqs. (11.23) and (11.24) after the first observation, we obtain
We note that estimates do not change! A closer look reveals that the first forecast would have been
We do not update estimates, because the forecast was perfect. By a similar token, errors are zero and parameters are not updated for time buckets t = 2 and t = 3. On second thought, this is no surprise: We have used a model with four parameters, and actually 3 degrees of freedom, to match three observations. Of course, we obtain a perfect match, and errors are zero throughout the fit sample. All the good reasons for not calculating errors there! Things get interesting after observing Y4, and the calculations in Table 11.5 yield
As a further exercise, let us compute forecasts F6,2 and F6,3:
Once again, the example shows the danger of calculating errors within the fit sample; performance evaluation must always be carried out out-of-sample to be fair. Of course, we will not incur such a blatant mistake with a larger fit sample. In the literature, a backward initialization is often suggested. The idea is to run the smoother backward in time, starting from the last time bucket, in order to obtain initialized parameters. Doing so with a large data set will probably avoid gross errors, but a clean separation between fit and test samples is arguably the wisest idea.
Leave a Reply