In this section, we deal with the case of stationary demand, as represented by Eq. (11.14). In simple exponential smoothing we estimate the level parameter Bt by a mix of new and old information:
where α is a coefficient in the interval [0, 1]. In (11.16), the new information consists of the last observation of demand Yt, and the old information consists of the estimate , which was computed at the end of time bucket t − 1, after observing Yt − 1. The updated estimate is a weighted average of new and old information, depending on the smoothing coefficient α. To understand its effect, it is useful to examine the two extreme cases:
- If we set α = 1, we forget all of the past immediately, as the new estimate is just the last observation.10 If α is large, we are very fast to catch new trends, but we are also quite sensitive to noise and spikes in demand.
- If we set α = 0, new information is disregarded altogether; if α is very small, there is a lot of inertia and the learning speed is quite low.
We see that α plays a role similar to the time window k in a moving average. By increasing α we make the forecaster more responsive, but also more nervous and sensitive to noise; by decreasing α, we increase inertia, and noise is better smoothed. The tradeoff is illustrated in Fig. 11.5; a relatively small smoothing coefficient (α = 0.05) makes the adaptation to new market conditions very slow in case (a); by increasing α, the method is more responsive, as shown in case (b). In Fig. 11.6 we see the case of a sudden spike in demand. When α is small, on one hand the spike has a smaller effect; on the other hand the effect is more persistent, making the forecast a bit biased for a longer time period, as shown in plot (a). When α is large, there is an immediate effect on the forecast, which has a larger error and a larger bias, but this fades away quickly as the spike is rapidly forgotten, as shown in plot (b). In fact, the smoothing coefficient is also known as the forgetting factor.
Fig. 11.5 Effect of the smoothing coefficient α, when demand jumps to a new level.
Since we assume a stationary demand, the horizon h does not play any role and, just like with moving average, we have
Equation (11.16) illustrates the way simple exponential smoothing should be implemented on a computer, but it does not shed much light on why this method earned such a name. Let us rewrite exponential smoothing in terms of forecasts for a time bucket, assuming that h = 1, so that
Fig. 11.6 Effect of smoothing coefficient α, when demand has a spike.
If we collect terms involving α, we get
But , and we get the second view of exponential smoothing
This shows that the new forecast is the old one, corrected by the last forecast error, which is smoothed by the coefficient α ≤ 1. The larger is α, the stronger is the correction. This shows why this algorithm is a smoother: It dampens the correction induced by the error, which could be just the result of a transient spike. Note that the forecast is increased when the error is positive, i.e., we underforecasted demand, and decreased otherwise.
Fig. 11.7 Exponentially decaying weights in exponential smoothing.
To understand where the term “exponential” comes from, we still need a third view, which is obtained by unfolding Eq. (11.16) recursively. Applying the equation at time t − 1, we obtain
Plugging this equation into (11.16) yields
If we apply the same reasoning to , , etc., we find
This third view clearly shows that exponential smoothing is just another average. We leave it as an exercise for the reader to prove that weights add up to one, but they are clearly an exponential function of k, with base (1 − α) < 1. The older the observation, the lower its weight. Figure 11.7 is a qualitative display of exponentially decaying weights, and it should be compared with the time window in Fig. 11.3. The exponential decay is faster when α is increased, as the base (1 − α) of the exponential function in Eq. (11.18) is smaller.
Leave a Reply