Stationary demand: three views of a smoother

In this section, we deal with the case of stationary demand, as represented by Eq. (11.14). In simple exponential smoothing we estimate the level parameter Bt by a mix of new and old information:

images

where α is a coefficient in the interval [0, 1]. In (11.16), the new information consists of the last observation of demand Yt, and the old information consists of the estimate images, which was computed at the end of time bucket t − 1, after observing Yt − 1. The updated estimate images is a weighted average of new and old information, depending on the smoothing coefficient α. To understand its effect, it is useful to examine the two extreme cases:

  • If we set α = 1, we forget all of the past immediately, as the new estimate is just the last observation.10 If α is large, we are very fast to catch new trends, but we are also quite sensitive to noise and spikes in demand.
  • If we set α = 0, new information is disregarded altogether; if α is very small, there is a lot of inertia and the learning speed is quite low.

We see that α plays a role similar to the time window k in a moving average. By increasing α we make the forecaster more responsive, but also more nervous and sensitive to noise; by decreasing α, we increase inertia, and noise is better smoothed. The tradeoff is illustrated in Fig. 11.5; a relatively small smoothing coefficient (α = 0.05) makes the adaptation to new market conditions very slow in case (a); by increasing α, the method is more responsive, as shown in case (b). In Fig. 11.6 we see the case of a sudden spike in demand. When α is small, on one hand the spike has a smaller effect; on the other hand the effect is more persistent, making the forecast a bit biased for a longer time period, as shown in plot (a). When α is large, there is an immediate effect on the forecast, which has a larger error and a larger bias, but this fades away quickly as the spike is rapidly forgotten, as shown in plot (b). In fact, the smoothing coefficient is also known as the forgetting factor.

images

Fig. 11.5 Effect of the smoothing coefficient α, when demand jumps to a new level.

Since we assume a stationary demand, the horizon h does not play any role and, just like with moving average, we have

images

Equation (11.16) illustrates the way simple exponential smoothing should be implemented on a computer, but it does not shed much light on why this method earned such a name. Let us rewrite exponential smoothing in terms of forecasts for a time bucket, assuming that h = 1, so that

images

Fig. 11.6 Effect of smoothing coefficient α, when demand has a spike.

images

If we collect terms involving α, we get

images

But images, and we get the second view of exponential smoothing

images

This shows that the new forecast is the old one, corrected by the last forecast error, which is smoothed by the coefficient α ≤ 1. The larger is α, the stronger is the correction. This shows why this algorithm is a smoother: It dampens the correction induced by the error, which could be just the result of a transient spike. Note that the forecast is increased when the error is positive, i.e., we underforecasted demand, and decreased otherwise.

images

Fig. 11.7 Exponentially decaying weights in exponential smoothing.

To understand where the term “exponential” comes from, we still need a third view, which is obtained by unfolding Eq. (11.16) recursively. Applying the equation at time t − 1, we obtain

images

Plugging this equation into (11.16) yields

images

If we apply the same reasoning to imagesimages, etc., we find

images

This third view clearly shows that exponential smoothing is just another average. We leave it as an exercise for the reader to prove that weights add up to one, but they are clearly an exponential function of k, with base (1 − α) < 1. The older the observation, the lower its weight. Figure 11.7 is a qualitative display of exponentially decaying weights, and it should be compared with the time window in Fig. 11.3. The exponential decay is faster when α is increased, as the base (1 − α) of the exponential function in Eq. (11.18) is smaller.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *