MOVING AVERAGE

Moving average is a very simple algorithm, which serves well to illustrate some tradeoffs that we will face later. As a forecasting tool, it can be used when we assume that the underlying data generating process is simply

This is the model we obtain from (11.13) if we do not consider trend and seasonality.⁸ In plain words, the idea is that demand is stationary, with average B_t. In principle, the average should be constant over time. If so, we should just forecast demand by a plain average of all available observations. The average has the effect of filtering noise out and revealing the underlying “signal.” In practice, there are slow variations in the level B_t. Therefore, if we take the sample mean of all available data

we may suffer from two drawbacks:

We might be considering data that do not carry any useful information, as they pertain to market conditions that no longer apply.
We assign the same weight 1/T to all demand observations, whereas more recent data should have larger weights; note that, in any case, weights must add up to 1.

A moving average includes only the most recent k observations:

The coefficient k is a time window and characterizes the moving average. To get a grip of the sum, in particular of the +1 term in the lower limit, imagine that k = 2; then, at time t, after observing Y_t, we would take the average

We see that the sum should start with time bucket t − 1, not t − 2. In a moving average with time window k, each observation within the last k ones has weight 1/k in the average. This is illustrated in Fig. 11.3. The estimate of the level is used to build a forecast. Since demand is assumed stationary, the horizon h plays no role at all, and we set

Fig. 11.3 Time window in a moving-average scheme.

Example 11.5 Let us apply a moving average with time window k = 3 for the dataset

and compute MAD, assuming a forecast horizon of h = 1. We can make a first forecast only at the end of time bucket t = 3, after observing Y₃ = 14:

Here, is the estimate of the level parameter B_t at the end of time bucket t = 3. Then, stepping forward, we drop Y₁ = 12 from the information set and include Y₄ = 15. Proceeding this way, we obtain the following sequence of estimates and forecasts:

As we noticed, forecasts do not depend on the horizon; since demand is stationary, any forecast F_t,h based on the information set up to and including time bucket t will be the same for h = 1, 2, …. For instance, say that at the end of t = 5 we want to forecast demand during time bucket t = 10; the forecast would be simply

To compute MAD, we must match forecasts and observations properly. The first forecast error that we may compute is

By averaging absolute errors over the sample, we obtain the following MAD:

Note that we have a history of 8 time buckets, but errors should be averaged only over the 5 periods on which we may calculate an error. The last forecast F_8,1 is not used to evaluate MAD, as the observation Y₉ is not available.

A standard question asked by students after seeing an example like this is:

Should we round forecasts to integer values?

Since we are observing a demand process taking integer values, it is tempting to say that indeed we should round demand forecasts. Actually, there would be two mistakes in doing so:

The point forecast is an estimate of the expected value of demand. The expected value of a discrete random variable may well be noninteger.
We are confusing forecasts and decisions. True, we cannot purchase 17.33 items to meet demand; it must be either 17 or 18. However, what if items are purchased in boxes containing 5 items? What about making a robust decision hedging against demand uncertainty? What about existing inventory on hand? The final decision will depend on a lot of factors, and the forecast is just one of the many inputs needed.

Comments

Leave a Reply Cancel reply