Stationary demand: initialization and choice of α

One obviously weird feature of Eq. (11.18) is that it involves an infinite sequence of observations. However, in real life we do not have an infinite number of observations; the sum must be truncated somewhere in the past, right before we started collecting information. The oldest term in the average, in practice, corresponds to the initialization of the algorithm. To see this, let us assume that we have observations Y₁, …, Y_T. If we apply Eq. (11.16) at time bucket t = 1, we have

The term is an initial estimate, or we should better say a guess, since it should be evaluated before the very first observation is collected. By applying the same idea that lead us to Eq. (11.18), we find

Hence, after collecting T observations, the weight of the initial estimate in the average is (1 − α)^T. If α is small, this initial term has a slowly decaying impact, for increasing T, and it may play a critical role, as a bad initialization will be forgotten too slowly.

To deal with initialization, we should clearly frame the problem:

If we are applying exponential smoothing online, to a brand-new product with no past sales history, we must make some educated guess, possibly depending on past sales of a similar product. Arguably, α should be larger at the beginning, since we must learn rapidly, and then it could be reduced when more and more information is collected.
If we are working online, but we are forecasting demand for an item with a significant past sales history, we may use old observations to initialize the smoother, but we should do it properly.
By the same token, initialization must be carried out properly when working offline and evaluating performance by historical simulation.

To see what “properly” means in case 2 above, imagine that we are at the end of time bucket t = T, and we have an information set consisting of T past observations:

We are working online and want to forecast demand for t = T + 1, and one possible way to initialize the smoother is to compute the sample mean of these observations, setting

Then, our forecast for Y_T + 1 would be . However, in doing so, we are forgetting the very nature of exponential smoothing, since the sample mean assigns the same weight to all of the observations. We should try to set up a situation as similar as possible to that of a forecasting algorithm that has been running forever. The way to do so is to apply exponential smoothing on the past data, from Y₁ to Y_T, using the sample mean as an initialization of . Apparently, we are playing dirty here, since we use past data to come up with an initialization that is shifted back to the end of time bucket t = 0. However, there is nothing wrong with this, as long as we do not compute forecast errors to evaluate performance.

The last remark is also essential when carrying out historical simulation to evaluate performance. Assume again that we have a sample of T observations Y_t, t = 1, …, T, and that the size of the fit sample is τ < T. We partition the sample as follows:

Then, the correct procedure is:

Use observations in the fit sample to initialize
Apply exponential smoothing on the fit sample, updating parameter estimates as prescribed, without calculating errors
Proceed on the test sample, collecting errors
Evaluate the selected forecast error measures

The reader could wonder why we should carry out step 2; a tempting shortcut is to proceed directly with the application of the smoother on the test sample. A good way to get the important message is by referring back to Monte Carlo simulation of a queueing system (Section 9.2.1). In that case, we should discard the initial part of the simulation, which is just a transient phase, if the queueing system starts empty, and gather statistics only in steady state. The same consideration applies here. By running the smoother on the fit sample, we warm the system up and forget the initialization. We illustrate the idea in the example below.

Example 11.6 Consider a product whose unit purchase cost is $10, is sold for $13, and, if unsold within a shelf life of one week, is scrapped, with a salvage value of $5. We want to decide how many items to buy, based on the demand history reported in Table 11.4. If we were convinced that demand is stationary, i.e., its expected value and standard deviation do not change, we could just fit a probability distribution like the normal, based on sample mean and sample standard deviation. Then, the newsvendor model would provide us with the answer we need. However, if expected demand might shift in time, exponential smoothing can be applied. To see this, imagine taking the standard sample statistics when the demand shows an abrupt jump, like the demand history depicted in Fig. 11.5. We recall that the underlying demand model is

Table 11.4 Application of exponential smoothing to a newsvendor problem.

Imagine that, at some time bucket t = t*, the level jumps from value B′ to a larger value B″. If we knew the “true” value of the level B_t at each time bucket, demand uncertainty would be related only to the standard deviation of the unpredictable component . If we calculate sample mean and standard deviation mixing the two components, the jump in B_t would result in an estimate of that is much larger than true value; the sample mean would be halfway between two values B′ and B″. In practice, we need to update dynamically the estimates of both expected value and standard deviation; hence, let us apply exponential smoothing, with α = 0.1 and a fit sample of 5 time buckets.

The calculations are displayed in Table 11.4. On the basis of the fit sample, the initial estimate of level is

The first demand observation is used to update the estimate of level:

Note that we should not compute an error for the first time bucket, comparing Y₁ = 99 against , since the demand observation Y₁ itself has been used to initialize the estimate; incidentally, using a fit sample of size 1, the first error would be zero. It would also be tempting to run the smoother starting from time bucket t = 6, the first time bucket in the test sample. However, it is better to “forget” the initialization and the consequent transient phase, by running the smoother through the full fit sample, without collecting errors. We proceed using the same mechanism and, after observing Y₅ = 139, we update the estimate again:

Now we may start forecasting and calculating errors:

Please note that we should compare Y₆ = 70 with the forecast in the line above in the table, F_5,1 = 119.4640. Of course, you could arrange the table in such a way that each demand observation and the related forecast are on the same line. At the end of the test sample, the last forecast is

but this is not used in assessing performance, since observation Y₁₆ is not available; however, we use it as an estimate of the expected value of demand, valid for time bucket t = 16. In so doing, we implicitly assume that our estimates are unbiased.

Now we also need an estimate of the standard deviation. Since we cannot use sample standard deviation, we resort to using RMSE as an estimator. In order to evaluate RMSE, we square errors, as shown in the last column of Table 11.4. We obtain

Hence, we assume that demand in time bucket t = 16 is normally distributed with the following expected value and standard deviation, respectively:

Now we apply the standard newsvendor model. The service level is calculated from the economics of the problem:¹¹

corresponding to a quantile z_0.375 = −0.3186. The order quantity should be

After collecting more observations, the above estimates would be revised; hence, the order quantity is not constant over time.

Apart from initialization, another issue in exponential smoothing is the choice of the smoothing coefficient α. The example suggests one possible approach, based on historical simulation: Choose the coefficient by minimizing a selected error measure. However, a possibly better strategy is dynamic adaptation. When dealing with a new product, we said that we could start with a larger α in order to forget a possibly bad initial guess rapidly; then, we could reduce α in order to shift emphasis toward noise filtering. Another element that we should take into consideration is bias. In Fig. 11.5 we have seen that the smoother can be slow to adapt to new market conditions, when an abrupt change occurs. The effect is that forecasts become systematically biased, which is detected by a mean error which is significantly different from zero. A strategy that has been proposed in the literature is to increase α when mean error is significantly different from zero. However, bias can also be the effect of a wrong demand model. Simple exponential smoothing assumes stationary demand, but a systematic error will result if, for instance, a strong upward or downward trend is present.

Stationary demand: initialization and choice of α

Comments

Leave a Reply Cancel reply