Optimal Design and Analysis of the Exponentially Weighted Moving Average Chart for Exponential Data

We study optimal design of the Exponentially Weighted Moving Average (EWMA) chart by a proper choice of the smoothing factor and the initial value (headstart) of the decision statistic. The particular problem addressed is that of quickest detection of an abrupt change in the parameter of a discrete-time exponential model. Both pre- and post-change parameter values are assumed known, but the change-point is not known. For this change-point detection scenario, we examine the performance of the conventional one-sided EWMA chart with respect to two optimality criteria: Pollak's minimax criterion associated with the maximal conditional expected delay to detection and Shiryaev's multi-cyclic setup associated with the stationary expected delay to detection. Using the integral-equations approach, we derive the exact closed-form formulae for all of the required performance measures. Based on these formulae we find the optimal smoothing factor and headstart by solving the corresponding two bivariate constraint optimization problems. Finally, the performance of the optimized EWMA chart is compared against that of the Shiryaev--Roberts--$r$ procedure in the minimax setting, and against that of the original Shiryaev--Roberts procedure in the multi-cyclic setting. The main conclusion is that the EWMA chart, when fully optimized, turns out to be a very competitive procedure, with performance nearly indistinguishable from that of the known-to-be-best Shiryaev--Roberts--$r$ and Shiryaev--Roberts procedures.


Introduction
Quickest change-point detection is concerned with the design and analysis of procedures for on-line detection of possible changes in the characteristics of an observed random process. Specifically, the process is assumed to be monitored through sequentially made observations (e.g., measurements), and should their behavior suggest the process may have statistically changed, the aim is to conclude so within the fewest observations possible, subject to a tolerable level of the risk of false detection. The area finds applications across many branches of science and engineering: industrial quality and process control (see, e.g., Ryan, 2011;Montgomery, 2012;Kenett and Zacks, 1998), biostatistics, economics, seismology, forensics, navigation, cybersecurity (see, e.g., Tartakovsky et al., 2005Tartakovsky et al., , 2006; Tartakovsky et al., 2013), and communication systems-to name a few. A sequential change-point detection procedure is defined as a stopping time adapted to the observed data, at which one stops and declares that apparently a change is in effect.
The design of a detection procedure comes down to constructing an appropriate detection statistic sensitive to a change, which is compared with a threshold. Once the first exceedance of the threshold occurs, the procedure declares a change. To this end, much of statistical inference in change-point detection uses likelihood ratio (LR), and embraces both the maximum LR argument as well as the Bayesian or quasi-Bayesian theory. For example, Page's (1954) Cumulative Sum (CUSUM) "inspection scheme," a popular detection procedure, which is based on the maximum LR argument. On the other hand, the Bayesian theory is manifested, e.g., in the Shiryaev (1961;1963) procedure, its several limiting cases including the Shiryaev-Roberts (SR) procedure (Shiryaev 1961, 1963and Roberts 1966) and the Shiryaev-Roberts-r (SR-r) procedure proposed by Moustakides et al. (2011).
The focus of this paper is on the Exponentially Weighted Moving Average (EWMA) chart, due to Roberts (1959), who called it the Geometric Moving Average chart. This chart is usually applied to "raw" data, i.e., it is not LRbased, and is motivated by considerations of forecasting; see, e.g., Hunter (1986). This not with standing, the EWMA chart can be also successfully used as a change-point detection procedure. The EWMA statistic is defined as Z n = (1 − λ)Z n−1 + λX n , n 1 (i.e., linear in observations), where Z 0 and λ ∈ (0, 1] are design parameters and {X n } n 1 are observations. The initial value Z 0 is often referred to as the headstart (see, e.g., Lucas and Crosier 1982), and λ is the smoothing factor. Note that {Z n } n 0 is a first-order autoregressive process. It has been shown by Lucas and Saccucci (1990) that EWMA can be as powerful as CUSUM for the purposes of detecting a shift in the mean of Gaussian noise. As suggested by Hunter (1986), the EWMA chart can be viewed as a compromise between theX-chart and the CUSUM inspection scheme. Specifically, Hunter observed that, on the one hand, if λ = 1, then the EWMA chart is "memoryless" and uses only the most recent observation, X n , ignoring all prior data. This is no different from theX-chart. On the other hand, if λ is close to 0, then the distribution of "importance weight" across the past observations is roughly uniform: each observation's influence is about λ. This "elephant memory" is akin to the CUSUM chart, since it is equivalent to repetitively applying the SPRT, which in turn is based on accumulating data assigning equal (namely, unit) weight to each observation.
While it is apparent that the performance of the EWMA chart is dependent upon the choice of the headstart and the smoothing factor, there seems to be no clear answer as to how does one set each of these characteristics to "squeeze" as much as possible out of the chart. The main objective of this work is to begin bridging this gap. The current "rule of thumb" is to use small values (such as 0.05 λ 0.25) for small shifts and larger values (between 0.2 and 0.4) for larger shifts. This was demonstrated empirically, e.g., by Roberts (1959), Lucas and Saccucci (1990), Crowder (1987bCrowder ( , 1989Crowder ( , 1987a, and Knoth (2005a). Also, Srivastava and Wu (1997) considered EWMA to detect a change in the drift of Brownian motion and obtained asymptotically optimal value for the smoothing factor, but their EWMA procedure is slightly different from the conventional one considered in our paper. A thorough study of the problem of the optimal design of the EWMA chart has also been carried out by Gan (1998).
More specifically, the bulk of the present paper is centered around the basic minimax change-point detection problem. It assumes the observations are independent with the pre-and post-change distributions fully specified. The particular change-point scenario we focus on is based on exponential data with a surge in the parameter. We employ the conventional one-sided EWMA chart to detect the change. It turns out that for this change-point scenario one can IASSL ISSN -1391-4987 obtain exact closed-form expressions for nearly every performance characteristic of the EWMA chart. This was first accomplished by Novikov (1990); see also Areepong and Sukparungsee (2010); Mititelu et al. (2009Mititelu et al. ( , 2010; Sukparungsee and Novikov (2006; Areepong and Novikov (2008). We extend Novikov's results using a different approach, exploiting renewal integral equations. Specifically, we derive exact formulas for the conditional Average Detection Delay (ADD) for an arbitrary value of the change-point and for the Stationary Average Detection Delay (STADD). To the best of our knowledge, this is the first time these formulae are obtained. Our goal is to use these formulae to perform an optimization of EWMA with respect to both the headstart and the smoothing factor. A similar analysis was previously carried out, e.g., byČisar andČisar (2011); Sukparungsee (2012); Knoth (2005b). However, they did not consider optimizing simultaneously with respect to the headstart and the smoothing factor. Nor did they consider optimizing with respect to the STADD.
The rest of the paper is organized as follows. In Section 2 we formally state the problem and introduce all of the performance measures and procedures of interest. The exact formulae for EWMA's operating characteristics are derived in Section 3. Using these formulae, in Section 4 we perform a case study with two objectives. First, we apply the obtained formulae to optimize the EWMA chart simultaneously with respect to the smoothing factor and the headstart. This is done for the minimax problem and for the multi-cyclic problem separately. Secondly, once the EWMA chart is fully optimized, its performance is compared against that of the Shiryaev-Roberts procedure in the multi-cyclic setting and against that of the Shiryaev-Roberts-r procedure in the minimax setting. These two procedures are selected to serve as benchmarks for a reason: the Shiryaev-Roberts procedure is exactly optimal in the multi-cyclic sense, and the Shiryaev-Roberts-r is nearly optimal in the minimax sense. The popular CUSUM scheme lacks either of these properties and for this reason is not considered.

Optimality Criteria and Detection Procedures
As indicated above, the focus of this work is on two formulations of the changepoint detection problem: the minimax formulation and that related to multicyclic disorder detection in a stationary regime. In this section, we formulate both problems and introduce the relevant detection procedures.
Suppose one can observe a series of independent random observations {X n } n 1 . Statistically, the series is such that for some time index ν, which is referred to as the change-point, X 1 , X 2 , . . . , X ν each posses a known pdf f (x), and X ν+1 , X ν+2 , . . . are each drawn from a population with a pdf g(x) ≡ f (x), also known. The change-point is the unknown serial number of the last f (x)distributed observation; therefore, if ν = ∞, then the entire series {X n } n 1 is sampled from distribution with the pdf f (x), and if ν = 0, then all observations are g(x)-distributed. Our objective is to decide that the change is in effect, raise an alarm and respond. The challenge is to make this decision "as soon as possible" past and "no earlier" than a certain prescribed limit prior to the true change-point.
Statistically, the problem is to sequentially differentiate between the hypotheses H k : ν = k 0, i.e., that the change occurs at time moment ν = k, 0 k < ∞, and H ∞ : ν = ∞, i.e., that the change never takes place. Once the k-th observation is made, our decision options are either to accept H k , and thus declare that the change has occurred, or to reject H k and continue observing data.
To test H k against H ∞ one first constructs the corresponding likelihood ratio (LR). Let X 1:n = (X 1 , X 2 , . . . , X n ) be the vector of the first n 1 observations. The joint densities of X 1:n under H k and H ∞ are f (X j ) and where it is understood that p(X 1:n |H ∞ ) = p(X 1:n |H k ) if k n. For the LR, Λ k:n = p(X 1:n |H k )/p(X 1:n |H ∞ ), one then obtains and we note that Λ k:n = 1 if k n. We also assume that Λ 0 = 1.
Next, to decide which of the hypotheses H k or H ∞ is true, the sequence {Λ k:n } 1 k n is turned into a detection statistic. To this end, one can either use the maximum likelihood principle and the detection statistic W n = max 1 k n Λ k:n (maximal LR) or the generalized Bayesian approach (limit of the Bayesian approach) and the quasi-Bayesian detection statistic R n = n k=1 Λ k:n (average LR with respect to an improper uniform prior distribution of the change-point). See Polunchenko and Tartakovsky  Once the detection statistic is chosen, it is supplied to an appropriate sequential detection procedure. To this end, the above two detection statistics -{W n } n 1 and {R n } n 1 -give raise to a myriad of detection procedures. The first detection procedure we will consider is the the Shiryaev-Roberts (SR) procedure, which is defined as the stopping time where A > 0 is a preselected detection threshold, and R n = n k=1 Λ k:n is the SR statistic, which can also be computed recursively as Note that the SR statistic starts from zero. Hereafter in the definitions of stopping times we shall assume inf{∅} = ∞.
Another procedure we will also be interested in is the Shiryaev-Roberts-r (SRr) procedure. It is a derivative of the SR procedure proposed by Moustakides et al. (2011) which starts at a fixed but specially designed point R 0 = r, 0 r < A. The stopping time with this new deterministic initialization is defined as where A > 0 is again a preset detection threshold and the SR-r detection statistic R r n is given by the recursion Note that for r = 0 this is nothing but the conventional SR procedure.
Unlike the SR procedure or the SR-r procedure, the EWMA chart proposed by Roberts (1959) usually does not use the LR, and is applied to raw data. Certainly this statistic can also be built upon the log-likelihood ratio, replacing X n in (2.1) with log[g(X n )/f (X n )], as it is done in certain publications. Specifically, the EWMA detection statistic is defined as where the smoothing factor λ ∈ (0, 1] and the initialization point z are given. The starting point Z λ,z 0 = z is usually set to the observations' pre-change mean. The corresponding stopping time can be defined in a number of ways. The most popular way is the two-sided EWMA, which can be effectively used for detecting changes in both directions. We will be interested in the one-sided EWMA defined as where A > 0 is a preset detection threshold. We now proceed to review the optimality criteria whereby one decides which procedure to use. We first set down some additional notation. Let P ν (·) be the probability measure generated by the observations {X n } n 1 when the changepoint is ν 0, and E ν [ · ] be the corresponding expectation. Likewise, let P ∞ (·) and E ∞ [ · ] denote the same under the no-change scenario, i.e., when ν = ∞.
Consider first the minimax formulation proposed by Pollak (1985). The risk of raising a false alarm is measured by the average run length (ARL) to false alarm ARL(T ) = E ∞ [T ] and the delay to detection by the maximal average delay to detection where o(1) → 0 as γ → ∞. Note also that the randomized version of the SR procedure initialized from the quasi-stationary distribution of R n , introduced by Pollak (1985), is also third-order asymptotically optimal. CUSUM chart, on the other hand, is only second-order asymptotically (as γ → ∞) optimal. Furthermore, Polunchenko and Tartakovsky (2010) and Tartakovsky and Polunchenko (2010) showed the SR-r procedure to be exactly minimax in two particular scenarios. Hence, it is the SR-r procedure (and not the CUSUM chart) that one is to use as a benchmark to compare the EWMA chart against in the minimax setting (2.2).
Note that Pollak's version of the minimax formulation assumes the detection procedure is applied only once; the result is either a false alarm, or a correct (though delayed) detection, and no data sampling is done past the stopping point. This is known as the single-run paradigm. Yet another formulation emerges if one considers applying the same procedure in cycles, e.g., starting anew after every false alarm. This is the multi-cyclic formulation.
Specifically, the idea is to assume that in exchange for the assurance that the change will be detected with maximal speed, the experimenter agrees to go through a "storm" of false alarms along the way. The false alarms are ensued from repeatedly applying the same detection rule, starting from scratch after each false alarm. Put otherwise, suppose the change-point, ν, is substantially larger than the desired level of the ARL to false alarm, γ > 1. That is, the change occurs in a distant future and it is preceded by a stationary flow of false alarms; the ARL to false alarm in this context is the mean time between consecutive false alarms. As argued by Pollak and Tartakovsky  Formally, let T 1 , T 2 , . . . denote sequential independent applications of the same stopping time T , and let T j = T 1 + T 2 + · · · + T j be the time of the j-th alarm, j 1. Let I ν = min{j 1 : T j > ν} so that T Iν is the point of detection of the true change, which occurs at time instant ν after I ν − 1 false alarms have been raised. Consider STADD(T ) = lim ν→∞ E ν [T Iν − ν], i.e., the limiting value of the ADD that we will refer to as the stationary ADD (STADD); then the multi-cyclic optimization problem consists in finding T opt ∈ ∆(γ) such that

2.3)
For the continuous-time Brownian motion model, this formulation was first proposed by Shiryaev (1961Shiryaev ( , 1963 who showed that the SR procedure is strictly optimal. For the discrete-time iid model, optimality of the SR procedure in this setting was established by Pollak and Tartakovsky (2009). Specifically, they showed that if the threshold A = A γ in the SR procedure S Aγ is so selected that ARL(S Aγ ) = γ, then STADD(T ) for every γ > 1.
We also note that the CUSUM chart does not possess this optimality property. It therefore makes sense to compare the EWMA chart against the SR procedure with respect to STADD(T ).

EWMA Performance Evaluation and Optimization
Consider a model where the observations are independent and exponentially distributed throughout the entire period of surveillance, and assume that the pre-and post-change mean values are 1 and 1 + θ, respectively, where θ > 0 (known). This is the exponential iid model we intend to focus on in this work. Recall that our objective is to optimize and compare the performance of the EWMA chart against the performance of the SR procedure in the multi-cyclic setting and against the performance of the SR-r procedure in the minimax setting. To achieve this goal, we need to be able to compute the required performance measures for each of the procedures of interest for the change-point scenario in question. To this end, we will rely on the previous work of Moustakides et al. (2011) and its recent extension due to Polunchenko et al. (2013a, 2014a,b), who proposed a generic numerical framework allowing one to compute a large range of performance indices for a broad class of detection procedures with Markovian detection statistics. This condition is clearly fulfilled by all of the procedures of interest. We will therefore proceed to employing the numerical framework by setting up the corresponding integral equations and relations.
We first set down some notation. Following Moustakides et al. (2011), let T x A = inf{n 1 : V x n A} be a generic detection procedure, whose detection statistic, {V x n } n 0 , admits the recursion with V x 0 = x, a fixed value (headstart), and Ψ(x) a sufficiently smooth function such that Ψ(x) 0 for x 0. Moustakides et al. (2011), we have the following integral equations and relations: and from the above equations we obtain that We will now narrow down the above equations to the EWMA chart assuming the exponential iid model introduced at the beginning of this section. To this end, we first remark that the integral equations presented above are Fredholm equations of the second kind. Usually, such equations do not allow for an analytical solution and should be solved numerically. However, as we will show shortly, in the case of EWMA and exponential iid model, one can compute explicitly every single performance measure we are after.
Our goal is twofold. First, we would like to optimize EWMA over the smoothing parameter λ and the headstart z with respect to supremum average detection delay (2.2) and see how its performance compares to that of the SR-r procedure. Second, we would like to optimize the EWMA procedure with respect to stationary average detection delay (2.3) and compare it to the SR procedure. As we will see, sloppily chosen λ and z may result in considerable performance degradation.
To optimize performance in the minimax sense, choose subject to ARL = γ for a given γ > 1. To optimize performance in the stationary sense, choose subject to ARL = γ for a given γ > 1.
Throughout the rest of the paper we will write α for 1−λ. We will also be using the q-analogues notation, namely a special case of the q-Pochhammer symbol, q-brackets and q-factorials:

Recovering ARL
Let (z) denote the average run length to false alarm, ARL, of the EWMA procedure initialized from point z. Then (3.1) becomes Looking for a solution in the form Finally,

Recovering SADD
Since SADD = sup 0 k<∞ ADD k , we first start with ADD 0 , and (3.2) becomes Looking for a solution in the form one gets (3.10) In order to obtain ADD k for k 1, recall that ADD k = δ k /ρ k (3.6), so we have to compute δ k and ρ k . From Then for k 1, where the constants {c k } k 1 can be found from solving the following linear system for c = (c 1 , c 2 , · · · , c k ) : where M is a lower-triangular k-by-k Toeplitz matrix with coefficients M i,i = 0, and To get {δ k } k 1 we will look for solutions in the form

A Case Study
We now employ the formulae obtained in the preceding section to compute the performance of the EWMA chart as a function of the smoothing factor λ and the headstart z. The aim is to find the optimal pair (λ, z), i.e., the values of λ and z for which the performance of the chart is optimal. Specifically, suppose first that the ARL to false alarm is fixed at a given level, and let us see how EWMA's performance depends on both the smoothing parameter λ and the headstart z. As can be seen in Figure 4.1, a proper choice of both λ and z may significantly improve the performance of the EWMA procedure. We now make   two observations based on the numerical results presented in Figure 4.1. First, regardless of the detection delay measure to be minimized, the impact of the smoothing factor on EWMA's performance is substantial. However, the impact of the headstart, z, on EWMA's performance varies depending on the detection delay measure to be minimized. Specifically, one would expect SADD to be more sensitive to the choice of z, and STADD to be less sensitive. This is in fact confirmed by Comparison results of EWMA optimized in λ, but not in z, to optimal procedures are summarized in Tables 4.1 and 4.2. Recall that SR-r is asymptotically third-order minimax and SR is exactly optimal in the stationary setting. Tables show that optimized EWMA has almost the same performance as SR-r in the minimax setting and as SR w.r.t. STADD. Moreover, evidence suggests that for the exponential scenario fixing the headstart at z = 1 does not lead to significant drop in performance. On the other hand, optimizing over λ is crucial. How the performance is affected by λ w.r.t. SADD when the headstart is fixed at z = 1 is shown in Figure 4  Next, we consider the performance as a function of ARL. Since we have two other parameters to optimize over, we will present the case where the headstart is fixed at z = 1 and optimize STADD w.r.t. λ. Figure 4.3 suggests that λ opt → 0 as γ → ∞. Lastly, we consider a case where the post-change parameter is unknown, but lies within a certain range, say θ 0.5. Choosing an optimal λ for the worstcase scenario, namely for the smallest change, one might be interested in how it affects performance should the post-change parameter be different. For calculations we take θ = 0.5, 1.0, 2.0. As a benchmark, we take the SR procedure    tuned to the correct θ, which is exactly optimal in the stationary sense and inspect the loss in performance for a) EWMA with λ chosen optimally for θ = 0.5, and b) SR procedure designed with θ = 0.5 in mind. The results are summarized in Figure 4.4. It can be seen that in such a setup EWMA has the upper hand. Indeed, it is less sensitive to the parameter misspecification. γ STADD(λ*) / STADD opt λ* = optimal for θ = 0.5, actual θ = 0.5 λ* = optimal for θ = 0.5, actual θ = 1 λ* = optimal for θ = 0.5, actual θ = 2 (a) Performance loss ratio for EWMA.