Data-efficient quickest change detection in distributed and multi-channel systems

A distributed or multi-channel system consisting of multiple sensors is considered. At each sensor a sequence of observations is taken, and at each time step, a summary of available information is sent to a central decision maker, called the fusion center. At some point of time, the distribution of observations at an unknown subset of the sensor nodes changes. The objective is to detect this change as quickly as possible, subject to constraints on the false alarm rate, the cost of observations taken at the sensors and the cost of communication between the sensors and the fusion center. Minimax formulations are proposed for this problem. An algorithm called DE-Censor-Sum is proposed, and is shown to be asymptotically optimal for the proposed formulations, for each possible post-change scenario, as the false alarm rate goes to zero. It is also shown, via numerical studies, that the DE-Censor-Sum algorithm performs significantly better than the approach of fractional sampling, where the cost constraints are met based on the outcome of a sequence of biased coin tosses, independent of the observation process.


Introduction
The problem of detecting an abrupt change in the statistical properties of a phenomenon under observation is encountered in many applications. Examples include quality control, infrastructure, environment or habitat monitoring, and spectrum sensing in cognitive radios. A mathematical formulation in statistics that most accurately models the above detection problem is the problem of quickest change detection (QCD).
In the classical QCD problem, a decision maker observes a sequence of random variables {X n }. At some point in time γ, called the change point, the distribution of {X n } changes. The objective is to find a stopping time τ on the sequence {X n } so as to detect this change in distribution as quickly as possi-ble. Mathematically, the goal is to minimize some metric on the detection delay subject to a constraint on a suitable metric on the false alarms.
The general formulation of the QCD problem allows for arbitrary dependence between the observations, and even allows for observations to be observed in continuous time. However, in this paper we focus on the i.i.d. model. In the i.i.d. model, the observations {X n } are independent, conditioned on the change point γ, and identically distributed before and after γ. See Poor and Hadjiliadis (2009), Veeravalli and Banerjee (2013) and Tartakovsky et al. (2014) for a general background on QCD, including references and discussions of applications in engineering and sciences. In Section 2, we provide a brief overview of this classical problem that is relevant to our work.
In many applications, for example surveillance using sensor networks and statistical process control, changes are infrequent, and there is a cost associated with taking observations or acquiring data. However, in the classical problem formulations, although there is a penalty on acquiring data after the change through the metric on delay, there is no penalty on the cost of observations taken before the change point. This motivates the study of a data-efficient version of the classical QCD problem. In Section 3, we discuss our recent results on data-efficient quickest change detection (DE-QCD). In DE-QCD, the objective of the decision maker is to minimize a metric on the detection delay, subject to constraints on a metric on the false alarm and a metric on the cost of observations used before the change point.
The classical QCD problem has been extended in many ways, of which the most practically relevant extension has been to QCD in distributed sensor networks. In Section 4, we discuss ways in which the DE-QCD results of Section 3 can be extended to some sensor network models.

Classical Quickest Change Detection: An Overview
In this section we provide a brief overview of the results from the classical QCD literature. A more detailed introduction can be found for example in Veeravalli and Banerjee (2013). We will focus on the i.i.d. model, in which the random variables {X n } are i.i.d. with probability density function (p.d.f.) f 0 before the change point γ, and i.i.d. with p.d.f. f 1 after γ. If the distribution of the change point γ is known, then γ is modelled as a random variable Γ, and the problem is studied in a Bayesian setting. If the distribution of γ is not known, then the change point is modelled as an unknown constant and the QCD problem is studied in minimax settings. Let P n (correspondingly E n ) be the probability measure (correspondingly expectation) when the change occurs at time n. Then, P ∞ and E ∞ stand for the probability measure and expectation when the change does not occur.

Bayesian Quickest Change Detection
In the Bayesian setting studied by Shiryaev (1963), the change point γ is modeled as geometric with parameter ρ, i.e., for 0 < ρ < 1 where I F represents the indicator of the event F .
represents the information at time n. For time n ≥ 1, based on the information vector I n , a decision is made whether to stop and declare change or to continue taking observations. Let τ be a stopping time on the information sequence {I n }, that is I {τ =n} is a measurable function of I n . Define the average detection delay (ADD) and the probability of false alarm (PFA), as Then the Bayesian quickest change detection problem as formulated by Shiryaev is as follows.
Here, α ≤ 1 is the given constraint on the PFA.
An algorithm that is optimal for the above formulation is provided below. Define, the a posteriori probability that change has already happened given the observations. The probability p n can be computed recursively using the following recursion: The following theorem is proved in Shiryaev (1963): Shiryaev (1963)). If then for a given PFA constraint of α, it is optimal to start with p 0 = 0, for n ≥ 0, update p n using (2.4), and stop the first time p n is above the threshold A α : provided, A α ∈ (0, 1) can be chosen such that Thus, the a posteriori probability or the belief, is computed over time, and a change is declared the first time this belief is larger than a threshold. In the following we refer to the algorithm defined in (2.5) as the Shiryaev algorithm.
In Tartakovsky and Veeravalli (2005) an asymptotic analysis for the Shiryaev algorithm is provided. A simple upper bound on the PFA of the Shiryaev algorithm is given by Thus, setting A = 1 − α ensures that PFA(τ S ) ≤ α. However, the bound for PFA can be quite loose, and hence a more accurate estimate of the PFA is desirable. Also, since the Shiryaev algorithm is optimal, having an estimate of the delay provides a useful lower bound to the ADD of any stopping time.
Let R(x) be the asymptotic distribution of the overshoot when the random walk n k=1 log f 1 (X k ) f 0 (X k ) + n| log(1 − ρ)| crosses a large positive boundary; see Siegmund (1985). Define, (2.7) Also, let Note that ζ ≤ 1. The following theorem is proved in Tartakovsky and Veeravalli (2005).
Theorem 2.2 (Tartakovsky and Veeravalli (2005)). If and further if log L(X) is non-arithmetic (see Siegmund (1985)), then The above theorem implicitly includes an asymptotic lower bound on the performance of any stopping time satisfying a PFA constraint of α: (2.11) We note that due to (2.6), the non-arithmetic assumption is not needed for the above lower bound to be valid.

Minimax Quickest Change Detection
In this section we review the classical minimax formulations for quickest change detection. Again, we focus on the i.i.d. model, where the random variables {X n } are i.i.d. with probability density function (p.d.f.) f 0 before the change point γ, and i.i.d. with p.d.f. f 1 after γ.
In the absence of knowledge of the distribution of the change point, ADD and PFA are not well defined. Thus, new performance metrics are needed for the non-Bayesian setting. We note that the Shiryaev algorithm can still be used to detect a change even in a non-Bayesian setting by selecting any arbitrary value of ρ and by using the Shiryaev recursion for p n . However, p n will no longer be a probability, and the optimality and performance analysis results of the previous section will no longer be valid. See Veeravalli and Banerjee (2013) for a discussion on Shiryaev-Roberts family of algorithms, which have strong asymptotic optimality property for the minimax formulation in Pollak (1985), and are obtained from the Shiryaev algorithm by setting the geometric parameter ρ to zero.
Without a prior on the change point, a reasonable measure of false alarms is the mean time to false alarm, or its reciprocal, which is the false alarm rate (FAR): . (2.12) Finding a uniformly powerful test that minimizes the delay over all possible values of γ subject to a FAR constraint is generally not possible. Therefore it is more appropriate to study the quickest change detection problem in a minimax setting in this case. There are two popular minimax problem formulations in the literature, one due to Lorden (1971) and the other due to Pollak (1985).
In Lorden's formulation, the delay metric used is the supremum of the average delay conditioned on the worst possible realizations. This differs from the metric proposed in Lorden (1971) in that we take the delay to be zero when τ = γ.: The minimax formulation proposed by Lorden (1971) is: Here, 0 ≤ α ≤ 1 is the given constraint.
A less pessimistic way to measure the delay was suggested by Pollak (1985): (2.14) for all stopping times τ for which the expectation is well-defined. The minimax formulation proposed by Pollak is Here, 0 ≤ α ≤ 1 is the given constraint.
is an asymptotic lower bound for the WADD is also shown by Lorden (1971).
It is shown in Lorden (1971) that the CuSum algorithm, proposed by Page in Page (1954), achieves this lower bound. A slightly different version of the CuSum algorithm is actually exactly optimal for the Lorden's formulation; see Moustakides (1986). The CuSum algorithm is described as follows. Update the statistic using the following recursion: The CuSum statistics is initially at 0. As observations are taken, the CuSum statistic is incremented with the log likelihood ratio of the observations. If the CuSum statistic goes below zero, the statistic is reset to zero. After the change, due to the positive mean of log L(X), the statistic C n grows to ∞. A change is declared the first time the CuSum statistic is above the threshold A.
It can be shown that ( Veeravalli and Banerjee (2013)) The proof of the following theorem can be found in Lorden (1971) and Lai (1998).
Theorem 2.4 (Lorden (1971) , Lai (1998)). If and as α → 0, Thus, the CuSum algorithm is asymptotically optimal for both Problem 2.2 and Problem 2.3 because it achieves the lower bound of Theorem 2.3

Data-Efficient Quickest Change Detection
As discussed in the introduction, in many practical application, it is of interest to control the cost of observations used before the change point. In Banerjee and Veeravalli (2012a) and Banerjee and Veeravalli (2013a), we studied Bayesian and minimax versions of the DE-QCD problem, respectively. Recall that in a DE-QCD problem, the objective is to minimize a metric on the delay, subject to constraints on metrics on the false alarms and the cost of observations used before the change point.
In Section 3.1, we consider the DE-QCD problem in the Bayesian setting. We first propose a metric for data-efficiency in the Bayesian setting, that captures the average number of observations used before the change point. We then consider the Bayesian problem of Shiryaev, discussed in 2.1, with an additional constraint on the average number of observations used before the change point. The search is restricted to those policies that employ on-off observation control to limit the observations used before the change. We then discuss a two-threshold extension of the Shiryaev algorithm; which we call the DE-Shiryaev algorithm. In Banerjee and Veeravalli (2012a), we showed that the DE-Shiryaev algorithm is asymptotically optimal for the proposed dataefficient Bayesian formulation.
In Section 3.2, we consider the DE-QCD problem in minimax settings. We first propose a metric for data-efficiency in minimax settings, which captures the fraction of time observations are taken before the change point. We then extend the formulations of Lorden and Pollak by putting an additional constraint on the fraction of time observations are used before the change point. We then discuss an extension of the CuSum algorithm for DE-QCD; which we call the DE-CuSum algorithm. In Banerjee and Veeravalli (2013a), we showed that the DE-CuSum algorithm is asymptotically optimal for the proposed data-efficent minimax formulations.
The asymptotic optimality of the DE-Shiryaev and the DE-CuSum algorithms is established by showing that their performance achieves the lower bound of (2.11) and (2.17), respectively. Asymptotic optimality follows because the lower bound is valid for any stopping time, whether or not observation control is employed.
In both the classical algorithms, the Shiryaev algorithm and the CuSum algorithm, a sequence of statistics is computed using the likelihood ratio of the observations over time, and a change is declared the first time the sequence of statistics crosses a threshold. A large value of the statistic signifies that the evidence is strong of a change having already happened.
In both the DE-QCD algorithms, the DE-Shiryaev algorithm and the DE-CuSum algorithm, there are two thresholds A, B, with B < A. A sequence of statistics is computed over time, and a change is declared if the sequence of computed statistics crosses the threshold A. However, if the computed statistic at time n is below A, then an observation is taken at time n + 1 only if the statistic is above the threshold B. Essentially, when a statistic is below the threshold B, it can be viewed as a strong evidence against the change, and it can be treated as an appropriate time to drop an observation. When an observation is skipped the statistic has to be updated without using the observation; the technique for executing this is discussed in the description of the algorithms in the subsections below.

Data-Efficient Bayesian Quickest Change Detection
We consider the i.i.d. model, i.e., {X n } is a sequence of random variables, We assume that Γ is geometrically distributed with parameter ρ; see (2.1).
For data-efficient quickest change detection we consider the following class of control policies. At each time n, n ≥ 0, a decision is made as to whether to take or skip the observation at time n + 1. Let M n be the indicator random variable such that M n = 1 if X n is used for decision making, and M n = 0 otherwise.
Thus, M n+1 is a function of the information available at time n, i.e., where, φ n is the control law at time n, and represents the information at time n. Here, X , otherwise X i is absent from the information vector I n . Also, I 0 is an empty set.
For time n ≥ 1, based on the information vector I n , a decision is made whether to stop and declare change or to continue taking observations. Let τ be a stopping time on the information sequence {I n }, that is I {τ =n} is a measurable function of I n . A policy for data-efficient quickest change detection is Ψ = {τ, φ 0 , . . . , φ τ −1 }.
To capture the cost of observations used before the change point, we consider the metric the average number of observations (ANO) used before the change point, The data-efficient extension of the classical Bayesian problem of Shiryaev (Section 2.1) is: and Here, α and β, with 0 ≤ α ≤ 1 and 0 ≤ β, are given constraints.
When β ≥ E[Γ] − 1, Problem 3.1 reduces to the classical Bayesian quickest change detection problem Problem 2.1. Also note that when α is small, for the Shiryaev algorithm ANO ≈ E[Γ] − 1. Hence, the Shiryaev algorithm cannot be a solution to this problem if β is small. We now describe the DE-Shiryaev algorithm, that we studied in Banerjee and Veeravalli (2012a), which is asymptotically optimal for the above formulation, for each fixed β, as α → 0.
The probability p n is updated using the following recursions:  With B = 0 the DE-Shiryaev algorithm reduces to the Shiryaev algorithm. When B > 0, the statistic evolves in a similar manner as the Shiryaev statistic as long as the statistic is above B. Thus, a change is declared when p n > A. However, when p n goes below B, p n is updated using the prior on the change point ρ (p n increases monotonically as a result of this), and observations are skipped as long as p n is below B. Since p 0 = 0 < B, few initial observations are skipped even before the observation process begins. Note that except for these initial skipped observations, the number of consecutive observations skipped at any time is a function of the undershoot of the statistic p n (which is a function of the likelihood ratio of the observations), when it goes below B, and At and beyond t(B), whenever p n crosses B from below, it does so with an overshoot that is bounded by ρ. This is because For small values of ρ, this overshoot is essentially zero, and the evolution of p n is roughly statistically independent of its past evolution. Thus, beyond t(B), the evolution of p n can be seen as a sequence of two-sided statistically independent tests, each two-sided test being a test for sequential hypothesis testing between "H 0 = pre-change", and "H 1 = post-change". If the decision in the two-sided test is H 0 , then observations are skipped depending on the likelihood ratio of the observations (the undershoot), and the two-sided test is repeated on the observations taken beyond the skipped observations. The change is declared the first time the decision in a two-sided test is H 1 .
The following theorem is proved in Banerjee and Veeravalli (2012a). For any fixed B and A = 1 − α we have If further log L(X) is non-arithmetic, then for a fixed β, where ζ is as defined in (2.7).

Data-Efficient Minimax Quickest Change Detection
In this section we discuss the extension of the minimax formulations of Lorden and Pollak for DE-QCD we studied in Banerjee and Veeravalli (2013a). We first describe the new metric introduced by us in Banerjee and Veeravalli (2013a) to capture the cost of observations used before the change point in a non-Bayesian setting. With M n , I n , τ , and Ψ as defined earlier in Section 3.1, the Pre-change Duty Cycle (PDC) metric is defined as: Clearly, PDC ≤ 1. Thus, the metric PDC captures the average fraction of time observations are taken before change. If all the observations are used (M k = 1 ∀k), then PDC achieved would be equal to 1. Consider a policy in which every second observation is dropped. Then the PDC for that policy will be exactly equal to 0.5.
For false alarm, we consider the metric used in Lorden (1971) and Pollak (1985), the mean time to false alarm or its reciprocal, the false alarm rate: .

(3.6)
For delay we consider two possibilities: the minimax setting of Pollak (1985) where the delay metric is the following supremum over time of the conditional delay (We are only interested in those policies for which the CADD is well defined.) or the minimax setting of Lorden (1971), where the delay metric is the supremum over time of the essential supremum of the conditional delay WADD(Ψ) = sup n ess sup E n (τ − n) + |I n−1 . , which may not contain the entire set of observations. As in (2.15),
We are also interested in the data-efficient extension of Pollak (1985) Problem 3.3. and Here, 0 ≤ α, β ≤ 1 are given constraints.
With β = 1, Problem 3.3 reduces to the minimax formulation of Pollak (1985), and Problem 3.2 reduces to the minimax formulation of Lorden (1971). Note that although the CuSum algorithm trivially achieves the lower bound in the above theorem, it is not asymptotically optimal for Problem 3.2 and Problem 3.3 above for every β. This is because PDC(τ C ) = 1, and will not satisfy any constraint of β < 1. Another option is to use the CuSum algorithm and use only the m th observation, and then choose m large enough to meet the PDC constraint of β. However, the delay of such a policy will be approximately m CADD(τ C ). Does there exist a policy that achieves any given constraint of β on the PDC, and still achieves the lower bound in (2.17) asymptotically? In Banerjee and Veeravalli (2013a), we have shown that the answer to the above question is in the affirmative. Specifically, the DE-CuSum algorithm, a twothreshold generalization of the CuSum algorithm, can be designed to meet any given constraints of α and β. And for each fixed β, it achieves the lower bound The statistic W n is updated using the following recursions: where (x) h+ = max{x, −h} and L(X) = f 1 (X) f 0 (X) .
The evolution of the DE-CuSum algorithm is plotted in Fig. 3.3. If h = ∞, the evolution of the DE-CuSum algorithm can be described as follows. As seen in Fig. 3.3, initially the DE-CuSum statistic evolves according to the CuSum statistic untill the statistic W n goes below 0. Once the statistic goes below 0, observations are skipped depending on the undershoot of W n (this is also the sum of the log likelihood ratios of the observations) and a pre-designed parameter µ. Specifically, the statistic is incremented by µ at each time step, and observations are skipped untill W n goes above zero, at which time it is reset to zero. At this point, fresh observations are taken and the process is repeated untill the statistic crosses the threshold A, at which time a change is declared. If h < ∞, the maximum number of consecutive observations skipped is bounded by h/µ+1. This assumption is crucial to the WADD analysis of the DE-CuSum algorithm. This may be also desirable in some applications.
In the DE-Shiryaev algorithm, when an observation is skipped the statistic is updated using the prior on the change point. The parameter µ in the DE-CuSum algorithm is a substitute for the Bayesian prior ρ of the DE-Shiryaev algorithm, and is treated as a design parameter. The parameter µ (and also h if it is chosen to be finite) is chosen to meet the constraint on the PDC, and the threshold A is chosen to meet the constraint on the FAR. The DE-CuSum algorithm is essentially a two-threshold algorithm with an upper threshold of A and a lower threshold of B = 0.
In analogy with the evolution of the DE-Shiryaev algorithm, the DE-CuSum algorithm can also be seen as a sequence of independent two-sided tests. In each two-sided test a Sequential Probability Ratio Test (SPRT) (see Wald and Wolfowitz (1948)), with log boundaries A and 0, is used to distinguish between the two hypotheses "H 0 = pre-change" and "H i.e., to a sequence of SPRTs (also see Siegmund (1985)). The DE-CuSum algorithm can also be seen as a two-threshold algorithm with the lower threshold B = 0.
Unless it is required to have a bound on the maximum number of observations skipped, the DE-CuSum algorithm can be controlled by just two-parameters: A and µ. In the following theorem it is shown that these two parameters can be selected independent of each other directly from the constraints. That is the threshold A can be selected so that FAR ≤ α independent of the value of µ. Also, it is possible to select a value of µ such that PDC ≤ β independent of the choice of A. Moreover, with the parameters selected in this manner, the WADD and hence the CADD, achieves the lower bound in (2.17).
Theorem 3.2 (Banerjee and Veeravalli (2013a)). Let 1. For any µ, h and A C n ≥ W n ∀n ≥ 0, (3.11) where C n is the CuSum statistic (2.18) and W n is the DE-CuSum statistic, applied to the same sequence of observations. As a result

Thus, setting
2. There exists values of µ * and h * for which i.e., the PDC constraint is satisfied independent of the choice of A.
In Fig. 3.4, we show that the DE-CuSum algorithm provides a significant gain in performance as compared to the approach of fractional sampling, where the CuSum algorithm is used and the PDC constraint is met by skipping observations randomly. Fig. 3.4 shows the comparative CADD vs FAR performances of the CuSum algorithm, the DE-CuSum algorithm, and the fractional sampling scheme, for f 0 ∼ N (0, 1), f 1 ∼ N (0.75, 1) and PDC = 0.5. As seen in the figure, even for a PDC of 0.5, i.e., even after dropping 50% of the observations before change, there is a small difference in delays of the CuSum and the DE-CuSum algorithm. On the other hand, the difference in performance as compared to the fractional sampling scheme is significant.

Data-Efficient Quickest Change Detection in Sensor Networks
In many engineering applications, a sensor network is often deployed to detect a change in the statistical properties of a process or phenomenon under observation.
In the literature, the QCD problem is also studied in the sensor network framework. In the sensor network model we are interested in, there are L sensors and a central decision maker called the fusion center. The sensors are indexed by the index ∈ {1, · · · , L}. In the following we say sensor to refer to the sensor indexed by . At sensor the sequence {X n, } n≥1 is observed, where n is the time index. At some unknown time γ, the distribution of {X n, } in a subset κ = {k 1 , k 2 , · · · , k m } ⊂ {1, 2, · · · , L} of the streams changes, from f 0, to say f 1, . The observations {X n, } are independent across indices n and , conditioned on γ. The distributions f 0, and f 1, are known to the decision maker but the affected subset {k 1 , k 2 , · · · , k m } may or may not be known. A processed version of the observations is transmitted from each sensor to the fusion center. At each time step, the information transmitted from the sensors to the fusion center could be the raw observations themselves, a quantized version of the observations, or a carefully designed summary message representing all the information available till that time at the sensors. The objective here is to find an efficient way to process and transmit information to the fusion center, and to find a stopping time on the information received at the fusion center to detect the change in distribution as quickly as possible.
The sensor network problem has been studied in both Bayesian and minimax settings in the literature; see Veeravalli (2001) Banerjee et al. (2011). However, in this section, we restrict our discussion to the more practically relevant minimax settings. We now describe three well known algorithms in the literature.
1. CuSum-All: This algorithm can be used when the affected subset κ is known. In this case the problem is same as the case when the change affects all the sensors. For this setup, the CuSum-All algorithm is proposed in Mei (2005), and is shown to achieve the lower bound in (2.17). In this algorithm, the CuSum algorithm (2.18) is used at each sensor. A '1' is transmitted each time the CuSum statistics is above a threshold, and a change is declared at the fusion center the first time a '1' is received from all the sensors at the same time.
2. CuSum-Censor-Max: This algorithm can be used when the affected κ is not known, but it is known that |κ| = 1. Thus, only one of the sensors is affected post-change. In the CuSum-Censor-Max algorithm, the CuSum algorithm is used at each sensor. The CuSum statistics is transmitted from a sensor to the fusion center only if the statistic is larger than a threshold, and a change is declared when the maximum of the transmitted statistics from all the sensors crosses another threshold. This algorithm is proposed in Tartakovsky and Veeravalli (2002) (without the censoring part). It is shown that in Tartakovsky and Veeravalli (2002) that this algorithm is uniformly asymptotically optimal, i.e., it achieves the lower bound in (2.17) for each possible post-change scenario.
3. CuSum-Censor-Sum: For the case when the affected subset κ or even its size is completely unknown, the CuSum-Censor-Sum algorithm is proposed in Mei (2011) (also see Mei (2010)). In this algorithm also the CuSum algorithm is used at each sensor, and the CuSum statistics is transmitted from a sensor to the fusion center only if the statistic is larger than a threshold. A change is declared when the sum of the transmitted statistics from all the sensors crosses another threshold. It is claimed in Mei (2011) that the CuSum-Censor-Sum algorithm is uniformly asymptotically optimal. Based on the properties of the DE-CuSum algorithm, it can be shown that the above data-efficient algorithms are also asymptotically optimal, under appropriate conditions, i.e, their performance is asymptotically equal to the lower bound of (2.17) (with D(f 1 ||f 0 ) replaced by appropriate K-L distance between preand post-change distributions). See Banerjee and Veeravalli (2012b), Banerjee and Veeravalli (2013b), and Banerjee and Veeravalli (2013c) for the detailed problem formulations and numerical results. In summary, the DE-All algorithm can be used when there is a severe constraint on amount of information that can be acquired and transmitted and the number of affected subset is known. If the number of affected subset is not known, but is known to be small as compared to L, then the max schemes performs better than the sum scheme, and vice versa.
For the case when the change affects all the sensors, more efficient algorithms have been proposed in Banerjee and Veeravalli (2012b).
1. DE-Dist: When the communication constraint is not so severe, multiple bits approximating the entire statistic can be transmitted from the sensors to the fusion center, instead of transmitting only one bit. In the DE-Dist algorithm, the CuSum algorithm is used at each sensor. Whenever an observation is taken at a sensor, it is transmitted from a sensor to the fusion center. At the fusion center a CuSum is applied to the received observations to detect the change. It is shown in Banerjee and Veeravalli (2013c) that the DE-Dist algorithm performs considerably better than the DE-All algorithm.
2. Serialized DE-CuSum: This is a centralized control based algorithm. In this algorithm, the observations from the sensors are serialized, and then the DE-CuSum algorithm is applied to the serialized observation sequence. This algorithm can be used as a benchmark for data-efficient algorithms in sensor networks.

Conclusions
We provided a survey of data-efficient quickest change detection. The DE-QCD problem is the classical QCD problem with an additional constraint on the cost of observations used before the change point. The results surveyed here show that the likelihood ratio of the observations can be utilized to modify the classical QCD algorithms to introduce on-off observation control. A large likelihood ratio shows that the evidence is large of the change having already happened, and a change can be declared. On the other hand, a small likelihood ratio indicates that most likely the change has not happened, and few observations can be dropped. We showed how this intuition is used to modify the single-threshold based Shiryaev algorithm and the CuSum algorithm, to obtain the two-threshold based DE-Shiryaev algorithm and the DE-CuSum algorithm, respectively. We also discussed the optimality properties of these data-efficient algorithms. We then provided a short discussion on how this technique can be used to obtain data-efficient algorithms for sensor networks. Insights obtained from these results can be used to obtain data-efficient algorithms for more general QCD models.