Multiple Crossing Sequential Fixed-Size Confidence Regions for Regression Parameters Under Normality

The purely sequential sampling procedure proposed by Mukhopadhyay and Abid (1986) is customarily used to construct a fixed-size confidence region for regression parameters. This methodology has asymptotic efficiency and asymptotic consistency properties, but it does not have the exact consistency property. We propose that sequential sampling be continued allowing the sample size to cross a corresponding boundary multiple times. The asymptotic efficiency and asymptotic consistency properties are ascertained for multiple crossing stopping rules (Theorem 2.1). A truncation technique as well as a fine-tuning adjustment are developed. The simulated data are generated by realistic models arising from a study that investigates the association between prostate-specific antigen (PSA) and a number of appropriate prognostic clinical covariates. We highlight via large-scale simulations the remarkable gain in nearly achieving the target coverage without significant over-sampling.


Introduction
Multi-stage sampling designs date back to Mahalanobis (1940). The primary motivation was to gauge and control the sampling error in large scale surveys spearheaded by him. Stein's (1945) and Wald's (1947) methodological breakthroughs highlighted the importance of multi-stage and sequential sampling designs. Subsequently, multi-stage and sequential sampling methodologies were developed for point estimation, interval estimation, hypothesis testing, selection and ranking, and other problems in inference.
We implement a new sequential sampling methodology to construct fixedsize confidence regions for regression parameters with a prespecified confidence coefficient. The sequential methodology is governed by multiple crossing stopping rules, originally developed by Muthu Poruthotage (2013,2014) and Muthu Poruthotage (2013) in the context of fixed-size confidence regions for a normal mean. We broaden the notion of multiple crossing by addressing fixed-size regression parameter estimation problems.

Fixed-Accuracy Estimation of Regression Parameters: Preliminaries
Chatterjee (1962) developed a two-stage procedure to estimate the regression coefficients of a general linear model with a fixed-size confidence region under normal errors. Gleser (1965Gleser ( ,1966, Srivastava (1967Srivastava ( , 1971) and others developed purely sequential procedures for fixed-size confidence region estimation of regression coefficients under non-normal errors. These methodologies were primarily inspired by Stein (1945) and Chow and Robbins (1965). Mukhopadhyay (1974) first introduced minimum risk point estimation of regression parameters and Finster (1983) gave associated second-order properties. Limited shrinkage versions in two-stage and sequential regression parameter estimation are available in the literature (Ghosh et al. 1997; Mukhopadhyay and de Silva 2009). We begin with the general linear model: where Y n is a n × 1 observation vector, X n is a known n × p matrix with r(X n ) = p(< n), β is a p × 1 unknown regression parameter vector, ε n ∼ N n (0, σ 2 I n ) is the n × 1 error vector, and 0 < σ 2 < ∞ is unknown. The least square estimator of β, namely β n = X n X n −1 X n Y n , is also the best linear unbiased estimator of β. Also, β n is distributed as: β n ∼ N p (β, σ 2 X n X n −1 ). (1.2) Thus, we define a fixed-size confidence region for β as: R n = β ∈R p : n −1 ( β n − β) X n X n ( β n − β) ≤d 2 , (1. 3) where d > 0 and the associated confidence coefficient 1 − α are preassigned, 0 < α < 1. We treat (1.3) as a fixed-size ellipsoidal confidence region since its maximum diameter is determined by d which is fixed in advance. The confidence coefficient associated with R n is: (1. 4) where F (u) = P {U ≤ u} with U ∼ χ 2 p , u ≥ 0. Had σ 2 been known, by ensuring that n is the smallest integer satisfying n ≥ aσ 2 d −2 = C, say, with a ≡ a p,α such that F (a) = 1 − α, (1. 5) so that the confidence coefficient P β,σ (β ∈ R n ) associated with R n will be ≥ 1 − α. We refer to C in (1.5) as the optimal fixed sample size, had σ 2 been known. But, C remains unknown. Mukhopadhyay and Abid (1986) developed the following purely sequential procedure: Having initially observed (Y i , X i,1 , ..., X i,p ), i = 1, ..., n 0 , , the customary mean square error. At termination based on (1.6), the corresponding fixed-size confidence region, R Q 1 , is: (1.7) In order to address some of the properties of the purely sequential procedure (1.6)-(1.7), let us formally define the notions of consistency, asymptotic consistency and asymptotic efficiency. Let Q be a generic stopping rule with an associated fixed-size confidence region R Q .
Then, we define: The purely sequential procedure (1.6)-(1.7) has the asymptotic consistency and asymptotic efficiency properties. but it does not have the consistency property. Mukhopadhyay and Abid (1986) proposed two-stage and modified two-stage procedures in the spirit of Chatterjee (1962) and Mukhopadhyay (1980) respectively. Both procedures had consistency property, but only the modified two-stage procedure was first-order efficient. However, neither two-stage procedures were asymptotically second-order efficient, a notion formulated by Ghosh and Mukhopadhyay (1981), but the purely sequential procedure was (Mukhopadhyay and Solanky 1994, chapter 3). In order to establish a property such as second-order efficiency, one relies upon nonlinear renewal theory. Refer to Siegmund (1985), Mukhopadhyay (1988 However, such universal r, the required additional data of fixed size r upon termination, remains unknown. The multiple crossing stopping rules aim at estimating r adaptively and delevering an achieved coverage probability not exceeding the target coverage probability by a large margin.

Layout of the Paper
In Section 2, we formally define the multiple crossing stopping rules and discuss some of its properties. A truncation technique on the multiple crossing stopping rule is then proposed. Then, a fine-tuned adjustment to the stopping rule (1.6) is proposed in Section 3. These methodologies nearly provide consistency and alleviate some logistical concerns that may arise in field-experimentations. The methodologies are supplemented by extensive data analyses and implemented via simulations. Section 4 contains concluding thoughts. For ease of locating our tables and figures, they are all laid out after Section 4 but before the list of references begins.

Multiple Crossing Methodology
The general idea of multiple crossing goes like this: The stopping rule (1.6) dictates that sampling must be terminated once the sample size q 1 ≥ aS 2 q 1 /d 2 , a random quantity, for the 1 st time. Under a multiple crossing stopping rule, sampling continues even after the 1 st crossing of the boundary. In fact, the sample size is allowed to cross the corresponding boundary multiple times.

Multiple Crossing Stopping Rules
In order to define multiple crossing stopping rules, we mildly modify the notation from Section 1. Observations are generated by the following general linear model: β is an unknown p × 1 regression parameter vector, ε n ∼ N n (0, σ 2 I n ) is the n × 1 error vector, and 0 < σ 2 < ∞ is unknown.
IASSL ISSN -1391-4987 Let i = 1, ..., k. We denote: Full observations vector: Full design matrix: X q,k = X 1,q 1 , X 2,q 2 , ..., X k,q k ; Total sample size: Least square Estimator of β: (2.2a) Mean square error: Confidence region: Let ξ ≡ β q,k − ω with ω ∈ R p and R q,k = ω ∈ R p : q −1 (k) ξ X q,k X q,k ξ ≤d 2 . (2.2b) Now, we assume that sampling is first carried out according to (1.6) and we have observed Q 1 = q 1 , Y 1,q 1 with the design matrix X 1,q 1 . However, instead of terminating the sampling process at this point, we take one additional observation and follow through with observations one at-atime until it crosses the corresponding boundary 2 nd time. The number of additional observations needed beyond Q 1 up until the 2 nd crossing is denoted by Q 2 defined as: At this juncture, we have recorded Y Q,2 , X Q,2 , and Q (2) = Σ 2 i=1 Q i . The corresponding fixed-size confidence region for β is: In general, the number of crossings is denoted by k. If we decide to terminate sampling according to (2.4), then our results would correspond to k = 2 with Q = (Q 1 , Q 2 ).

Beyond Second Crossing
It is possible to continue sampling, one observation at-a-time beyond second crossing. Suppose that we have crossed the intended boundary k − 1 times. That is, by this time, we have already gathered data We define a stopping variable associated with crossing the boundary for the k th time as follows: Finally, based on the combined set of gathered data Y Q,k , X Q,k , and Q (k) = Σ k i=1 Q i , the corresponding fixed-size confidence region for β is constructed as: (2.6) with Q = (Q 1 , ..., Q k ), k ≥ 2. Now, Theorem 2.1 states some of the key properties of the multiple crossing stopping rule (2.5) with the associated confidence region (2.6).
In order to prove Theorem 2.1, we begin with two lemmas. Lemma 2.1 is needed in the proof of Lemma 2.2. For convenience, we temporarily revert back to the notation used in Section 1.
stand for the total sample size associated with a stopping variable (2.5).
Now, M + 1 ≥ Q 1 a.s. and in view of Lemma 2.1, we claim: From Lemma 2.2, we have: Next, we may utilize the following basic inequality relevant for Q (k) , the total sample size associated with (2.5): Then, Lemma 2.2 and (2.10) imply: which is part (i). Now, we turn to part (ii). Combining Fatou's lemma and (2.11), we claim: the projection matrix, n > p. We may appeal to Wiener's (1939) Hence, for sufficiently small d > 0, the right-hand side of (2.10) may be rewritten as: which shows that Q (k) /C is uniformly integrable. Combining (2.13) with (2.11), we immediately obtain the following result: Combining (2.12) and (2.14), we have part (ii). Next, we turn to part (iii). We observe: Hence, a simple application of Lebesgue dominated convergence theorem combined with part (i) leads to part (iii).

Assessments of Coverage Probabilities
Now that we have established the asymptotic first-order efficiency and asymptotic consistency properties of the multiple crossing stopping rule (2.5), we move to address the coverage probabilities associated with it in more detail. We do so with the help of a large simulation exercise. The premise of the multiple crossing stopping rule is its ability to nearly achieve the target coverage probability 1 − α with an optimal number of additional observations beyond 1 st crossing. We try to strike a balance between oversampling and overshooting the target coverage probability. An appropriate choice of k is critical in order to achieve this balance.

2.
With each additional observation generated one by one, as needed, we sequentially checked against the stopping rule (1.6). This data generating process terminated at the point where the sample satisfied (1.6), that is, when k = 1. At this instance, we recorded the observed sample size Q 1 = q 11 and calculated the least square estimate β (1) q,1 . 3. After 1 st crossing, we generated one new observation and checked against the stopping rule (2.5). This was supplemented by generating new observations, recorded one at-a-time as needed according to the termination rule (2.5). At each subsequent crossing, namely k = 2, 3, 4 and 5, we recorded the observed sample size Q k = q k1 as well as the least , the total observed sample size, and the corresponding design matrix X (1) q,k . 4. Then, we constructed the fixed-size confidence region for parameter vector β, namely, successively with k = 1, ..., 5. Finally, we recorded the observed value p k1 = 1(or 0) if β ∈ (or / ∈) to the region R We express the pertinent simulated estimates as follows: , k = 1, ..., 5; and 95% CL Cov: L(or U ) = p k − (or +)1.96s p k .
(2.17) Table 1 presents selected outcomes to compare and contrast performances of the multiple crossing stopping rule (2.5) to the original sequential stopping rule (1.6) when p = 2 and α = 0.05. We note that p k is an estimate of the coverage probability at the k th crossing. When k = 1, that is when sampling is terminated according to the stopping rule (1.6), p 1 fell under 0.95. This is clearly visible for smaller values of C such as 25, 50, 100. Recall that all tables and figures are laid out before the list of references. However, this is expected since the stopping rule (1.6) does not provide exact consistency. When sampling is extended until 2 nd crossing (k = 2) according to the stopping rule (2.5), we see an increment in the coverage probabilities which is estimated by p 2 . This increment, however, appears to be not large enough to nearly claim the coverage 0.95. Since the multiple crossing procedure (2.5) easily extends to subsequent crossings, we may compare p 3 , p 4 , and p 5 , the estimated coverages corresponding to 3 rd , 4 th , and 5 th crossings, with the target coverage 0.95. Columns 7 and 8 in Table 1 provides approximate 95% confidence interval for the coverage probability associated with (2.5)-(2.6). We observe that the target coverage 0.95 can be safely claimed by letting k = 5 in (2.5). But, it is also important to notice that none of the p k values exceed the target coverage probability of 0.95 by a large margin. This is a much desired characteristic of multiple crossing procedures. After all, one does not want to over-achieve the coverage probability via oversampling. The efficiency at which the multiple crossing procedure (2.5)-(2.6) nearly achieves the target coverage is further illustrated by q (k) in Table 1. Note that q (k) is an estimate of E β,σ [Q (k) ]. Even after five crossings q (5) remains in the close proximity of C, the optimal fixed sample size. Figure 1a illustrates the performances of the stopping rule (2.5)-(2.6) corresponding to C running through 25 (25)100(50)200(100)1000. The vertical lines in Figure 1a correspond to 95% confidence intervals for the coverage probabilities associated with the stopping rule (2.5)-(2.6) when k = 5 and p = 2. The square markers represent the target coverage probability 0.99. Nearly all such confidence intervals contain 0.99. Figures  1b-1c correspond to α = 0.05 and α = 0.10 respectively. Again, recall that all tables and figures are laid out before the list of references.

Illustration with p = 6
We add that when p = 6, a linear regression model with an intercept on the original data from Kutner et al. (2005)  and ε ∼ N (0, 1012). Table 2 is similar to Table 1 except that Table 2 corresponds to p = 6. We provide results up to 8 th crossing (k = 8). It is clearly seen how the estimated coverage probabilities p k move closer to 0.95 as k increases, however, in this case we need to go beyond 5 th crossing to nearly claim the target coverage 0.95. This is evident especially in the case of smaller values of C such as 25, 50, 100. Results in Table 2 suggest that the choice k = 8 is appropriate when p = 6.

Additional Observations Beyond First Crossing
The gain in the achieved coverage probability noted in Section 2.3 was clearly due to the multiple crossing methodology determining adptively the required additional observations beyond 1 st crossing per (1.6). But, if the number of additional observations beyond 1 st crossing is considerably high, the proposed stopping rule (2.5) may not be attractive. Thus, we investigate the empirical distribution of Q k , the number of additional observations needed for the k th crossing beyond the (k − 1) th crossing. The q (k) values in Tables 1 and 2 indicate that the average number of additional observations beyond 1 st crossing is not excessive. Table 3 provides the empirical distribution of Q k for k = 2, 3, 4, 5 when p = 2 and α = 0.05. In Table 3, for instance, when C = 25, our Q 2 was 1 with frequency 8720 times out of 10000 simulations. That is, the 2 nd crossing had occurred after recording merely 1 additional observation beyond 1 st crossing 8720 times out of 10000 simulations. Also, Q 2 was 2 with frequency 505 times out of 10000 simulations. That is, the 2 nd crossing occurred after recording merely 2 additional observations beyond 1 st crossing 505 times out of 10000 simulations. The last column in Table  3, for example, when C = 25 and k = 2 shows the maximum number of additional observations needed beyond 1 st crossing to 2 nd crossing is 31, again out of 10000 simulations. We note an interesting feature: When k is larger, the additional number of observations for subsequent crossing becomes successively smaller. For example, when C = 25, our Q 5 was 1 with frequency 9646 out of the 10000 simulations. An important implication is that when the sampling process moves ahead to higher crossings, the likelihood of faster termination increases drastically, which is a highly desirable feature because it contributes to lowering the right-skewness in the distribution of Q 5 . Table 4 provides similar information when p = 6. Those characteristics that we highlighted in Table 3 clearly prevail in Table 4 for p = 6. Recall that all tables and figures are laid out before the list of references. From Tables 3-4 one may notice that even though the number of additional observations needed for higher levels of crossings stays fairly within reason, in very rare instances, it could be in the range of 100s. This phenomenon was discussed in Mukhopadhyay and Muthu Poruthotage (2013,2014) and Muthu Poruthotage (2013) too. In order to curb possible severe right-skewness of Q 5 when p = 2 and Q 8 when p = 6, we propose a truncation method.

Multiple Crossing with Truncation
The objective of truncation is simply to curb the right-hand tail of the distribution of Q i (i = 2, ..., k) while preserving the gain in the increased coverage probability. The main characteristic of truncation rule is that we would not let Σ k i=2 Q i go beyond Q γ 1 with Q 1 from (1.6) and 0 < γ < 1, a predefined fixed constant. That is, we force Σ k i=2 Q i to stay rather "small" compared with Q 1 which is already very close to C when n 0 ≥ p + 3. See the first row within each block in Tables 1 and 2. Indeed Mukhopadhyay and Abid (1986) showed that Q 1 satisfied the second-order efficiency property in the sense of Ghosh and Mukhopadhyay (1981). We may explain the role of 0 < γ < 1 as follows: In the worst possible scenario, when Q γ 1 observations are sequentially added to previously recorded Q 1 observations, the total number of observations Q 1 + Q γ 1 would obviously exceed Q 1 . But, the maximum total number of observations Q 1 + Q γ 1 would not exceed Q 1 substantially in the senses that That is, Q γ 1 stays "small" relative to C when 0 < γ < 1. Recall Q 1 defined by (1.6) and Q i , i = 2, ..., k, defined by (2.5). Let u denote the largest integer < u. Now, we formally define the truncated stopping rules: 20) with i = 3, ..., k as needed. It should be observed from (2.19)-(2.20) that there is no guarantee that the sampling process will reach a predefined k th , k ≥ 2, crossing as it did in (2.5). If sampling is carried out according to the truncated stopping rules with a predetermined k, the only guarantee is that the number of crossings before termination is at least 1 and at most k. This is due to the fact that termination of the stopping rule (2.19)-(2.20) would be triggered by one of the following two events, whichever occurs first: • attaining the pre-defined maximum number of boundary crossings (= k); • or attaining the maximum number of additional observations allowable beyond Q 1 (namely, [Q γ 1 ]) .

Simulations on Truncated Methodology
The results in this section pertain to simulations on the stopping rule (2.19)-(2.20) when p = 2 and α = 0.05. We again fixed k = 5 since the open-ended multiple crossing methodology (2.5) comfortably achieved the near target coverage 0.95 when p = 2. The basic steps in these simulations (with 10000 replications) closely follow the steps laid out in Section 2.3 with appropriate and obvious modifications by using a superscript T throughout in order to remind ourselves that the truncation is now in place with fixed γ = 0.5, 0.8 and n 0 = 10. For brevity, we omit similar analysis when p = 6. We recall that q T (1) and q T (5) will estimate respectively with estimated standard errors s T q (1) , s T q (5) . Analogously, the coverage probability P β,σ β ∈ R T Q,5 will be estimated by p T 5 with estimated standard error s T p 5 . As before, L(or U ) is the lower(or upper) approximate 95% confidence limits for the target coverage probability. Table 5 illustrates performances of the truncated stopping rule (2.19)-(2.20) when p = 2, α = 0.05 with k = 5 based on the model (2.16). It can be seen that when γ = 0.8 our approximate 95% confidence intervals for the coverage probability tend to cover the target 0.95. Hence, the main objective of the multiple crossing methodology is preserved despite truncation. However, the prime motivation for truncation was to curb right-hand tail of Q i (i = 2, ..., k) distribution. The empirical frequency distribution of Q T (5) (= 5 i=2 Q T i ), the total number of additional observations beyond 1 st crossing but up to the termination, for either choice of γ are shown in Table 6. We note from Table 6 that the total number of additional observations beyond 1 st crossing rarely exceeded 20. Hence, this modification provides some practical assurance against excessive oversampling beyond 1 st crossing and yet nearly achieving consistency property.

Fine-Tuned Multiple Crossing Methodology
In this section, we pursue a modification to the multiple crossing stopping rule (2.5) which will further enhance its practical appeal by reducing the number of crossings required until termination. Our objective is to terminate the sampling process by curtailing additional crossings beyond the 1 st one.

First Crossing
The motivation and the methodology for fine-tuning a purely sequential stopping rule were first introduced by Mukhopadhyay and Datta (1995). Instead of (1.6), they proposed the following fine-tuned version: where the fine-tuning factor ξ(∈ R) was explicitly provided with the associated fixed-size ellipsoidal confidence region, R Q 1 . This Q 1 corresponds to 1 st cropssing. Mukhopadhyay and Datta (1995) showed:

Beyond First Crossing
Now, we incorporate the fine-tuning factor ξ from (3.2) within the multiple crossing methodology. The fine-tuned version of (1.6) is given by (3.1). After 1 st crossing, the implementation of the multiple crossing IASSL ISSN -1391-4987 methodology described in Section 2 remains unchanged. However, for subsequent crossing after k − 1 crossings, with Q i,ξ = q i , i = 1, ..., k − 1 already observed, we define the fine-tuned stopping rule as follows: Finally, based on combined set of gathered data Q i,ξ , i = 1, ..., k will provide Q (k),ξ = Σ k i=1 Q i,ξ , X Q ξ ,k and β Q ξ ,k from all k crossings, our proposed fixed-size confidence region for β is constructed as: (3.4) In Section 3.3, we briefly summarize some findings from our simulation exercises carried out to evaluate performances of the fine-tuned methodology (3.3)-(3.4).

Coverage Probabilities and Suggestions for k
We fixed p = 2, 6, n 0 = 10 to run simulations under models (2.16) and (2.18) respectively. The confidence coefficient 1 − α was set at 0.90, 0.95, 0.99. The basic steps in these simulations are exactly the same as those explained in Section 2.3. We let k = 1, ..., 5, that is, we again check performances by including up to 5 th crossing. Table 7 summarizes results when p = 2 and α = 0.05. Equation (3.2) gave ξ = −4.1785. The 95% confidence interval for the coverage probability of the fine-tuned multiple crossing procedure is given in the last two columns. We can safely claim that k = 2 is enough since all such confidence intervals include 0.95 by 2 nd crossing. From Section 2.3, we recall that we had to go up to 5 th crossing without fine-tuning in place to make a similar claim. Again, recall that all tables and figures are laid out before the list of references. Table 8 is similar to Table 7 except that it corresponds to p = 6. We have ξ = −5.4785 when α = 0.05. From Section 2.3, we again recall that under non-fine-tuned multiple crossing stopping rule (2.5), we had to go up to 8 crossings to nearly claim the required coverage probability.
From Table 8, we see that under the fine-tuned modification, the required coverage probability can be nearly achieved with 3 rd crossing. Hence, the fine-tuning is highly recommended. Finally, it should be noted that foregoing observations that we laid out in the context of finetuned multiple crossing methodologies remain valid for a range of values of α including 0.01, 0.05 and 0.10. We have summarized comparable sentiments in Figures 3 and 4.
Remark 3.1. We had carried out simulations across a much wider range of values of C, α, n 0 and p than those that are shown in Tables 1-8. We have kept much of that larger body of data analyses out for brevity. One may refer to Muthu Poruthotage (2013) for details.

Concluding Remarks
Multi-stage and sequential sampling methodologies are commonly used to construct confidence regions of fixed size. The purely sequential methodology of Mukhopadhyay and Abid (1986) is not consistent even though it is asymptotically efficient and asymptotically consistent. The lack of consistency was clearly demonstrated in the simulations where the achieved coverage probability frequently fell below the target, especially when C was small. The proposed methodology nearly achieves the target coverage probability with minimal oversampling when k, the number of crossings, is appropriately chosen. Our proposed truncation eliminates the possibility of prolonged sampling beyond 1 st crossing as explained in Section 2. The fine-tuned adjustment implemented in Section 3 makes an invaluable impact by reducing the number of crossings significantly. For example, it reduces k from 5 to 2 when p = 2 and reduces k from 8 to 3 when p = 6. Finally, it should be noted that the multiple crossing methodology is a general sampling strategy which can be utilized in other types of statistical problems including some arising from multiple comparisons and selection and ranking. We are in the process of exploring such avenues.  (2.16) with k = 1 and multiple crossing methodology (2.5) with k = 2, 3, 4, 5: n 0 = 10, α = 0.05, and p = 2   (2.18) with k = 1 and multiple crossing methodology (2.5) with k = 2, 3, 4, 5, 6, 7, 8: n 0 = 10, α = 0.05, and p = 6