Statistical Inference for Weibull Distribution Based on a Modiﬁed Progressive Type-II Censoring Scheme

In this paper, a modiﬁed progressive Type-II censoring scheme is introduced. Relationships between the modiﬁed progressive Type-II censoring scheme and a randomized Type-II censoring scheme are discussed. The maximum likelihood estimators (MLEs) of the parameters of Weibull distribution are derived and EM-algorithm is used to obtain the estimates as well as the asymptotic variance-covariance matrix. Monte Carlo simulation is used to evaluate the performance of the MLEs in terms of biases, mean square errors and some commonly used optimal criteria in experimental design. Finally, a numerical example is provided to illustrate the methodology presented here.


Introduction
Because of the increase in global competition and the customer expectations for reliable products, manufacturers need to improve the product quality and reliability and to provide a longer warranty period in order to stay competitive in the market. In reliability engineering, knowledge about failure time distribution of the products are valuable in determine optimal warranty policy (Wu and Huang, 2012) and product design improvement. To obtain knowledge about product lifetime distribution, life-testing experiments are run at different product development and testing stages before the product can be put on the market.
When the product are extremely reliable or the cost of a failing a product is high, a censoring scheme is always imposed in the life-testing experiment in order to save time and cost of the experiment. One of the commonly used censoring schemes is the Type-II right censoring scheme whereas the life-testing experiment is terminated as soon as the m-th (where m is pre-fixed) failure is observed. Then, only the first m failures out of n units under test will be observed. The data obtained from such a restrained life-test will be referred to as a Type-II censored sample. A generalization of the Type-II censoring scheme, called progressive Type-II censoring scheme, has been proposed in the literature. Under the progressive censoring scheme, n independent units are placed on a life test and m failures are going to be observed. Immediately following the first failure time (say, X R 1:m:n ), R 1 of the surviving units are randomly selected and removed from the experiment. Then, immediately following the second failure time (say, X R 2:m:n ), R 2 of the surviving units are removed, and so on. This experiment terminates at the time when the m-th failure is observed and the remaining n − m − R 1 − · · · − R m−1 surviving units are censored. Here, X R 1:m:n < X R 2:m:n < · · · < X R m:m:n describe the progressively censored failure times where R = (R 1 , . . . , R m ) is the progressive censoring scheme. Two books dedicated to progressive censoring were written by Balakrishnan and Aggarwala (2000) and Balakrishnan and Cramer (2014) and a comprehensive review of the subject was provided in a discussion paper by Balakrishnan (2007). Inference and generalizations based on progressive censoring scheme were studied by Cohen (1963), Mann (1971) In this paper, a simple modification of the progressive censoring, which can be viewed as a randomized Type-II censoring scheme, is developed. The main objective of this paper is to compare the proposed modification with the progressive Type-II censoring scheme. For this purpose, the performance of the estimators for the Weibull parameters based on data obtained from the modified censoring scheme as well as the progressive censoring scheme are examined. The rest of this paper is organized as follows. In Section 2, we describe the formulation of the modified censoring scheme (MCS) and the relationship between the MCS and a randomized Type-II censoring scheme. In Section 3, we consider Weibull lifetimes and obtain the maximum likelihood estimates (MLEs) of the parameters based on log-lifetimes under the MCS by EM algorithm. Fisher information matrix of the MLEs is also obtained by using missing information principle. Section 4 presents the results of a Monte Carlo simulation study for assessing the performance of the MLEs under MCS. A numerical example is presented in Section 5 to illustrate the methodology presented in this paper. In Section 6, some concluding remarks are provided.

Modified Censoring Scheme
In a progressively Type-II censored experiment, n independent units are placed on a life test simultaneously. Suppose that the experimenter decided to observe m(≤ n) failures and the progressive censoring scheme R = (R 1 , R 2 , . . . , R m ) are prefixed, then R i units are randomly removed from the test as soon as the i-th failure is observed (i = 1, 2, . . . , m). Denote the i-th observed failure time from this scheme as X R i:m:n , the exact failure times of the R i censored units cannot be observed beyond X R i:m:n , i = 1, 2, . . . , m. In the modified censoring scheme proposed here, we allow the R i (i = 1, 2, . . . , m − 1) units to be monitored until to the end of the progressively censored experiment, i.e., until X R m:m:n is observed. Hence, the exact failure times of some of the m−1 i=1 R i censored units in the progressively censored experiment can be recorded in addition to the progressively censored order statistics. We can see that the number of total observed failures before X R m:m:n , say m + K, is a random variable, where K is a discrete random variable with support {0, 1, 2, . . . , n − m}.
This modified censoring scheme (MCS) can be viewed as a randomized Type-II censoring scheme in which the first m + K failures are observed among the n units with K being a discrete random variable with support {0, 1, 2, . . . , n−m} and probability mass function (p.m.f.) where 0 ≤ p k ≤ 1 and n−m k=0 p k = 1. Therefore, the observed ordered failures from the MCS can be denoted as X 1:n < X 2:n < · · · < X m+K:n , where X i:n is the i-th ordinary order statistics from a sample of size n. Note that X m+K:n d = X R m:m:n . Here, the p.m.f. of K is determined by the progressive censoring scheme R.
This result can be seen by observing that the event {K = 1} is equivalent to that the R m−1 = 1 item being removed at the the time of the (m − 1)-th failure is the item with the shortest lifetime among the (n − m + 1) surviving items. In general, the p.m.f. of K can be expressed as (Ng, et al., 2014) Pr(K = k) = Pr(X R m:m:n = X m+k:n ) for K = 0, 1, . . . , n − m, where a 1 = 1, a m = m + k, b i = min a i−1 + i j=1 R j + 1, n . Since the number of observed failures under the MCS must be greater than or equal to m, one should expect to obtain the same or more information about the lifetime distribution compare to a progressive Type-II censoring scheme. The MCS guarantees at least m observed failures are available for statistical inference and the total time required for the experiment is determined by a chance process.
Suppose that the lifetimes of the units are independent and identically distributed (i.i.d.) with probability density function (p.d.f.) f (x; θ) and cumulative distribution function (c.d.f.) F (x; θ), where θ is the parameter vector. Then, given K = k, the conditional likelihood function based on a sample obtained from MCS can be written as where x i:n is the observed value of X i:n and Θ is the parameter space.

Statistical Estimation with Weibull Lifetimes
Suppose n independent units are placed on a life-test with the corresponding lifetimes X 1 , X 2 , . . ., X n being identically distributed. We assume that X i , i = 1, 2, . . . , n are i.i.d. Weibull distributed with p.d.f. and c.d.f.
respectively, where λ > 0 is the scale parameter and β > 0 is the shape parameter. For 0 < β < 1, the Weibull distribution has a decreasing hazard function. For β > 1, the Weibull distribution has an increasing hazard function and for β = 1, the Weibull distribution reduces to an exponential distribution which has a constant hazard rate. Here, we can consider the log-transformed lifetime Y = ln X which has an extreme-value distribution with location parameter µ = ln λ and scale parameter σ = β −1 . Specifically, the p.d.f. and for y > 0, µ ∈ R, σ > 0, respectively, where θ = (µ, σ) . Since the extremevalue distribution is a member of the location-scale family of distributions and it is easier to work with, therefore, the log-lifetimes are used in the statistical analysis (see, for example, Ng et al., 2002). Under the progressive Type-II censoring scheme, the MLEs of the Weibull parameters λ and β (or equivalently µ and σ) and Fisher information matrix were obtained by Ng et al. (2002) using EM-algorithm and missing information principle, respectively. Since the MCS is a randomized Type-II censoring scheme, given the value of K = k, the conditional statistical inference can be done by using the results from conventional Type-II censoring, which has been welldeveloped in the literature. Therefore, we only briefly present the computational formulae for the MLEs and the Fisher information matrix as well as the steps required for EM-algorithm if it is chosen to be used to obtain the MLEs.

Maximum Likelihood Estimates
Given K = k, based on the ordered log-lifetimes Y 1:n < Y 2:n < · · · < Y m+k:n , the MLE of parameter σ can be obtained by solving the following equation with respect to σ: Numerical methods, such as the Newton-Raphson method, can be used here to solve the above non-linear equation. After obtaining the MLE of σ, sayσ, the MLE of µ can be obtained asμ = g(σ).
Besides using direct optimization numerical methods, Expectation-Maximization (EM)-algorithm can be used as an alternative tool here to obtain the MLEs of µ and σ. One of the advantage of EM-algorithm is that the second derivatives of the log-likelihood is not required to implement the EM-iteration. It is especially useful if the complete data set is easy to analyze (see, for example, Adamidis  ). The analysis with the data from MCS can be viewed as an incomplete data problem. We first denote the observed and censored (missing) data as Y = (Y 1:n , Y 2:n , . . . , Y m+k:n ) and Z = (Z 1 , Z 2 , . . . , Z n−m−k ), respectively. Combine Y and Z to form the complete data W . The E-step of the (h + 1)-th EM-iteration requires the conditional expectations and The M-step in a EM-iteration is maximizing the likelihood based on complete sample over θ, with the missing values replaced by their conditional expectations. Thus, in the (h + 1)-th EM-iteration, θ (h+1) , is given by

Asymptotic Variance-Covariance Matrix
Louis (1982) developed a procedure for extracting the observed information matrix when the EM-algorithm is used to obtain the MLEs based on incomplete data. The idea of the missing information principle can be expressed as (Louis, 1982 andTanner, 1993) Observed information = Complete information − Missing information.
In this subsection, we describe the use of the missing information for computing the conditional variance-covariance matrix of the MLEs under the MCS, given K = k. The observed information, complete information and missing information are denoted by I Y (θ), I W (θ) and I Z|Y (θ), respectively.
The complete information I W (θ) is given by where γ = 0.577215665 . . . is the Euler's constant and c 2 is π 2 /6 + (1 − γ) 2 . The Fisher information matrix in one observation which is censored at the time of the m-th failure can be computed as Then, the missing information matrix is Therefore, the observed information matrix is presented by The Fisher information matrix I Y (θ) can be obtained by the missing information principle. Hence, the asymptotic variance-covariance matrix of the MLEs of µ and σ can be obtained by inverting the observed information matrix Under some mild regularity conditions, the MLEsθ = (μ,σ) is approximately bivariate normal with mean θ = (µ, σ) and variance-covariance matrix

Monte Carlo Simulation Study
To evaluate the performance of the MLEs and the effect of the choice of progressive censoring scheme, a Monte Carlo simulation study is conducted. Without loss of generality, we consider the parameter setting µ = 0 and σ = 1 with different values of n, m and different censoring schemes. Based on 1,000 simulations, the estimated biases, mean square errors (MSEs) and the average number of iterations of the EM-algorithm required to obtain the MLEs under MCS and the conventional progressive censoring scheme are presented in Table 1. Here, convergence is assumed when the absolute differences between the successive estimates are less than 10 −5 . We also present the expected number of observed failures for the MCS (i.e., E(K) + m) in Table 1. As we expected, the biases and MSEs of the MLEs based on the MCS are smaller than or equal to those based on progressive censoring scheme with the same values of n and m. It is also observed that the average number of EM-iterations required to obtain MLEs based on the MCS are less than or equal to those based on progressive censoring scheme in all cases considered here. This is due to the fact that the number of observed failures is always m for the progressive censoring scheme while the number of observed failures in the MCS is a random variable with support greater than or equal to m. The MCS and progressive censoring scheme are equivalent when R = (0, . . . , 0, n − m), which is also the Type-II censoring scheme.
In order to assess the accuracy of the approximation of the variances and covariance of the MLEs determined from the information matrix, the aforementioned simulation study is used. For the MCS, the simulated values of V ar( µ), V ar( σ) and Cov( µ, σ) as well as the estimated values by averaging the corresponding values obtained from the I −1 Y are presented in Table 2. For the conventional progressive censoring schemes, the results are presented in Table  3  Besides the biases and MSEs of the MLEs, we also compare the MCS and the progressive censoring scheme based on some commonly used optimal criteria in design of experiment. Since the parameter vector θ = (µ, σ) is twodimensional, optimality can be defined in terms of the following criteria (see, for example, Wu et al., 2008): (1) D-optimality: Maximizing the determinant of the Fisher's information matrix. It is known that the determinant |I (θ)| is proportional to the reciprocal of the volume of the asymptotic joint confidence region for θ so that maximizing this determinant is equivalent to minimizing the volume of confidence region. Consequently, a larger value of the determinant of the Fisher's information matrix would correspond to higher joint precision of the estimators of θ.
(2) A-optimality: Maximizing the trace of the Fisher's information matrix. This optimal criterion is also known as trace criterion. It maximizes the sum of the diagonal entries of Fisher's information matrix. This means that the A-optimality criterion does not implement all available information the parameters.
(3) E-optimality: Maximizing the largest eigenvalue the Fisher's information matrix. This criterion maximizes the smallest non-zero eigenvalue of Fisher's information matrix. This also means that not all available information are used.
The average values of the objective functions for D-, Aand E-optimality criteria based on different MCSs and progressive censoring schemes are presented in Table 4. Since the expected number of observed failures for MCS is always ≥ m, these values of the objective functions based on MCS are always better than those based on progressive censoring scheme with the same R.

Illustrative Example
A real dataset from Lawless (1982) is used to illustrate the methodology developed in this paper. The dataset contains the failure times of n = 15 electrical insulating fluids that were subjected to a 32kV voltage stress. The dataset is presented in Table 5. If the lifetimes are assumed to be Weibull distributed, the MLEs of the Weibull parameters based on the complete dataset are λ = 25.936 and β = 0.561. Kolmogorov-Smirnov goodness-of-fit test statistic based on the complete dataset is 0.135 with p-value = 0.913, which indicates that Weibull distribution is a reasonable model for this dataset.
Suppose a progressive censoring plan with m = 5 and censoring scheme R = (10, 0, 0, 0, 0) are imposed in this life-testing experiment. Using this scheme, we generated the progressively censored sample is obtained by removing randomly selected 10 units from the test when the first failure is occurred at x R 1:5:15 = 0.27 (i.e., at log-lifetime y R 1:5:15 = −1.3093). The corresponding log-lifetimes and the 10 randomly selected units at the first observed failure (denoted by asterisk) are presented in Table 6. Hence, the progressively censored ordered log-lifetimes based on the progressive censoring scheme is (-1.3093, -0. Based on the random generation described above, we obtained k = 8 and the first 13 failures are observed. The MLEs, the observed Fisher information matrices, the variance-covariance matrices and the values of the objective functions for D-, Aand E-optimality criteria based on complete sample (n = 15), progressively censored sample (m = 5) and the sample obtained from MCS (m + k = 13) are presented in Table 8

Conclusions
The progressive censoring scheme and the modified censoring scheme with the same censoring scheme R are terminated at the same time but the MCS collects more data on average. The relationship between the MCS and a randomized Type-II censoring scheme is discussed. Since more failures are observed based on the MCS, from the simulation results, it can be seen that the estimation of parameters based on MCS is more efficient than the progressive censoring scheme. Briefly, if the experimenter wants to estimate the distribution parameters efficiently and randomize the number of observed failures to be greater than m, then MCS scheme will be a suitable censoring scheme for the life test.  Table 2.  Table 3.    Table 7. Probability mass function of K in the illustrative example (the number of observed failure is m + K) based on the MCS