Some Estimation Procedures in Presence of Non-Response in Two-Phase Sampling

The present investigation deals with the problem of estimation of population mean in presence of non-response under two-phase (double) sampling. Following the technique of sub-sampling of non-responding group adopted by Hansen and Hurwitz (1946) and using information on two auxiliary variables, four general classes of estimators have been suggested for four different situations of nonresponse and their properties are studied. It is shown that several estimators can be generated from our proposed classes of estimators. Comparisons of the proposed strategies with some contemporary estimators of population mean in presence of non-response are carried out. The results obtained are illustrated numerically through empirical studies which present the effectiveness of the suggested classes of estimators.


Introduction
In surveys covering human populations in most cases, information is not obtained from all the units in the survey at the first attempt even after some call-backs. An estimate obtained from such incomplete data may be misleading especially when the respondents differ from the non-respondents. In order to reduce the effect of non-response in such situations, (Hansen and Hurwitz, 1946) gave a technique of sub-sampling of the non-responding group. It is well known fact that in sample surveys precision in estimating the population mean may be increased by using information on single or multiple auxiliary variables. Following (Hansen and Hurwitz, 1946) technique, several authors including (Cochran, 1977;Rao, 1986;42 ISSN 2424-6271 IASSL Khare and Srivastava, 1993, 1995; Okafor and Lee, 2000; Khan, 2004, 2006; Singh and Kumar, 2010 a) have contributed towards the improvement of the estimation procedure of population mean in presence of nonresponse using information on auxiliary variable. (Olkin, 1958;Mohanty, 1967;Srivastava, 1971; Singh and Kumar, 2010 b;  and others have made the extension of the ratio method of estimation to the case where multiple auxiliary variables are used to increase the precision of estimates. In many situations, information on the auxiliary variable may be readily available on all the units of the population; for example, tonnage (or seat capacity) of each vehicle or ship is known in survey sampling of transportation and number of beds in different hospitals may be known in hospital surveys. When such information is lacking, it is sometimes, relatively cheap to take a large preliminary sample in which auxiliary variable alone is measured. This technique is known as double sampling or two-phase sampling. Two-phase sampling happens to be a powerful and cost effective (economical) technique for obtaining the reliable estimate in first-phase (preliminary) sample for the unknown population parameters of the auxiliary variables. For example, (Tabasum and Khan, 2004) have mentioned that the procedure of double sampling can be applied in a household survey where the household size is used as an auxiliary variable for the estimation of family expenditure. Information can be obtained completely on the family size, while there may be non-response on the household expenditure. Motivated with the above arguments and following the technique of subsampling of the non-responding group, we have proposed four general classes of estimators for four different situations of non-response in two-phase sampling using information on two auxiliary variables and studied their properties. It is shown that several estimators can be generated from our proposed classes of estimators. The superiorities of the proposed classes of estimators over some existing estimators have been established through theoretical and empirical comparisons.

Proposed Classes of Estimators
Consider a finite population 1 2 N U = (U , U , . . ., U ) of N units, y, x and z are the variables under study, first auxiliary variable and second auxiliary variable respectively with population means Y, X and Z. Let k y, k x and k z be the values of y, x and z for the k-th (k = 1, 2, …, N) unit in the population. If the information on an auxiliary variable x whose population mean is known and highly correlated to y is readily available for all the units of the population, it is well known that regression and ratio type estimators of population mean Y could be used for good performance. However, in certain practical situations when IASSL ISSN 2424-6271 43 population mean X is not known, a priori in such case the technique of twophase sampling is useful. Thus, to estimate the population mean X, a first phase sample of size n is drawn from the entire population U by simple random sampling without replacement (SRSWOR) and a second phase sample of size n (i.e., n > n  ) is selected from the first phase by SRSWOR and the variable y under investigation is measured on it. If there is non-response in the second phase sample one may form an estimator by utilizing the information only from the respondents or take a sub-sample of the non-respondents and re-contact them. We assume that at the first phase sample of size n,  all the units supplied information on the auxiliary variables x and z and at the second phase sample of size n, let 1 n units supply information on y and 2 n refuse to respond. Following (Hansen and Hurwitz, 1946) The sub-sample of 2 u will be denoted by 2m u.
If non-response occurs on the study variable y as well as on the auxiliary variable x in the second phase sample, the conventional two-phase ratio, product and regression estimators for population mean Y are considered as But, when non-response situation is observed only on the study variable y, while the complete information on the auxiliary variable x is available in the second phase sample, the conventional double sampling ratio, product and regression estimators are suggested as * 4 y t = x , x  (4) * 5 y t = x, x (5) and    (Tabasum and Khan, 2006). The estimator 6 t were first envisaged by (Khare and Srivastava, 1995) and the estimator 3 t was revisited by (Okafor and Lee, 2000).
Motivated by the above suggestions and following the two-phase sampling structure defined above with the assumption that the population mean of the auxiliary variable x be unknown, we have proposed following four general classes of estimators of population mean Y of the study variable y applicable for four different situations of non-response.

Situation I
In this case, we assume that the non-response conditions occur on the study variable y as well as on the auxiliary variable x in the second phase sample of size n and also the population mean Z of the second auxiliary variable z be known. Accordingly, we have suggested the general class of estimators of population mean Y in two-phase sampling as where z is the sample mean of the variable z based on the first phase sample of size n ,   1 h x , z  be a class of estimators of X using information on x and z  , such that   1 h X, Z = X.
We treat the composite function   x -x x t = y , t = y , t = y + b x -x , t = y exp ; i = 1, 2, . . ., 4  is the estimate of the population (entire population i. e. U) regression coefficient xz β of x on z based on the first phase sample of size n. 

Situation II
In this situation, we assume that the non-response occurs on the study variable y as well as on the auxiliary variables x and z in the second phase sample of size n and the population mean of the auxiliary variable z be unknown. Considering these aspects, we have formed the general class of estimators of Y in two-phase sampling as   * * * 2 T = g y , x , z , x , z  where   * * * g y , x , z , x , z  is a function of * * * y , x , z , x and z  such that where * z is the Hansen-Hurwitz estimator for population mean Z and is defined by * we present below few ratio, product, regression and exponential type estimators as the members of the class of estimators 2 T .
x -x x t = y , t = y , t = y + b x -x , t = y exp ; i =1, 2, . . ., 4 , is the estimate of the population (i. e., U) regression coefficient yz β of y on z,

Situation III
In this case, we assume that the non-response situation occurs only on the study variable y while the complete information on the auxiliary variable x is available in second phase sample of size n and also the population mean Z of the second auxiliary variable z be known. Considering this situation, we have proposed the general class of estimators of population mean Y in two-phase sampling as where   2 h x , z  be a class of estimators of X using information on x and z  , such that with   Y, X, X, Z and   * G y , x, x , z satisfies the similar regularity conditions given for   Y, X, X, Z and   ** F y , x , x , z  in equation (9).

Situation IV
In this case, we assume that in the second phase sample non-response situation is found on the study variable y and the auxiliary variable z with unknown Z while the complete information on the auxiliary variable x is available. Considering this situation, we have suggested the general class of estimators of population mean Y in two-phase sampling as and   Y, X, Z, X, Z and   ** ψ y , x, z , x , z satisfies the similar regularity conditions as presented for the class of estimators 1 T above.
Proceeding as above, it can also be found that the classes of estimators 3 T and 4 T are also very wide and the following estimators can be identified as their member.
Estimators belonging to the class 3 T : 4 T :

Bias and Mean Square Errors of the Proposed Classes of Estimators
where yx yz xz ρ , ρ , ρ : correlation coefficients between the variables shown in suffice based on the whole population (i. e. U), x y z C ,C , C : coefficient of variations of the variables x, y and z respectively based on the whole population, yx(2) yz(2) xz (2) ρ , ρ , ρ : correlation coefficients between the variables shown in suffice in the non-response group of the population (i. e. 2 U ), ISSN 2424-6271 IASSL population means of the variables x, y and z respectively in the non-response group of the population, population mean squares of the variables x, y and z respectively in the nonresponse group of the population and In the light of the conditions mentioned for   ** F y , x , x , z  in equation (9), it is noted that Y, X, Z, X, Z Y, X, Z, X, Z q= ψ y , x, z , x , z ψ y , x, z , x , z as X is unknown xx Taking expectations on both sides of the equations (21)-(24) and using the results from equation (17)     depend on unknown population parameters such as y x z C , C , C , yx xz yz ρ , ρ , ρ , yx(2) xz(2) yz (2) ρ , ρ , ρ ,   x(2) z(2) y2 C , C and C . Thus, to use such estimators one has to use guessed or estimated values of these parameters. Guessed values of these population parameters can be obtained either from past data or experience gathered over time; see (Murthy, 1967;Reddy, 1978;Tracy et al., 1996). If such guessed values are not available then it is advisable to use sample data to estimate these parameters as suggested by (Gupta and Shabbir, 2008). In case, nonresponse situations occur in the sample data, it is advised to utilize the technique of sub-sampling of the non-responding group to estimate these parameters as suggested in this paper. After substitutions of the above population parameters with their respective estimated values, it could be observed that the mean square errors of the resulting estimators are same (up to first order of approximations) to those derived.  T is more precise than 1 t when

Efficiency Comparisons of the Proposed Classes of Estimators
It may be noted from equation (48) that 1 T is always more precise than 2 t when 2 x yx y x C + 2ρ C C 0  and 2 x(2) yx(2) y(2) x (2) C +2ρ C C 0  which is possible provided which is always possible provided  2   2   2 2 2  2  2  2  2  3 x z xz 2 x(2) z(2) xz (2) AB -E k -1 f C C 1 -ρ + W C C 1 -ρ n Similarly, (ii) 2 T is more precise than 1 t when

Efficiency Comparisons of the Estimators 34 T and T
Now, we compare the efficiencies of the classes of estimators 34 T and T under their respective optimality conditions with the estimators * y, 4 5 6 t , t and t when there is non-response only on the study variable y but the complete information on the auxiliary variable x is available from the second phase sample of size n.
It may be noted that class of estimators 3 T is always more efficient than * y.
(ii) From equations (36) and (42), it may be detected that the class of estimators 3 T is more precise than 4 which is possible when   Similarly, from equations (37) and (42) -(44), it may be noted that: (ii) 4 T is more precise than 4 t when It may be noted from equations (60) and (74) that the dominance of the class of estimators 2 T over 3 t and 4 T over 6 t are difficult to establish theoretically.
However, their performances are examined below through empirical studies carried over different population which establish their superiority over the traditional ones.

Numerical Illustration
We have chosen three natural population data sets to illustrate the efficacious performances of our proposed classes of estimators. The source of the populations, the nature of the variables y, x, z and the values of the various parameters are given as follows.

Population I-Source: (Khare and Sinha, 2007)
The present data belongs to the physical growth of upper findings are displayed in Table 2 and Table 3 where we have designated the percent relative efficiency (PRE) of an estimator T with respect to sample mean estimator * y as  Table 3: PREs of the different estimators with respect to * y when the nonresponse situation is observed only on the study variable y in second phase sample while complete response is available on the auxiliary variable x.

Conclusions
The following conclusions can be read-out from the present study.  conditions they are always preferable over the (Hansen and Hurwitz, 1946) sample mean estimator * y . The dominance conditions of the proposed classes of estimators over the existing estimators   i t i = 1, 2, . . ., 6 are also shown in section 5. (b) From Table 2 and Table 3