Improvement by Multi-Stage Selection in Non-Normal Large Populations

Multistage selection based on the covariates of the breeding value is studied in non-normal large populations. Expressions for expectation and variance of the target variable are given for retained sub population in the case of lognormal, skew normal and pareto distributions. Numerical illustration for optimum level of cullings for fixed overall intensity of selection is discussed.


Introduction
In plant and animal breeding breeder saves for reproduction a fraction of the population through truncation selection. The aim of selection is to improve the breeding value (Falconer, 1989). Since breeding value is not observable traits correlated with the target variable. Further selection is practiced in two or more stages for the covariates of the target variable may not be available simultaneously (Jain andAmble, 1962, Norrell, Arnason andHugason, 1991). The distribution of the target variable and the other traits are assumed to be multivariate normal (Bhat, 1990;Malhotra, 1973;and Young, 1974). In such cases the expression for genetic gain on the assumption of normality is not strictly applicable. Hence in this paper expressions for response to selection and variance of the target variable after selection when the traits follow non-normal distribution are given for large populations. Quantitative traits like foreheads of crabs (Kapteyn, 1903), aflatoxin in peanut (Quesenberry, Whitaker and Dickens, 1976) wool yield have been represented by lognormal distribution. Moreover lognormal distribution is more realistic 150 ISSN 2424-6271 IASSL representations of characters like weight, height and density than is normal distribution since these traits take only non-negative values (Johnson, Kotz and Balakrishnan, 1994). Another distribution to deal with non-normal data with the problem of moderate skewness is skew-normal distribution (Azzalinni and Dalla Valle, 1996; Arnold and Beaver, 2000). Pareto distribution (Mardia, Kent and Bibby, 1979) is a distribution with positive skewness and mathematical tractability like normal. In all the three distributions correlation between variates has reasonably wide range unlike many other multivariate distributions. Hence these three distributions have been considered.

General theory
Let X 1 and X 2 be two observable variables which are available for selection to improve unobservable variable Y (also denoted by X 3 ). Assume X 1, X 2 and Y follow trivariate distribution. Let f(x 1, x 2, y) be the joint density function of X 1, X 2 and Y. µ and denote the mean and variance of Y in the unselected population. First selection is made on the basis of X 1 while the second selection is made on X 2. Consider a two stage selection programme in a population in which variables X 1 and X 2 are observed. The truncation points c i (i=1,2) depend on the proportions p i ( i=1,2) retained in the successive stages and satisfy 1 2 1 where f 1 (x 1 ) is the marginal density of X 1 and f 2 (x 1, x 2 ) is the joint density of the variables X 1 and X 2 and p is the proportion for the whole selection. The selection is aimed at improving the genetic variable Y which is not directly measurable (Norrell, Arnason, Hugason, 1991;Smith and Quaas, 1982). The improvement is measured through genetic gain which is the difference between the means of breeding values in the selection group and the population as a whole. The mean of Y after two stage selection is The genetic gain after two stage selection programme is The variance of Y after selection is This theory on genetic gain follows from Cochran's (1951) results which showed that for multistage selection in an infinite population that genetic gain is optimized if the successive conditional means of the target variable with respect to available measurements on X 1 and X 2 are used in selection.

Results in Non-normal populations
The non-normal distributions considered are lognormal, skew-normal and Pareto.

Lognormal distribution
The density of trivariate lognormal distribution is The truncation points c 1 and c 2 are determined for given values of p 1 and p 2, the proportions retained in the two stages of selection. It can be shown that p 1 = Ψ (ln c 1 ) p = Ψ (ln c 1, ln c 2 ; r 12 ) (7) where Ψ (ln c 1 ) and Ψ (ln c 1, ln c 2 ; r 12 ) are the upper tail probabilities of univariate normal N(0,1) and bivariate normal N(0, 0, 1, 1, r 12 ) respectively. It can further be shown that the genetic gain and variation if genotypic variable after two-stage selection can be worked out using (3) and (4).

Skew-normal distribution
The expression for density of skew-normal random vector X (Azzalini and Dalla Valle, 1996) is ; Ω) is the three dimensional normal density with zero mean vector and correlation matrix Ω. Ф is the standard normal N(0,1) distribution function and α is the vector of shape parameter. The mean and variance of X 3 (Azzalini and Dalla Valle, 1996) are where ( ) ( ) The cut-off points c 1 and c 2 are found for given values of p 1, p the proportions retained in the two-stage selection programme as The method of m.g.f (Tallis, 1961; Gopinath Rao, Singh and Nagamani, 2014) is used to find the mean and second raw moment of genotypic variable Y and the m.g.f of trivariate skew normal density is where t is the column vector of t s ( s = 1, 2, 3) and taking X 3 as Y and it can be shown that The single and double integrals in the expression can be numerically evaluated (Yakowitz and Szidarovszky, 1990) for known α i 's and ω ij 's. (12) and (13) can be used to find the genetic gain and variance of Y for skew-normal distribution using (3) and (4).

Pareto distribution
The density function of trivariate Pareto distribution is The mean and variance of X 3 are The truncation points c 1 and c 2 are related to proportions retained p 1 and p as The mean and second raw moment after selection can be shown to be  (18) The response to selection and variance of the genotypic variable Y can be found from the above using (3) and (4).

Numerical illustration
The most important question in the context of sequential selection is the optimum levels of cullings in each of the two stages. One of the strategies is to achieve maximum genetic gain for a fixed overall proportion selected. This strategy is relevant in plant breeding work (Finney, 1984) one method of arriving at an approximate solution is an empirical approach of considering a range of sets of values p 1 and p 2 and choosing the set which maximises the genetic gain. Let r 12 , r 13 and r 23 be the correlation coefficients between X 1 and X 2, X 1 and X 3 and X 2 and X 3 respectively, with r 12 = 0.5, r 13 = 0.4 and r 23 = 0.6. Suppose an overall proportion of 60% is be retained in the first and second stage respectively have been considered for sets of values covering the entire range. Genetic gain for lognormal distribution is presented in Table 1. In the case of skew-normal distribution the values of the parameter considered are as follows: δ i = 0.5 (i=1, 2, 3), the correlation matrix of Z (see Azzalini and Dalle Valle (1996) is The conditional expectation of the genetic variable Y is found from (12). The expression has three terms which involve incomplete intergrals. The evaluation of these integrals is done by numerical methods (Yakowitz and Szidarovszky, 1990). The double integral of the first term is evaluated numerically by Simpson's 1/3 rule. Genetic gains have been determined for the same set of proportions which were considered in the case of lognormal distribution and are presented in table 2. The combination of proportion p 1 = 0.75 and p 2 = 0.80 is optimum for skewnormal distribution.
In the case of Pareto distribution the values of the parameter a 1 = a 2 = a 3 = 1 and θ = 3 are considered. It is found that the genetic gain for this strategy is the same for different sets of proportions considered and hence there is no particular combination for which gain is optimum.