Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

A common feature of high-dimensional data is that the data dimension is high, however, the sample size is relatively low. We call such a data HDLSS data. In this paper, we study HDLSS asymptotics for Gaussian-type HDLSS data. We find a surprising geometric representation of the HDLSS data in a dual space. We give an estimator of the eigenvalue by using the noise-reduction (NR) methodology. We show that the estimator enjoys consistency properties under mild conditions when the dimension is high. We provide an asymptotic distribution for the largest eigenvalue when the dimension is high while the sample size is fixed. We show that the estimator given by the NR methodology holds the asymptotic distribution under a condition milder than that for the conventional estimator.


Introduction
A common feature of high-dimensional data is that the data dimension is high, however, the sample size is relatively low.This is the so-called "HDLSS" or "large p, small n" data situation where p/n → ∞; here p is the data dimension and n is the sample size.The asymptotic studies of this type of data are becoming increasingly relevant.In recent years, substantial work had been done on the asymptotic behavior of eigenvalues of the sample covariance matrix in the limit as p → ∞, see Johnstone (2001) and Paul (2007) for Gaussian data and Baik and Silverstein (2006) for non-Gaussian, i.i.d.data.Those literatures handled the cases when p and n increase at the same rate, i.e. p/n → c > 0. The asymptotic behaviors of high-dimensional, low-sample-size (HDLSS) data were studied by Hall et al. (2005), Ahn et al. (2007), and Yata and Aoshima (2012) when p → ∞ while n is fixed.They explored conditions to give a geometric representation of HDLSS data.The HDLSS asymptotic study usually assumes either the normality as the population distribution or a ρ-mixing condition as the dependency of random variables in a sphered data matrix.For instance, see Jung and Marron (2009).Yata andAoshima (2009, 2013b) succeeded in investigating the consistency properties of both eigenvalues and eigenvectors in a more general framework.Yata and Aoshima (2012) gave consistent estimators of both the eigenvalues and eigenvectors together with the principal component (PC) scores by a method called the noise-reduction (NR) methodology.Yata and Aoshima (2010, 2013a) created the cross-data-matrix (CDM) methodology that provides a nonparametric method for non-Gaussian HDLSS data.On the other hand, Aoshima and Yata (2011a,b, 2013a) developed a variety of inference for high-dimensional data such as given-bandwidth confidence region, two-sample test, test of equality of two covariance matrices, classification, variable selection, regression, pathway analysis and so on along with sample size determination to ensure prespecified accuracy for each inference.See Aoshima and Yata (2013b,c) for a review covering this field of research.
In this paper, suppose we have a p × n data matrix, X (p) = [x 1(p) , ..., x n(p) ], where x j(p) = (x 1j(p) , ..., x pj(p) ) T , j = 1, ..., n, are independent and identically distributed (i.i.d.) as a p-dimensional distribution with mean vector µ p and covariance matrix Σ p (≥ O).We assume n ≥ 3. The eigen-decomposition of Σ p is given by . Then, Z (p) is a p × n sphered data matrix from a distribution with the zero mean vector and the identity covariance matrix.Here, we write Z (p) = [z 1(p) , ..., z p(p) ] T and z j(p) = (z j1(p) , ..., z jn(p) ) T , j = 1, ..., p.Note that E(z ji(p) z j i(p) ) = 0 (j = j ) and Var(z j(p) ) = I n , where I n is the ndimensional identity matrix.Hereafter, the subscript p will be omitted for the sake of simplicity when it does not cause any confusion.We assume that the fourth moments of each variable in Z are uniformly bounded.Note that if X is Gaussian, z ij s are i.i.d. as N (0, 1), where N (0, 1) denotes the standard normal distribution.
In this paper, we study HDLSS asymptotics for Gaussian-type HDLSS data when p → ∞ while n is fixed.In Section 2, we find a surprising geometric representation of the HDLSS data in a dual space.In Section 3, we give an estimator of the eigenvalue by using the NR methodology.We show that the estimator enjoys consistency properties under mild conditions when the dimension is high.We provide an asymptotic distribution for the largest eigenvalue when the dimension is high while the sample size is fixed.We show that the estimator given by the NR methodology holds the asymptotic distribution under a condition milder than that for the conventional estimator.In Section 4, we summarize simulation studies of the findings.

Geometric Representations in a Dual Space 2.1 When µ is Known
We assume µ = 0 without loss of generality.Let us write the sample covariance matrix as S o = n −1 XX T .Then, we define the n × n dual sample covariance matrix by S oD = n −1 X T X.Let λo1 ≥ • • • ≥ λon ≥ 0 be the eigenvalues of S oD .Then, we define the eigen-decomposition of S oD by S oD = n j=1 λoj ûoj ûT oj .Note that S o and S oD share non-zero eigenvalues.We consider the following condition.
in probability as p → ∞.On the other hand, when X is non-Gaussian and Z is non-ρ-mixing, Yata and Aoshima (2012) showed another geometric representation as follows: where D n is a diagonal matrix whose diagonal elements are of O P (1).Yata and Aoshima (2012) considered a boundary condition between (2.1) and (2.3) as follows: Then, they gave the following result.
Corollary 2.1.Let w j = {(n−1)/tr(Σ)} λj ûj .Assume (A-i) and (A-ii).Then, we have as p → ∞ that From Corollary 2.1 the eigenspace spanned by ûj , j = 1, ..., n − 1, is close to the orthogonal complement of 1 n in R n as p → ∞ and the direction of the eigenvectors is not uniquely determined.On the other hand, the eigenvalues become deterministic but there becomes no difference among them.For these reasons, it is difficult to estimate the eigenvalues and the eigenvectors by using S D (or S) in conventional PCA.
Let us observe a geometric representation given by Corollary 2.1.Now, we consider an easy example such as λ 1 = • • • = λ p = 1 and n = 3.In Fig.

Largest Eigenvalue and Its Asymptotic Distribution
In this section, we consider eigenvalue estimation and give an asymptotic distribution for the largest eigenvalue.Yata and Aoshima (2012) proposed a method for eigenvalue estimation called the noise-reduction (NR) methodology that was brought by the geometric representation in (2.2).When we apply the NR methodology to the case when µ is unknown, the NR estimator of λ i is given by Note that λi ≥ 0 for i = 1, ..., n − 2. Yata and Aoshima (2012Aoshima ( , 2013b) ) showed that λj has several consistency properties when p → ∞ and n → ∞.In this paper, we focus on the largest eigenvalue, λ1 , that has the most important information in data analysis and study its asymptotic properties when p → ∞ while n is fixed.We assume the following conditions for the population largest eigenvalue: Note that (A-iv) is naturally satisfied for the case when X is Gaussian and (Aiii) is met.Let z oj = z j −(z j , ..., zj ) T , j = 1, ..., p, where zj = n −1 n k=1 z jk .We write that Then, from Corollary 2.1, under (A-iii) and (A-iv), we have p → ∞ that Therefore, we have that λ1 Note that z T o1 1 n = 0.If P (lim p→∞ ||z o1 || = 0) = 1, we have as p → ∞ that λ1 under (A-iii) and (A-iv).
For the second term in (3.1) with i = 1, we have the following result.
Next, we consider asymptotic properties of the conventional estimator, λ1 , for the sake of comparison when p → ∞ while n is fixed.We assume the following condition for the population largest eigenvalue: Hence, (A-v) is stronger than the conditions (A-iii) and (A-iv).From (3.2), for the conventional estimator λ1 , we have the following result.Corollary 3.2.Assume P (lim p→∞ ||z o1 || = 0) = 1.Under (A-v), it holds as p → ∞ that λ1 In addition, if z 1j , j = 1, ..., n, are i.i.d. as N (0, 1), it holds that By comparing Theorem 3.1 and Corollary 3.1 with Corollary 3.2, we can conclude that λ1 has the asymptotic properties under milder conditions than λ1 when p → ∞ while n is fixed.In fact, (A-v) is a too strict condition in real high-dimensional data analyses.It should be noted that (A-v) is equivalent to the condition that λ 1 /tr(Σ) → 1, p → ∞, that is (A-v) means that the contribution ratio of the first principal component is asymptotically 1 as p → ∞.
Ahn et al. (2007) andJung and Marron (2009) showed a geometric representation as follows:n tr(Σ) S oD P − → I n , p → ∞. (2.1)Let w oj = {n/tr(Σ)} λoj ûoj and R on = {e n ∈ R n | ||e n || = 1}.Yata and Aoshima (2012) showed that the eigenvalues of S D .Let us write the eigen-decomposition of S D as S D = n−1 j=1 λj ûj ûT j .Note that S and S D share non-zero eigenvalues.Then, we have the following geometric representation for S D .

2 . 1 ,
we displayed scatter plots of 20 independent pairs of ±w j (j = 1, 2) that were generated from N p (µ, I p ) for (a) p = 4, (b) p = 40, (c) p = 400 and (d) p = 4000.We denoted w 1 by and w 2 by .We also denoted 1 n = (1, 1, 1) T by the dotted line.We observed that all the plots of w 1 and w 2 gather on the surface of the orthogonal complement of 1 n = (1, 1, 1) T in R 3 when p is large.Moreover, they appeared around the unit circle on the orthogonal complement of 1 n = (1, 1, 1) T in R 3 as expected by Corollary 2.1 (a) p = 4 (b) p = 40 (c) p = 400 (d) p = 4000