Estimation of Parameters in a Finite Mixture of Multivariate Gamma Distributions using Gaussian Approximation V

Finite mixture of multivariate gamma distributions is extensively used in the domains of stochastic modelling, reliability, hydrology and life testing. In this paper, we consider a multivariate gamma mixture model (MGMM) with independent marginals. A novel approach is proposed for estimating the parameters of this model. The approach makes use of Wilson-Hilferty approximation, MCLUST algorithm and the principle of maximum likelihood. Numerical illustrations based on simulated as well as real datasets have been implemented to assess the performance of the proposed approach. The results indicate that the proposed methodology provides reliable estimates for the model parameters.


Introduction
Multivariate mixture models have received wide recognition in recent years owing to availability of high-dimensional data.Over the last few decades, several multivariate mixture models have been introducedto analyse multivariate events of interest in various spheres of research.Examples includemultivariate skew-normal mixtures (Lin, 2009), multivariate skewt mixtures (Lin, 2010) andmultivariate Poisson mixtures (Karlis and Meligkotsidou, 2007).In particular, multivariate Gaussian mixtures have been extensively used to model data pertaining to genetics, astronomy and medical imaging.Kowalczyk and Tyrcha (1989) point out that "families of multivariate distributions with positively skewed marginals and positively dependent components are often needed as models for real data."In this context, ISSN-2424-6271 IASSL multivariate gamma distribution deserves considerable attention.Several forms of multivariate gamma distributions with gamma marginals are available in literature.A comprehensive study on various forms of multivariate gamma distributions can be found in Chapter 48 of Kotz et al. (2000).
It is pertinent to note that even though several forms of multivariate gamma distributions exist in literature, studies pertaining to mixture of multivariate gamma distributions have been scarce due to model complexities.Jones et al. (2000) which deals with a bivariate gamma mixture distribution where the scale parameters have a generalised Bernoulli distribution is an exception.This serves as a motivation to introduce a form of MGMM and propose a methodology to estimate its parameters.
We consider a finite mixture of k -variate gamma distributions with independent marginals.
Here, α i and β i represent the shape and scale parameters associated with Z i .The joint distribution of  = (Z 1 , Z 2 , … , Z k )′is a multivariate gamma distribution with probability density function (pdf) given by ) whereα i > 0 and β i > 0. The pdf of the resulting MGMM is defined as where f i (; , )denotes the pdf of i-th component which is a k-variate gamma distributionof the form given in equation (1) and g denotes the number of components.
Here, π 1 , π 2 , … , π g represent mixture proportions or weights that satisfy the It can be seen that the total number of parameters involved in equation ( 2) is (2gk + g − 1) .In order to estimate the parameters, a novel methodology is introduced which makes use of The present article attempts to estimate the parameters of the model defined in equation (2) by combining the functionalities of WH approximation as well as MCLUST algorithm along with the principles of maximum likelihood.This is a multivariate extension to the methodology to estimate the parameters of a finite mixture of two-parameter gamma distributions proposed by Vani Lakshmi and Vaidyanathan (2016).Its potential uses involve scenarios where the marginals of the model are independent.Often, this is encountered in hydrology and reliability studies.
Rest of thearticle is divided into four sections.Section 2 describes the proposed methodology for parameter estimation in MGMM.Section 3 presents numerical illustration based on simulated data.In Section 4, an application pertaining to flood frequency analysis is presented andSection 5 concludes the paper with discussion.ISSN-2424-6271 IASSL

Parameter estimation in MGMM
Consider a random sample of nobservations from a g-component MGMM where component densities have the form defined in equation (1) .The likelihood function corresponding to the mixture model is defined as Following the definitionin McLachlan and Peel ( 2004), the log-likelihood function for the complete data under the EM frameworkis written as where π ij = { 1, if observation j belongs to component i 0, otherwise and f i (z 1 , z 2 , … , z k )denotes the k-variate gamma distribution corresponding to the i-th component with parameters  i = (α i1 , … , α ik ) ′ and  i = (β i1 , … , β ik ) ′ .Substituting equation (1) in (4) and differentiating with respect to α ip , β ip ; p = 1,2, … , k, we get the following log-likelihood equations.
where Ψ(. ) denotes the digamma function.It can be observed that equations( 5)and( 6)do not provide closed-form solutions for α ip andβ ip ; p = 1,2, … , k.Therefore, one has to resort to iterative procedures or gradient-based search algorithms in order to estimate the parameters of the model.However, these approaches may result in estimates with large bias either because of sensitivity to starting values or complexity associated with the presence of digamma function coupled with interdependence of parameters.In the sequel, we introduce a novel approach to estimate the parameters of the model defined in equation (2).
MCLUST is then implemented on the transformed model resulting in component memberships, estimates for (, ) as well as mixture proportions.However, our interest lies in estimating the parameters (α ij , β ij ), i = 1,2, … , g; j = 1,2, … , k of the gamma marginal densities.
Towards this, the following steps are implemented.1.Based on the estimates provided by MCLUST, compute 100(1 − δ)%Confidence Interval (CI) for μ ij where μ ij denotes the mean corresponding to j-th marginal of the i-th component andδ ∈ (0,1) determines the coverage probability.

Using the relationship,μ
based on WH approximation, obtain all values of (α ij , β ij )that lies within theCI by an extensive two-dimensional search over the parameter space.Let this set be denoted by S. 3. Maximise the log-likelihood of the normal density corresponding toj-th marginal of thei-th component within S to obtain estimates for (α ij , β ij ), say (α ̂ij , β ̂ij ).The above steps are repeated for each component of the model.
In the following section, the performance of the proposed methodology is assessed in terms of the average value of the estimates (AVG), Bias, Standard Deviation (SD) and Mean Square Error (MSE) obtained through simulation studies.

Numerical Illustration
To assess the performance of the proposed approach, we consider a twocomponent MGMM of the form defined in equation (2) withk = 2. Datasets of sizes n 1 andn 2 are simulated from each component for the parameter choices given in Table The above parameter choices ensure that the simulation study takes into account various levels of overlap among components, namely: low, medium and high.Figure 1plots the mixture densities corresponding to the various cases considered in Table 1.
Figure1: Mixture densities corresponding to the three different cases From Figure1, it can be observed that Case 1 is characterised by a high overlap among the components.In this case, the two components have same mean vector.In Case 2, there is a relatively low overlap between the components.The component-wise scale parameters are same in this case.In Case 3, there is a medium overlap among the components.In this case, the components have same variance-covariance matrix.
In order to assess the performance of the proposed approach for higher-order components, a simulation study involving three-component bivariate gamma mixture model (Case 4) has also been carried out with parameter choice as given in Table 2.The proposed methodology is implemented for 500simulation runs using a code developed in R version3.1.1.The value of δ is chosen as 0.001 so as to obtain a large confidence interval for means.The simulation results corresponding to the four cases along with misclassification rates are presented in Tables3,4 and 5 respectively.
From the simulation results, it is observed that the 1) proposed methodology provides estimates that are 'close' to the true parameter values for all the cases considered in the study as can be seen from the 'small' Bias values.2) misclassification rate is low in cases where the components have low or medium overlap.3) proposed methodology performs well for higher-order components.

Application
As a real-life application of the proposed methodology, we consider a dataset available in Yue (2001) that deals with stream flow measurements collected for a period of 77 years  from Madawaska basin in the Quebec province of Canada.Yue (2001) points out that the variables, flood duration (expressed in number of days) and flood peak (expressed in m 3 s -1 ) are independent ( = −0.0118,p value = 0.9189) , each having gamma distribution.A scatter plot of the variables given in Figure 2 indicates the presence of a single cluster.Since the variables are independent with gamma marginals, the proposed approach is implemented using the joint density of flood duration and flood peak.The results indicate the presence of a single cluster with corresponding parameter estimates as provided in Table6.

Discussion
In this paper, we propose a novel approach to estimate the (2gk + g − 1) parameters of a k-variate gamma mixture model with g components assuming the variates to be independent.The proposed methodology incorporates the concepts of WH approximation, MCLUST algorithm and the principle of maximum likelihood.
The complexities associated with the estimation of parameters and identification of mixture proportions is considerably reduced due to the incorporation of WH approximation that transforms the MGMM to ISSN-2424-6271 IASSL multivariate Gaussian mixture setup.This enables the use of MCLUST which gives a framework to estimate the parameters of multivariate Gaussian mixture model with different covariance structures.MCLUST provides cluster memberships, estimates for component-wise (, ) and mixture proportions.The parameters of MGMM are then estimated by using the relationship between (, ) and (α ij , β ij ), i = 1,2, … , g; j = 1,2, … , k as given by WH approximation.Due to the non-linear nature of the relationship, closed form expression for the estimators do not exist.The proposed methodology overcomes this limitation by estimating the parameters through a confidence interval based search approach that involves .One may also use the estimates of    , i = 1,2, … , k to construct confidence interval in a similar manner and thereby obtain estimates for the parameters.However, this may be computationally more intensive that the present method.
The proposed method does not involve the use of calculus, complex iterative optimisation methods and re-parameterisation procedures.A key advantage of the method is its capability to determine the number of components present in the mixture model.However, computing time taken by the proposed method to obtain estimates is more owing to the fact that it searches the parameter space separately for each component.
From simulation results, it is clear that the proposed methodology results in estimates with 'small' Bias.Also, application of the proposed approach to reallife data indicates that the estimates obtained are good.This can be seen from large p-values associated with KS test.To conclude, the proposed methodology is able to identify the group memberships and estimate the parameters of MGMM for low, medium and high overlap between the components.
Further research is being carried out to extend the proposed methodology to estimate the parameters of MGMM in the presence of location parameters and dependent marginals.

Figure 2 :Figure 3 :Figure 4 :
Figure 2: Scatter plot of flood duration and flood peak

Table 2 :
Choice of parameter values for Case 4

Table 3 :
Simulation Results for Cases ,  and

Table 4 :
Simulation Results for Case