On Mutual Information for Elliptical Distributions: A Case of Nonlinear Dependence of ‘n’ Vectors

In this paper, we modeled dependent categorical data via mutual information concept to obtain the measure of statistical dependence. We first derive the entropy and mutual information index for exponential power distribution. These concepts are important and were developed by Shannon in the context of information theory. Several literatures are already published in the case of the multivariate normal distribution. Then we extend these tools to the special case of a full symmetric of multivariate elliptical distributions. The upper bound for the entropy which is attained for the normal density is established. We further derived the nonlinear joint model for dependent random vectors that spans an elliptical vector space to enhance multivariate relationships among non-empty subsets of vectors via multivariate mutual information; based on the assumption that the subsets of each vector and their interactions can be represented in discrete form. To illustrate its application, the multivariate dependency among various sites based on dominance of some attributes were investigated.


Introduction
Information theory is a branch of mathematical theory of probability and statistics introduced by Shannon (1948) who used mathematical theory in communication to describe logarithmic measures of information and has 20 ISSN-2424-6271 IASSL stimulated a tremendous amount of study in engineering fields on the subject of information theory. It is also the branch of applied probability and statistics that is relevant to statistical inference and therefore should be of basic interest to statisticians Kullback, (1978). Information theory seeks the quantification of information. One goal of information theory is the development of coding schemes that provide good performance in comparison with the optimal performance given by the theory. It works under the assumption of a strongly stationary random process to define an information quantity contained in a multivariate probability density function, for example, the multivariate normal distribution (Kullback, 1978 Thomas, 1991). This quantity allows to measure the cumulative information of a multivariate data set, or more specifically, to quantify the mutual information between two random variables or vectors. On the other hand, the entropy is a notion of information provided by a random process about itself and it is sufficient to study the reproduction of a marginal process through a noiseless environment. For a systematic and comprehensive account of these and related concepts, see, for example, Cover & Thomas (1991). Also comprehensive review of mathematical expectation and properties of single entropy, joint entropy, conditional entropy and maximum entropy   . H in discrete and continuous case are available in various texts see Shannon (1948), Tho Pharm et. al. (2012), Reginald (2015) etc. It is worth noting that there are various measures of dependence with the sole aim of obtaining an estimate of the degree of dependence between two or more random vectors. However dependence relationship among random vectors which could be linear or nonlinear has been a frequent but often incomplete goal because of its complexity. Despite this, the frequency of its necessity in cluster analysis, cryptography, data mining, networking and imaging and so on has made its further development inevitable. Though linear correlation as a measure of degree of dependence has a clear interpretation on linear dependence, it's under estimation of the nonlinear dependence is a setback that has restricted its usage. Among the various measures of dependence, mutual information (MI) is the most comprehensive of all due to its non-parametric nature; but its inability to give a clear cut interpretation has made its usage in data analysis relatively rare. Note that since most real life problems have varying vectors (variables) whose individual outcomes has collective conditional joint effect on each other due to complicated interactions 'e.g molecular interactions in biological networks'; then a clear multivariate representation of vector interactions must be used to define in clear terms total and conditional multivariate mutual information (

Mutual Information for Exponential Power Distribution
Following the concepts of entropy and mutual information index, starting with the unions of two or more sets; the MMI to capture the amount of information shared between two sets (input and output) and multiple sets (multiple inputs and single output) can be expressed mathematically. Definition 2.1: The mutual information between two random vectors Xɛ R n and YɛR m with joint and probability density functions f X,Y (x,y), f X (x) and f Y (y) respectively is The mutual information index between X and Y is defined by Implying that for one input and one output MI structure we have I(X|Y)=I(X)-I(X,Y)=H(X)-I(X,Y)=H(X,Y)-H(Y) while two inputs and one output MI has the structure I(X,Y|Z)=I(X,Y)-I(X,Y,Z)=H(X,Z)+H(Y,Z)-H(Z)-H(X,Y,Z).

The Exponential Power Distribution
Definition 2.4: The random variable X is said to have univariate exponential Where β is the shape parameter, µ and σ are location and scale parameters respectively, such that when β=2 we have the usual normal distribution and β, Laplace distribution. Therefore, we could say that the exponential power distribution has Laplace and normal distribution as special cases. The shape parameter β regulates the tail of the distribution which made it more flexible symmetric model. Subsequently we have the following proposition; x E X H and the result (6) follows

The Multivariate Extension to Exponential Power Distribution
The univariate normal density has been well known to have the maximum entropy subject to the condition that mean and variance are fixed. The similar result is true for its multivariate extension. Now, let f(x)be the density function of a n-dimensional random variable f(x) from EPD. We maximize where dv stands for the volume element, subject to the conditions that the mean and the dispersion matrix have given values; E(X)= where  determine the kurtosis (Gomez,et. al., 1998). Thus the correlation structure can be obtained directly from  in the usual way. Let g{(x)} be alternative density from n-dimensional normal density given as; From the information theory inequality we have for any of the two alternative Note that from (13) we can obtain the MI for two vectors Likewise for three vectors we have Hence the MMI index for n exponential power distributed vectors is   Given that the squared radial function in (7) is dependent on the in-built shape parameter  within the density generator function.

Relationship between Covariance and MMI matrix structures
So re-writing (15) and substituting into (17) we obtain the absolute nonlinear squared radial function for multivariate normally distributed discrete interacting vectors as Hence supposing X and Y are random vectors with discrete variates j and k respectively and total degrees of freedom df=jk-1 in a j  k contingency table; then the nonlinear chi-square partitioning is given as We can extend the partitioning to accommodate for many vectors interacting simultaneously. This approach can be used to analyze entropy and mutual information significance among discrete experimental data when the outcome is in frequency and the usual analysis of variance is not applicable.

Measure of Mutual information significance
; for i  j $ and its MMI estimate.

Conclusion
The measure of nonlinear dependence via partitioning of elliptical mutual information index into linear and nonlinear component has put an end to the era of difficulty in clear cut interpretation of the proportion of nonlinear dependence