Recovering Fisher-Information from the MGF Alone without Requiring Explicit PMF or PDF from a One-Parameter Exponential Family

: It is well-known that a ﬁnite moment generating function (m.g.f.) corresponds to a unique probability distribution. So, an important question arises: Is it possible to obtain an expression of Fisher-information, I X ( θ ) , using the m.g.f. alone, that is without requiring explicitly a probability mass function (p.m.f.) or probability density function (p.d.f.), given that the p.m.f. or p.d.f came from a one-parameter exponential family? We revisit the core of statistical inference by developing a clear link (Theorem 1.1) between the m.g.f. and I X ( θ ). Illustrations are included.


Introduction
A moment generating function (m.g.f.) is widely used in both probability theory and statistics. Suppose that an observation X has its probability mass function (p.m.f.) or probability density function (p.d.f.) given by f (x; θ) where x ∈ X , a subset of the real line. Here, θ is an unknown parameter with a parameter space Θ, a sub-interval of the real line. The m.g.f. of X is: To be specific, Fisher-information in X about unknown θ is given by: Fisher-information: I X (θ) = E θ ∂ ∂θ log f (X; θ) 2 , or equivalently, E θ − ∂ 2 ∂θ 2 log f (X; θ) , for all θ ∈ Θ, (1.2) assuming that the expectations are finite and non-zero. Now, we mention a very powerful and well-known result that emphasizes the importance of m.g.f. and it is briefly stated as follows: A finite m.g.f. is associated with a unique probability distribution One may refer to Casella and Berger (2002, p. 66) and Mukhopadhyay (2000, pp. 190-192). A probability distribution, once accurately identified from an m.g.f., will immediately lead to an expression of Fisher-information. But, that route will nearly demand that we are aware of an exact expression of the p.m.f or p.d.f. which gives rise to a particular m.g.f. on hand assuming the m.g.f. is finite.

Main Result
Suppose that we are given an expression of M X (t; θ) rather than f (x; θ), but we cannot immediately identify an exact nature of the p.m.f or p.d.f. f (x; θ). In such a situation, it will be useful if there is a way to obtain I X (θ) using the expression of M X (t; θ) alone.
A natural query comes up: Can we recover an expression for I X (θ) from M X (t; θ) without first identifying the corresponding unique distribution associated with M X (t; θ)? Theorem 1.1 answers in the affirmative when the associated distribution comes from a one-parameter exponential family.
An important question may be raised: How would one know whether or not a given m.g.f. came from a one-parameter exponential family? This is often an important assumption behind numerous probabilistic models under consideration. An experimenter may not know an exact statistical or probabilistic model that will uniquely give rise to a dataset on hand. But, a oneparameter exponential family has a rich structure with flexible analytical and geometrical features lending useful stochastic models as possibile choices to consider in data analyses.
We quote how Wikipedia (https://en.wikipedia.org/wiki/Exponential family) explains in its opening paragraph: "In probability and statistics, an exponential family is a set of probability distributions of a certain form ... This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The concept of exponential families is credited to ... Anderson (1970), Pitman and Wishart (1936), Darmois (1935), Koopman (1936). The term exponential class is sometimes used in place of 'exponential family' (Kupperman (1958))". Theorem 1.1. Suppose that a p.m.f. or p.d.f. f (x; θ) belongs to a one-parameter exponential family. More specifically, we suppose that: where X , h(x), U (x) do not involve θ and Θ, c(θ), w(θ) do not involve x. Let us denote:

4)
for t ∈ T , a sub-interval of the real line. Then, I X (θ), given by (1.2 ), can be recovered from M U (t; θ), the m.g.f. of the (complete and ) minimal sufficient statistic U ≡ U (X) for θ, as follows: This tool will be especially useful in many situations for obtaining I X (θ). Illustrations will follow in Sections 2 and 3.

Layout
In Section 2, we illustrate the usefulness of the limiting expression (1.5) with three initial examples. These examples highlight the importance of Theorem 1.1. In Section 3, two additional illustrations with more complexities exhibit the true potential of Theorem 1.1 by finding the expressions of I X (θ) from the given m.g.f.s. One may note that the probability distributions associated with the m.g.f.s from Section 3 may not be readily identifiable from cursory reading.
In Section 4, we begin with some motivations behind Theorem 1.1. Those motivations and a formal proof of Theorem 1.1 are closely tied with each other and hence we keep these in one place. In Section 5, we briefly discuss an ad hoc technique if one can easily identify an expression of w(θ) in (1.3) by glancing at M U (t; θ) and some trial and error under natural parametrization. Some concluding thoughts are included in Section 6.

Illustration 1
Suppose that U has its m.g.f. given by: (2.1) Now, using L'Hôpital's rule repeatedly, the expression on the right-hand side of (1.5) simplifies to: The m.g.f. M U (t; θ) from (2.1) clearly corresponds to U = X where X has the N (θ, 1) distribution with an unknown mean θ. The p.d.f. of X obviously has the same form as in (1.3) with appropriate c(θ), h(x), w(θ), U (x) = x, and X = Θ = (−∞, ∞).
The m.g.f. M U (t; θ) from (2.3) corresponds to U ≡ X where X has the Poisson(θ) distribution with an unknown mean θ. Also, the p.m.f. of X obviously has the same form as in (

Illustrations with More Complexities
In this section, two complex illustrations are discussed. In either case, it may be difficult to readily identify an associated population distribution f (x; θ) simply by looking at the m.g.f., and such a possibility bolsters the importance of Theorem 1.1. A main point is that (1.5) does not require f explicitly to come up with I X (θ).
In complicated cases, one may find it bothersome to write down correct expressions of the successive partial derivatives (with respect to t) for both the numerator and denominator while implementing Theorem 1.1. But, in this day and age of computers, that should not cause any hindrance. Indeed, the lengthy expressions in (3.5) and (3.7)-(3.11) were obtained using MAPLE.
Here is one other complicated illustration. Suppose that the distribution of X belongs to a one-parameter exponential family, and U = U (X) has its m.g.f. given by: What is I X (θ)? One may exploit Theorem 1.1 and check: All other details are left out.

Some Comments
A natural temptation would be to check the expressions of I X (θ) by utilizing the associated p.d.f.s of U which, by the way, ought to be proportional to f (x; θ). In Illustration 4, the form of the m.g.f. from (3.1) and our subsequent derivation of I X (θ) may supply hints to guess the corresponding f (x; θ). In that case, one will be able to check easily that I X (θ) so found was indeed correct.
Illustration 5 is quite different from the other four illustrations. The m.g.f. from (3.4) is not something that one handles everyday! While it is difficult to come up with the corresponding f (x; θ), it may remain out of sight. Exactly right there lies the usefulness of Theorem 1.1.

Motivation and Proof of Theorem 1.1
We suppose that f (x; θ), X , and Θ satisfy the regularity conditions that are customarily assumed for the Cramér-Rao inequality to hold. A crucial point is that the partial derivative ∂ ∂θ of an integral over X may be moved inside the integral sign. In the case when f (x; θ) belongs to a one-parameter exponential family, such requirements would be necessarily satisfied. See, for example, Lehmann (1983, p. 125), Lehmann and Casella (1998, pp. 115-118), Rao (1973, p. 329), and Mukhopadhyay (2000, p. 366).
We immediately have: Now, since f (x; θ) is a p.m.f. or p.d.f., we have: which gives: by interchanging the integral with respect to x and the partial derivative with respect to θ.
In other words, U ≡ U (X) is an unbiased estimator of a parametric function d(θ) defined by (4.2). Then, from (4.1), we can conclude that the variance of U (X) must attain the Cramér-Rao lower bound, that is, See Rao (1973, p. 325) and Lehmann and Casella (1998, p. 121).

Proof of Theorem 1.1
At this point, we turn around to note that exp {tU (X)} is an unbiased estimator of M U (t; θ) for all fixed t ∈ T . Now, the Cramér-Rao inequality will imply: for all fixed t ∈ T , θ ∈ Θ. Clearly, V θ [exp {tU (X)}] is equivalently expressed as: which happens to be the denominator of the expression on the right-hand side of (1.5).
Next, when can one expect the equality to hold in (4.4)? The clue is hidden inside (4.3). If we pick t very small, then the associated unbiased estimator exp {tU (X)} for M U (t; θ) would behave "nearly" as an estimator that is linear in U . This may be validated as follows: We note that both ∂ ∂θ M U (t; θ) 2 and M U (2t; θ) − (M U (t; θ)) 2 tend to zero as t is made small.
Let us denote: Next, we apply L'Hôpital's rule repeatedly and interchange the partial derivatives with respect to t and θ to write: , using (4.5) (4.6) On the other hand, in view of (4.1) and (4.3), for such a linear unbiased estimator of M U (t; θ), that is for small t, its variance must attain the Cramér-Rao lower bound. Hence, we have the following result: which coincides with the limiting expression on the right-hand side of (1.5). The proof is complete.

Natural Parametrization?
Now, we briefly discuss a parallel idea of an ad hoc nature. Given how Section 4 was laid out, one may feel tempted to cast Theorem 1.1 in the light of Theorem 7.2 of Lehmann (1983, p. 127) or equivalently Theorem 5.4 of Lehmann and Casella (1998, p. 116) or the formula from (2.46) in Barndorff-Nielsen and Cox (1994, p. 27). One may consider other appropriate sources too. But, then, one should first identify w(θ) used in (1.3) readily from a given expression of M U (t; θ) alone.
If that is indeed the case, then one may consider obtaining I * X (w(θ)), Fisher-information under the mean-value parametrization or the so-called natural parametrization where I * X (w(θ)) can be rewritten as: From (5.1), we can express I X (θ) as: Clearly, w(θ) must satisfy the following relationship: M U (t; θ) = g (t + w(θ)) /g (w(θ)) for some g(.), and for all t ∈ T and θ ∈ Θ.

(5.3)
We note that even if one is not readily able to fully identify f by looking at M U (t; θ) alone, through trial and error on a case by case basis, one may occasionally come up with an explicit expression of w(θ) satisfying (5.3).
Illustration 1 (Continued): M U (t; θ) = exp tθ + 1 2 t 2 and (5.3) leads to g(x) = exp x 2 /2 and w(θ) = θ. Then, formula (5.2) leads to I X (θ) = 1. The formula from (5.2) will work well as long as one can easily identify w(θ) by proceeding on a case by case basis. We are not aware of a precise analytical method for determining w(θ) satisfying (5.3) in a general situation.

Concluding Thoughts
In view of (4.6), Fisher-information from (1.5) is easily rewritten as: where we denote E θ [U ] ≡ µ(θ) and V θ [U ] ≡ σ 2 (θ). The expression in (6.1) may appear simpler than that on the right-hand side of (1.5). If one can not immediately guess f (x; θ) from the expression of M U (t; θ) alone, then the chance is probably very slim of one's quoting µ(θ) and σ 2 (θ) right away from thin air. Even though equation (6.1) is correct, this particular form may not be readily useful. There is no additional advantage in overreaching (6.1) beyond (1.5) unless one just happens to identify µ(θ) and σ 2 (θ) readily.
A direct extension of Theorem 1.1 in the case of a multiparameter exponential family appears straightforward and hence it is omitted for brevity.