Lorenz Curves and Treatment-Covariate Interactions in Clinical Trials

A common objective in comparative two-treatment randomized clinical trials is the study of the possible heterogeneity of the treatment effect across subgroups of patients, with the objective of identifying patients who benefit the most (or the least) from a new treatment. Here we describe the connection that exists between an exploratory approach to such problem (STEPP, or the Subpopulation Treatment Effect Pattern Plot approach) and the Lorenz curve, and in particular the generalized Lorenz curve. We exploit such connection to construct a test for the absence of interaction between a continuous covariate and the difference in the mean of a continuous outcome between the two treatment groups. We also review some recent developments in the study of concentration for right censored survival data, which are also closed related to the Lorenz curve.


Introduction
Consider the general setting of comparing a new treatment against standard therapy in a two-arm randomized clinical trial.It is often of interest to study the heterogeneity of the treatment effect across groups of patients, to try and identify subgroups of patients who may benefit the most (or the least) from the new therapy, so that treatment can be tailored to the individual patient.Here we focus on the case in which subgroups are defined with respect to a onedimensional covariate X -say, a biomarker or a baseline risk index.
The presence of an interaction effect is defined as the situation in which the effect of the new treatment on the outcome measure Y as compared to the standard treatment varies as X varies.One may test for the presence of an interaction effect within a regression model.For example, one could use a fractional polynomial (FP) model both overall and within each treatment group (see Royston and Sauerbrei, 2008).In that case, the first step is to construct a multivariable adjustment model which may contain binary covariates and fractional polynomial transformations of any continuous covariates.The second step involves the fractional polynomial modelling within the adjustment model.This approach is more flexible than the standard polynomial regression and it is useful when one wishes to preserve the continuous nature of the covariate in the regression model, but suspects that some or all of the relationships may be non linear.Potential shortcomings of the fractional polynomial approach are its limited power to detect nonlinear interactions, and the possible sensitivity to extreme values at either end of the distribution of X. Owing to insufficient sample size, variables with a modest or weak effect may not be selected or by default, linear effects may be chosen instead of more realistic non linear functions.In clinical applications, on the other hand, the patient population is often categorized into two or more disjoint groups according to the value of X based on cut-points placed over its support, and the interaction is studied within a model that includes main effects as well as treatment-covariate interaction terms (Altman et al., 1994).Such approach suffers from its dependence on the choice of the cut-points and from the loss of power due to categorization.An alternative approach to the investigation of interactions, that sits between these two extremes, was first introduced in Bonetti and Gelber (2000) within the survival analysis setting, and it is called the Subpopulation Treatment Effect Pattern Plot (STEPP) approach.The approach is reviewed in Section 2. It is based on dividing the observations into subpopulations defined with respect to a collection of cutoff values of the covariate X, and on estimating the treatment effect within each subpopulation.To increase the number of patients that contribute to each point estimate (and hence the precision of the individual estimates) subpopulations are allowed to overlap.Plots showing the estimated treatment effects in the subpopulations can be used to investigate the possible interaction between X and treatment.STEPP is essentially a smoothing-bybinning approach, and it is appealing because it clearly defines the groups of patients on which customary treatment effect measures are calculated using standard methods.Different implementations of STEPP exist that depend on the nature of the outcome of interest.For example, in survival data clinical trials interest may be in the estimated difference between the survival probabilities at some fixed time point t ⇤ between the two treatment groups, across the subpopulations.Other possibilities include the Cox proportional hazards model, or the cumulative incidence function within the competing risks setting (Bonetti and Gelber, 2004;Lazar et al., 2010).
Here we note the connections that exist between the STEPP construction and ideas related to the Lorenz curve, and in particular to its generalized version based on the regression function.Indeed, the Lorenz curve has been introduced to measure the level of inequality in a population, typically with respect to income or wealth.In Section 3 we recall this concept, and the notion of generalized Lorenz curve (Lorenz, 1905;Kakwani, 1977).In particular, we focus on the generalized Lorenz curve of the regression function as a tool to measure the dependence between two variables X (real) and Y (positive).By comparing two generalized Lorenz curves of the regression function E(Y |X = x) between two groups in a population, one can study the presence of an interaction, i.e. whether the presence of a differential effect of the continuous predictor X on the outcome variable Y varies between the two groups.We relate this concept to the STEPP construction, show some results, and use them to construct a test for treatment-covariate interactions.In addition, we highlight other recent developments in subgroup identification in clinical studies based directly on concentration ideas, also closely related to the Lorenz curve.We close with some discussion in Section 4.

The Subpopulation Treatment Effect Pattern Plot (STEPP)
STEPP extends the idea of looking at treatment effect within subgroups of patients.In particular, the approach studies patterns of treatment effect across overlapping subpopulations of patients.Consider n patients in a two-arm (randomized) clinical trial, with a continuous baseline covariate X 2 [x min , x max ] ⇢ < 1 observed on all patients, and define K overlapping subpopulations of patients with respect to their values of X. Subpopulations (P j , j = 1, ..., K) can be constructed according to a sliding window pattern: assign a patient i to subpopulation P j when x i 2 [l j , u j ], where the two nondecreasing sets of numbers {l j } and {u j } are such that {l j 2 [x min , x max ], u j = inf{u l j |P n (l j  X  u) p)} for some fixed p 2 (0, 1), with P n the empirical distribution of X in the data.Subpopulations in the sliding window construction have an overlapping part and a part that differs from the neighboring supopulations.The patient with index i having covariate value x i belongs to subpopulation P j if l j  x i  u j .
As another possibility, the tail oriented construction produces subpopulations by removing patients from the whole patient population, starting with the ones with the highest (or lowest) values of X.With increasing distance from the center, more and more patients with high covariate values (or low covariate values) are dropped.The cutoffs are defined as {l j 2 [X min , X max ], u j = X max } or {l j = X min , u j 2 [X min , X max ]}.Note that, unlike the sliding window approach, the tail oriented construction includes among the subpopulations also the subpopulation that coincides with the set of all patients in the study.For the jth subpopulation, an estimate b ✓ j of treatment effect is then computed.
The plot of the treatment effect estimates b ✓ j within the subpopulations vs. the median value of X within each subpopulation is called a STEPP plot, and its examination may suggest the presence of patterns in the treatment effects as the covariate of interest varies.
The possible implementations of STEPP include the case in which treatment effect is defined as b ), with b S(t) G,j the Kaplan-Meier estimator of survival at time t within treatment group G inside subpopulation  Another implementation of STEPP is based on the hazards ratio obtained from fitting a Cox PH model (Bonetti and Gelber, 2000).This implementation presents some difficulties in its interpretation due to the fact that the same patient typically belongs to more than one subpopulation, and that it is therefore not clear what probability model its survival time follows.Nevertheless, the desire to capture the difference in two survival distributions beyond the probability of surviving beyond up to a specific time point makes this hazards ratio implementation useful.
Other implementations include the case in which treatment effect is defined to be the difference in a quantile of the survival function (say, the median survival time) as described in Bonetti et al. (2009b).The survival medians in treatment group G (2 {1, 2}) within the K subpopulations are obtained by inversion of ), the vector of the Kaplan-Meier estimators of the survival functions S G,j (•), j = 1, . . ., K.
Lastly, treatment effect can be defined within the context of competing risks as the difference in the estimated cumulative incidence function (of, say, the cause of death of interest c) at some fixed time point t ⇤ within each subpopulation: or as the treatment effect parameter in a Fine and Gray semiparametric proportional hazards model for the competing risk of interest (Fine and Gray, 1999;Lazar et al., 2010).Recently, the graphical part of STEPP has been extended to the case of two continuous covariates (Pogue-Geile et al., 2013).Apart from the graphical examination of the STEPP plot, statistical tests of significance can be performed to test for the homogeneity of the treatment effects across the subpopulations.To test the null hypothesis of no interaction between the covariate of interest (i.e.across subpopulations) and treatment effect one can use one of several possible test statistics.For example, the test statistic can be used, where b ✓ ALL is the measure of treatment effect computed on all patients in the study.The distribution of T can be estimated by sampling repeatedly from the estimated asymptotic distribution of ( b and a p-value can thus be produced.Lastly, a simultaneous confidence band around the plot of the estimated treatment effects can be obtained by solving numerically for the equation for a sample of random variables generated from the estimated asymptotic distribution of the estimates.The parameter represents the widening of the

Lorenz Curves and Treatment-Covariate Interactions in Clinical Trials
marginal confidence intervals that is necessary to produce the desired simultaneous coverage of the band.Note that Bonferroni adjustment can also be used, as well as marginal confidence intervals for each subpopulation.Alternatively to large sample theory, in general one can also perform more accurate testing by using permutation-based inference (Bonetti et al., 2009b).Several examples of clinical applications of the various implementations of STEPP can be found in Lazar et al. (2010).The R stepp package for both the asymptotic and the permutation-based procedures is available on CRAN.
3. Lorenz-Curve, Generalized Lorenz Curve of the Regression Function, and Concentration

The Lorenz Curve
The Lorenz curve (LC) has been used since 1905 to describe concentration and inequality in distributions of resources.Its main importance is in economics, when dealing with income and wealth: it is still considered as a simple method for visualizing distributions of income or wealth with respect to their inherent inequality or concentration, through the graph of the cumulative proportion of total income or wealth owned, against the cumulative proportion of the population owning it.For a nonnegative random variable Y ⇠ f (y) with finite and strictly positive expected value µ = E(Y ), the Lorenz curve is defined as the function (F, C)(y) given by Lorenz, 1905).Note that this can also be written as where the quantile function F 1 Y is defined as the pseudoinverse of the cumulative distribution function (cdf), F Y , as in Pietra (1915).
Note that any distribution supported on the non-negative half line with a finite and positive first moment admits a Lorenz curve.It is a direct consequence of the definition that the Lorenz curve L(p) is continuous on [0, 1], with L(0) = 0 and L(1) = 1, increasing, and convex.The Lorenz curve is often represented through its dual.The Dual Lorenz curve (DLC) of a non negative positive random variable Y with finite expectation µ is the graph of where p 2 [0, 1].The dual Lorenz curve can be denoted as L ⇤ Y (p) = 1 L Y (1 p), so it is the centrally symmetric curve to the LC, and it is therefore a concave, increasing and continuous function on [0, 1], with L ⇤ (0) = 0 and L ⇤ (1) = 1.The empirical counterparts of these curves can be expressed in terms of order statistics.In particular, considering n points, let y (i)n denote the i-th smallest value of the random variable Y , with i = 1, ..., n.Then the empirical Lorenz curve is defined as: with k = 1, . . ., n and with L(0) = 0. (The empirical dual Lorenz curve is defined similarly).Essentially, the Lorenz curve describes the amount of total wealth (say) owned by the p ⇥ 100% poorest individuals, for p 2 [0, 1] (see also Gastwirth, 1971).

The Generalized Lorenz Curve of the Regression Function
A generalization of the Lorenz curve has been proposed in Kakwani (1977).Let g(x) be a continuos function of x such that its first derivative exists, and g(x) 0 8x.If 0 < E(g(X)) < +1, then the generalized Lorenz curve (GLC) is defined as where p 2 [0, 1] and x p = F 1 X (p).It turns out that the GLC can be used to study orderings of the dependence between two variables: Blitz and Brittain (1964) introduced a monotone dependence structure based on the generalized

Lorenz curve of the regression function
also with p 2 [0, 1] and x p = F 1 X (p) (Blitz and Brittain, 1964).As a consequence, bivariate distributions can be ordered according to monotone dependence.The symbol = is used below to denote the equivalence of random variables Z 1 and Z 2 : Z 1 =Z 2 iff the probability of the event {Z 1 = Z 2 } is equal to one.In particular, the following properties hold: 1. L E(Y |X) (p) passes through the points (0, 0) and (1, 1).

L E(Y |X) (p) is increasing if and only if g(x)
> 0 for all x.

L
is the class of bivariate nonnegative random variables (X, Y ) with continuous marginal distribution functions F X and F Y .Then L E(Y |X) will be above (below For the proofs of these properties we refer to Muliere and Petrone (1992).By assuming the absolute continuity of (X, Y ), one can prove the following equivalent expression of the generalized Lorenz curve of the regression function: and where b where G F (u,v) is a zero mean Gaussian process with known covariance function.
We refer to Appendix A for a sketch of the proof of Theorem 3.1.While this result describes the large sample behavior of b ), the fact that its expression involves an integration with respect to a process makes it hard to use in practice.Indeed, we recommend the use of permutation-based inference instead.Note that above we have defined the empirical generalized Lorenz curve of the regression function as: The GLC of the regression function is therefore related to an increasing sequence of subpopulations defined by the values x p = F 1 X (p) of the predictor X, across all values of p 2 [0, 1].In particular, for a fixed value p the GLC of the regression function is the ratio between the conditional expected value of Y given that X  F 1 X (p) and the overall expected value of Y .This clearly resembles the tail oriented version of STEPP described in Section 2 when one defines as treatment effect the (estimated) differences in conditional means of exactly the form E(Y |X  F 1 X (p)).For the subpopulations containing all patients that have values of X smaller than the empirical percentiles x p , these are therefore estimated nonparametrically.This parallel suggests alternative tests for interaction, motivated by the comparison of the level of dependence between X and the outcome Y in the two treatment groups, when such dependence is expressed by the GLC of the regression function.In particular, one may study the presence of an interaction effect through the difference between the estimated GLC of the regression function for the two treatment groups ⌘ across all values of p 2 [0, 1].For example, one may consider the following two test statistics: and Inference for the proposed test statistics could be based on Theorem 3.1.However, as noted above, it is more practical to make use of permutation testing.Permutation tests are conditional statistical procedures, where the conditioning is with respect to the permutation set of the observed data which plays the role of reference set for the inference.Under the null hypothesis and assuming exchangeability, the conditional probability distribution of a generic element that belongs to the set of the permutations of the observed data is independent of the distribution P of the data.This allows permutation inference to be invariant with respect to P within H 0 .Due to this invariance property, permutation tests are therefore distribution free and non parametric.To perform a permutation test one follows the customary four steps: (i) define the null hypothesis, the alternative hypothesis, and the assumptions; (ii) choose the appropriate test statistic and calculate its value using the observed data; (iii) calculate the value of the test statistic for all permutations of the data; and (iv) reject or not reject the null hypothesis using the estimated distribution of the test statistic under permutation.When the number of all permutations is very large one typically uses the conditional Monte Carlo approach that chooses a sample of permutations, thus producing an estimate of the exact p-value (Good, 1994).In our problem one may develop a permutation test under the null hypothesis that F X,Y is equal in the two treatment-defined populations by permuting the pairs of values (X i Y i ) across the two patient groups.

Concentration
An alternative way of studying the differential effect of a new treatment on a subgroup of patients exists that is based directly on the Lorenz curve con-  (Gini, 1912).Asymptotic normality of the empirical Gini index is well known (Hoeffding, 1948).Importantly, the Gini index G is equal to twice the concentration area, or the area between the 45 degree line and the Lorenz curve.Thus G is consistent with orderings of distributions that are induced by the Lorenz curves.It is therefore reasonable to compare distribution with respect to their concentration.In particular, testing for the equality of two population Gini indices may capture differences in the two distributions that may not be revealed by other measures.Rejection of the null hypothesis of equality of the two outcome distributions of the two patient groups from the point of view of their concentration may suggest that subgroups of patients exist for whom treatment has a strong positive (or detrimental) effect.Equivalently, differences in concentration between groups may suggest the presence of a differential treatment effect on some patient groups.Following these ideas, the use of the Gini index has recently been extended to right-censored survival data arising from clinical studies to detect differences in concentration between the survival time distributions of two treatment groups of patients (Bonetti et al., 2009a).An alternative expression for G is

Lorenz Curves and Treatment-Covariate Interactions in Clinical Trials
One may consider the following restricted version of G: Under some regularity conditions, the restricted Gini statistic has an asymptotic normal asymptotic distribution, and a plug-in estimator of its asymptotic variance is available (Bonetti et al., 2009a).A test is then constructed that compares the estimated (restricted) Gini concentration coefficient between the two treatment groups.The Gini test complements other tests in the sense that it captures differences in survival distributions not typically detected by other tests (e.g.Wilcoxon, Log-rank, or Gray-Tsiatis), and a large simulation study suggests that the Gini index should be considered together with other existing tests to detect differences in survival distributions.Here, too, permutation distribution inference is recommended, but only for the case of small and unbalanced groups (Gigliarano and Bonetti, 2013).Additional results in that reference apply specifically to cure rate models.An R function for the implementation of the restricted Gini test for right censored survival data is available from the Authors.

Conclusions
As we have seen above, STEPP is an exploratory tool which is easy to interpret, and it provides an opportunity to detect interactions beyond those that may be apparent based on regression models.STEPP may suggest the presence of a pattern in treatment effects, and it falls somewhere between the simple subgroup analyses performed on disjoint patient subgroups and full modeling.When implementing the methods, it is important to assess the robustness of the results of the analysis to the choice of the parameters that define the subpopulations.
As we have discussed here, interesting connections exist between STEPP and Lorenz curves.In particular, starting from the definition of the generalized Lorenz curve of the regression function, we have suggested another way to explore the presence of the interaction effect between treatment and a continuous predictor X.Using test statistics based on the difference of two generalized Lorenz curve of the regression function, one may construct increasing overlapping subpopulations of patients (as in the tail-oriented version of STEPP) and The class of lower rectangles is P -Donsker for any law of (X 1 , Y 1 ), ..., (X n , Y n ), and the indicator function is square integrable (van der Vaart and Wellner, 1996; Das Gupta, 2008).By the central limit theorem for empirical measures, the sequence p n( b F (x, y) F (x, y)), converges weakly to is a zero-mean Gaussian process G F (x,y) with covariance function given by Cov(G F (x i ,y i ) , G F (x j ,y j ) ) = F ((x i ^xj ), (y i ^yj )) F (x i , y i )F (x j , y j ), where ^denotes the minimum and i, j = 1, .., n (van der Vaart and Wellner, 1996).Lastly, by the functional delta method, one concludes that p n( ( b F (x, y), p) F ((x, y), p) is asymptotically equivalent to R y 1(x  x p )d(G F (x,y) ).

Figure 2 .
1 illustrates the two constructions.The horizontal axis indicates the various subpopulations within which treatment effects are estimated, and shows on the vertical axis the range of covariate values used to define the cohort of patients included in each subpopulation.

Figure 2 . 1 :
Figure 2.1: Sliding window (left) and Tail Oriented (right) construction of the subpopulations in STEPP.

P
j , and with t ⇤ a suitably chosen time point(Bonetti and Gelber, 2004).Figure2.2 shows the plot of the four-year disease free survival (DFS) estimates in two of the four arms of a breast cancer clinical trial across the sliding window subpopulations (panel A), and the associated STEPP plot of their difference (panel B).The covariate of interest here is a well-known predictor of breast cancer prognosis, the tumor proliferation fraction (Ki-67), which is associated with the degree of effectiveness of chemotherapy.The prognostic and predictive value of Ki-67 LI were evaluated in the BIG 1-98 study, an international, double-blind phase III clinical trial of 8,010 postmenopausal women with early stage invasive breast cancer, who were randomly assigned to one of four adjuvant endocrine therapy arms: letrozole, tamoxifen, or sequences of these agents (letrozole to tamoxifen, tamoxifen to letrozole).We refer toLazar et al. (2010) for a detailed description of that STEPP analysis.

Figure 2 . 2 :
Figure 2.2: STEPP plots for 4-year disease-free survival (DFS) vs. Ki-67 labeling index (LI) in letrozole vs. tamoxifen.Each subpopulation contains approximately 150 patients, with approximately 50 overlapping patients.The left panel (A) shows the estimated 4-year DFS within each subpopulation for the two treatment groups; the right panel (B) shows the difference in the 4-year DFS estimates, with the 95% marginal confidence intervals (Reprinted with permission from Figure 1 in Lazar et al., 2010).

(< + ) 2 |y 1 y 2 |
structed on the two treatment groups.Disregarding covariates, let us go back to the Lorenz curve of a positive random variable Y .The area between the 45 degree line and the Lorenz curve is a measure of the concentration of the distribution of Y .Let the random variable of interest be Y 0 with cdf F Y , survival function S Y (y) = 1 F Y (y), density function f Y (y), finite expected value µ > 0, and variance V ar(Y ).The coefficient of mean difference is defined as = Z dF Y (y 1 )dF Y (y 2 ).The Gini coefficient of concentration for F Y is G = /(2µ), and it can be estimated from an i.i.d.sample Y 1 , . . ., Y n drawn from the population by b G H = D/(2Y n ), with D = (n(n 1)) 1 P j6 =k |Y j Y k | and Y n the sample mean

,
and estimate it by replacing S Y by its Kaplan-Meier estimator b S Y .