Joint Multilevel Model for Analyzing Length of Stay through Competing Endpoints in Dengue Epidemiology

Dengue is a common mosquito-borne tropical disease caused by a virus. It is a life threatening disease since it sometimes leads to death within a short period of time. Multilevel modeling is a form of statistical modeling when data is at different levels. Due to dengue seriousness and risk being more similar for patients within districts than between districts, there is correlation between patients within districts. Thus district has to be taken as a cluster variable. A frequently encountered response in epidemiological studies is the length of Stay (LOS) of a patient, which measures the time until the event of interest occurs. Complexity arises with the different states/destinations of the time event and competing risk modelling is a more appropriate method for handling such states. The association of platelet count and length of stay of a dengue patient leads to the joint modelling approach for analyzing the dengue patients. Formulation criteria for the joint model with clustered data is to link these models through two sub models that is by using the multilevel multinomial logistic model for the LOS of dengue patients with different destinations and multilevel continuous model for the log platelet count. The linkage between two responses was derived by sharing a common random effect. Factors that have an effect on different destinations of LOS are, time indicators, year, age, classification, rainfall, temperature and humidity, while age, sex, classification, year place treated, rainfall, temp and humidity are associated factors for the log platelet count of dengue patients. Moreover, supremacy of joint modelling was proved by the AIC and BIC values over two separate univariate models.


Introduction
Dengue has become endemic in the World Health Organization (WHO) regions of Africa, the Americas, the Eastern Mediterranean, South-East Asia and Western Pacific in recent decades (WHO, 2016). Also dengue is a life threatening disease, since it can leads to death of a patient within a short period of time. Dengue Fever (DF) and Dengue Hemorrhagic Fever (DHF) has become a prominent cause of hospitalization and death. Lately, the occurrence of DF and DHF patients' have been rapidly spreading in Sri Lanka. In 2017, Colombo district showed rapid increment of dengue cases nearly with 34,000 in Sri Lanka (Disease Survelliance, 2018).
Length of stay is an intractive tool for assessing dengue patients as it is a result of patient characteristics, hospital characteristics and social characteristics (Sa, Dismuke, and Guimaares, 2007). Also, dengue patients are influenced by the characteristcs of the patients as well as their living surounding with the spatial variation of the rainfall, temperature, humidity and degree of urbanization. Therefore, patients/individuals are hierachchicaly structured within their living surrounding such as districts. This means that the patients within the same distrct are more alike than patients between districts. This leads to multilevel modeling. Clustering effects are arising due to the correlation between the same levels nested within the higher level in such situation. Hence, there are different random effects between levels as a result of this clustering. There is a lot of literature on selecting district as a clustering variable for dengue. Wickremasuriya and Sooriyarachchi (2014) have shown that the seriousness of dengue is more correlated within districts than between districts. Jayanetthi and Sooriyarachchi (2015) has indicated that dengue survival is more correlated within districts than between districts. Fernando and Sooriyarachchi(2018) have shown that dengue counts in Sri Lanka are related to the climate and so are correlated within districts as the weather is similar within districts.
Length of stay (LOS) of a patient is influenced by various endpoints/failure event like discharge, dead and transferred. This leads to competing risk modeling. It is used when an individual is under risk of multiple events/different destinations Haller, 2014). The shared parameter model is called a joint multilevel/hierachchical or random effect model, which is widely used for joining two dependent outcomes under a clustering nature. Here the random effect is introduced and shared with dependent outcomes to account for the association between these dependent outomes. On the other hand, complexity arises when one outomes is a competing risk event with hiearchchical nature. This is because it contains multiple events as well as two correlations that is correlation between responses for each cluster and correlarrtion between indididuals within the same cluster (Gueorguieva and Agresti, 2001).
The relation between LOS and platelet count of a dengue patient is motivatated by the study in past literature in order to evaluate the dengue patients (Pooransingh, Teeluksingh, and Dialsingh, 2016;Karunarathna and Sooriyarachchi, 2017). The joint association was built through the competing risk event response for the LOS of dengue patients and continuous response for platelet count after taking the logorithm of that variable. Although the failure events more often are in countinuous form , here, LOS of a dengue patient is recorded in dicrete form. As a result, the approach undertaken here is to analyze the LOS via dicrete time competing event model. The data was obtained from the Epidemilogical unit which had records of dengue patients reported in 2006-2008. In this study districts with high dengue cases (10 districts) were used to aid model convergence. The climate data was obtained from Meterological Department of Sri Lanka.
Logistic modeling for ordered categories and multinomial logistic modeling for nominal data are the most prominent method for modeling multilevel univaraite discrete competing risk models (Agresti, 1984;McCullagh, 1980;Prentice and Gloeckler, 1978;Gerhard, 1995). However joint modeling approaches for competing risk are well known with the continous time competing risk by considering the cause specific hazard or subdistribution hazard models. Research on joint modeling of longitudunal and continuous competing risk is a field of interest in recent decades for formulating the joint association of competing risk events by sharing the random effect among responses through two marginal sub models of mixed effect models for longitudinal measurement and cause specific hazard for the competing event ( (Steele, Diamond, & Wang, 1996). The novel method is formulated in here because there was no work found for the joint modeling of the discrete competing risk event by multinomial logit with any other response under the clustering nature in the literature.
The rest of the paper is organised as follows. The methodologocial aspects which is used to analyze this situation is discussed under the "Methodology" section. The next section discusses the results of the analysis of the dengue application. The last section gives some concluding remarks.

Methodology (in 12pt, bold;Below paragraph 6pt, Contents in 11pt)
The methodology to formulate the joint model to clustered data is carried out via two linked sub models of multilevel multinomial logistic model for the LOS of a dengue patient with a different destination and multilevel continuous model for the platelet count. This situation arising in different families of distributions such as multinomial distribution and continuous (normal) distribution. The linkage between the two dependent variables are modeled through the association between random effects.

Separate Sub Models
Assume that the durations are measured in discrete time intervals indexed by t (t=1….,T). For each discrete time interval t for the j th individual in cluster i is symbolized as Y1tij (r) that is the multinomial outcome which denotes whether an event r has occur during the interval t or not for the j th individual in i th cluster. The second response is symbolized as Y2tij (log of platelet count variable) which is a normally distributed outcome in the time interval t for the j th individual in the i th cluster.

Multilevel Multinomial Sub Model
The structural formulation of the model for the discrete competing risk event of Y1tij (r) is given by, (1) ; ℎ , = 1,2, … . , Let htij (r) be the hazard of an event of type r in interval t with corresponding vector containing the probabilities of falling into any particular category. Then, the log odds of experiencing an event of type r relative to an event of types s at time t for the j th individual in the i th cluster is given by, The equation (2) represents the probabilities of being in a particular category at time interval t in the random effects hazards model; (3) ; ℎ , = 1,2, … . , − 1 And

Multilevel Continuous Sub Model
Multilevel continuous sub model can be represented as, Here, 1 is the random intercept, 2 ′ is the fixed regression coefficient, is the random error for the individual which is normally distributed with mean 0 and variance 2 . Here 1 is a normally distributed random effect for the cluster with zero meanJoint Models For simplicity, assume that the random effects ( 0 ( ) ) are associated with the i th cluster for the event type r, is the same for each event type. Therefore equation (2) is rewritten as, ; ℎ , 0 ~(0, 2 ) A joint model is obtained by assuming a distribution for both sets of random effects 0 and 1 jointly. However, in here the authors are interested in considering the shared random intercept model with a scale parameter λ. Therefore, equation (5) has become as, Also, it assumes that, conditional on 0 , the outcome vectors 1 ( ) and 2 are independent, that is the association between two outcome vectors are assumed to be completely captured by the association between the random effects (Ivanova, Molenberghs, and Verbeke, 2015; Elashoff, Li, and Li, 2007).

Estimation and Inference
Assuming the independence of 1 ( ) and 2 , conditional on 0 , the corresponding likelihood function to the joint model is given by, Where, θ is the vector containing all parameters in the conditional distribution and normally distributed random effect 0 . where, ; where ∆ = censoring indicator and ℎ ( ) can be derived from equation (3) to be, Then, the likelihood function can be derived from equation (8) by substituting the values from equation (9) -(11). In this paper, standard numerical integration method of adaptive Gaussian quadrature, has been implemented in the SAS procedure Proc NLMIXED (Pinherio and Bates, 1995) due to the complexity of the integral part of the equation (8).

Analysis, Results and Discussion
The data is on dengue from 2006 -2008. For this period monthly climate data has been used. District is used as a clustering factor. Only data from 10 high incidence districts have been considered. The districts studied consists of Colombo, Galle, Gampaha, Kalutara, Kandy, Kegalle, Kurunegala, Matara, Puttlam and Ratnapura. In this study, discrete time approach is appropriate as the duration is recorded as whole days. Therefore, it can be assumed that duration reflects a discrete time measurement. Thus, initially, the time event; length of stay is categorized into three categories; 0-4 days, 4-6 days and 6-10 days which are known as febrile phase, critical phase and recovery phase respectively according to the clinical manifestation of dengue (Yacoub, et.al., 2014). Hence, each individual studied within 10 days and the recorded destination such as discharge, transfer or dead for each individual within the time period was used. If any individuals do not show any event up to 10 days, they are considered as censored observations.
As the next step, data need to be transformed to carry out discrete time competing risk modelling from the standard one-person, one-record data set (the person-data set) in to a one person, multiple period data set (the person-period data set) (Singer and Willett, 1993). That is indicator variables need to be introduced to the time variable and other categorical variables. Table 1 gives a list of variables used for this analysis. Also, reference category for the categorical variables is highlighted in the Table 1. Initial values for formulating the joint association between LOS of a dengue patients and log of platelet count were obtained by fitting the univariate models without considering the random effects in the model. The different destination such as discharge, transfer and dead, is a competing risk event and a nominal category and multinomial logistic regression model was fitted by using "PROC LOGISTIC" procedure under the "glogit" function in SAS. Three different models were formulated from the multinomial logit model as discharge vs censored, transfer vs censored and dead vs censored. Out of the 19 explanatory variables, only 8 variables were significant under the chi squared test at 5% level of significance. These were time indicators for the LOS, year, age, classification, rainfall, humid, humidlag1 and temp. The other response is log platelet and it is a continuous variable. Univariate model for the log platelet was fitted through "PROC Genmod" procedure in SAS. Eight variables which are year, month, age, sex, place treated, classification,, humid and temp were significant for log platelet response.
Then, the joint model was initiated through "PROC NLMIXED" procedure, with the significant variables from the univariate models. When formulating the joint model, the shared random intercept between responses with an inflation factor (λ) is used to account for the different scale at which these are measured (Ivanova, Molenberghs, and Verbeke, 2015) Table 2 presents parameter estimates of the joint model. This table shows that standard deviation of the random effect (district clustering effect (RI.sd)), residual (Res.Sd) and scale parameter (λ).
The equation (12)-(14) with equation (15), emphasizes that month, sex and place treated is only associated with the log of platelet count. Age is negatively associated with the discharge patients while positively associated with transfer patients. Temperature is related with all three destination of discharge, transfer and dead. The risk of death among the patients who were in febrile phase is 0.451 times lowerr than the patients who were in recovery phase, while 2.69 times higher in critical phase with respect to the recovery phase. For Discharge a patient at the febrile phase is 0.47 times lower than the recovery phase and discharging a patients at the critical phase is 0.418 lower than those at the recovery phase. The transferring patient at the febrile phase higher by 0.35 times lower compare to the recovery phase, while transferring patients who are in critical phase is higher by amount of 2.23 times with respect to the recovery phase. Temperature, rainfall and humidity were associated as geographical factors for the discharge patients. However rainfall and temperature were related to the death of dengue patients, temperature is only geographical factor among the transferred patients. Rainfall is highly associated to the dead patients when compared with discharged patients. In year 2007, the risk of death of a dengue patients are 8.8 times higher than year 2008.
Platelet count is related with October and December only among the months. Also, it indicates that male patients are having higher platelet count than the female patients. DHF1 and DHF2 patients show lesser platelet counts than the DF patients, while lowest values is having for the DHF2. So, it implies that the risk is higher for the patients who are classified as DHF2 for the dengue patients.
Another important result is that the random effect related to the district and the parameter λ are both highly significant, indicating that taking the district as a cluster variable is justified.

Checking for Supremacy of the Joint Model
It is important to compare the performance of the joint model with the two separate univariate models to express the necessity and performance of the joint model. Summation of the model fit statistics of the two univariate models was used to compare with the model fit statistics of the joint model to check the advantage of using the joint model over the univariate models. This is given in table3.

Conclusion
The main objective of this study was to analyze the dengue patients through the length of stay of dengue patients under the clustering nature. Since LOS of dengue patients is highly associated with the platelet count, joint model was formulated to analyze the dengue patients through the different destination of LOS and platelet count. Here platelet count was considered by converting to the logarithm to achieve normality. The proposed shared parameter model was fitted through two sub models that is multilevel discrete time competing risk model and multilevel normal model by using PROC NLMIXED procedure in SAS. Here multilevel discrete time competing risk model was fitted by the multinomial logistic model after converting the person-data set into the person-period data set. As the multinomial distribution was not supported by PROC NLMIXED, the likelihood of two univariate models were programmed to fit the joint model.
Factors that have an effect on different destination of LOS are, time indicators, year, age, classification, rainfall, temperature and humidity, while age, sex, classification, year place treated, rainfall, temp and humidity are associated factors for the log platelet count of a dengue patients. This study implies that the risk of death is higher for the patients who are at the critical phase rather than recovery phase. Also, there was a higher relation with the rainfall for the dead patients. Moreover, the patients who are from DHF2 are having higher risk.
The results of the univariate and joint model for checking the best model, proved that the joint model is better than two univariate models.