Factors Influencing Traffic Accidents in Jaffna

Traffic accidents have become a very serious public health problem around the world. It was identified as one of the leading causes of the death and injury in Sri Lanka. Each year over 40,887 road accidents occur in Sri Lanka, causing on average six fatalities every day, several hundred are left seriously injured, and some with lifelong consequences. Jaffna district is the northernmost region of the Island of Sri Lanka and it is now experiencing rapid economic growth after the civil war. This leads to a huge increase in motorization rate associated with rapidly expanding road construction. As a result, in recent years, the rate of occurrence of traffic accidents in Jaffna is increasing drastically. This motivated us to study the factors influencing the severity of the accidents in Jaffna. In this study, based on Jaffna police records, 692 accident cases were considered during the period 2010-2013. The variable “Accident severity” (Fatal/Non-fatal) is considered as a dichotomous response variable, and the factors Time, Location, Type of vehicle, Gender, License status, Cause of accident, and Type of accident are treated as influencing factors on the accident severity. The main focus of this study is to identify the most influential factors involved in accident severity. Because of the binary nature of the response variable, we have used logistic regression approach for the analysis. After a series of statistical analyses were conducted, independent variables “Type of vehicle” and “Age” were identified as more influential variables influencing the accident severity. Results from this study reveal that the fitted logistic regression model can be used for the safety improvements against the traffic accidents in Jaffna. S. Renuraj, N. Varathan, and N. Satkunananthan 118 ISSN 2424-6271 IASSL


Introduction
Traffic accidents are a serious public health problem and one of the leading causes of the death and injuries around the globe with ever rising trend.The magnitude of the problem of road traffic injuries in Sri Lanka significantly increased in the last decade.Road accidents cause a significant number of deaths annually in Sri Lanka.Each year over 40,887 road accidents occur in Sri Lanka, causing on average six fatalities every day.Nearly 9000 car and van accidents occur annually with over 5400 accidents caused due to speeding.Over 740 pedestrians die every year, recording two pedestrian deaths per day.Of the total number of road accidents that occur each year, over 2471 result in accidental deaths.Several hundred are left seriously injured, some with lifelong consequences.Nowadays, the rate of occurrence of traffic accidents in Jaffna peninsula is increasing drastically.Much against the popular belief, it is the light vehicles that are most frequently causing traffic accidents.The main goal of this study is to identify the most influential factors of traffic accidents on accident severity in Jaffna.Through analyzing the factors affecting road accidents one can come up with good strategies to easily overcome this potential problem.
Our study aims at examining not all the factors, but some believed to have a higher potential for serious injury or death, such as gender, accident time, location, age, vehicle type, accident type, license status, accident cause.Other factors were not examined because of substantial limitations in the data obtained from accident reports.Generally, Logistic regression and other related categorical-data regression methods have often been used to assess risk factors for various diseases.However, logistic regression has been used as well in transportation studies (Maiti and Bhattacharjee, 2001).In this study because of the binary nature of response variable, the Logistic Regression approach is used to identify the factors that significantly affect the severity of accidents.Many scholars have investigated the factors influencing the traffic accidents worldwide (For example, Baguley, 2001;Clarke et al. , 2005) .Recently, Baruah and Chaliha (2015) have analyzed the incidence of alcohol consumption among the victims of road traffic incidents brought for autopsy to the Department of Forensic Medicine, GMCH, Guwahati.They found that males are most exclusively involved in accidents following alcohol consumption with 20-29 age group most affected.Majority of the victims were lowly educated and were pedestrians or riders of two wheelers.Singh et al. (2014) have done a study on road traffic fatalities in adults of North West India.This study was based on the autopsy records of unnatural deaths occurred in a leading tertiary health care center of North West India.The adult road traffic fatalities constituted 41% of all unnatural deaths with male preponderance (89.6%) throughout the study period.People in the age group 21-30 years (32%) particularly from rural areas (57%) were most affected.The pedestrians and two wheeler users formed the majority of fatalities (78%).Collision between two wheeler and light motor vehicle was the most common crash pattern and injury to head & neck region was the most common cause of death.Maximum number of accidents occurred between 4pm to 8pm (28%) and in the month of November (11%).Unskilled workers, agricultural workers and government employees constituted a larger proportion of fatalities (45%).Shruthi et al. (2013) have conducted a retrospective observational study in the Department of Forensic Medicine and Toxicology, Kempegowda Institute of Medical Sciences, Bangalore between January 2010 to December 2012.Results of this study revealed that, out of 225 autopsied Road Traffic Accidents (RTA) victims, 55.11% victims were between 21-30 years of age, males constituted 78.22% of the total victims, and four wheeler vehicles were involved in 68.44% RTAs.Maximum RTAs occurred during the daytime, between 6 AM to 12 PM.Head injures constituted 30.22% of the total injuries, followed by injuries involving abdomen, thorax and limb.Haemorrhagic shock caused 63.11% of deaths, while head injury caused death in 30.22% of cases.Singh et al. (2013) have done the study on Elucidation of risk factors in survivors of road traffic accidents in North India.This study was conducted from 1 March 2012 to 30 May 2012 at the Trauma Centre of King George's Medical University, Lucknow, India and the questions were asked from survivors of road traffic accidents using a pretested questionnaire after they received pre-medical care.At the end of the study, it was found that severe injuries are more likely to be due to over-speeding of vehicles, not using helmets and seat belts.Another study done in India by Dileep Kumar et al. (2013) discuss death due to fatal road traffic accidents.Dovom et al. (2012) have used binary logistic regression analysis to analyze the pedestrian fatal accident severity in Iran.In their study, first, the univariate analysis was performed to identify the potential risk factors that significantly influence the probability of pedestrian death at the scene.To understand further, a multivariate model was performed then by entering all the significant independent variables identified during the univariate analysis.Further, the logistic regression technique was used to estimate the odds ratios at 95 percent confidence intervals as a determinate of which variables should be included.This study reveals that pedestrian fatality probability at the scene was significantly related to age, involved vehicle and place of accident.There was an increase in probability of death at the scene when a heavy vehicle struck a pedestrian, an accident occurred on a rural road or the pedestrian's age was between 7 and 18. Singh and Aggarwal (2010) have analyzed the fatal road traffic accidents among young children in Muzaffarnagar.In this study, descriptive statistical analysis was used and it was found that fatal road accidents are a major cause of childhood mortality up to sixteen years of age involving mainly males.Pedestrians and cyclists were the common group injured and majority of the accidents occurred during the winter season.Jabbar et al. (2009) have done the cross sectional type of descriptive study to explore the risk factors related to road traffic accidents in context of Bangaladesh.On the basis of their findings, there was no concrete conclusion regarding exploration of the risk factors but it was stated that a greater part of Road Traffic Accidents (RTA) can be prevented by specific preventive measures and also taking personal precautions.Komba (2007), have done a case study on Risk factors and road traffic accidents in Tanzania.This study has revealed the pattern and trends of motor traffic accidents in Kibaha district from 2001 to 2004.It shows that the accident occurrence was increasing every year, passengers and pedestrians are always at highest risk of being injured or killed on the road, young males are highly prone to motor traffic accidents.Males are more involved in road accidents than females; the risk of dying in an accident during the night was significantly higher than during the day, especially when it was raining.Further Age, sex, over speeding, reckless driving, being a pedestrian, or a motor cyclist were identified as risk factors to motor vehicle crashes.This study has also identified qualitatively (by interviews) that the technical element of the highway construction, corruption, irresponsibility, poor management, driving while using cell phone, driving without training, failure to respect and obey traffic regulations, bad condition of vehicles, age of the vehicles and poor condition of services as the important risk factors associating to the cause of traffic accidents in Kibaha district.Somasundaraswaran (2006) has analyzed accident statistics of Sri Lanka during 1989-2005.He found that the main reason for the rapid increase of traffic accidents is due to alarming rate of vehicle ownership together with inadequate road network development to support the demand.Kumarage et al. (2000) studied the relationship among the various causes of accidents in Sri Lanka by using the accident statistics of the year 1997.They found that speed related accidents be the most contributory of fatal accidents.Vehicle defects, driving on the wrong side and aggressive driving were identified as being significant causes of fatal accidents.Other local studies in this respect include Dharmaratne and Ameratunge (2004); Jeepura and Pirasath (2012).Al-Ghamdi (2002) has also used the logistic regression approach to examine the contribution of several variables on the accident severity in Riyadh.This study reveals that, location and cause of accident are the most significant factors associated with accident severity.The findings of this study also showed that logistic regression as used in this research is a promising tool in providing meaningful interpretations that can be used for future safety improvements.In this paper by considering possible factors of accident severity, we identify the most influential factors using the logistic regression technique.Further we test the interaction effect among the levels of the identified factors.Finally we perform the goodness of fit test for the reduced model.The paper is organized as follows: basic theoretical results pertaining to the logistic regression, proportion test; likelihood estimation/ratio test and the goodness of fit test approach are briefly reviewed in Section 2. Section 3 includes the main results and the discussion of the analysis.Section 4 presents our conclusions.

Large-Sample Confidence Interval for a Population Proportion p
Let p be the probability of an event of interest.One can show that ̂ = x/n is an unbiased estimate for p, if x is the number of successes in n trials.Usually p is unknown and based on a random sample one can calculate a (1-α) 100% confidence interval.A (1-α)100% large sample confidence interval for a population proportion p can be determined using, Where Z α/2 is the α/2 percentile of a standard normal distribution.

Theoretical background of logistic regression
Logistic regression model is the most popular model for binary data.It is generally used to study the relationship between a binary response variable and a group of predictors (can be either continuous or categorical).The response can take the values 1 or 0. Consider a binary response variable Y = 0 or 1and a single predictor variable Where and are the regression coefficients.
For multiple predictor variables , the logistic regression model can be written as

The likelihood ratio test
The likelihood ratio test is a test of the significance of the difference between the likelihood ratios for the researcher's model minus the likelihood ratio for a reduced model.That is, a finding of significance (p<=0.05 is the usual cutoff) leads to rejection of the null hypothesis that all of the predictor effects are zero.When this likelihood test is significant, at least one of the predictors is significantly related to the dependent variable.
In detail: If Y is coded as zero or one (a binary variable), the expression () x   provides the conditional probability denoted P(Y=1| x).A formal way to express the contribution to the likelihood function for the pairs (x i ,y i ) is through the term φ( Since x i values are assumed to be independent, the product for the terms given in the foregoing equation gives the likelihood function as follows: l(β)=∏ φ So the log likelihood expression L(β)=ln[l(β)] =∑ y i ln[π(x i )] + (1-y i ) ln[1-π(x i )]}………………...(1) Maximizing the above function with respect to β and setting the resulting expression equal to zero will produce the following value β.
The deviance factor is defined as in the following equation (2).
From the equations ( 1) & ( 2): The goodness of fit statistic G = D(for model without the variable) -D (for model with the variable) = -2ln (likelihood of the current model (without variable))-(-2ln (likelihood of the current model (with variable)) Thus, G χ² (1)

Hosmer-Lemeshow Test
The Hosmer and Lemeshow's (H-L) goodness of fit test divides subjects into deciles based on predicted probabilities, and then computes a chi-square from observed and expected frequencies.Then a probability value is computed from the chi-square distribution to test the fit of the logistic model.If the H-L goodness-of-fit test statistic is greater than 0.05, as we want for well-fitting models, we fail to reject the null hypothesis that there is no difference between observed and model-predicted values, implying that the model's estimates fit the data at an acceptable level.

Data Description
The data was collected from the records of Jaffna police station in Sri Lanka which consists of 692 series of accident statistics.The collected data represent the accidents which occurred in Jaffna peninsula during the period of 2009-2013.The data is categorized whether death is occurred or not.If the vehicle which caused the accident is unknown we omit those data.Similarly we omit the data which doesn't have required fields to our study.In this study we have considered nine variables which are shown in the Table 1.
The nature of accident is taken as a response variable; others are independent variables (explanatory).The response is binary variable, which represents two levels: 0 represents the accidents which result no fatality but at least have one injury, 1 represents the accidents which result at least one fatality.

Preliminary Study
To have an idea about the influential factors on the accident severity, the initial step is preceded by representing the data in graphical form.Since the factors influencing severity are mostly categorical, we use bar charts to represent the data.Based on the Figure 1, it is not easy to end up with any conclusion.

Reduction of levels of factors
It is better to have as few levels of factors as possible for easy interpretation.By using the one sample proportion test, we then reduce the levels of factors that are influencing the severity of accidents.The summary statistics of proportion test is listed in the Table 2. Based on the results of Table 2, the variable "Bend" under Location factor, variables "Drink driving" and "Other" under Reason factor shows nonsignificance for proportion test.Therefore, we remove those variables and continue our analysis with the remaining variables.

Defining Design Variables
In this study all the factors are categorical except Age.After the possible reduction of levels of factors, we introduce design (dummy) variables to each and every categorical variable.For example the factor accident cause has four levels.So we introduce three design variables as shown in the Table 3.

Model Fitting
Since the response variable is dichotomous (fatal/non-fatal), we apply the logistic regression model to fit this data.To estimate the parameters of the logistic regression model, we use the maximum likelihood procedure.To do this model fitting, we first consider all the eight factors of explanatory variables, with the response variable Nature of accident; this model is named as full-model.Based on the summary statistics from the Table 4, the factor Age (P-value <0.05) and the design variable V3FW1 (P-value <0.05) are statistically significant.Since the factor Vehicle type has four levels with three associated design variables (including V3FW1), we can't come to the conclusion that factor Vehicle type is significant to the model.Therefore, we perform another test called likelihood ratio test, which allows us to select the influential factors by considering the changes in deviances of full model and reduced model (particular factor dropped from the full model).

Checking significance of the factor
In this section we test the significance of the factors by using likelihood ratio test.
To perform this test we consider only the main factors, without any interaction terms.The corresponding summary statistics are listed in the Table 5.

Reduced logistic regression model
After identifying most influential factors Age and Vehicle type, we fit the reduced logistic regression model with these factors.The reduced model is of the form,

P(Y=1) =
Where the coefficients are displayed in the Table 6.Moreover, the overall significance (P-value = 0.013 <0.05) of the reduced model is retained by using the likelihood ratio test.

Checking interaction effect
Based on the above results, the factors Age and Vehicle type were identified as most influential factors.In this section we check whether there is any interaction effect among the levels of identified factors by adding interaction terms in the reduced model.The corresponding P-value (0.548) ensures that there is no interaction effect among the levels of the factor Vehicle type and Age.

Goodness of fit Test
It is the most common way to check the fitted model is appropriate or not.After obtaining the final model, we check the goodness of fit of the model by applying Hosmer-Lemeshow test.Since the p-value= 0.919 > 0.05, the identified model is well fit.

Conclusion
In this paper we have performed a statistical analysis of identifying most influential factors on the accident severity in Jaffna.Through the analysis we have identified that the factors "Vehicle type" and "Age" are more influential for accident severity.We also have fitted the logistic regression model by using the identified factors and the maximum likelihood estimation technique is used to estimate the parameters of the reduced logistic regression model.Further, it was observed that there is no interaction among the levels of identified influential factors "Vehicle type" and "Age".Moreover, the overall significance of the fitted reduced model is tested using the likelihood ratio test.Finally we have provided the impact of the factor "Vehicle type" by means of odds ratios.Therefore in future, by considering above facts one can come up with good strategies to reduce the severity of accidents.Some recommendations to reduce road traffic accidents:  To educate the young drivers on traffic safety and rules and regulations. To introduce special driving programs for school leavers. Driving license can be renewed frequently (once in a year) and the knowledge on rules and regulations, health, and driving skills must be rechecked during the renewal of the license. Fitness of the vehicle can be checked annually.
x.One wants to model E (Y | x) = P (Y = 1| x) as a function of x.The logistic regression model expresses the logistic transform of P (Y = 1| x) as a linear function of the predictor.This model can be re-written as (I) The "Intercept only" model (null model) reflects the net effect of all variables not in the model plus error.(II) The "Final" model (fitted model), which is the researcher's model comprised of the predictor variables.When the reduced model is the baseline model with the constant only (initial model), the likelihood ratio test tests the significance of the researcher's model as a whole.A well-fitting model is significant at the .05level or better meaning the researcher's model is significantly different from the one with the constant only.

Figure 1 :
Figure 1: Comparison of several factors with Number of accidents Then we check the deviance value by dropping the explanatory factors one by one from the full-model.By testing the p-value of those deviance differences, we then identify the most influential factors.Based on the above identified factors the final reduced model is obtained.After the reduction of the levels of factors we have 22 levels of factors, and 15 design variables including Age.Then we fit the logistic regression model by using the maximum likelihood estimation technique.The general model for the logistic regression is, Where, β i (i =1, 2,….15) is the effect of the i th design variable and β 0 is the constant.

Table 2 :
Summary statistics for Proportion Test * indicates status of non-significance.

Table 3 :
Defining design variables

Table 4 :
Summary statistics of design variables

Table 5 :
Summary statistics for testing factors From the Table5, the factors Time, Location, License status, Gender, Accident cause and Accident type are identified as non-significant (P-value > 0.05) factors.Therefore we remove those factors from the model and finally we have selected, Age and Vehicle type are most influential factors on the accident severity.

Table 6 :
Summary statistics of coefficients of reduced model