The Statistical Distribution of Annual Maximum Rainfall in Colombo District

The modeling of extreme rainfall events is a fundamental part of flood hazard estimation. Establishing a probability distribution to represent the precipitation depth at various durations has long been a topic of interest in hydrology, meteorology and others. The daily rainfall data of 110 years (1900-2009) have been collected from the Meteorology station, Colombo, Sri Lanka. The data were then analyzed to identify the maximum rainfall received on any one day (24 hours duration), in during any monsoon season (4 seasons) and in a year (365 days period). The objective of this paper is to identify the best fit probability distribution of annual maximum rainfall in Colombo district for each period of study. Distribution parameters were estimated by using the maximum likelihood method. Three statistical goodness of fit test were carried out in order to find the best fitting probability distribution among 45 probability distributions for annual maximum rainfall and maximum rainfall for 4 seasons separately. After finding three best fitting distributions from the respective tests, the parameters of the selected probability distributions are used to generate random numbers for actual and estimated maximum daily rainfall for each period of study. The best fit probability distribution was identified based on minimum absolute deviation between actual and estimated values. Based on this fitting distribution, rainfall magnitudes for different return periods were calculated. The log-Pearson 3 and Burr (4P) were found as the best fit probability model for the annual and first inter monsoon season period of study, respectively. Generalized extreme value distribution was observed in remaining period of monsoon seasons. Further, the distribution reveals that the 216 mm or more of annual maximum daily rainfall return period is ten years. Similarly, the relevant estimates of return levels are listed against the return periods for extreme rainfall events during the four seasons of a year. T. Mayooran and A. Laheetharan 108 ISBN-1391-4987 IASSL


Introduction
In the advent of global warming, there are increased concerns regarding extreme weather events. As elsewhere across the globe, South Asian countries have been observing an increase in occurrence of extreme climate events in recent decades. Many researchers have found evidences of increasing extreme weather events such as heat waves, cold waves, floods, droughts and severe cyclones over the past few decades. Extreme rainfall events can have severe impacts on people's life. An investigation of extreme rainfalls by the scientific community in any country serves several purposes such as: the estimation of extreme rainfalls for design purposes; the assessment of the rarity of observed rainfalls; and, comparison of methods to estimate design rainfalls. This study also focuses the last purpose and a detailed regionalized study is practically useful for the planners and other users.
Moreover, information of spatial and temporal variability of extreme rainfall events is very useful for the design and construction of certain projects, such as dams and urban drainage systems, the management of water resources, and the prevention of flood damage as they require an adequate knowledge of extreme events of high return periods. In most cases, the return periods of interest exceed usually the periods of available records and could not be extracted directly from the recorded data. Therefore, in current engineering practice, the estimation of extreme rainfalls or flood peak discharges is accomplished based on statistical frequency analysis of maximum precipitation or maximum stream flow records where available sample data could be used to calculate the parameters of a selected frequency distribution. The fitted distribution is then used to estimate event magnitudes corresponding to return periods greater than or less than those of the recorded events do. Accurate estimation of extreme rainfall could help alleviating the damage caused by storms and floods and it can help achieving more efficient design of hydraulic structures. Several probability models have been developed to describe the distribution of annual extreme rainfalls at a single site. However, the choice of a suitable model is still one of the major problems in engineering practice since there is no general agreement as to which distribution, or distributions, that should be used for the frequency analysis of extreme rainfalls. The selection of an appropriate model depends mainly on these evaluations that yield very different conclusions than that of previous researches on this subject. Applications of probability distributions to rainfall data have been investigated by several researchers from different regions of the world. Hirose (1994) have found that the Weibull distribution is the best fit for the annual maximum of daily rainfall in Japan. Nadarajah and Withers (2001) and Nadarajah (2005) provided the application of extreme value distributions to rainfall data over sixteen locations spread throughout New Zealand and fourteen locations in West Central Florida, respectively. Further, Nadarajah and Choi (2007) have studied annual maxima of daily rainfall for the years 1961-2001 for five locations in South Korea, and the generalized extreme value distribution is fitted to data from each location to describe the extremes of rainfall and to predict its future behaviour. They suggested that the Gumbel distribution provides the most reasonable model for four of the five locations considered. Chu et al. (2009) have applied the generalized extreme value distribution for extreme rainfall events in Hawaii Islands using three different methods based on the mean annual number of days on which 24-h accumulation exceeds a given daily rainfall amount, the value associated with a specific daily rainfall percentile, and the annual maximum daily rainfall values associated with a specific return period. For estimating the statistics of return periods, the threeparameter generalized extreme value distribution is fitted using the method of Lmoments, and the spatial patterns of heavy and very heavy rainfall events across the islands are mapped separately based on the above three methods for annual maximum of daily rainfall data. Hanson and Vogel (2008) have studied the probability distribution of daily rainfall in the United States to represent the precipitation depth at various durations which has long been a focus of interest in hydrology. Sharma and Singh (2010) analyzed the daily maximum rainfall data of Pantnagar, India for a period of 37 years for annually, seasonally, monthly and weekly, and the best fitted probability distribution is identified using the least square method among the 16 compared distributions. Deka and Borah (2009) have derived the best fitted distribution to describe the annual series of maximum rainfall data for the period 1966 to 2007 of nine distantly located stations in north east India, and they considered only five extreme value distributions. In Sri Lanka, Baheerathan and Shaw (1978) have analysed data from 8 to 24 years in different stations by fitting Gumbel distribution. Dharmasena and Premasiri (1990) have used 25 years of data related to five regions in Sri Lanka, and fitted Gumbel distributions. Varathan et al. (2010) have used 110 years data in Colombo district to analyze the annual maximums of rainfall, and found that the Gumbel distribution is the best fitting distribution. Therefore, a regionalized study on the probability modeling of extreme rainfall is very much essential as the probability model may vary according to the geographical locations of the area considered. In this study an attempt has been made to study the annual maximum daily rainfall data in Colombo, Sri Lanka, and the findings of the same along with the methodology adopted are presented. Further the best fitted probability distribution model is determined by considering more generalized forty five probability models. When considering the problem of selecting a probability distribution model to describe a maximum rainfall data, there are number of goodness-of-fit tests. Rather than selecting one of these tests, the present paper looks at how triple tests can be combined to make the final selection.
The paper is organized as follows. In section 2 description of the data is given. The methodology for fitting forty five probability models, and identifying best fitted procedures are described in section 3. Finally, the results of the best fitted probability distribution models and their implications are discussed in section 4.

Study Area
Sri Lanka is an island situated in the Indian Ocean, north of equator and off the lying farthest towards south tips of India. It lies between latitudes 5° 55` and 9° 50` N and longitudes 79° 42 and 81° 53 E. The surface area of the island is 65,635 sq.km and its greatest length from north south is 430 km. Sri Lanka consists of a central sloping on all sides from the Piduruthalagala peak 2528 meters to the sea. Based on the altitude the island is divided to Low, mid and Up country. The Low country is demarcated as land below 300 meters elevation, the Mid country as land between 300 meters and 1000 meters elevation, and Upcountry 1000 meters and above. The climate of Sri Lanka is strongly affected by the topographical features such as ridges, peaks, plateaus, basins, valleys and escarpments of the country, and it is classified as tropical monsoon, a wet and dry climate but with only a brief dry season, according to Koppen's classification of climates (Boucher, 1975).The climate experienced during 12 months period in Sri Lanka can be characterized in to 4 climatic seasons, First Inter monsoon Season (March-April), Southwest monsoon season (May -September), Second Inter monsoon season (October -November), Northeast Monsoon season (December -February). The data consists of daily rainfall for the years from 1900 to 2009 for the Colombo meteorological station. Fortunately, there are no missing values during the period. The data obtained from the Department of Meteorology, Colombo, Sri Lanka which lists the daily rainfalls in millimeters. The extreme values selected from the tabulated daily data. The heavy rainfall and the associated floods and landslides affect many areas. In particular, Colombo, the capital of the country faces serious flooding problems in low-lying areas due to extreme rainfalls. Due to these extreme rainfall events, water levels will rise, and in turn will affect the coastal economy and more land will be covered with flood. The quality of the water is also be degraded due to these reasons.

Methodology
The present study is based on time series data related to maximum daily rainfall annually, and seasonally. The randomness of the data set were checked by using the autocorrelation plots. The general advantage of the regional approach analysis is that more data are available for the probability modeling, and parameter estimates become more reliable and spatially more coherent. In the frequency analysis problem, standard statistical techniques are used to model the extreme rainfall events from daily rainfall data. Then several standard probability distributions are used to identify the best fitted model.
Annual one-day extreme rainfall is usually defined as the maximum daily rainfall within each year, and one would have as many extreme values as the total number of years. The annual rainfall of this area is over 2000 mm, which is subjected to large variation. The data were then processed to identify the maximum rainfall received on any one day (24 hours duration) in a monsoon season and in a year (365 days period).
If X 1 , X 2 , … , X 365 are daily rainfall values, then the data selection point (extreme point) value is Max{X 1 , X 2 , …, X 365 }; where X i is the daily rainfall in mm of any particular year, for i= 1, 2, 3…...365. The best fitted probability distribution was evaluated by using the following systematic procedures.

Checking data homogeneity based on autocorrelation function (ACF)
Autocorrelation plots (Box and Jenkins, pp. 28-32) are a commonly used tool for checking randomness in a data set. This randomness is ascertained by computing autocorrelations for data values at varying time lags. If random, such autocorrelations should be near zero for all time-lag separations. If non-random, then one or more of the autocorrelations will be significantly non-zero. Autocorrelation plots are formed by vertical axis autocorrelation coefficient( ℎ ) versus horizontal axis time lag h (h = 1, 2, ...), where and the variance function 0 If the autocorrelation plot is being used to test for randomness (i.e., there is no time dependence in the data), the following formula is recommended: where n is the sample size, Z is the percent point function of the standard normal distribution and is the significance level. In this case, the confidence bands have fixed width that depend on the sample size. This formula is used to generate the confidence bands in the ACF plot.

Fitting the probability distributions
The generalized extreme value, gamma, Gumbel max, inverse Gaussian, log normal, normal, Pearson, Weibull were used probability models for evaluating the best fitted probability distribution for rainfall. In addition, these different forms of distributions and various other standard statistical distributions were also employed, and the following 45 probability distributions were applied: beta, Burr (

Testing the goodness of fit and identifying the best fitted probability distribution
The goodness-of-fit test measures the compatibility of random sample with the theoretical probability distribution function. The Anderson-Darling test (Stephens, 1974) is used to test if a sample of data came from a population with a specific distribution. It is a modification of the Kolmogorov-Smirnov (K-S) test and gives more weight to the tails than does the K-S test. The K-S test is distribution free in the sense that the critical values do not depend on the specific distribution being tested. The Anderson-Darling test makes use of the specific distribution in calculating critical values. This has the advantage of allowing a more sensitive test and the disadvantage that critical values must be calculated for each distribution. The Anderson-Darling test is an alternative to the chisquare and Kolmogorov-Smirnov goodness-of-fit tests.
The goodness of fit tests are used for testing the following null hypothesis: The maximum daily rainfall data follow the specified distribution : The maximum daily rainfall data does not follow the specified distribution.
Kolmogorov-Smirnov and Anderson-Darling tests are used along with the chisquare test at (0.01) level of significance to select the best fitted probability distribution model. Consequently the ranking of different probability distributions were marked from 1 to 45 based on minimum test statistic value. The probability distribution having the first rank was selected for all the three tests independently. Thus the procedure for obtaining the best fitted probability distribution model on the basis of all the three identified probability distributions is explained in the next section.

Generating random numbers
First the parameters of the three probability distributions were used to generate the random numbers. A commonly used technique is called the inverse transform technique. Let be a uniform random number in the range (0, 1). If = −1 ( ) , then is a random variable with CDF, = , and where ∈ ℝ.
This can be used as a random number generator to generate numbers according to the uniform distribution, and this method can be used as a technique of generating the any random variable with a known probability distribution.

Minimum absolute deviation method
The generated random numbers for the selected distributions are treated as estimated values for actual rainfall values. These estimated values are then compared with actual rainfall values to compute the absolute deviations (AD) for each period: where, is the actual rainfall value, is the estimated value ( = 1, 2, … . . , ).
Among the selected three distributions, the best fitted probability distribution was then identified based on the minimum absolute deviation between actual and estimated values.

Return period
Return period (T): Once the best probability model for the data has been determined, the interest is in deriving the return levels of rainfall. The T year return level, say x T , is the level exceeded on average only once in T years. For example, the 2-year return level is the median of the distribution of the annual maximum daily rainfall.
Probability of occurrence (p) is expressed as the probability that an event of the specified magnitude will be equaled or exceeded during a one year period. If n is the total number of values and m is the rank of a value in a list ordered descending magnitude (x 1 > x 2 > x 3 ... > x m ), the exceeding probability of the m th largest value, x m , is ≥ = . 7 (See Ramachandra Rao and Hamed, page 6-7). A given return level x T with a return period T may be exceeded once in T years. Therefore, If the probability model with CDF, is assumed then on inverting and get the general expression

Results and Discussion
The methodology presented above was applied to the 110 years observational data in which lists the maximum rainfall in millimeters (mm) were taken from department of Meteorology, Colombo. Accordingly the data was classified into five data sets as mentioned the study period in Table 1. These five data sets were classified as 1 annual and 4 seasons to study the probability distribution pattern at different levels.  The identified distributions are listed in Table 2 with the estimated parameters for each data set. It is also observed that some of the probability distributions have the first rank in both Kolmogorov Smirnov and Anderson Darling tests. The estimated parameters were used to generate random numbers for each data set and the least square method was used for the rainfall analysis. The random numbers were generated for actual and estimated observations for all the 110 years, and the residuals were computed for each data set. Sum of these deviation were obtained for all identified distributions. The probability distribution having minimum deviation was treated as the best selected probability distribution for the individual data set. The best selected probability distribution for each data set is presented in Table 3.  Table 4. It has been predicted that the 2 year return period's return level is approximately 129 mm, which means rainfall of 129 mm or more, should occur at that location on the average only once every two year. In other words, the average 216mm or more daily extreme rainfall event occur for the period of every ten-year with the occurrence probability 0.1000. Among the monsoon seasons considered, the south-west monsoon appears to be associated with the highest return levels. As notice that, the north-east monsoon has the lowest return levels.

Conclusion
This paper has demonstrated a probability modeling of maximum daily rainfall in Colombo, Sri Lanka using more variety of distributions. The results of rainfall study for identifying best fitted probability distributions revealed that the best probability distributions for the maximum daily rainfall for different data set are different. The log-Pearson 3 (3P) and Burr (4P) distribution were found as the best fitted probability distribution model for the annual and first-inter monsoon season period of study, respectively. Generalized extreme value was observed in balancing three monsoons season period as best fitted probability distribution model. The scientific results clearly established that the analytical procedure devised and tested in this study may be suitably applied for the identification of the best fitted probability distribution of weather parameters.
Average values of the maximum daily rainfall amounts corresponding to return periods of 2-to 200-yrs, are derived together with their uncertainties.Despite the encouraging results of our analysis, the estimates of the extreme return values may have a limited validity. It is shown that the regional approach leads to superior results, and a similar framework may be useful in fitting probability model for maximum rainfall in other parts of the region.