Estimating Extreme Losses for the Florida Public Hurricane Model

While the world thinks of coastal Florida as a paradise and retirement haven, residents in these areas don’t always agree with that depiction. Living under the threat of hurricanes for six months of the year and paying enormous sums of money for wind storm insurance is not exactly paradise. However this has not deterred people from wanting a piece of paradise and migration to Florida has continued unabated. Exposure has increased significantly along coastal regions causing insurance companies to reevaluate their risks. They still focus on estimation of annual insured loss, but increasingly they want to be prepared for extreme losses. This paper attempts to look at various methods of estimating extreme quantiles of the loss distribution in the Public Hurricane Loss Model. Both nonparametric and parametric models are used to estimate the catastrophic quantiles and then compared for accuracy. We found that the Weibull distribution fitted the data very well compare to simple exponential and GDP distributions.


Introduction
Are residents of Florida living in paradise or insurance hell? It is always sunny, the sky is always blue, there is no winter to speak of, residents can play tennis all year round, swim all year round, so why do residents wonder about the wisdom of living on the coast of Florida? It is the insurance costs and the threat of hurricanes six months of the year! Anyone buying a new home in coastal Florida will attest to the difficulty of getting windstorm insurance for their home and if they are lucky enough to get insurance, it is extremely expensive. A number of residents opt to go without windstorm insurance because they realize that the cost of obtaining insurance is far higher than any payouts they can expect to get. This was not always the case. Insurance was easy to obtain and not prohibitively expensive until Hurricane Andrew struck Florida in 1992. According to the Insurance Institute of Florida (McChristian, 2012), when it struck; Andrew was the costliest natural disaster in U.S. history in terms of insurance payouts. The insurance claims payout totaled $15.5 billion at the time ($25 billion dollars in 2012). At the time the report was written Hurricane Andrew was the second costliest natural disaster in the US, following damages caused by Hurricane Katrina in 2005. Prior to Andrew most insurance companies were using past losses to compute premiums. However, there had not been any major hurricanes in the state of Florida for a number of years leading to an underestimate of the insured losses. Several companies had to declare bankruptcy after Andrew. It was Andrew and its catastrophic damage that led to the reevaluation of the calculation of insured losses from catastrophic events and underscored the need for the estimation of losses caused by the "One in a hundred year event" or Probable Maximum Losses. Hurricane Losses are now evaluated through the use of computer simulation models. These models use historical data to simulate thousands of years of hurricane activity to estimate insured losses. To be used for rate making purposes in Florida, these models have to be certified by an independent body called the Florida Commission for Hurricane Loss Projection Methodology (FCHLPM). In order to be certified by FCHLPM, modelers have to demonstrate compliance with a set of rigorous standards. There are currently five models certified for rate making purposes in Florida. Four of these are owned by private commercial companies and one is publicly funded and has been developed by a team of scientists in the state university system (SUS) in Florida; the Florida Public Hurricane Loss Model (FPHLM). The FPHLM simulates 50,000+ years of hurricane activity. These simulated hurricanes are then tracked till they make landfall in the state of Florida. This allows the model to estimate the wind risk in any region in Florida. This information is then used by the engineers and the actuaries respectively to estimate structural damage and insured losses. The primary output of the model is the annual expected insured loss at any given address. For details on the FPHLM, we refer the reader to Hamid et al. (2008Hamid et al. ( , 2010 However as said earlier, from an insurer's perspective, having an estimate of the annual average insured loss will not help them in an event like Hurricane Andrew. They want to be able to hedge against catastrophic events like Andrew which have the potential to bankrupt them. Therefore in the recent years, a lot of attention has focused on the estimation of these extreme events which occur rarely but can have disastrous effects, in other words, attention has shifted from average losses to high quantiles of the loss distribution, also referred to as Value at Risk (Var) or Probable Maximum Loss (PML) in the insurance industry. This paper discusses different methods used to estimate the PML for the Florida Public Hurricane Loss Model. The organization of the paper is as follows. We discuss the theory and the various estimation methods in Section 2. Section 3 discusses the applications of these methods to FPHLM. This paper ends up with some concluding remarks in section 4.

Notations and Preliminaries
As mentioned earlier, the estimation of Probable Maximum Loss (PML) is extremely important in catastrophic event analysis. PML is defined as a loss that is exceeded with a very small probability p (close to 0). In other words, PML concerns itself with the estimation of x* p , such that P(X > x* p ) = p, where X is the random variable that represents losses. Simply put, if F represents the c.d.f. of losses, then F(x* p ) = 1-p, where p is such that 1-p is close to one (see Matthys and Bierlant (2003).) A PML is often accompanied by a return period (which is the average number of years that must pass before such a loss is observed); and the return period associated with x* p is 1/p. PML can be estimated by nonparametric and parametric methods. Note that while PML is often computed on the basis of annual losses, it can also apply to annual maximum losses. Perhaps the simplest way to estimate PML is nonparametrically where the extreme quantiles are simply estimated by their empirical counterparts. This is the method that has been used by the Public model to estimate extreme quantiles and is detailed in the next section:

Non-Parametric Methods
Non-parametric procedures to compute the PML assume that the empirical loss distribution is a close substitute for the population loss distribution, free from any parametric constrains. PML can be produced non-parametrically through order statistics. To estimate PML corresponding to the 100p th percentile, the k th order statistic, X k , is used; where k is determined by sample size N multiplied by p. If the result is not an integer, the smoothed empirical estimate is applied to interpolate two adjacent order statistics through, PML p =(1-h)x j +hx j+1 where j=[(N+1)p] and h=(N+1)p-j; here [.] indicates the greatest integer function (Wilkinson (1982), Hogg and Klugman (1984)). This method, however, is not applicable for PML p where p>N/(N+1).
To obtain confidence intervals for PML p , we use the well known result ( see Section 3.2 of Practical Nonparametric Statistics by WJ Conover) that for any 1 ≤ j ≤ N, the probability that The above implies that for some r < s ≤ N, 95 0 1 Hence to construct a (1α)100% confidence interval for PML p , we need to find r and s with r <s (done through a numerical search) such that If the solution from the computer search is not unique, the pair of r and s that minimizes s-r is selected to give the narrowest interval. An approximate 95% confidence interval for PML p is given by (X r , X s ) using a large sample approximation. The large sample approximation assumes normality to obtain r and s as, In case any value of r and s is not an integer, the smoothed empirical estimate is used. While extremely simple to use, nonparametric methods to estimate extreme quantiles pose some risk. Data can be scarce in the upper tail of the distribution leading to biased estimates, especially in the case of heavy tailed distribution. The presumption of heavy tails in the case of catastrophic events has given rise to a plethora of different techniques to model extreme quantiles, chief amongst them being "the annual maxima methods" and the "Peak Over Threshold (POT) methods.

Annual Maxima Methods (or Block Maxima Methods)
The annual maxima method belongs to a general class of methods called the Block maxima methods. Here, instead of trying to find a distribution that fits the entire data set (or the tail for heavy tailed distributions), the investigator tries to model maximum values in a certain period (called blocks.) The fitted model then is used to estimate extreme quantiles. In this paper, we consider the annual maxima method as proposed by Gumbel (1958) Harris (1996) to consider weighted least squares method for estimating parameters.

Peaks-Over-Threshold (POT) Method
The peak over threshold method to model extreme events has steadily gained popularity in the recent years and is probably used by the majority of practitioners of extreme value statistics. In this scenario, the modeler is mainly interested in estimating the high percentiles of losses over a threshold u. In other words, the modeler is interested in the distribution of Y = Xu, provided X exceeds u. Mathematically, we let F u represent this conditional distribution, which is described as follows: where u>0 and x≥0.
It is well known then that under certain conditions and a large enough threshold (Pickands (1975)), F u is in limit a Generalized Pareto Distribution (GPD) with c.d.f given by: where s >0 and x≥0 when g ≥0 and x ≤ -s/g, when g < 0. So for large thresholds u, We use this method as detailed in Matthys and Bierlant (2003). Assume that our original loss data are given by X 1 , X 2 , . . . , X N . We select a sufficiently high threshold, u (typically chosen as the order statistic corresponding to a high percentile of the losses; say the 75 th percentile or higher) and let N u be the number of exceedances above u. So if u is the (k+1)st largest observation X (n-k) , then N u is k and the exceedances are given by We then estimate the parameters of the GPD using the exceedances and let these estimates be given by ŝ and ĝ . Then the conditional tail u F of F is estimated as: and is therefore estimated by: Once again from Matthys and Bierlant (2003), we can invert the above equation to get an estimate of high quantiles above the threshold u as follows: Note that the above method can only be used if p < N u /N. Besides using the GPD to fit the conditional tail, we also investigated the use of the exponential and the Weibull distributions as possible fits for the tail. Past research by Yang et. al. (2011) has shown that the losses from PHLM do not tend to be heavy tailed and therefore it seemed prudent to investigate the use of some skewed light tailed distributions as possible fits. Recall that the CDF of exponential distribution is given by , >0 x e F x q q -=and 0 x > Note that exponential distribution is a special case of GPD. The CDF of Weibull distribution is given by The methodology to estimate the extreme quantile is the same as that for the GPD and so we will not describe it here again. A big component of the quantile estimation using the POT method is the selection of the threshold value u. As suggested earlier, the threshold value u is often unknown but chosen to be one of the order statistics, say the (k+1)st largest observation or X (n-k) . For our paper we chose three values of k; that corresponded to the 75 th percentile, 80 th percentile and 85 th of the data distribution. While there are a number of methods available to estimate the parameters, we used the maximum likelihood method to estimate them. For more on POT method, the readers are referred to Leadbetter (1991) and McNeill and Saladin (1997) among others.

Data Analysis
Data analysis was conducted on losses generated from the latest certified version of the PHLM; PHLM 5.0. The model generates 56,000 years of hurricane activity in the state of Florida and thus 56,000 years of losses. In keeping with historical hurricane frequencies, where the majority of the times Florida has no land falling hurricanes; 50.9% of the losses were zero. As a result the annual maxima method to estimate high quantiles worked very poorly for our data set. As seen in Figures 3.1 and 3.2, the annual maximum losses are very skewed due to the high proportion of zeros and the Gumbel distribution does not fit the data at all. Hence we decided to use the POT method to estimate PML using both the annual maximum values and the total losses. The estimated PML's were compared to the ones obtained via the nonparametric method. To find the PML using the POT method, as in Matthys and Bierlant (2003), we truncate the data at the threshold value and use the conditional tail to compute estimates of the parameters. For example, to model the tail of the annual total loss using the threshold 75 th percentile, we considered only the highest 25% of the data that are higher than the 75th percentile. We then subtract the value of the cut-off point from the original data and then use the proposed distributions (GPD, exponential or Weibull), to model these tail data,. The extreme values are then estimated using equations (2.2-2.4). The estimated extreme percentiles or the PML values for maximum annual loss have been estimated for the 80th through 99th percentiles using the proposed distributions and are presented in Table 3.1 and the corresponding goodness of fit tests are presented in Table 3

Summary and Concluding Remarks
This paper considers various distributions namely, GPD, Weibull and Exponential for estimating the extreme quantiles of the loss distribution for the Public Hurricane Loss Model (PHLM). We consider both total annual loss and maximum annual losses. Both nonparametric and parametric models are used to estimate the catastrophic quantiles and then compared for accuracy. For the parametric case, we used the POT method and used the maximum likelihood method to estimate the model parameters. From the empirical analyses, it is evident that the most useful and widely used Weibull distribution fitted our data very well compare to simple exponential and GPD distributions. The conclusion of the paper is limited for the FPHLM data. For any definite conclusion to be drawn from these results, we might need to use some more quantile based methods as well as semiparametric methods. However, this paper might motivate the FPHLM to use parametric methods to estimate the PML as compared to the nonparametric methods we have been using so far.