Can We Ex Post Facto Justify Duckworth-Lewis Rule?

In a one-day international cricket match, due to disturbances such as rain or storm, at least one of teams cannot bat for stipulated fifty overs. The Duckworth-Lewis rule is then applied so that the match possibly ends up with a decision. We explore whether the rule can be justified based on statistical analysis of outcomes of oneday matches in which all the stipulated overs were bowled as well as matches in which it was needed to apply the rule. Our analysis shows that the rule is quite fair.


Introduction
In One Day International (ODI) cricket matches, each team gets 50 overs both to bowl and bat. However, due to rain, storm or some other reasons, the teams do not have enough time to bat the stipulated 50 overs. Sometimes, during the first inning itself, anticipating that 100 overs may not be completed, the umpires curtail the number of overs of the team which bowls first. More frequently, the team to bat second does not get all the 50 overs. In both these cases, Duckworth-Lewis (DL) rule is applied, which assigns a fewer number of overs for the team which bowls first or sets a target (in terms of runs to be scored) for the team which bats second (or both). The target for the team which bats second is set depending upon the actual score of the first batting team and resources available to the second batting team in terms of the number of overs yet to be bowled and the number of batsmen either batting or yet to bat. We refer to Duckworth and Lewis (1998) and Duckworth (2001) for introductory discussions of the DL rule. For discussions of the actual rule and its history, we refer to Duckworth-Lewis-Stern method (2020).
The DL rule was introduced in 1999. The DL rule is sometimes criticized on the basis that the team to bat second cannot optimize its strategies. For example, if one is sure to get 50 overs to bat, an optimal strategy may be not to lose the wickets in early overs and to increase the run rate in the later overs. According to cricket fans, the DL rule sometimes leads to ridiculous targets.
Based on the experience gained over years, some modifications of the rule have been made. In 2014, Steven Stern replaced Duckworth and Lewis as the 'custodian' of the rule (Duckworth-Lewis-Stern method, 2020). Though the rule is now known as DLS rule, we continue to refer the rule as the DL rule throughout our discussion.
The rule has been in use for about two decades and has been applied to a large number of matches. For discussions regarding appropriateness of the rule, papers published by Devooght (2007) and, Schall and Weatherall(2013) were referred. .
It is natural to investigate the possibility of whether the rule can be now justified based on the records available. In this paper, we attempt to defend the DL rule. Our strategy is to fit a statistical model which predicts the outcome of ODI match based on the past performances of the two teams and other covariates. We then predict the outcomes of matches where DL was in force and compare such predictions with outcomes of those matches where all the100 overs were bowled.
The remaining paper is organized as follows. In Section 2, we describe the data set and strategies of data analysis which leads to the prediction of a match. Comparison of predictions of DL matches and regular matches is made in Section 3. Section 4 offers some concluding remarks.

Methodology and Data Description
Records of ODI matches played between January 11, 1998 and February 14, 2016 are available on the website Espncricinfo (2020). A record of a match consists of date, names of the two teams, scores made by the two teams, the playground where the match was played and details regarding DL rule (if applied). The names of the two teams (I and II) are given in the alphabetical order on the above website.
We consider the following repressors.
1. 1 = the proportion of matches won by the team I within the last N days of the present match.
2. 2 = the proportion of matches won by the team II within the last N days.
3. 3 equals 1, if the team I is playing in its own country, it equals 2 if the team II is playing in its own country, and it is 0, if it is a neutral ground for both the teams.
4. 4 equals 1 if the team I wins the toss and 0 otherwise. 5. 5 = the proportion of matches won by the team I within the last N days of the present match on the same ground 6. 6 = the proportion of matches won by the team II within the last N days on same ground.
Let the binary response Y be given by Y = 1, if the team I wins and Y = 0, otherwise. The probability of Y = 1 is then modeled by the logistic regression given by, In the above, the parameter 0 is the intercept parameter, while the regression parameter corresponding to the regressor are given by , i =1, 2, …, 6 For a good account of logistic regression models, we refer to Hosmer and Lemeshow (2000). For regressor 3 , value '0' (neutral ground) has been used as the reference level. This leads to two regression parameters 3 (1) vs 0 and 3 (2) vs 0.
We first divide data into two parts. The first part consists matches played between January11,1998 to February27,2013 (the training data) while second part consists of matches played between January 1, 2014 to February 14,2016 (the test data). The numbers of observations in the training dataset and the test-data set are respectively 1929 and 252. The matches where DL rule is applied are excluded from both the training and the test datasets. Also, matches without any results (including matches with the two scores tied) are excluded from both the datasets.

Results and Discussion
To choose N, we fit the stepwise logistic regression models with N is taken 100, 125, 150, 175, 200, 250, 300, 350 and 400. For each value of N, the regression coefficients β0, β4, β5, β6 corresponding to the Intercept, regressor x4, x5, x6 respectively were eliminated in the stepwise procedure. Table I gives the proportions of successful predictions of matches within the training data, proportion of successful predictions of matches in the test data and the similar proportion for DL matches. If an estimated probability of Y = 1 exceeds 0.5, y is predicted to be one, otherwise it is predicted to be 0. This rule has been applied throughout our discussion. Obviously, if we select a very large N, the regressor gets linked with performance of teams in the remote past, which reduces the prediction success. On the other hand, if we choose a smaller N, we tend to get fewer number of matches by one or both the teams within N days, which reduces prediction success and sometimes leads to instability of estimators of parameters. We thus choose N = 175 as it optimizes the overall success rate. This corresponds to choosing the last 6 months.
In Table 2, we give maximum likelihood estimates of βi= 0, i = 1, 2, 3 for the training data with N = 175. Their standard errors and Wald's test-statistic for testing βi= 0, i = 0, 1, 2, 3 are also given therein. These regression coefficients are highly significant. These statistics remain almost the same for the entire data and hence not reported herein. In Table 3, we report the actual versus predicted outcomes of ODI's in the training data while Table 4 corresponds to the test data.   Table 5: corresponds to the DL matches. It is seen from these tables that the model successfully predicts outcomes of matches.

Conclusions
Based on the statistical analysis reported in Section 2, we conclude that the DL rule can be justified by statistical considerations. This is the main finding of our work. In about 65% of the DL matches, the actual results of the match are consistent with those predicted by the logistic regression model and this percentage is about the same as the ones in the training and the test datasets. Further, the logistic regression model offers a satisfactory model to the data.
Our analysis suggests that the team playing in its own country has a higher winning chance as compared to the neutral ground. The team playing in opponent's country has less chance of winning as compared to the neutral ground. However, these effects are not symmetric.
From the table 2, we conclude that the team which has been performing well over the last 6 months has significantly higher chance of winning.
There is a feeling that the team batting second has an advantage since it knows its target so that it can deploy its resources in better manner. We carried out another logistic regression analysis which included this regressor. It is not reported here. It turns out to be insignificant. Thus, such a feeling cannot be supported.
It may be remarked that a very high prediction success rate cannot be expected as there are a number of uncertainties involved in any game such as cricket, which regards these as glories of the game. Thus, overall, the DL rule is reasonably fair.