Assessing quality of crash modification factors estimated by empirical Bayes before-after methods
来源期刊:中南大学学报(英文版)2020年第8期
论文作者:吴玲涛 陈英 黄中祥
文章页码:2259 - 2268
Key words:traffic safety; empirical Bayes; crash modification factor; safety effectiveness evaluation
Abstract: Before-after study with the empirical Bayes (EB) method is the state-of-the-art approach for estimating crash modification factors (CMFs). The EB method not only addresses the regression-to-the-mean bias, but also improves accuracy. However, the performance of the CMFs derived from the EB method has never been fully investigated. This study aims to examine the accuracy of CMFs estimated with the EB method. Artificial realistic data (ARD) and real crash data are used to evaluate the CMFs. The results indicate that: 1) The CMFs derived from the EB before-after method are nearly the same as the true values. 2) The estimated CMF standard errors do not reflect the true values. The estimation remains at the same level regardless of the pre-assumed CMF standard error. The EB before-after study is not sensitive to the variation of CMF among sites. 3) The analyses with real-world traffic and crash data with a dummy treatment indicate that the EB method tends to underestimate the standard error of the CMF. Safety researchers should recognize that the CMF variance may be biased when evaluating safety effectiveness by the EB method. It is necessary to revisit the algorithm for estimating CMF variance with the EB method.
Cite this article as: CHEN Ying, WU Ling-tao, HUANG Zhong-xiang. Assessing quality of crash modification factors estimated by empirical Bayes before-after methods [J]. Journal of Central South University, 2020, 27(8): 2259-2268. DOI: https://doi.org/10.1007/s11771-020-4447-2.
J. Cent. South Univ. (2020) 27: 2259-2268
DOI: https://doi.org/10.1007/s11771-020-4447-2
CHEN Ying(陈英)1, 2, WU Ling-tao(吴玲涛)3, HUANG Zhong-xiang(黄中祥)1
1. School of Traffic and Transportation Engineering, Changsha University of Science & Technology,Changsha 410114, China;
2. School of Architecture, Changsha University of Science & Technology, Changsha 410114, China;
3. Center for Transportation Safety, Texas A&M Transportation Institute, Bryan, Texas, 77847, USA
Central South University Press and Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract: Before-after study with the empirical Bayes (EB) method is the state-of-the-art approach for estimating crash modification factors (CMFs). The EB method not only addresses the regression-to-the-mean bias, but also improves accuracy. However, the performance of the CMFs derived from the EB method has never been fully investigated. This study aims to examine the accuracy of CMFs estimated with the EB method. Artificial realistic data (ARD) and real crash data are used to evaluate the CMFs. The results indicate that: 1) The CMFs derived from the EB before-after method are nearly the same as the true values. 2) The estimated CMF standard errors do not reflect the true values. The estimation remains at the same level regardless of the pre-assumed CMF standard error. The EB before-after study is not sensitive to the variation of CMF among sites. 3) The analyses with real-world traffic and crash data with a dummy treatment indicate that the EB method tends to underestimate the standard error of the CMF. Safety researchers should recognize that the CMF variance may be biased when evaluating safety effectiveness by the EB method. It is necessary to revisit the algorithm for estimating CMF variance with the EB method.
Key words: traffic safety; empirical Bayes; crash modification factor; safety effectiveness evaluation
Cite this article as: CHEN Ying, WU Ling-tao, HUANG Zhong-xiang. Assessing quality of crash modification factors estimated by empirical Bayes before-after methods [J]. Journal of Central South University, 2020, 27(8): 2259-2268. DOI: https://doi.org/10.1007/s11771-020-4447-2.
1 Introduction
Roadway safety management includes seven steps: network screening, diagnosis, countermeasure selection, economic appraisal, project prioritization, and safety effectiveness evaluation. Evaluation is the last step. Nevertheless, it plays a critical role in the whole management process. The evaluation assesses how crashes (number and severity) have changed due to the treatment(s) [1, 2]. The safety effectiveness is typically represented in the form of a crash reduction factor (CRF) or a crash modification factor (CMF) [3, 4].
Safety analysts have proposed various approaches to estimate CMFs for treatments: simple before-after study (also known as na1ve before-after study), before-after studies with comparison group, before-after studies with empirical Bayes (EB) method, full Bayes (FB) before-after studies, regression modeling approach, and recently proposed propensity score matching method [3, 5-7]. The simple before-after method is straightforward. However, the CMFs estimated with this approach are less reliable. The method assumes that the crash number before the implement of a treatment is an adequate estimate of what would be had the countermeasure not been implemented in the after period [8, 9]. Moreover, the regression- to-the-mean (RTM) bias (which is quite common in safety evaluation) and changes unrelated to the treatment (e.g., traffic volume) are not accounted for. Hence, the simple before-after study is not recommended for estimating CMFs. The FB method requires Markov chain Monte Carlo (MCMC), which might be complicated to most safety practitioners. With the development in data management techniques (e.g., geographic information system, database management) and statistical packages (e.g., SAS, R), the regression modeling approach has been popular by safety analysts. However, researchers have pointed out that this approach may not capture the underlying true relationship between roadway factors and safety. There might be unobserved heterogeneity [10, 11], omitted variable bias [10, 11], and miss-specified functional forms [12, 13], etc. More importantly, the cross-sectional analyses may not reveal the casual-effect relationship [14-16].
The EB before-after studies have been considered one of the most robust methods for estimating CMFs. In the EB approach, the estimate of safety is a combined average of observed crashes and predicted crashes using a weighting factor. Although there are some limitations with the EB before-after studies, such as sample size issue, site selection bias, and mixed safety effects, this approach is the state-of-the-art method for estimating CMFs, and has been widely used by safety researchers in recent two decades. In the US Federal Highway Administration’s CMF Clearinghouse, about 50% of the CMFs were developed with the EB approach [17]. CMFs estimated with the EB method typically have smaller standard errors, and they are believed to be more reliable. In the CMF clearinghouse, those CMFs are assigned with more stars (i.e., higher quality). The primary advantages of the EB before- after studies are 1) accounting for the RTM bias, and, 2) improving estimation accuracy. It is well known that the EB method accounts for the RTM bias [18]. However, the accuracy has never been fully examined. A few studies have reported that the standard error of a CMF estimated with the EB method might have been underestimated [19]. Thus, the primary objective of this study is to thoroughly examine the quality and accuracy of CMFs estimated with EB before-after studies. Particularly, this study focuses on the quality of the standard errors of the CMFs.
The organization of this paper is as follows: Section 2 documents the steps of the EB method for estimating CMFs, and methodology used to evaluate the CMF standard errors. Section 3 introduces the artificial realistic data (ARD) and real-world crash dataset used in the evaluation process. Section 4 documents the analysis results. The last section summarizes the findings and discusses future works.
2 Methodology
This section first introduces the steps for estimating CMFs using an EB before-after study, then documents the process for examining the accuracy of standard errors of the CMFs.
2.1 EB method for estimating CMF
GROSS et al [3] provided the process for estimating the CMFs using the EB approach, and the process has been widely used [20, 21]. The primary steps are described below.
Step 1: Develop a safety performance function with reference sites.
Collect traffic and crash data at comparison sites (also known as reference sites). The comparison sites should have similar roadway geometric characteristics as the treatment sites. Develop a crash prediction model or safety performance function (SPF) with reference site data.
It is worth noting that there have been a number of models that can be utilized to develop SPFs [22], e.g., Poisson, Negative Binomial (NB), Sichel [23, 24], Conway–Maxwell–Poisson [25, 26], finite mixture models [27-31]. The Poisson model is not recommend, since crash counts are often over-dispersed. For a specific dataset, researchers are encouraged to test the performance of different models. Different models may result in different parameters, and hence different CMF estimating results. This study utilizes the widely used NB model. In the NB regression model, the SPF has the following form:
μ=f(ADT, site characterisitcs)×exp(ε) (1)
exp(ε)~gamma(1, α) (2)
Y~Poisson(μ) (3)
where μ is Poisson mean for a site (a segment or an intersection) in a certain time period; ε is independent model error; exp(ε) follows gamma distribution, with size parameter α and shape parameter 1/α; Y is observed crash count at the site in the time period.
Step 2: Calculate the number of expected crashes (before period).
Calculate the number of expected crashes at each site before implementing the countermeasure:
(4)
where is expected number of crashes (i.e., EB estimate); w is weighting factor; μ is predicted number of crashes; Y is observed crash count. The variance of the expected crash number is calculated as:
(5)
Step 3: Calculate the proportion of crashes in the two periods (after and before).
Estimate the proportion of the after period crash estimate to the before period estimate (Pi), shown in Eq. (6).
(6)
where μa,i is after period EB estimate at site i; μb,i is before period EB estimate at site i.
Step 4: Estimate the number of crashes and its variance (after period).
Calculate the number of crashes that would occur during the after period if the treatment was not implemented:
(7)
Its variance is calculated as:
(8)
Step 5: Compute the sum of the expected number of crashes (after period).
If the treatment was not installed, the crashes during the after period are calculated as:
(9)
where J is the total number of treatment sites, and is the expected number (i.e., EB estimate in Step 4) of crashes in the after period if the treatment was not implemented.
Step 6: Compute the total of the observed crashes.
For a treated site, crashes can be influenced by the treatment. The actual number of crashes for all the treated sites in the after period is calculated as:
(10)
where Li is the observed crash count at site i in the after period.
Step 7: Calculate the variances.
With the Poisson distribution, the variance of is equal to Li (observed crash count at site i). And, the variance of
can be estimated by using Eqs. (11)-(14):
(11)
(12)
(13)
(14)
Step 8: Estimate the safety effectiveness (i.e., CMF) of the countermeasure.
The CMF is estimated using Eq. (15):
(15)
The standard error of the CMF is calculated by:
(16)
s.e. (17)
For the detailed procedures of the EB before-after method, readers are referred to Refs. [8, 21]. In this study, we are particularly interested in the accuracy of the standard error of the CMF. The next two subsections introduce the method for evaluating the accuracy.
2.2 Assessing accuracy of CMF standard errors: artificial realistic data approach
To examine the accuracy of estimated CMF standard error, it is desired to compare the estimate of standard error with a true value. Unfortunately, the exact CMF and its standard error of a treatment are unknown in practice, which makes it nearly impossible to assess the performance using observed real-world crash data. To address this issue, this study first used artificial realistic data (ARD), also known as simulated data. The ARD concept was first proposed by HAUER [32], and has been extensively used by safety analysts [4, 11]. In the ARD structure, a CMF and standard error are pre-assumed for a treatment, and crash counts at the treatment sites are then randomly generated based on the theoretical Poisson mean. Using the generated crash counts, researchers then apply the EB approach (discussed in the previous section) and estimate the CMF. The true values will be used to assess the quality of CMF, as well as estimated standard error. The steps of the ARD experiment used in this study are described below.
Step 1: Determine basic SPF.
As has been discussed in the previous section, a SPF is needed to calculate the predicted number of crashes at a site. With the ARD, the SPF from the HSM [1] (on rural two-lane highways) is adopted (Eq. (18)).
(18)
where AADTi is traffic volume (i.e., annual average daily traffic, vehicles per day) on roadway site i; and Li is length.
Step 2: Assume CMF and standard error value.
Assume a CMF and standard error for a treatment. This study will assume a CMF for the installation of rumble strips as CMFtrue, with a standard error setrue.
Step 3: Calculate Poisson mean values.
For every site, calculate the true long-term crash means in the before and after periods, separately, with the SPF and assumed CMF using Eqs. (19) and (20):
(19)
(20)
where μi,bef is true long-term crash mean in the before period at site i; μi,aft is true long-term crash mean in the after period at site i; AADTi,bef, and AADTi,aft are traffic volume in the two periods (before and after), respectively; and CMFi is the CMF for the treatment (i.e., rumble strip) at site i.
(21)
Note that the CMF is assumed to be varying across sites with mean equal to CMFtrue and standard deviation equal to setrue. When setrue is 0, the CMF is a fixed value. With that saying, the treatment has exactly the same safety effect at all sites, which is usually not true.
The true long-term crash means the real number of crashes that should occur at the site before and after the installation of the countermeasure, respectively. They are used to generate random crash counts.
Step 4: Generate discrete counts.
Generate random counts Yi,bef and Yi,aft, giving that the means (i.e., μi,bef and μi,aft) for site i are gamma distributed with the dispersion parameter (or size factor) α and mean equal to 1 [33]. This study implemented the varying dispersion parameter function, αi=Li/0.236, consistent with the HSM [1].
Step 5: Estimate CMF and standard error.
Estimate the CMF for the treatment and standard error using steps of the EB before-after approach that has been introduced in the previous section (i.e., Eqs. (1)-(17)).
Step 6: Evaluate the values derived from the EB method.
Compare the CMF standard error estimated in Step 5 with the pre-assumed true value (i.e., setrue in Step 2). This study uses error percentage to assess the accuracy of the estimation. The calculation is shown in Eq. (22). Smaller error percentage indicates higher accuracy of the standard error for the CMF derived from the EB method.
(22)
where seEB is estimated standard error of the CMF developed with the EB method; erroris error percentage of estimated CMF standard error; setrue is pre-assumed true value of the standard error of the CMF.
2.3 Assessing accuracy of CMF standard errors: real-world data approach
To further validate the examination results with the ARD data, this study also analyzes real crash data. As mentioned, the true safety effect of a countermeasure is unknown in practice, which makes it impossible to assess the accuracy of a CMF. To overcome this problem, this study uses a dummy treatment. That is there are no actual treatments implemented at the “treatment” sites. The authors assume that a dummy countermeasure has been installed at all sites. Since no significant safety changes have happened, the CMF for the dummy treatment is technically 1.0. With the traffic data and observed crash counts at the “treatment” sites as well as similar reference sites, this study estimates a CMF and its standard error with the EB method.
In addition, recent studies have reported the temporal instability in roadway safety [34]. If the instability of crashes is not properly accounted for, the modeling results and CMF estimates can be biased. To account for the yearly variation problem, the reference and “treatment” sites selection has been balanced in the two periods. So, the reference sites contain equal number of sites in the before and after periods.
To summarize, the evaluation experiments have the following two scenarios:
Scenario I: ARD Data;
Scenario II: Real-world Data.
Details about the data manipulation are discussed in the next section.
3 Data preparation
The data preparation includes two subsections: one for the ARD data preparation, and the other for the real-world data collection.
3.1 ARD data
To prepare the ARD dataset, the authors selected 100 rural two-lane roadway segments from the Texas Department of Transportation Roadway (TxDOT) and Highway Inventory Network Offload (RHiNO) database. The HSM basic SPF was applied to generate the random crash counts in the before period. Two CMFs for rumble strips were utilized to generate the after period crash counts.One is from the HSM manual with a value of 0.84 and a standard error of 0.13 (referred as HSM CMF hereafter). The other is from the TxDOT’s Highway Safety Improvement Program Work Codes Table with a value of 0.50 (Referred as Texas CMF hereafter). Four standard error values are assumed for the Texas CMF: 0.01, 0.05, 0.10, and 0.20. Summary statistics of the segment data and generated crash counts are illustrated in Table 1.
3.2 Real-world data
To further validate the evaluation results, this paper also analyzed a real-world dataset. This study selected 10000 segments on rural two-lane highways from the roadway network as reference sites, with 400 segments as treatment sites. There are more reference sites than treatment sites, because in practice it is usually easier to find comparison sites having similar roadway characteristics with the treatment sites. Using greater number of reference sites improves the stability and performance of SPFs. However, generally there are not as many treatment sites as reference sites due to the limited safety improvement funding especially if the treatment is costly. Several filters have been used to make the segments more homogenous, including pavement width, shoulder width and type, and horizontal curve. The roadway and traffic information were extracted from the RHiNO database, and six years of crash counts (2013-2018) were gathered from the Crash Records Information System (CRIS). An effort has been made by checking the TxDOT’s Daily Work Report (DWR) to make sure that there were no treatments installed on these segments.
Table 1 Summary statistics for ARD data (100 sites)
A dummy treatment was assumed to be implemented at 400 “treatment” segments on January 1st, 2016. Thus, the before period was from January 2013 to December 2015, and the after period was from January 2016 to December 2018. To further eliminate the yearly variation of crashes, a half portion of the “treatment” sites has been randomly selected and the two periods were switched. Since no actual treatments were taken, there was no significant change in safety at the “treatment” sites and the safety levels of these sites remained similar. This supports the previous argument that the CMF is technically 1.0. Summary statistics of the reference sites and “treatment” sites are shown in Table 2.
It is worth mentioning that the ARD data were generated based on SPFs in the HSM for rural two-lane roadways. During this process, other factors affecting crash occurrences were assumed to be the same among all sites. The ARD crash count data were less dispersed than the real-world data. The former has an index of dispersion of 3.37 (i.e., 1.752/0.91, before period), while the latter has an index of dispersion of 4.74 (i.e., 2.832/1.69, before period). This is probably due to the unobserved heterogeneity in the real-world data.
4 Results
4.1 Scenario I: ARD data
In Scenario I, the true CMF for installing rumble strips was first assumed to be 0.84 with a standard error of 0.13. These values were adopted from the HSM and CMF Clearinghouse (CMF Clearinghouse, ID=115). The authors applied the EB process for developing CMFs. Table 3 documents the results. As can be seen, there are 81 crashes during the after-period at the 100 “treated” sites. Estimating results reveal that if the treatment was not implemented, the expected crash number (i.e., EB estimate) would have been 101.5 in the after period. The CMF for the “installation” of rumble strips is estimated as 0.80. Its standard error is 0.10. Thus, the confidence interval of the CMF is from 0.604 to 0.996 (at 95% level), which is statistically significant (i.e., 1.0 is not covered in the interval). It can be seen that the estimated CMF is close to the true value (i.e., 0.84). However, the estimated CMF standard error is lower than its true value (i.e., 0.13). The error percentage (using Eq. (22)) is 23.1% (please see the first row of Table 4).
The authors followed the same procedure and generated crash counts using Texas CMF (i.e., 0.50) with four pre-assumed standard error values: 0.01, 0.05, 0.10 and 0.20. The evaluation results are presented in Table 4. In each case, the estimated CMF value is quite close to the true value. However, the estimated standard error (please see column s.e.(CMF)) does not reflect the true value in all the cases. The minimum error percentage is above 20%. The maximum error percentage reaches 663.0% (i.e., case TX1). In this case, the assumed standard error for CMF is 0.01, which is an extremely small value compared to others. Since the error percentage is defined as the ratio between the difference and true value, higher error percentage is observed when the true value is relatively small, i.e., (0.0763–0.01)/0.01=6.63 or 663%.
Table 2 Summary statistics of segment data (real-world data)
Table 3 Estimating results of ARD data (HSM CMF for rumble strip=0.84, SE=0.13)
Another interesting finding is that the standard error of the CMF (please see column s.e.(CMF) in Table 4) is quite stable regardless of the pre- assumed standard error. In the cases with Texas CMF, the standard errors of the CMFs are always around 0.075. The before-after study with the EB method for evaluating countermeasure safety effect is not sensitive in the CMF standard error. Further analyses into the EB process indicate that this method does not account for the variance of the CMF among different sites. In the step of calculating variance of the CMF (i.e., Eq. (16)), all the four terms, i.e., and
are independent of the CMF standard error. This reveals the reason that the estimated CMF standard error values remain at the same level in the ARD data. In short, the CMF standard error estimated with the EB before-after studies does not reflect the true value in the ARD experiment.
4.2 Scenario II: Real-world segment with dummy treatment
While previous section documents the evaluation results of the ARD data, this section discusses the results of real-world dataset. Recall that in the real crash data, this work assumes a dummy treatment. Since there are not significant changes in the before and after periods, the CMF for the dummy treatment is technically 1.0. Hence, the estimated CMF confidence interval should cover 1.0.
With real-word data, a SPF of reference sites needs to be developed. This study has utilized NB model and assumed varying form of dispersion for roadway segments, which is consistent with the HSM. The modeling results of the SPF are presented in Table 5. All the parameters (i.e., intercept, traffic volume, and parameter for dispersion) are statistically significant at the 99.9% level.
It is worth mentioning that this study considered traffic volume only in the crash prediction model (i.e., SPF results shown in Table 5). There are two advantages of flow-only models. First, this makes it consistent with the analyses using ARD data (i.e., both have traffic volume only). Second, flow-only models are relatively easier to be calibrated (e.g., less data collection efforts, and fewer CMFs needed for calibration) in practice. Nevertheless, the flow-only model may suffer from omitted variable bias.
The number of observed crashes expected number of crashes as well as their variance of sample “treatment” sites in the after period is tabulated in Table 6. At the 400 “treatment” sites, the expected number (i.e., EB estimate) of crashes in the after period if the “treatment” has been implemented is 669.2. The actual number of observed crashes in the after period is 792. The CMF for the dummy treatment is calculated as 1.18 (using Eq. (15)), and the standard error is 0.057. The 95% confidence interval is from 1.07 to 1.29, which does not cover the theoretical value 1.0.
Table 4 Evaluation results of ARD data (HSM CMF and Texas CMF)
Table 5 Modeling results of SPFs (real-world data)
Table 6 CMF estimating results of real-world data
5 Discussion and conclusions
Although various methods have proposed for estimating CMFs for countermeasures or treatments, the before-after studies with the EB method have been recognized as a robust approach for estimating CMFs and are preferred when data are available. The expected crash number in the EB method combines two sources: the observed and the predicted number of crashes. The SPF is developed with reference sites (or comparison sites) having similar geometric characteristics as the treatment site. The primary advantages of the EB before-after studies are 1) accounting for the RTM bias, and 2) improving the estimation accuracy. Previous studies have reported that the EB method is capable of addressing the RTM bias [35], but the accuracy of the CMFs estimated using the EB approach has not been fully examined.
This study has examined the accuracy of CMFs derived using the traditional EB before-after studies, particularly, focused on CMF standard errors. The authors first generated a simulated dataset (i.e., ARD) with pre-assumed CMFs and associated standard errors. The CMFs were estimated using the traditional EB method, and then the results were compared with the pre-assumed true values. To further validate the evaluation, the authors collected traffic data along with six-years of crash records on two-lane rural highways. Reference sites and “treatment” sites were carefully selected, and the periods have been balanced for both groups to eliminate the yearly variation in crash data. A dummy treatment was assumed to be installed at the “treatment” sites. Since no actual treatments were made, the theoretical CMF for the dummy treatment is 1.0. Followed the procedures of the EB before-after studies, this study developed a basic SPF for the reference sites, and estimated the CMF for the dummy treatment. The 95% confidence interval of the CMF was compared with 1.0.
The main conclusions can be summarized as follows: 1) With the simulated data, the CMFs derived with the EB before-after method are nearly the same as the pre-assumed true values. 2) The estimated standard errors of the CMFs do not reflect the true values. The estimation remains at the same level regardless of the pre-assumed standard error of the CMF. The EB before-after study is not sensitive to the variation of CMF across sites. 3) The analyses on real-world traffic and crash data with a dummy treatment confirmed the analyzed results with ARD data. The EB before- after method tends to under estimate the standard errors of the CMFs. Preliminary examination on the EB before-after approach indicates that the CMF variance estimating algorithm does not depend on the variation of the CMFs among sites. According to these findings, researchers might have been over confident about the estimated safety effectiveness of treatments because of using EB before-after studies. This situation is extreme when a treatment has significant varying effects at different sites.
It is worth mentioning that the quality of CMFs derived from an EB before-after study depends on various factors: the selection of reference sites, sample size, and model selection of the SPFs, etc. To prevent low sample size bias and to simplify the analyses, this study has used a large number of sites for both reference and treatment groups. In practice, it may not be possible to find hundreds of sites that are treated with a single countermeasure. In addition, this study developed a basic SPF for reference sites. Although these sites have been selected with multiple filters, heterogeneity still exists in the data [36, 37]. This might also affect the estimated CMF and its standard error. Nevertheless, the simulation experiment, in which the unobserved heterogeneity is not an issue, has shown the biased estimate of the standard error of CMFs in the EB before-after study. It is necessary to revisit the algorithm for estimating the CMF variance in the before-after studies with EB method in the future.
References
[1] American Association of State Highway and Transportation Officials (AASHTO). Highway safety manual [M]. Washington DC, 2010.
[2] PIARC. Roadway safety manual [M]. World Road Association, 2003.
[3] GROSS F, PERSAUD B N, LYON C A. Guide to developing quality crash modification factors [R]. Washington DC: Federal Highway Administration, 2010.
[4] WU L T. Examining the use of regression models for developing crash modification factors [D]. Texas A&M University, 2016.
[5] GOOCH J P, GAYAH V V, DONNEL E T. Quantifying the safety effects of horizontal curves on two-way, two-lane rural roads [J]. Accident Analysis & Prevention, 2016, 92: 71-81. DOI: 10.1016/j.aap.2016.03.024.
[6] WOOD J S, GOOCH J P, DONNELL E T. Estimating the safety effects of lane widths on urban streets in Nebraska using the propensity scores-potential outcomes framework [J]. Accident Analysis & Prevention, 2015, 82: 180-191. DOI: 10.1016/j.aap.2015.06.002.
[7] SASIDHARAN L, DONNELL E T. Propensity scores- potential outcomes framework to incorporate severity probabilities in the highway safety manual crash prediction algorithm [J]. Accident Analysis and Prevention, 2014, 71: 183-193. DOI: 10.1016/j.aap.2014.05.017.
[8] HAUER E. Observational before-after studies in road safety: Estimating the effect of highway and traffic engineering measures on road safety [M]. Tarrytown, N.Y., USA: Pergamon, 1997.
[9] SHEN J, GAN A. Development of crash reduction factors: Methods, problems, and research needs [J]. Transportation Research Record, 2003, 1840(1): 50-56. DOI: 10.3141/ 1840-06.
[10] NOLAND R B, ADEDIJI Y. Are estimates of crash modification factors mis-specified? [J]. Accident Analysis & Prevention, 2018, 118: 29-37. DOI: 10.1016/j.aap. 2018.05.017.
[11] WU L T, LORD D, ZOU Y J. Validation of crash modification factors derived from cross-sectional studies with regression models [J]. Transportation Research Record, 2015, 2514(1): 88-96. DOI: 10.3141/2514-10.
[12] WU L T, LORD D. Examining the influence of link function misspecification in conventional regression models for developing crash modification factors [J]. Accident Analysis & Prevention, 2017, 102: 123-135. DOI: 10.1016/j.aap. 2017.02.012.
[13] WU L T, LORD D, GEEDIPALLY S R. Developing crash modification factors for horizontal curves on rural two-lane undivided highways using a cross-sectional study [J]. Transportation Research Record, 2017, 2636(1): 53-61. DOI: 10.3141/2636-07.
[14] HAUER E. The art of regression modeling in road safety [M]. New York: Springer, 2015.
[15] HAUER E. Even perfect regressions may not tell the effect of interventions [C]// The 92nd Annual Meeting of Transportation Research Board. Washington DC, 2013.
[16] HAUER E. Cause, effect and regression in road safety: A case study [J]. Accident Analysis & Prevention, 2010, 42(4): 1128-1135. DOI: 10.1016/j.aap.2009.12.027.
[17] FHWA. CMF clearinghouse brochure [EB/OL]. [2019-01-20]. http://www.cmfclearing-house.org/collateral/ CMF_brochure.pdf.
[18] MONTELLA A. Safety evaluation of curve delineation improvements: Empirical Bayes observational before-and- after study [J]. Transportation Research Record, 2009, 2103(1): 69-79. DOI: 10.3141/2103-09.
[19] WU L T, MENG Y, KONG X Q, ZOU Y J. A novel approach for estimating crash modification factors: Jointly modeling crash counts and time intervals between crashes [C]// Transportation Research Board 98th Annual Meeting, Washington DC, 2019.
[20] LORD D, GEEDIPALLY S R. Safety effects of the red-light camera enforcement program in Chicago, Illinois [R]. Lord Consulting, College Station, Texas, 2014.
[21] WU L T, GEEDIPALLY S R, PIKE A M. Safety evaluation of alternative audible lane departure warning treatments in reducing traffic crashes: an empirical Bayes observational before–after study [J]. Transportation Research Record, 2018, 2672(21): 30-40. DOI: 10.1177/0361198118776481.
[22] LORD D, MANNERING F. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives [J]. Transportation Research Part A: Policy and Practice, 2010, 44(5): 291-305. DOI: 10.1016/ j.tra.2010.02.001.
[23] WU L T, ZOU Y J, LORD D. Comparison of sichel and negative binomial models in hot spot identification [J]. Transportation Research Record, 2014, 2460(1): 107-116. DOI: 10.3141/2460-12.
[24] ZOU Y J, LORD D, ZHANG Y L, PENG Y C. Comparison of sichel and negative binomial models in estimating empirical Bayes estimates [J]. Transportation Research Record, 2013, 2392(1): 11-21. DOI: 10.3141/2392-02.
[25] LORD D, GEEDIPALLY S R, GUIKEMA S D. Extension of the application of Conway-Maxwell-Poisson models: Analyzing traffic crash data exhibiting underdispersion [J]. Risk Analysis: An International Journal, 2010, 30(8): 1268-1276. DOI: 10.1111/j.1539-6924.2010.01417.x.
[26] LORD D, GUIKEMA S D, GEEDIPALLY S R. Application of the Conway–Maxwell–Poisson generalized linear model for analyzing motor vehicle crashes [J]. Accident Analysis & Prevention, 2008, 40(3): 1123-1134. DOI: 10.1016/ j.aap.2007.12.003.
[27] ZOU Y J, ASH J E, PARK B J, LORD D, WU L T. Empirical Bayes estimates of finite mixture of negative binomial regression models and its application to highway safety [J]. Journal of Applied Statistics, 2018, 45(9): 1652-1669. DOI: 10.1080/02664763.2017.1389863.
[28] PARK B J, LORD D, WU L T. Finite mixture modeling approach for developing crash modification factors in highway safety analysis [J]. Accident Analysis & Prevention, 2016, 97: 274-287. DOI: 10.1016/j.aap.2016.10.023.
[29] ZOU Y J, HENRICKSON K, WU LT, WANG Y H, ZHANG Z R. Application of the empirical Bayes method with the finite mixture model for identifying accident-prone spots [J]. Mathematical Problems in Engineering, 2015, 2015(10): 1-10. DOI: 10.1155/2015/958206.
[30] PARK B J, LORD D, LEE C. Finite mixture modeling for vehicle crash data with application to hotspot identification [J]. Accident Analysis & Prevention, 2014, 71: 319-326. DOI: 10.1016/j.aap.2014.05.030.
[31] YANG X X, ZOU Y J, WU L T, ZHONG X Z, WANG Y H, IJAZ M, PENG Y C. Comparative analysis of the reported animal-vehicle collisions data and carcass removal data for hotspot identification [J]. Journal of Advanced Transportation, 2019, 2019(3): 1-13. DOI: 10.1155/2019/ 3521793.
[32] HAUER E. Trustworthiness of safety performance functions [C]// The 93rd Annual Meeting of the Transportation Research Board. Washington DC, 2014.
[33] LORD D. Modeling motor vehicle crashes using Poisson-gamma models: Examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter [J]. Accident Analysis & Prevention, 2006, 38(4): 751-766. DOI: 10.1016/j.aap.2006. 02.001.
[34] MANNERING F. Temporal instability and the analysis of highway accident data [J]. Analytic Methods in Accident Research, 2018, 17: 1-13. DOI: 10.1016/j.amar.2017.10.002.
[35] HAUER E, HARWOOD D W, COUNCIL F M, GRIFFITH M S. Estimating safety by the empirical Bayes method: A tutorial [J]. Transportation Research Record, 2002, 1784(1): 126-131. DOI: 10.3141/1784-16.
[36] YANG X, ZOU Y, TANG J, LIANG J, IJAZ M. Evaluation of short-term freeway speed prediction based on periodic analysis using statistical models and machine learning models [J]. Journal of Advanced Transportation, 2020. DOI: 10.1155/2020/9628957.
[37] DAS S, MINJARES-KYLE L, WU L, HENK R. Understanding crash potential associated with teen driving: Survey analysis using multivariate graphical method [J]. Journal of Safety Research, 2019, 70: 213-222. DOI: 10.1016/j.jsr.2019.07.009.
(Edited by YANG Hua)
中文导读
经验贝叶斯前后对比方法评估事故修正系数的精确度分析
摘要:事故修正系数(措施安全效果)的评估是交通安全管理的重要环节,经验贝叶斯(EB)方法是目前最先进、首选的方法。该方法能够解决回归到均值的问题并提高评估精度,然而,尚未有学者对EB方法所得到事故修正系数的精确度进行细致的分析,本论文旨在填补该项空白,并重点针对事故修正系数的标准差进行分析。论文采用了模拟与实际观测两项数据对EB方法进行了分析,结果表明:1) EB方法得到的事故修正系数与理论值比较接近;2) EB方法所得到事故修正系数的标准差并不能反映真实值,估计值不随真实值的变化而变化;3) 基于实际数据的分析表明EB方法往往低估事故修正系数的标准差。交通安全研究人员应当注意在使用EB方法评估安全效果时,事故修正系数的方差可能存在偏差,有必要进一步优化EB方法中方差的算法。
关键词:交通安全;经验贝叶斯;事故修正系数;安全效果评估
Foundation item: Project(51978082) supported by the National Natural Science Foundation of China; Project(19B022) supported by the Outstanding Youth Foundation of Hunan Education Department, China; Project(2019QJCZ056) supported by the Young Teacher Development Foundation of Changsha University of Science & Technology, China
Received date: 2020-02-21; Accepted date: 2020-07-10
Corresponding author: WU Ling-tao, PhD, Assistant Research Scientist; Tel: +1-979-317-2530; E-mail: wulingtao@gmail.com;;; ORCID: https://orcid.org/0000-0003-2337-7145