Real-time rear-end crash potential prediction on freeways
来源期刊:中南大学学报(英文版)2017年第11期
论文作者:曲栩 王炜 王文夫 刘攀
文章页码:2664 - 2673
Key words:freeway rear-end crash; crash potential prediction; crash precursors; case control strategy; support vector machine
Abstract: This study develops new real-time freeway rear-end crash potential predictors using support vector machine (SVM) technique. The relationship between rear-end crash occurrences and traffic conditions were explored using historical loop detector data from Interstate-894 in Milwaukee, Wisconsin, USA. The extracted loop detection data were aggregated over different stations and time intervals to produce explanatory features. A feature selection process, which addresses the interaction between SVM classifiers and explanatory features, was adopted to identify the features that significantly influence rear-end crashes. Afterwards, the identified significant explanatory features over three separate time levels were used to train three SVM models. In the end, the multi-layer perceptron (MLP) artificial neural network models were used as benchmarks to evaluate the performance of SVM models. The results show that the proposed feature selection procedure greatly enhances the accuracy and generalization capability of SVM models. Moreover, the optimal SVM classifier achieves 81.1% overall prediction precision rate. In comparison with MLP artificial neural networks, SVM models provide better results in terms of crash prediction accuracy and false positive rate, which confirms the superior performance of SVM technique in rear-end crash potential prediction analysis.
Cite this article as: QU Xu, WANG Wei, WANG Wen-fu, LIU Pan. Real-time rear-end crash potential prediction on freeways [J]. Journal of Central South University, 2017, 24(11): 2664–2673. DOI: https://doi.org/10.1007/s11771- 017-3679-2.
J. Cent. South Univ. (2017) 24: 2664-2673
DOI: https://doi.org/10.1007/s11771-017-3679-2
QU Xu(曲栩)1, WANG Wei(王炜)1, WANG Wen-fu(王文夫)2, LIU Pan(刘攀)1
1. School of Transportation, Southeast University, Nanjing 210096, China;
2. Civil Engineering Department, University of Waterloo, 200 University Avenue West,Waterloo, ON, N2L3G1, Canada
Central South University Press and Springer-Verlag GmbH Germany, part of Springer Nature 2017
Abstract: This study develops new real-time freeway rear-end crash potential predictors using support vector machine (SVM) technique. The relationship between rear-end crash occurrences and traffic conditions were explored using historical loop detector data from Interstate-894 in Milwaukee, Wisconsin, USA. The extracted loop detection data were aggregated over different stations and time intervals to produce explanatory features. A feature selection process, which addresses the interaction between SVM classifiers and explanatory features, was adopted to identify the features that significantly influence rear-end crashes. Afterwards, the identified significant explanatory features over three separate time levels were used to train three SVM models. In the end, the multi-layer perceptron (MLP) artificial neural network models were used as benchmarks to evaluate the performance of SVM models. The results show that the proposed feature selection procedure greatly enhances the accuracy and generalization capability of SVM models. Moreover, the optimal SVM classifier achieves 81.1% overall prediction precision rate. In comparison with MLP artificial neural networks, SVM models provide better results in terms of crash prediction accuracy and false positive rate, which confirms the superior performance of SVM technique in rear-end crash potential prediction analysis.
Key words: freeway rear-end crash; crash potential prediction; crash precursors; case control strategy; support vector machine
1 Introduction
In the past decade, real-time crash potential prediction (RCPP) has been the interest of many researchers. Similar to the automatic incident detection (AID) problem, the RCPP problem is basically a pattern classification issue, which separates traffic and environmental patterns bearing high crash potential from those patterns with low crash potential. MADANAT and LIU [1] introduced the concept of real-time predictions of freeway incident likelihoods, which set up the basis for a proactive traffic control system. They formulated incident likelihood prediction models for two critical types of freeway incidents, namely, crashes and overheating vehicles, using a binary logit method with real-time measurements of traffic and weather variables. OH et al [2] quantified the measurements of crash likelihood applying a non-parametric Bayesian classification approach. They utilized a t-statistic to identify the 5-minute standard speed deviation preceding crash as a significant indicator of crash occurrence. Though limited in sample size, their research indicated the potential of reducing accident likelihood with real-time traffic data. LEE et al [3] attempted to relate crashes to various traffic flow characteristics prior to crash (referred to as “crash precursors”) via log-linear models. Later, LEE et al [4] modified their model with experimental results, bringing a critical precursor that represented the queue formation and dissipation into the log-linear model, yet due to its insignificant impact on crash potential, the difference of speed across adjacent lanes was eliminated. PANDE and ABDEL-ATY [5] first assigned the issue of predicting crashes as a classification problem. They utilized probabilistic neural network (PNN) to solve this problem. In their study, approximately 70% of crashes over the evaluation dataset could be identified. However, their PNN classifier, having a strong statistical basis and not based on training mechanism, is not a typical artificial intelligence classifier. In summary, some of the above models achieved reasonable classification accuracy, but their methods for explanatory variables selection were mainly based on subjective judgment or filter feature selection techniques. The major disadvantage of these models is that they only focus on the intrinsic properties of the data, ignoring the interaction between the explanatory variables and the classifier.
ABDEL-ATY et al [6] developed a case-control logistic regression model to examine crash and corresponding non-crash traffic flow features, thus the external factors such as road geometry and time of the day were being controlled. Later on, ABDEL-ATY et al [7] refined their previous work with an expanded database. Split models for crashes during high speed and low speed were proposed using the same analysis methods. Yet, even with a split system, it is still difficult to account for all types of accidents considering the diverse traffic characteristics associated with each type of crash. In this regard, GOLOB and RECKER [8, 9] employed a nonlinear (nonparametric) canonical correlation analysis, which successfully threw light upon the relationships between traffic flow conditions and different types of crashes. Their research also indicated that some collision types are more common under certain traffic conditions, which suggested later researches on crash potential prediction should be crash type specific, if possible. XU et al [10] selected the variables that affect crash risk under uncongested and congested traffic conditions by using the random forest (RF) technique. And the Genetic Programming (GP) model was used to fit each traffic state based on the result of RF technique. SUN and SUN [11] proposed a dynamic Bayesian network (DBN) model of time sequence traffic data to investigate the relationship between crash occurrence and dynamic speed condition data. The traffic conditions near the crash site were identified as several combinations according to the level of congestion and included in the DBN model.
On the basis of the previous studies, PANDE and ABDEL-ATY [12, 13] innovatively utilized a data mining process in RCPP modeling. They employed an artificial intelligence classifier to relate freeway traffic parameters to crashes (rear-end and lane-change-related crashes). In their rear-end crash potential prediction study, they used Kohonen vector quantization (KVQ) techniques to cluster the crash data into two groups (one is the low-speed traffic regime, the other is the high- speed traffic regime) using average speed parameters as inputs. A reliable variable selection algorithm based on classification tree was introduced to identify significant parameters, and the identified parameters then served as inputs to multi-layer perceptron (MLP) neutral network models and normalized radial basis function (NRBF) neural network models. Moreover, classification models for the rear-end crashes under the high-speed traffic regime were successfully formulated using the above techniques (MLP and NRBF). Similar process was conducted for their lane-change related crash potential prediction study, in which a voting based hybrid model created by combining MLP and NRBF neural network was proposed. The new hybrid model outperformed independent models (MLP model and NRBF model) in term of classification accuracy over the validation dataset.
Support vector machine (SVM) is a new pattern classifier based on the concept of structural risk minimization within the context of machine learning theory [14]. In recent years, SVM is gaining popularity for its remarkable generalization performance and has been introduced into traffic engineering. FANG and CHEU [15] employed SVM for freeway and arterial incident detection, which validated that SVM was a superior pattern classifier for the AID problem. Their study was believed to be the first application of SVM in traffic engineering. ZHANG and XIE [16] introduced the v-support vector machine (v-SVM) model to forecasting the short-term freeway volume. These researches also demonstrated that the SVM models provided better (or at least similar) prediction over traditional artificial neural network models. LI et al [17] evaluated the application of SVM models that predicted highway crash frequencies. The research results showed that SVM models outperformed the traditional NB models in terms of effectiveness and accuracy. In addition, they do not over-fit the data, and offer better or comparative performance than BPNN models. LI et al [18] verified that SVM model can be used for crash injury severity analysis. Results indicated that SVM model provides better prediction accuracy for small proportion injury severities than the OP model does, and that the SVM model can also carry out the sensitivity analysis to evaluate the impacts of explanatory variables on crash injury severity. Unfortunately, previous researches have rarely applied SVM in crash potential prediction.
The primary objective of this work is to develop a new real-time rear-end crash identifiers based on SVM and to evaluate the performance of SVM classification regarding rear-end crash prediction. The secondary objective is to estimate the impact of feature selection procedure on the performance of SVM models.
The following section describes the historical loop detector data extraction and traffic feature aggression process. The third section presents the theoretical background of SVM and its operation procedure. The fourth section introduces the modeling processes and the comparison results. The fifth section conducts a simulation experiment to demonstrate the real-life application prospect. The last section summarizes the key results and provides recommendations for further studies.
2 Data and features
2.1 Study area description
The Milwaukee area Interstate-894 (I-894) under consideration is a 9.3-mile auxiliary freeway, and has a total of 20 mainline loop detector stations in each direction. Consecutive stations are spaced out at approximately 0.5 mile intervals, and the loop detectors of these stations collect average speed, volume, and occupancy on each of the three through lanes (left, median and right). All reported crashes on I-894, from 1994 through the current year, are archived in the Wisconsin MV4000 crash database. Based on the records of Wisconsin MV4000 crash database, loop detector data of crash cases and corresponding non-crash data were obtained from WisTransPortal V-SPOC Application Suite, which was developed by the Wisconsin Traffic Operations and Safety Laboratory at the University of Wisconsin-Madison, USA.
2.2 Data collection
Considering that this research focused on the relationship between traffic flow characteristics and crash occurrences, crashes associated with alcohol, drug use, and work zone were removed from the original dataset. In addition, the crashes posterior to crashes within half an hour at the same location were also excluded as well, because these accidents can be classified as secondary accidents, which are not the objective of this research. After the initial data screening, 708 rear-end crashes reported in the study area from January 2006 to December 2009 were selected to constitute the data source of this research.
A critical issue of this study is determining the length of the sampling interval. The 5-min level sampling interval was selected after careful and comprehensive comparison. Compared with larger level data, such as 15-min or 30-min level, 5-min level traffic flow data provides a more accurate description of the traffic conditions prior to rear-end crashes; compared with smaller level data (i.e., 30-s or 1-min level), although 5-min level data may ignore tiny traffic fluctuations in actual operation, it has the advantage of reducing random noises.
Another critical issue of this study is identifying the actual time of crash occurrence, for the crash occurrence time determines the starting point of traffic data extraction. According to the statement of Milwaukee traffic management authorities, the difference between the actual time of a crash and the recorded parameter ACCDTIME (in MV4000 database) is within 2 min. Since this study used the 5-min level traffic flow data, a 2-min level error is relatively acceptable. Therefore, the ACCDTIME was applied as the actual crash time for each crash case.
Loop detector data for each crash were extracted following specific time and spatial rules. Time rules: firstly, round the observed crash time (ACCDTIME) backward to the nearest 5-min and take the rounded time as start of data extraction, therefore, the data examined were ensured to be datasets prior to crash occurrence; secondly, extract traffic data for 35-min period (prior to crash) from the rounded starting point. The extracted 35-min data contained seven 5-min time slices, which were given “IDs” from 0 to 6, and slice 0 denotes the 5-min slice closest to actual crash time while slice 6 farthest from it. Note that slice 0 was actually removed from further analysis to facilitate practical applications, as recommended by ABDEL-ATY et al [6], a condition identified as crash-prone on the basis of datasets close to crash occurrence (i.e., slice 0) may not leave enough time for traffic control measures to modify the hazardous traffic condition. In light of time rules, for instance, a crash occurred on August 15, 2008 at 06:47 AM, the crash time is round to 06:45 AM. Then, records from 06:10 AM to 06:45 AM are extracted, yet the datasets in interval 06:40 AM to 06:45 AM are excluded from further analysis. Spatial rules: The stations nearest to each crash site were identified (based on records in the Wisconsin MV4000 crash database) and labeled as “crash-station”. In this research, traffic data of all three lanes at crash-station, two stations upstream and two stations downstream of crash-station were extracted on the basis of the time rules and spatial rules above.
After data extraction, 5-min raw data were aggregated into three levels, namely, 15-min level (combined by slice 1, 2, and 3), 20-min level (combined by slice 1, 2, 3, and 4), and 30-min level (combined by slice 1, 2, 3, 4, 5, and 6). Still with the above crash on August 15, 2008 at 06:47 AM, 15-min level aggregation uses data from 06:25 AM to 06:40 AM, 20-min level with data from 06:20 AM to 06:40 AM, and 30-min level with data from 06:10 AM to 06:40 AM. Due to intermittent mechanism failure of loop detectors and other random noise, traffic data are not always available for all crashes. A total of 440 rear-end crashes were found to obtain complete loop detector data, and these crashes were chosen as input to this research.
A 1:1 ratio case control strategy was adopted for the non-crash sampling. The case-control sample strategy is widely used in epidemiology with the idea of comparing two groups, one with the outcome of interest and one without it. ABDEL-ATY et al [6] firstly introduced the case-control sample method into predicting freeway cashes. In this study, non-crash cases were collected at the same locations over the same time periods on the same day (when no crash occurred) of weeks under similar weather conditions as the crash cases. This sampling scheme controlled for other essential elements affecting crash occurrence, such as time of day, day of week, roadway geometric design, and weather conditions, therefore, the impacts of these elements were implicitly considered. To improve the validity of the non-crash data, the objective five stations (crash-station, two upstream stations, and two downstream stations) were ensured to observe no crashes in the hour before and the hour after the target sampling period before each non-crash data extraction. This limit can qualify that the corresponding non-crash data were extracted under normal traffic conditions. Loop detector data for each non-crash were extracted following the same time and spatial rules as the crash cases.
2.3 Traffic features description
Aggregated traffic features explored as potential inputs to the model were categorized into the following four groups.
1) Averages (represented by A) of volume, speed, and occupancy [6, 7]:
(1)
where A is the average of volume, speed, and occupancy; sij is the volume, speed, and occupancy on lane i at time slice j, i=1, 2, …, n, j=1, 2, …, t; n is the total number of lanes, n=3 in this study; t is total number of time slices, t=3, 4 and 6, corresponding to 15-min level, 20-min level, and 30-min level, respectively.
2) Standard deviations of volume, speed, and occupancy over time at the same site on the same lane (represented by T). This group of features assesses the variation of traffic parameters over time.
(2)
where T is the standard deviations of volume, speed and occupancy over time; σi is the standard deviation (of volume, speed, and occupancy) on lane i over time; other symbols are defined the same as above.
3) Standard deviations of volume, speed, and occupancy across lanes at the same site over the same time slices (represented by L). This group of features evaluates the horizontal variation (across lanes) of traffic parameters.
(3)
where L is the standard deviations of volume, speed, and occupancy across lanes; σj is the standard deviation (of volume, speed, and occupancy) over time slice j across lanes; other symbols are defined identically as previous definition.
4) Differences of average volume, speed, and occupancy between two adjacent stations (represented by D). This group of features evaluates the longitudinal change of traffic parameters.
(4)
where D is the differences of average volume, speed, and occupancy between two adjacent stations; Ak is the average of volume, speed, and occupancy at station k; Ak+1 is the average of volume, speed, and occupancy at station k+1.
In this study, five objective loop detector stations (crash-station, two upstream stations, and two downstream stations) were named as A through E, with A denoted the farthest upstream station and C represented the station nearest to crash site. Likewise, the three data aggregation levels, namely, 15-min level, 20-min level, and 30-min level were given identification number 1, 2, and 3, respectively. Each traffic feature was represented by a name consisting of four characters, 1) category of aggregated traffic features (A, T, L or D); 2) volume, speed, or occupancy (V, S or O); 3) Station A, B, C, D or E; and 4) Time level 1, 2, or 3. Initially, a total of 57 features (3 category×3 variable×5 station +1 category×3 variable×4 difference) were included in the feature space on each time level.
3 Modeling methodology
3.1 Introduction to SVM
Since this paper focuses on the application of SVM system in rear-end crash potential prediction, fundamentals of SVM theory and general application procedures with regard to the crash potential analysis are presented in this section.
Each case in the training group is represented by (xk, yk), where k =1, 2, 3 …, m, xk∈Rn, xk refers to input traffic related features, n denotes the dimension of the input space; yk∈{1, –1}, yk is the corresponding class label which indicates whether the case is crash (yk=1) or non-crash (yk=–1).
SVM realizes the classification of linearly non- separable datasets via mapping them into a higher dimensional feature space (by the function φ(·)) and constructing a separable hyperplane in this space. The mapping function φ(·) is performed by a Kernel function K(·,·), which defines an inner product in the higher space. Then the form of SVM decision function (predictor) is as follows:
(5)
where sgn(·) extracts the sign of the expression, x is the vector to be classified, xk and yk are the given vector pair, ak≥0, denotes Lagrangian multipliers and b is constant.
In order to classify the instances in the testing group by the above predictor with maximum margin, SVM requires the solution of the following optimization problems:
(6)
where e is the vector of all ones; a is the vector consisting of m Lagrangian multipliers (ak>0); C>0 is the upper bound of ak; Q is a m by m matrix; Qij=yiyjK(xi,xj), and K(xi,xj)=φ(xi)T·φ(xj) is the kernel function. Note that Eq. (6) has been reformed by Lagrangian transformation. Readers can refer to Refs. [14, 19] and Ref. [20] for more mathematical details of SVM learning theory. In this research, radial basis function (RBF) Kernel is applied for the following reasons: RBF Kernel with appropriate parameter pair (γ and C) exerts the same performance as liner Kernel, and it has less numerical difficulties in contrast to polynomial Kernel under high degree [21]. RBF is as follows:
(7)
3.2 Feature selection and optimal Kernel parameters search
SVM does not directly obtain feature relevance, but actually suffers from a large number of irrelevant features [22]. Therefore, combining SVM with proper feature selection strategy is of vital significance in enhancing the accuracy of SVM classification.
F-score is a straightforward technique which evaluates the discrimination of two sets of values. For a specific aggregated traffic features in this research, the larger the F-score is, the more likely the feature of discriminative relevance is. Although a feature selection strategy using F-score alone is simple in computation, it only addresses the intrinsic properties of features, and then the identified significant features may not closely correlate with the classifier. Note that in this research, the F-score strategy is combined with an n-fold cross- validation Kernel parameter searching procedure [21], therefore, the interaction between the intrinsic properties of features and SVM classifier was realized. The procedures are shown as follows:
1) Calculate F-score of all the features with function:
(8)
where and
are the average of the ith feature of the whole, positive (crash) and negative (non-crash) datasets, respectively; Np and Nn are the total number of positive instances and negative instances, respectively; xk,ip is the ith feature of the kth positive instance, k=1, 2, …, Np, Np≥1, and xt,in is the ith feature of the tth negative instances, t=1, 2, …, Nn, Nn≥1.
2) Sort all the features in descending order of F-score, eliminate one feature with the lowest F-score each time to produce a subset of feature space as potential input until only one feature is left (in the feature space).
3) Conduct n-fold cross-validation procedures on each subsets (subsets with 57 features, 56 features, 55 features, …, 3 features, 2 features, 1 feature) to search the optimal Kernel parameter pairs (γ, C). Identify the subset of feature space that provides the highest cross-validation accuracy.
In step 3, an n-fold cross-validation grid search method was applied for searching the optimal Kernel parameter γ and C. The Grid search means trying exponentially growing sequences of γ and C in certain gird space, and the one with the best cross-validation accuracy is selected [21]. Besides, the cross-validation process was applied due to its capability of enhancing the grid search generalization ability. The n-fold cross- validation grid search process is applied with the following procedures:
1) Confine a grid space within log2C∈{–10, –9, …, 9, 10} and log2 γ∈{–10, –9, …, 9, 10} .
2) For each parameter pair (γ, C), divide the training dataset into n (set to 5) datasets of equal size.
3) Each time, one dataset is predicted using classifier trained on the remaining n–1 datasets.
4) Calculate the overall cross-validation accuracy, which is the average validation accuracy of the predicted results of all the 5 datasets.
5) Choose the parameter pair (γ, C) that leads to the highest overall cross-validation classification accuracy.
3.3 Proposed procedure using LIBSVM
An integrated support vector classification software named LIBSVM was used in this work. The procedure for the application of LIBSVM is as follows:
1) Convert the original inductive loop detector data to the format required by the software LIBSVM;
2) Scale each feature of all the instances linearly to the range [0, 1];
3) Conduct cross-validation feature selection strategy to identify the significantly relevant features and the best parameter pairs (γ, C) for RBF Kernel;
4) Use the recognized subset of features and the best parameter pair to create a model as a predictor for instances in the testing group.
3.4 Measures of classification performance
The performance of classification model is evaluated by the following three statistical criterions, namely, overall prediction precision rate (OPPR, Ropp), crash prediction precision rate (CPPR), and false positive rate (FPR, Rfp).
OPPR is the percentage of overall (crash and non-crash) traffic patterns classified correctly on the dataset:
(9)
CPPR is defined as the ratio of the number of crash case classified correctly to the actual number of crashes in the data set, given as a percentage:
(10)
FPR is defined as the ratio of the number of actual non-crash cases that were incorrectly labeled as crashes (false positive cases) to that of non-crash cases.
(11)
A classification model that can achieve high OPPR and CPPR values while obtaining low FPR is regarded as a superior model. Since a balance case control sampling strategy was applied for non-crash sampling, OPPR is an effective measure to obtain classification accuracy. Therefore, OPPR is used as the performance criterion for the cross-validation process. Note that if an unbalanced sampling strategy is adopted, a single application of OPPR would not effectively assess a classifier. In fact, for unbalanced datasets, CPPR and FPR are more reasonable evaluation criterions of model performances.
4 Results and analysis
In this research, values of averages (AV, AS, AO), standard deviations (TV, TS, TO, LV, LS, LO), and differences between adjacent stations (DV, DS, DO) made up the explanatory feature space. Each time level (Time levels 1, 2, 3) included a total of 57 explanatory features. Note that the data assembled into 30-min level (Time level 3) included information contained in 20-min level (Time levels 2) and 15-min level (Time level 1), and a similar relationship also existed between Time level 2 and Time level 1. Therefore, it was decided that traffic flow parameters over only one of the three time levels can be subjected to feature selection process at a time.
The dataset contains 440 crashes and corresponding 440 non-crash cases, was then partitioned 70% into training datasets and 30% into testing datasets randomly. Subsequently, training group explanatory feature spaces corresponding to Time levels 1, 2, and 3 were subjected to the F-score n-fold cross-validation strategy respectively. Table 1 shows the results of the feature selection: the feature subsets containing 14 different features led to the highest classification accuracy for Time level 1, 10 features for Time levels 2, and 8 features for Time level 3.
Table 1 Results of feature selection procedure
Since the F-score feature selection process is combined with n-fold cross-validation Kernel parameter gird search process, the optimal Kernel parameters C and γ on time levels 1, 2, and 3 were also identified in the feature selection process. Table 2 lists the obtained optimal parameter pairs and corresponding the best cross validation accuracy.
Table 2 Results of cross-validation Kernel parameter gird search process
After the feature selection and Kernel parameters optimization processes, three SVM predictors, namely SVM(Level 1a), SVM(Level 2a), and SVM(Level 3a), were trained separately with optimal Kernel parameter pairs and significant explanatory features subsets(shown in Table 1) as inputs. Table 3 shows the SVM prediction results: for the testing group datasets, all SVM models achieved over 78.1% OPPR, over 72.7% CPPR, and less than 22.3% FPR. Although SVM(Level 2a) achieves the best FPR of 16.1%, it also provides the worst CPPR of 72.7%; SVM(Level 1a) achieves the best OPPR of 81.1%, CPPR of 79.8% with slightly higher FPR of 17.5%. In short, the SVM model established using 15-min level traffic flow inputs (i.e., SVM (Level 1a)) is superior to models trained over 20-min level data or 30-min level data, this result indicates that the traffic condition that is close to the prediction target time can provide better rear-ends potential prediction.
Table 3 Classification results of SVM and MLP model
In order to assess the impact of feature selection procedure on SVM classification accuracy, another three SVM models were established without feature selection process. These three SVM models were each trained using all the 57 features on the corresponding time levels. And a similar grid search method was adopted to identify the optimal Kernel parameters. The results are also listed in Table 2. The prediction results of the new SVM models, namely SVM(Level 1b), SVM(Level 2b), and SVM(Level 3b), are also provided in Table 3. It can be observed that, all SVM models with unselected feature inputs achieve over 96.4% OPPR, over 97.3% CPPR, and less than 4.5% FPR over the training group datasets. However, the results for the testing group are comparatively poor as evident from the less than 73.2% OPPR, less than 72.3% CPPR, and over 25.9% FPR, revealing that SVM models with selected input features achieve more accurate classification for testing group datasets. Therefore, the F-score feature selection combined with n-fold cross-validation Kernel parameter gird search process can greatly enhance the accuracy and generalization capability of SVM models.
Since most traditional statistical RCPP models are one-size-fits-all approaches and do not conduct rear-end specific feature selection and model building processes, it is hard to compare statistical models’ performances with that of SVM models. Besides, neural network models achieved reasonable accuracy in previous researches on rear-end crash potential prediction [12]. Therefore, the multi-layer perceptron (MLP) network was selected as the benchmark to comparatively evaluate the performance of SVM models. The two-layer (excluding the input layer) MLP neural network adopted the error back propagation supervised training algorithm, and applied sigmoid transfer function (Tansig) and linear transfer function (Purelin) in the hidden layer and the output layer. Three MLP models MLP (Level 1), MLP (Level 2), and MLP (Level 3) were trained and tested with the same datasets as SVM (Level 1a), SVM (Level 2a), and SVM (Level 3a) respectively. A key issue in MLP network application is deciding the number of neurons in the hidden layer. To select appropriate number of hidden layer neurons, performance of MLP models using 10 varying hidden nodes are compared. The range for both MLP (Level 1) and MLP (Level 2) is confined within 11 to 20, while the range for MLP (Level 3) was 1 to 10. The MLP classification results are shown in Table 4.
For each time level, the classification result of the best MLP model is also provided in Table 3. The results show that MLP (Level 1) is the best MLP model, for it achieves the highest OPPR of 75.5%, the highest CPPR of 71.8%, and the lowest FPR of 20.9%. Besides, the comparison results of SVM model and MLP models show that the SVM classification models achieve higher classification accuracy and lower FPR than MLP models, therefore, SVM models outperform the MLP models and could be used as effective tools in rear-ends potential prediction application.
5 Real-time application simulations
To evaluate the performance of different predictors in analyzing actual traffic conditions, a simulation experiment is conducted using trained SVM (Level 1a), SVM (Level 1b) and MLP (Level 1) predictors. The inputs into these predictors were the aggregated real-life traffic data obtained from loop detector stations located at CLEVELAND AVE in eastbound on I-894 on September 22, 2005, and an actual rear-end crash occurred at 07:15 AM at this site. Note that due to intermittent data loss, a total of 228 5-min-level intervals are found to obtain complete loop detector data over the 24-h period.
Table 4 Classification results of MLP model with varying hidden layer neurons
The simulation procedures are as follows: firstly, collect the 5-min level traffic flow data for all through lanes at the target site (i.e., a total of 5 research stations, loop detector station nearest to the target location, two stations upstream and two stations downstream of target stations). Then, transmit the collected traffic data to the prediction unit to recognize the premier explanatory feature space. Finally, update data continuously and realize the real-time prediction of rear-end crash potential. For instance, by the time 07:10 AM, when the prediction unit receives three 5-min level traffic flow data, the premier subsets of explanatory features (i.e., features listed in Table 1) is calculated by a program and the input feature space of trained predictors is updated; then the prediction unit predicts the traffic state of 07:15 AM as rear-end crash prone or not. If a crash prone state is identified, the VMS near the target detector station would issue warning to drivers.
The prediction results show that all the three models classify the actual crash occurrence as high crash potential. SVM (Level 1a) achieves the lowest 16.3% FPR, whereas SVM (Level 1b) and MLP (level 1) achieve 20.3% and 18.0% FPR, respectively.
According to the simulation results, although only one crash actually occurs, the tested three predictors label 16.3%, 20.3% and 18.0% 5-min intervals as bearing high crash potentials respectively. But whether all these false positive predictions would be false alarms? To explore this issue, the occupancy profile at CLEVELAND AVE between 07:00 AM and 10:00 AM and the corresponding simulation results are plotted in Fig. 1. It is clear that most intervals predicted as bearing high crash potential (i.e., 07:00 AM to 07:20 AM, 08:10 AM to 08:20 AM, and 09:25 AM to 09:35 AM) fell in high occupancy variation periods, which is a symbol of shock wave conditions. Since shock wave conditions were identified to associate with high rear-ending risks [23], these false positive predictions could equally be crash-prone conditions. Therefore, not all false positive predictions are false alarms and issuing warnings to drivers regarding all the predicted high-risk intervals could effectively reduce the occurrence of rear-end crashes.
6 Conclusions
1) A framework for predicting freeway rear-ends potential is proposed using support vector machine (SVM) classifiers. The SVM predictor using 15-min level traffic flow data achieves the highest prediction accuracy, which indicates that the traffic flow data closer to the target time of prediction can form better “rear-end crash precursors”.
2) The feature selection procedure can greatly enhance the performance of SVM models in terms of accuracy and generalization capability. The SVM models possess better prediction capability than MLP models.
3) The proposed SVM models are capable of predicting rear-end crashes caused by adverse traffic flow condition. Although the non-crash sampling strategy in this work controls external factors geometric design of road, weather condition, etc., these factors are actually not used as inputs for SVM modeling. This implies that SVM classifiers regarding roadway and environmental conditions may demonstrate better performance. Therefore, future research efforts should be devoted to establishing SVM predictors with consideration of both real-time traffic flow condition and external factors.
Fig. 1 Occupancy profile at CLEVELAND AVE and simulation experiment results
Acknowledgments
The authors would like to thank the support from Wisconsin Traffic Operations and Safety Laboratory for their data provision, USA.
References
[1] MADANAT S, LIU P C. A prototype system for real-time incident likelihood prediction [R]. Washington D.C.: DEA Project Final Report (ITS-2). Transportation Research Board of the National Academies, 1995.
[2] OH C, OH J, RITCHIE S, CHAN M. Real-time estimation of freeway accident likelihood [C]// CD-ROM, Transportation Research Board of the National Academies. Washington D.C., 2001.
[3] LEE C, SACCOMANNO F, HELLINGA B. Analysis of crash precursors on instrumented freeways [J]. Transportation Research Record: Journal of the Transportation Research Board, 2002, 1784: 1–8.
[4] LEE C, HELLINGA B, SACCOMANNO F. Real-time crash prediction model for application to crash prevention in freeway traffic [J]. Transportation Research Record: Journal of the Transportation Research Board, 2003, 1840: 67–77.
[5] PANDE A, ABDEL-ATY M. Classification of real-time traffic speed patterns to predict crashes on freeways [C]// CD-ROM, Transportation Research Board of the National Academies. Washington D.C., 2004.
[6] ABDEL-ATY M, UDDIN N, PANDE A, ABDALLA F, HSIA L. Predicting freeway crashes from loop detector data by matched case-control logistic regression [J]. Transportation Research Record: Journal of the Transportation Research Board, 2004, 1897: 88–95.
[7] ABDEL-ATY M, UDDIN N, PANDE A. Split models for predicting multivehicle crashes during high-speed and low-speed operating conditions on freeways [J]. Transportation Research Record: Journal of the Transportation Research Board, 2005, 1908: 51–58.
[8] GOLOB T F, RECKER W W. Relationships among urban freeway accidents, traffic flow, weather, and lighting conditions [J]. Journal of Transportation Engineering, 2003, 129: 342–353.
[9] GOLOB T F, RECKER W W. A method for relating type of crash to traffic flow characteristics on urban freeways [J]. Transportation Research Part A: Policy and Practice, 2004, 38: 53–80.
[10] XU Cheng-cheng, WANG Wei, LIU Pan. A genetic programming model for real-time crash prediction on freeways [J]. IEEE Transactions on Intelligent Transportation Systems, 2013, 14(2): 574–586.
[11] SUN Jie, SUN Jian. A dynamic bayesian network model for real-time crash prediction using traffic speed conditions data [J]. Transportation Research Part C: Emerging Technologies, 2015, 54: 176–186.
[12] PANDE A, ABDEL-ATY M. Comprehensive analysis of the relationship between real-time traffic surveillance data and rear-end crashes on freeways [J]. Transportation Research Record: Journal of the Transportation Research Board, 2006, 1953: 31–40.
[13] PANDE A, ABDEL-ATY M. Assessment of freeway traffic parameters leading to lane-change related collisions [J]. Accident Analysis & Prevention, 2006, 38: 936–948.
[14] VAPNIK V N. The nature of statistical learning theory [M]. New York: Springer-Verlag, 1995.
[15] FANG Yuan, CHEU Ruey Long. Incident detection using support vector machines [J]. Transportation Research Part C, 2003, 11: 309–328.
[16] ZHANG Yun-long, XIE Yuan-chang. Forecasting of short-term freeway volume with v-support vector machines [J]. Transportation Research Record: Journal of the Transportation Research Board, 2007, 2024: 92–99.
[17] LI Xiu-gang, LORD D, ZHANG Yun-long, XIE Yuan-chang. Predicting motor vehicle crashes using support vector machine models [J]. Accident Analysis and Prevention, 2008, 40: 1611–1618.
[18] LI Zhi-bin, LIU Pan, WANG Wei, XU Cheng-cheng. Using support vector machine models for crash injury severity analysis [J]. Accident Analysis & Prevention, 2012, 45: 478–486.
[19] VAPNIK S V N. The support vector method of function estimation [M]. New York: Springer-Verlag, 1998.
[20] BURGES C J C. A tutorial on support vector machines for pattern recognition [M]. New York: Springer-Verlag, 1998.
[21] CHANG Chih-chung, LIN Chih-jen. LIBSVM: A library for support vector machines [M]. ACM, 2011.
[22] CHEN Yi-wei, LIN Chih-jen. Combining SVMs with various feature selection strategies [J]. Studies in Fuzziness & Soft Computing, 2006, 207: 315–324.
[23] HOURDOS J N, GARG V, MICHALOPOULOS P G, DAVIS G. Real-time detection of crash-prone conditions at freeway high-crash locations [J]. Transportation Research Record: Journal of the Transportation Research Board, 2006, 1968: 83–91.
(Edited by FANG Jing-hua)
Cite this article as: QU Xu, WANG Wei, WANG Wen-fu, LIU Pan. Real-time rear-end crash potential prediction on freeways [J]. Journal of Central South University, 2017, 24(11): 2664–2673. DOI: https://doi.org/10.1007/s11771- 017-3679-2.
Foundation item: Project(BK20160685) supported by the Science Foundation of Jiangsu Province, China; Project(61620106002) supported by the National Natural Science Foundation of China
Received date: 2016-12-25; Accepted date: 2017-04-07
Corresponding author: QU Xu, Assistant Professor, PhD; Tel: +86–13584010880; E-mail: quxu@seu.edu.cn