Hybrid tracking model and GSLM based neural network for crowd behavior recognition
来源期刊:中南大学学报(英文版)2017年第9期
论文作者:Manoj Kumar Charul Bhatnagar
文章页码:2071 - 2081
Key words:crowd video; crowd behavior; tracking; recognition; neural network; gravitational search algorithm
Abstract: Crowd behaviors analysis is the ‘state of art’ research topic in the field of computer vision which provides applications in video surveillance to crowd safety, event detection, security, etc. Literature presents some of the works related to crowd behavior detection and analysis. In crowd behavior detection, varying density of crowds and motion patterns appears to be complex occlusions for the researchers. This work presents a novel crowd behavior detection system to improve these restrictions. The proposed crowd behavior detection system is developed using hybrid tracking model and integrated features enabled neural network. The object movement and activity in the proposed crowded behavior detection system is assessed using proposed GSLM-based neural network. GSLM based neural network is developed by integrating the gravitational search algorithm with LM algorithm of the neural network to increase the learning process of the network. The performance of the proposed crowd behavior detection system is validated over five different videos and analyzed using accuracy. The experimentation results in the crowd behavior detection with a maximum accuracy of 93% which proves the efficacy of the proposed system in video surveillance with security concerns.
Cite this article as: Manoj Kumar, Charul Bhatnagar. Hybrid tracking model and GSLM based neural network for crowd behavior recognition [J]. Journal of Central South University, 2017, 24(8): 2071–2081. DOI: https://doi.org/10.1007/ s11771-017-3616-4.
J. Cent. South Univ. (2017) 24: 2071-2081
DOI: https://doi.org/10.1007/s11771-017-3616-4
Manoj Kumar, Charul Bhatnagar
Department of Computer Engineering and Applications, GLA University, Mathura, India
Central South University Press and Springer-Verlag GmbH Germany 2017
Abstract: Crowd behaviors analysis is the ‘state of art’ research topic in the field of computer vision which provides applications in video surveillance to crowd safety, event detection, security, etc. Literature presents some of the works related to crowd behavior detection and analysis. In crowd behavior detection, varying density of crowds and motion patterns appears to be complex occlusions for the researchers. This work presents a novel crowd behavior detection system to improve these restrictions. The proposed crowd behavior detection system is developed using hybrid tracking model and integrated features enabled neural network. The object movement and activity in the proposed crowded behavior detection system is assessed using proposed GSLM-based neural network. GSLM based neural network is developed by integrating the gravitational search algorithm with LM algorithm of the neural network to increase the learning process of the network. The performance of the proposed crowd behavior detection system is validated over five different videos and analyzed using accuracy. The experimentation results in the crowd behavior detection with a maximum accuracy of 93% which proves the efficacy of the proposed system in video surveillance with security concerns.
Key words: crowd video; crowd behavior; tracking; recognition; neural network; gravitational search algorithm
1 Introduction
In the field of computer vision, intelligent video surveillance is one of the ‘state of art’ research areas because of the sensitive security concerns. Many research topics are available in intelligent video surveillance despite the fact that tracking and behavior analysis from the crowded video are considered significant problem because of a number of applications comprising of behavior modeling, traffic control, event monitoring and security applications [1-4]. However, tracking and behavior detection in dense crowd is challenging [5, 6]. This is because in crowd large number of objects is close with each other, which makes it tough to establish correspondences across frames. In recent years, a number of security agencies focused in dense crowd management have emerged to respond to the need.
Attention over the automatic crowd beachgoer detection system comes across the research community because of abnormal crowd behavior in public events [7]. Crow behavior analysis is majorly performed in two tasks: 1) Motion information extraction and 2) abnormal behavior modeling. Motion information extraction is centered to crowd tracking which is the process used to estimate the speed, direction and location of crowd in a video sequence. The latter task abnormal behavior modeling is used to detect the anomalous events [8].
Conversely, the abnormal behavior modeling is vulnerable by general difficulties of the anomaly detection problem [9]. The basic restriction is the lack of universal definition of anomaly. Another constraint is infeasibility to enumerate the set of anomalies that are present in a given surveillance scenario [10]. The anomaly count is impossible to achieve because of the sparseness, rarity, and discontinuity of anomalous events which limits the number of the examples available to train an anomaly detection system. These constrictions lead the anomaly detection system an exceptionally challenging one. While this has motivated a great diversity of solutions, it is generally quite difficult to quantitatively compare different methods. Normally, these combine dissimilar representations of motion and appearance with different graphical models of normalcy, which are usually made to specific scene domains. Abnormalities are themselves defined in a somewhat subjective form, sometimes according to what the algorithms can detect. In some cases, different authors even define different anomalies on common data sets [10].
In this work, we considered five different videos which are marathon sequences as well as pedestrian video. Vehicles in the considered videos are considered as anomaly. This work presents a crowd behaviour detection system using integrated features from hybrid tracking model and GSLM based neural network. Primarily, the input videos are fed to the hybrid tracking model. In the hybrid tracking model which comprises EWMA model and motion model, the object movements in the video is tracked. Subsequently, after tracking path identification, the characteristics of the objects in the video is found by making use of twelve features extracted from the tracking path. The extracted features are used to train the GSLM-based neural network to classify the direction of movement and activity. The GSLM based neural network is proposed by integrating the LM algorithm and gravitational search algorithm (GSA) to enhance the learning process by finding the optimal training weights.
2 Literature review
Data-driven method for crowd tracking exploits the learned pattern to handle fast and dense crowd flow. Here, the computational task for searching motion pattern seems problematic [11]. HAJERFRADI et al [12] proposed crowd density map-based tracking method for monitoring persons in crowd in video surveillance data. It has the advantage of using local features as an observation of a probabilistic density function, whereas inflexibility in person detections for high level environment change seems tricky. An approach of confirmation-by-classification for crowd behavior detection in extreme occlusion was proposed by ALI et al [13]. However, the output is unreliable in this approach. A more robust crowd behavior detection system based on convolutional neural network was proposed by CAO et al [14]. This method requires large enough data and enough diversity. Discriminative structure prediction model captures the interdependence of multiple influence factors [15]. Labeling error seems crucial in this method.
An LNND descriptor based anomaly detection system was proposed by HU et al [16]. WU et al [17] proposed a Bayesian framework for escape detection by directly modeling crowd motion in both the presence and absence of escape events. The concept of potential destinations and divergent center for the characterization of crowd motion and constructing the corresponding class conditional probability density functions of optical flow is introduced in this method. Anomaly detection system based on acceleration feature was proposed by CHEN et al [18]. Varying from the preceding works, this method explores the global moving relation between the current behavior state and the previous behavior state. LI et al [10] proposed an anomaly detector that spans time, space and spatial scale, using a joint representation of video appearance and dynamics and globally consistent inference. The crowd scenes are modeled with a hierarchy of MDT model in this method which equates temporal anomalies to background subtraction, spatial anomalies to discriminant saliency, and integrated anomaly scores across time, space and scale with a CRF. SAXENA et al [8] deliberated the crowd feature selection and extraction and proposed a multiple-frame feature point detection and tracking based on the KLT tracker.
3 Motivation behind approach
Problem definition: Assume that the input crowded video V which is represented as a sequence of frames, The crowded input video contains multiple objects and the utmost intention is to track the moving path of every jth object using frame sequence. The deciding challenging considered here is to classify the every jth object based on movement and behavior. The object’s movement is to be classified as left, right, front and back and the behavior of the object is to be classified normal or abnormal.
Challenges: The classification of object’s movement and behavior becomes a challenging issue due to the following reasons:
1) The overlapping of objects in crowd video directly affects the estimation of direction as well as behavior;
2) The characteristic definition to classify the abnormal event is very challenging because when an abnormal event happens, everyone will try to escape from the location;
3) The video quality seems problematic for developing an automatic robust system with noise adaptability;
4) The important challenge on algorithmic part is the correct selection and utilization of intelligence algorithms and image processing techniques to effectively classify the direction and behavior.
Contributions of this work: The first contribution is to effectively select the relevant feature set preserving the characteristics about direction and behavior. Here, we define 12 features to preserve the direction and behavior dependent characteristics.
The second contribution is to develop a hybrid learning algorithm for the neural network to update weight in training for classification of direction and behavior. Here, the existing Levenberg–Marquardt algorithm [19] is integrated with the GSA to develop a new learning algorithm, called GSLM.
4 Proposed method: crowd behavior recognition using hybrid tracking model and GSLM neural network
This section presents the proposed crowd behavior recognition using hybrid tracking model and GSLM neural network. The processing steps involved in proposed crowd behavior detection system are 1) object tracking; 2) feature extraction and 3) GSLM NN based crowd behavior detection. In object tracking, the persons present in crowd video are tracked using the zero- stopping constraint based hybrid tacking model [20]. In feature extraction, significant physical parameters of tracked objects are extracted as features. Subsequently, proposed GSLM based neural network is used to detect the crowd behavior such as direction of movement (four directions), and activity of person (normal or abnormal).The block diagram of the proposed crowd behavior recognition is given in Fig. 1.
4.1 Object path tracking
Object path tracking is the first step in the proposed method. The path tracking is considered significant since the direction of the movement as well as behavior of person is identified based on object path characteristics. The object tracking, i.e. person tracking in the proposed method is performed using the zero-stopping constraint- based hybrid tracking model given [20]. Primarily, head objects are estimated by neighborhood search algorithm and visual based tracking. After object head extraction, the movement of the head part of object is tracked using motion estimation model and EWMA model. The tracking results are combined with zero-stopping constraint.
4.1.1 Optimal head object extraction
For optimal head part extraction, the reference point from the video sequence is obligatory. Initially, the input video V is read out and the frames are extracted. The extracted frames are given as input to hybrid tracking model. Let us assume that the input video contains n number of frames. At first, reference points are randomly selected from the first frame v1 to find out the head part of human object presented in the first frame.
where M×N is size of the image.
After reference point Rj extraction, the head part is found out using the neighborhood-based estimation procedure in which the minimum bound rectangle is designed by setting the reference point as centre pixel and with height h and width of w, and then, reference point is moved along left and right direction in the same key frame to find the optimal reference point. The optimal reference point selected out of the neighborhood search procedure is optimal head object.
4.1.2 Tracking procedure
When the optimal head object is detected, the movement of the head part is found out using the motion-based estimation model and EWMA model. Generally, the tracking procedure in motion-based estimation is usually done by matching the objects in the current frame to next frame. Let be the path tracked by motion-based estimation procedure for the jth object. Consequently, the head location identified from previous step is added to the path.
According to EWMA model, the tracking path is defined as follows:
,
where is the tracking location of the jth object of the (t+1)th frame by EWMA model; is the tracking location of the jth object of the tth frame by EWMA model, and DF is factor and Rj(t) is reference location of jth object of tth frame.
Fig. 1 Block diagram
As a final point, the tracked output from motion- estimated model and EWMA model is integrated with the condition of zero-stopping constraint to obtain the original tracking path of the jth object in input crowded video V.
4.2 Feature extraction
To find the direction of movement and behavior of the jth object, the feature points are extracted from the tracked path of the jth object of the input video Tj. The feature which deliberates the characteristics of the direction and behavior of the object is necessary for crowd behavior detection. In this proposed method, six features such as conditional increment, conditional decrement, conditional irregular, speed, variance and entropy are selected as feature points. The features such as conditional increment, conditional decrement, and conditional irregular preserve the characteristics of direction of every object. The later features such as speed, variance and entropy preserve the behavior of the crowd objects. Let us consider the tracking path of the jth object of the input video, Tj can be represented as follows,
where is x and y coordinate of the first location of the jth object. The differential path from the tracked path is computed by taking the successive difference of location and the formula for differential computation is expressed as
From the differential path, the conditional increment, conditional decrement and conditional irregular features points are computed. The feature parameters conditional increment and conditional decrement are used to check whether the direction of movement of objects is always in the positive direction or negative direction respectively. The feature parameters conditional irregular is utilized to obtain the values of the direction of movement, when the direction is irregular at every time. The following formula is used to compute the features.
The fourth feature that we considered here is the speed of the object. Generally, the speed of the object is the ratio of distance travelled and the time duration. The formula for computation of speed of the crowd object from the tracking path is expressed as follows:
where Nl is mapping distance (m) corresponding to one pixel; is the total distance travelled in terms of pixel; nx is the number of frames travelled and nT is number of frames per second.
The fourth and fifth parameters considered in the proposed method are related to statistics which are also important to find the behavior of the crowd object. The variance is computed as follows:
The entropy is computed as follows:
where u(l x) is the unique data values in the location path l x . Likewise, the same six features are extracted from the y location of the tracking path. Accordingly, from the location of tracked path, 12 feature points are extracted to detect the direction and behavior of crowd object. The final feature points of the jth object are represented in feature as follows:
4.3 GSLM-neural network based crowd behavior detection
When the features are extracted from the tracking path of every object from the input crowd video, the proposed GSLM-based neural network is exploited to find the direction and activity of objects. Feed forward neural network (FFNN) [21] is one of the widely used artificial intelligence models to perform the classification using two important processes like training and testing. In training process, the neurons in the network are learned utilizing by finding the optimal weights and then, learnt weights are utilized to find the class label of the tracked objects about the direction of movement and activity of the object. The optimal combination of connection weight in the learning process is compulsory to achieve classification with a minimal error. However, in learning algorithm of FFNN, the weight solution converges locally but not globally.
4.3.1 GSLM network architecture
The first step in neural network is an initialization of the neural network. Primarily, the input and output layer is initialized with appropriate requirement. In the proposed method, 12 features are initialized as input and 6 bit elements required to get two outputs (direction and activity) are initialized as output. The architecture of the adapted neural network is represented in Fig. 2.
The first four output bit elements in network resemble the direction of the object and the last two outputs resemble the activity of the object. The number of hidden layers is assigned to one and the number of hidden neurons is found out from the experimentation to fix up the exact number.
4.3.2 GSLM training process
The GSLM training process is originated with random initialization of weights. In the adapted neural network, 36 weighted elements to connect the input layer and hidden layer and Q*6 weighted elements to connect hidden layer with output layer are initialized with random value at the start of learning process. Besides, Q bias weights in the hidden layer and six bias elements to connect the output layer are needed. So, weight vector can be represented as
where wi are node weights and bias weight. The initialized weights are dynamically updated in consecutive iteration t. The formulae used for updating the weights using LM algorithm [19] are given as
where μ is the Levenberg’s damping factor which ranges from 0 to 1; I is the identity matrix; J is the Jacobian matrix for the system which is obtained by taking the first-order partial derivatives of a vector-valued function. This can be calculated by the discovery of the partial derivatives of each output in respect to each weight, and it has the form F(r, W) which is a nonlinear function viewed for neural network. R is the feature vector given to the neural network; W are the weights of the network; x is total number of weights defined in the neural network, including node and bias weight. The gradient matrix g is computed using the following equation.
The computed weights are applied to the neural network function represented as
where F(r, WLM) is the network function computed for every feature vector of the training signal using the weight vector W; Y is the associate output vector approximated or predicted by the network. For each input vector used on the training network, the error value is computed using the formula as
The error value in output of NN with LM updated weight is found to be maximal, which ascends the over fitting problem in NN. As a solution, in the proposed neural network new weight based on gravitational search algorithm [22] is used to find the optimal combination of weight in learning process. In the current iteration, the two weights, such as, Wt and are present. These weights are assumed as initialized solution in GSA algorithm. After initialization, the fitness values of solutions are calculated. After function evaluation,the gravitational constant value is updated, masses,accelerations and forces for each solution are found out. The solution with heavy mass is considered the fittest solution. From gravitational search algorithm, the position and velocity updation is applied to the solution vectors to generate a weight The search process is iterated with generation of all masses and forces.The final weight vector obtained after velocity and position updation process is known as
Fig. 2 Architecture of adapted neural network
The GSA generated weights are again applied to neural network function to obtain the neural network output for error amputation.
The error values because of LM updated weight and GSA updated weight ELM and EGA are computed and the one which has the less value is assigned as the error value (Et+1) for the current iteration and its corresponding weight is assigned as the final weight for the current iteration (Wt+1). Since the weight is generated by optimal search procedure, the solution converges locally as well as globally minimizing the error value.
The error of the current iteration Et+1 and previous iteration Et are compared. If the value is decreased, μ is decreased with a factor v. If the error value is increased, μ is increased with a factor v. This process is repeated T number of iterations and the final weights are taken as the trained weights which are then used for the direction and activity estimation of the crowd object. Table 1 shows the pseudo code of the proposed GSLM algorithm.
5 Results and discussion
5.1 Experimental set up
The proposed crowd behavior recognition system is experimented in a system with following configurations: 4 GB RAM Intel processor, and Windows 64 bit OS. The software tool used for implementation is Matlab 8.3 (R2014a). On experimentation, the size of hidden layer in neural network is fixed to one and the size of hidden neurons is obtained from the experimentation.
5.2 Dataset description
The experimental validation is performed over five different datasets. Three datasets Marathon-1, Marathon- 2 and Marathon-3 are taken from high density crowd dataset available [23]. Additional two datasets UCSD-bicycle and UCSD-car are taken from UCSD datasets [24].
Marathon-1: This dataset consists of video sequence of participants in a marathon captured from an overhead camera. This sequence is considered hard because the strict constriction among the participants due to similar looking outfits worn by them. The sequence has 492 frames, but each athlete remains in the field of view, on average, for 120 frames.
Table 1 GLSM Algorithm
Marathon-2: This sequence also comprises of a marathon. The variation is with camera capturing the sequence which is installed in a multistory building. As a result, the number of pixels on each individual is fewer. The illumination change is also present in this sequence because of athlete movements into shadow of the neighbouring buildings. It consists of 333 frames.
Marathon-3: The third sequence is also a marathon sequence which is exceptionally challenging due to two aspects: 1) appearance drastically changes due to the U-shape of the path; 2) the number of pixels on target varies due to the perspective effect. The fewer pixels make it more difficult to resolve even partial occlusions. It consists of 453 frames.
UCSD-bicycle: This video has 100 frames which contain movement of multiple persons with bicycle.
UCSD-car: This video has 100 frames which contain movement of multiple persons with car.
5.3 Validation measure
The performance of the classification in estimation of direction as well as activity detection in proposed crowd behavior detection system is validated using classification accuracy A. It is defined as follows:
where true positive (TP) is correctly identified; false positive (FP) is incorrectly identified; true negative (TN) is correctly rejected and false negative (FN) are incorrectly rejected.
5.4 Comparative methodology
The performance of proposed method is compared with four techniques, particle filtering+LM, particle filtering+GSLM, hybrid model+LM, and hybrid model+ SLM. The comparative technique for the analysis is developed by integrating tracking models and classification models. Particle filtering [25] and hybrid model [20] are tracking models considered and LM-based neural network and GSLM-based neural network are classification models considered.
5.5 Experimental results
The sample results of the proposed crowd behavior detection method are explained in this section. Figure 3(a) shows the sample frame from marathon-1. The marathon-1 contains 492 frames which have 100 objects.Figure 3(b) shows the detected optimal head objects of marathon-1 from the input frame using neighborhood search algorithm and it is marked with rectangular box. After optimal head object estimation, the path tracking is performed with hybrid model. Figure 3(c) shows the path tracked by the hybrid tracking model from which the features are extracted for the crowd behavior detection. The consecutive circle marking in Fig. 3(c) resembles the tracked path of the object. Similarly, Figs. 4-7 show the sample intermediate results of the marathons-2 and 3, UCSD-bicycle and UCSD-car.
5.6 Performance analysis based on direction of movement detection
This section shows the performance evaluation of the proposed method in detecting the direction movement of 100 objects presented in marathons-1-3. UCSD videos contain 9 objects. Figure 8(a) shows the accuracy graph for marathon-1 in detecting the direction of 100 objects for the various numbers of hidden neurons (Q). Here, the number of hidden neurons is varied from 10 to 50 and accuracy parameter is obtained for four different methods. From the graph, the accuracies of particle filtering+LM, particle filtering+ GSLM, hybrid model+LM and hybrid model+ GSLM achieve 66.67%, 35%, 66.67% and 93%, respectively, when the Q value is fixed to 40. The accuracy of the method is increased when Q value is increased and the accuracies of all the four methods are 87.5%, 77.77%, 87.5% and 93%, respectively, when Q value is fixed to 50. The maximum accuracy of 93% is obtained for the hybrid model+GSLM.
Fig. 3 Photos of sample frame from marathon-1 (a), detected object (b) and tracking path (c)
Fig. 4 Photos of sample frame from marathon-2 (a), detected object (b) and tracking path (c)
Fig. 5 Photos of sample frame from marathon-3 (a), detected object (b) and tracking path (c)
Fig. 6 Photos of sample frame from UCSD-bicycle (a), detected object (b) and tracking path (c)
Fig. 7 Photos of sample frame from UCSD-car (a), detected object (b) and tracking path (c)
Fig. 8 Accuracy graphs of performance-detecting direction of movement:
Figure 8(b) shows the accuracy graph of marathon-2 for various numbers of Q values. When the Q value is fixed to 20, the accuracy values attained by the comparative methods particle filtering+LM, particle filtering+GSLM, hybrid model+LM and hybrid model+GSLM are 77.77%, 88%, 35% and 88.89%, respectively. From Fig. 8(b), it is proved that the hybrid model+GSLM achieves the maximum accuracy for all the different Q values. The maximum accuracy reached by the hybrid model+ GSLM is 93% which is higher than all the existing methods.
Figure 9 shows the accuracy of the proposed method with existing methods in marathon-3. From the analysis of marathon-3, the maximum accuracy of particle filtering+LM, particle filtering+GSLM, hybrid model+LM and hybrid model+GSLM achieves 88.89%, 88.89%, 87.500% and 93% when the Q value is fixed to 50. When comparing the performance of all the four methods, the proposed hybrid model+GSLM out- performs all the existing methods for various Q values. Similarly, the performance comparison of UCSD-bicycle is given in Fig. 9(b). Here, the accuracy curve is increased when Q value is increased from 10 to 50. For the Q value of 40, the proposed hybrid model+GSLM achieves the accuracy of 88.889% as compared with the particle filtering+LM which obtains the accuracy of 22.22%. Overall, the maximum accuracy reached by the proposed hybrid model+LM is 93% which is higher than other methods.
Fig. 9 Accuracy graphs of performance detecting direction of movement:
Figure 10 shows the accuracy graph for UCSD-car. The accuracy graph is plotted by varying Q value from 10 to 50. Figure 10 shows the comparison of four different methods like, particle filtering+LM, particle filtering+GSLM, hybrid model+LM and hybrid model+GSLM in estimating the direction of movement of objects. When Q value is fixed to 50, the particle filtering+LM obtains 35% and particle filtering+ GSLM obtains the accuracy value of 35%. For the same Q value, hybrid model+LM and hybrid model+ GSLM obtain the accuracy of 66.667%, respectively. The best case accuracy value of 93% is obtained by the proposed method and the worst case value of 22.22% is attained by particle filtering+LM and particle filtering+GSLM. From the graph, we ensure that the proposed particle filtering+LM, particle filtering+GSLM, hybrid model+LM and hybrid model+ GSLM outperform the existing methods in all the Q values.
Fig. 10 Accuracy graph of performance detecting direction of movement for UCSD-car
5.7 Performance analysis based on activity detection
This section shows the performance evaluation of the proposed method in activity detection of 100 objects presented in marathons-1-3. UCSD-bicycle and UCSD-car contain 9 objects. The accuracy graph is plotted by varying the Q value from 10 to 50. Figure 11 shows the comparison of four different methods, like particle filtering+LM, particle filtering+GSLM, hybrid model+LM and hybrid model+GSLMin detecting the activity of objects whether it is normal or abnormal. From Fig. 11(a), the accuracy curve is increased when Q value is increased from 10 to 50. For the Q value of 10, the proposed hybrid model+GSLM achieves the accuracy of 88.89%. The maximal accuracy value of 93% is attained by the proposed method. From Fig. 12(b), the maximum accuracy reached by the proposed hybrid model+GSLM is 93% which is higher than other methods in marathon-2.
Figure 12(a) shows the accuracy graph for marathon-3. When Q value is fixed to 10, the particle filtering+LM obtains 88.89% and particle filtering+GSLM obtains the accuracy value of 88.89%. For the same Q value, hybrid model+LM and hybrid model+ GSLM obtain the accuracy of 66.647% and 93%, respectively. From the graph, we ensure that the proposed particle filtering+LM, particle filtering+GSLM, hybrid model+LM and hybrid model+GSLM out- performed the existing methods in all the Q values. Similarly, the performance comparison of UCSD-bicycle is given in Fig. 12(b). For the Q value of 10, the proposed hybrid model+GSLM achieves the accuracy of 88.89%, whereas the existing particle filtering+LM, particle filtering+GSLM, hybrid model+LM methods obtain the accuracy value of 44.445%, 44.44% and 88.89%, respectively. The best case accuracy value of 93% is attained by the proposed method.
Figure 13 shows the accuracy graph UCSD-car in detecting the activity of 9 objects for the various numbers of hidden neurons (Q). Here, number of hidden neurons is varied from 10 to 50 and accuracy parameter is obtained for four different methods. From the graph, the accuracies of particle filtering+LM, particle filtering+GSLM, hybrid model+LM and hybrid model+ GSLM achieve 33.33%, 22.22%, 88.89%, respectively, and 88.89% when the Q value is fixed to 10. The accuracy of the method is increased when Q value is increased and the accuracy of all the four methods is maximum when Q value is fixed to 50. The maximum accuracy of 93% is obtained for the hybrid model+GSLM.
Fig. 11 Accuracy graphs of performance detecting direction of movement:
Fig. 12 Accuracy graphs of performance detecting direction of movement:
Fig. 13 Accuracy graph of performance detecting direction of movement for UCSD-car
6 Conclusions
A novel crowd behavior detection system using hybrid tracking model and GSLM based neural network is presented. The hybrid tracking model is used to track the path of the object in the crowded video sequence. Twelve different features preserving the characteristics of the objects in estimating the direction and activity are extracted from the tracked path. The extracted features are utilized by the proposed GSLM based neural network for identifying the direction of the movement of the objects as well as the activity recognition of objects. In GSLM based NN, the learning process is performed by the combined action of LM algorithm and gravitational search algorithm. The experimentation of the proposed method is conducted over five different crowded video sequences and the performance validation with existing works is performed utilizing the measure classification accuracy. The experimentation results prove the efficacy of proposed crowd behavior detection system with maximal accuracy of 93%. Idea of incorporating modern heuristics algorithm for finding the optimal neural network architecture can be extended in future works.
References
[1] HOFMANN M, HAAG M, RIGOLL G. Unified hierarchical multi-object tracking using global data association [J]. IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), 2013: 22–28.
[2] M, SIKORA T. Real-time person counting by propagating networks flows [C]// IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS). Klagenfurt, Austria, 2011: 66–70.
[3] BREITENSTEIN M D, REICHLIN F, LEIBE B, MEIER E K, GOOL L V. Robust tracking-by-detection using a detector confidence particle filter [C]// IEEE International Conference on Computer Vision. Kyoto, Japan, 2009: 1515–1522.
[4] EISELEIN V, ARP D, M, SIKORA T. Real-time multi-human tracking using a probability hypothesis density filter and multiple detectors [C]// IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance (AVSS). Beijing, China, 2012: 325–330.
[5] WU Z, HRISTOV N, HEDRICK T, KUNZ T, BETKE M. Tracking a large number of objects from multiple views [C]// IEEE 12th International Conference on Computer Vision. Kyoto, Japan, 2009: 1546–1553.
[6] SONG X, SHAO X, ZHAO H, CUI J, SHIBASAKI R, ZHA H. An online approach: Learning-semantic-scene-by-tracking and tracking- by-learning-semantic-scene [C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, CA, USA, 2010: 739–746.
[7] ZHAN B B, MONEKOSSO D N, PAOLO R, SERGIO A V, XU L Q. Crowd analysis: A survey [J]. Machine Vision and Applications, 2008, 19(5): 345-357.
[8] SAXENA S, BREMOND F, THONNAT M, MA R. Crowd behavior recognition for video surveillance [J]. Proceedings of 10th International Conference on Advanced Concepts for Intelligent Vision Systems, 2008, 5259: 970-981.
[9] CHANDOLA V, BANERJEE A, KUMAR V. Anomaly detection: A survey [J]. ACM Computing Surveys, 2009, 41(3): 1–72.
[10] LI Wei-xin, MAHADEVAN V, VASCONCELOS N. Anomaly detection and localization in crowded scenes [J]. IEEE transactions on pattern analysis and machine intelligence, 2014, 36(1): 18-32.
[11] RODRIGUEZ M, SIVIC J, LAPTEV I, AUDIBERT J Y. Data-driven crowd analysis in videos [C]// IEEE International Conference on Computer Vision (ICCV). Barcelona, Spain, 2011: 1235–1242.
[12] FRADI H, EISELEIN V, DUGELAY J L, KELLER I, SIKORA T. Spatio-temporal crowd density model in a human detection and tracking framework [J]. Signal Processing: Image Communication, 2015, 31:100–111.
[13] ALI I, DAILEY M N. Multiple human tracking in high-density crowds [J]. Image and Vision Computing, 2012, 30(12): 966–977.
[14] CAO Li-jun, ZHANG Xu, REN Wei-qiang, HUANG Kai-qi. Large scale crowd analysis based on convolutional neural network [J]. Pattern Recognition, 2015, 48(10): 3016–3024.
[15] LIU Xiao, TAO Da-cheng, SONG Ming-li, ZHANG Lu-ming, BU Jia-jun, CHEN Chun. Learning to track multiple targets [J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(5): 1060-1073.
[16] HU Xing, HU Shi-qiang, ZHANG Xiao-yu, ZHANG Huan-long, LUO Ling-kun. Anomaly detection based on local nearest neighbor distance descriptor in crowded scenes [J]. The Scientific World Journal, 2014, Article ID: 632575.
[17] WU Si, WONG Hau-san, YU Zhi-wen. A bayesian model for crowd escape behavior detection [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(1): 85-98.
[18] CHEN Chun-yu, SHAO Yu, BI Xiao-jun. Detection of anomalous crowd behavior based on the acceleration feature [J]. IEEE Sensors Journal, 2015, 15(12): 7252-7261.
[19] MARTIN T. HAGAN, MOHAMMAD B. MENHAJ. Training feedforward networks with the Marquardt algorithm [J]. IEEE Transactions on Neural Networks, 1994, 5(6): 989-993.
[20] KUMAR M, BHATNAGAR C. Zero-stopping constraint-based hybrid tracking model for dynamic and high dense crowd videos [J]. Journal of the Imaging Science, 2017, 65(2): 75–86.
[21] TAHMASEBI P, HEZARKHANI A. A hybrid neural networks-fuzzy logic-genetic algorithm for grade estimation [J]. Computers & Geosciences, 2102, 42: 18-27.
[22] RASHEDI E, NEZAMABADI-POUR H, SARYAZDI S. GSA: A gravitational search algorithm [J]. Information Sciences, 2009, 179(13): 2232-2248.
[23] CRCV. Tracking in High Density Crowds Data Set [EB/OL]. http://crcv.ucf.edu/data/tracking.php.
[24] UCSD. UCSD Anomaly Detection Dataset [EB/OL]. http://www.svcl. ucsd.edu/projects/anomaly/dataset. htm.
[25] ABERA B, WOLINSKI D, PETTRE J, MANOCHA D. Real-time crowd tracking using parameter optimized mixture of motion models [J]. Computer Vision and Pattern Recognition, Cornell library, 2014.
(Edited by FANG Jing-hua)
Cite this article as: Manoj Kumar, Charul Bhatnagar. Hybrid tracking model and GSLM based neural network for crowd behavior recognition [J]. Journal of Central South University, 2017, 24(8): 2071–2081. DOI: https://doi.org/10.1007/ s11771-017-3616-4.
Received date: 2016-03-21; Accepted date: 2017-03-13
Corresponding author: Manoj Kumar; E-mail: choubey.manoj@gmail.com