Clustering method based on data division and partition
来源期刊:中南大学学报(英文版)2014年第1期
论文作者:LU Zhi-mao(卢志茂) LIU Chen(刘晨) S. Massinanke ZHANG Chun-xiang(张春祥) WANG Lei(王蕾)
文章页码:213 - 222
Key words:clustering; division; partition; very large data sets (VLDS)
Abstract: Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets (VLDS). In this work, a novel division and partition clustering method (DP) was proposed to solve the problem. DP cut the source data set into data blocks, and extracted the eigenvector for each data block to form the local feature set. The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector. Ultimately according to the global eigenvector, the data set was assigned by criterion of minimum distance. The experimental results show that it is more robust than the conventional clusterings. Characteristics of not sensitive to data dimensions, distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.
LU Zhi-mao(卢志茂)1, LIU Chen(刘晨)1, S. Massinanke1, ZHANG Chun-xiang(张春祥)2, WANG Lei(王蕾)3
(1. School of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China;
2. School of Software, Harbin University of Science and Technology, Harbin 150001, China;
3. School of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China)
Abstract:Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets (VLDS). In this work, a novel division and partition clustering method (DP) was proposed to solve the problem. DP cut the source data set into data blocks, and extracted the eigenvector for each data block to form the local feature set. The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector. Ultimately according to the global eigenvector, the data set was assigned by criterion of minimum distance. The experimental results show that it is more robust than the conventional clusterings. Characteristics of not sensitive to data dimensions, distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.
Key words:clustering; division; partition; very large data sets (VLDS)