简介概要

Clustering method based on data division and partition

来源期刊:中南大学学报(英文版)2014年第1期

论文作者:LU Zhi-mao(卢志茂) LIU Chen(刘晨) S. Massinanke ZHANG Chun-xiang(张春祥) WANG Lei(王蕾)

文章页码:213 - 222

Key words:clustering; division; partition; very large data sets (VLDS)

Abstract: Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets (VLDS). In this work, a novel division and partition clustering method (DP) was proposed to solve the problem. DP cut the source data set into data blocks, and extracted the eigenvector for each data block to form the local feature set. The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector. Ultimately according to the global eigenvector, the data set was assigned by criterion of minimum distance. The experimental results show that it is more robust than the conventional clusterings. Characteristics of not sensitive to data dimensions, distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.

详情信息展示

Clustering method based on data division and partition

LU Zhi-mao(卢志茂)1, LIU Chen(刘晨)1, S. Massinanke1, ZHANG Chun-xiang(张春祥)2, WANG Lei(王蕾)3

(1. School of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China;
2. School of Software, Harbin University of Science and Technology, Harbin 150001, China;
3. School of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China)

Abstract:Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets (VLDS). In this work, a novel division and partition clustering method (DP) was proposed to solve the problem. DP cut the source data set into data blocks, and extracted the eigenvector for each data block to form the local feature set. The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector. Ultimately according to the global eigenvector, the data set was assigned by criterion of minimum distance. The experimental results show that it is more robust than the conventional clusterings. Characteristics of not sensitive to data dimensions, distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.

Key words:clustering; division; partition; very large data sets (VLDS)

<上一页 1 下一页 >

有色金属在线官网  |   会议  |   在线投稿  |   购买纸书  |   科技图书馆

中南大学出版社 技术支持 版权声明   电话:0731-88830515 88830516   传真:0731-88710482   Email:administrator@cnnmol.com

互联网出版许可证:(署)网出证(京)字第342号   京ICP备17050991号-6      京公网安备11010802042557号