简介概要

Instance reduction for supervised learning using input-output clustering method

来源期刊:中南大学学报(英文版)2015年第12期

论文作者:YODJAIPHET Anusorn THEERA-UMPON Nipon AUEPHANWIRIYAKUL Sansanee

文章页码:4740 - 4748

Key words:instance reduction; input-output clustering; fuzzy c-means clustering; support vector regression; supervised learning

Abstract: A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed. The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data. Then, a set of prototypes are selected from the clustered input data. The inessential data can be ultimately discarded from the data set. The proposed method can reduce the effect from outliers because only the prototypes are used. This method is applied to reduce the data set in regression problems. Two standard synthetic data sets and three standard real-world data sets are used for evaluation. The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets. From the experiments, the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets. The numbers of instances of the synthetic data sets are decreased by 25%-69%. The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%, respectively. The reduction rate of 96% is very good for the electrocardiogram (ECG) data set because of the redundant and periodic nature of ECG signals. For all of the data sets, the regression results are similar to those from the corresponding original data sets. Therefore, the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.

详情信息展示

Instance reduction for supervised learning using input-output clustering method

YODJAIPHET Anusorn1, THEERA-UMPON Nipon1, 2, AUEPHANWIRIYAKUL Sansanee2, 3

(1. Electrical Engineering Department, Faculty of Engineering, Chiang Mai University, Chiang Mai, Thailand;
2. Biomedical Engineering Center, Chiang Mai University, Chiang Mai, Thailand;
3. Computer Engineering Department, Faculty of Engineering, Chiang Mai University, Chiang Mai, Thailand)

Abstract:A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed. The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data. Then, a set of prototypes are selected from the clustered input data. The inessential data can be ultimately discarded from the data set. The proposed method can reduce the effect from outliers because only the prototypes are used. This method is applied to reduce the data set in regression problems. Two standard synthetic data sets and three standard real-world data sets are used for evaluation. The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets. From the experiments, the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets. The numbers of instances of the synthetic data sets are decreased by 25%-69%. The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%, respectively. The reduction rate of 96% is very good for the electrocardiogram (ECG) data set because of the redundant and periodic nature of ECG signals. For all of the data sets, the regression results are similar to those from the corresponding original data sets. Therefore, the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.

Key words:instance reduction; input-output clustering; fuzzy c-means clustering; support vector regression; supervised learning

<上一页 1 下一页 >

相关论文

  • 暂无!

相关知识点

  • 暂无!

有色金属在线官网  |   会议  |   在线投稿  |   购买纸书  |   科技图书馆

中南大学出版社 技术支持 版权声明   电话:0731-88830515 88830516   传真:0731-88710482   Email:administrator@cnnmol.com

互联网出版许可证:(署)网出证(京)字第342号   京ICP备17050991号-6      京公网安备11010802042557号