一种基于特征选择的组合分类器在带钢表面缺陷分类中的应用
来源期刊:冶金自动化2010年第2期
论文作者:费江华 何永辉 孙晨 黄胜标
文章页码:19 - 23
关键词:带钢表面缺陷分类; 组合分类器; 特征选择; ReliefF算法; Pearson相关算法
Key words:strip steel surface defect classification; ensemble classifier; feature selection; ReliefF algorithm; Pearson correlation algorithm
摘 要:针对大规模数据集的分类问题,提出一种基于特征选择的新型组合分类器算法FS-Bagging。首先利用Re-liefF算法和Pearson相关算法对原始特征集进行特征选择,得到次优特征集;然后利用Bagging的思想,对次优特征集进行随机放回采样后得到一系列的特征子集;再用各特征子集对应的训练数据分别构建分类器,并将得到的多个分类器采用投票方法进行组合。最后利用国内某钢厂带钢表面质量检测系统中的实际数据,对18种缺陷进行分类实验。实验结果表明,FS-Bagging算法在效率、分类正确率方面都优于Bagging算法。
Abstract: FS-Bagging,a new feature selection based ensemble classifier algorithm,is proposed to cope with common classifier problems elicited by large attribute sets.The new classifier algorithm includes an application of both ReliefF algorithm and Pearson algorithm in an initial step,where a sub-optimal attribute set is derived from the original feature set.Subsequently,the sub-optimal attribute set is subjected to a random selection process,in which a number of feature subsets are grouped according to Bagging algorithm.The feature subsets and their corresponding training data sets are further used to construct a group of classifiers,and the final classification is achieved by simple majority voting results of each individual classifier within this ensemble.The feature subsets and their corresponding training data sets are further used to construct a group of classifiers,and the final classification is achieved by simple majority voting results of each individual classifier within this ensemble.A test on this new classifier with 18 main types of surface defect in strip steel production(data retrieved from a surface inspection system in a Chinese steel company)shows positive improvement over classical Bagging classifiers in terms of efficiency and accuracy in classification.
费江华1,何永辉2,孙晨1,黄胜标1
(1.上海市上海宝信软件股份有限公司机电成套事业本部
2.宝山钢铁股份有限公司)
摘 要:针对大规模数据集的分类问题,提出一种基于特征选择的新型组合分类器算法FS-Bagging。首先利用Re-liefF算法和Pearson相关算法对原始特征集进行特征选择,得到次优特征集;然后利用Bagging的思想,对次优特征集进行随机放回采样后得到一系列的特征子集;再用各特征子集对应的训练数据分别构建分类器,并将得到的多个分类器采用投票方法进行组合。最后利用国内某钢厂带钢表面质量检测系统中的实际数据,对18种缺陷进行分类实验。实验结果表明,FS-Bagging算法在效率、分类正确率方面都优于Bagging算法。
关键词:带钢表面缺陷分类; 组合分类器; 特征选择; ReliefF算法; Pearson相关算法