基于闭合模式的高维基因表达谱多类分类

来源期刊:中南大学学报(自然科学版)2008年第5期

论文作者:李宏 李翔 吴敏 陈松乔 易丽君

文章页码:1035 - 1035

关键词:关联规则;闭合模式;多类别;权重算法

Key words:association rules; closed pattern; multi-class; weight algorithm

摘    要:针对多类高维基因表达谱的特点,提出一种基于闭合模式的多类分类算法CBCP,即根据垂直格式的数据集采用路径枚举的方法挖掘闭合模式,极大地减少了冗余模式的产生。然后,对所有闭合模式进行排序,通过覆盖训练集建立分类器。针对分类器无法识别的样本提出权重算法进行判断,克服了使用Default类预测不精确的问题。研究结果表明,CBCP与经典分类算法如CBA和C4.5相比具有更高的预测准确率,并且在基因数大幅增加而样本数不变的情况下仍具有较强的稳定性,证明CBCP的可扩展性强,适用于高维数据集的多类分类预测。

Abstract: According to the characteristics of multi-class high-dimension gene expression profile, a new multi-class classification algorithm(CBCP) based on closed pattern was designed. Firstly an approach called path enumeration was proposed to mine closed patterns based on the vertical formatted data-table, which can reduce most redundant patterns. Then closed patterns were sorted and used to cover train dataset for building the classifier. The unrecognized samples were classified by weight algorithm, which can overcome the inaccuracy caused by using Default Class. The results show that the algorithm is proved to be more accurate than classical classification algorithms such as CBA and C4.5. CBCP keeps high accuracy when the number of genes increases substantially with the increase of number of samples fixed, which proves it is suitable for multi-class classification of high dimension datasets, and it is easy to extend.

有色金属在线官网  |   会议  |   在线投稿  |   购买纸书  |   科技图书馆

中南大学出版社 技术支持 版权声明   电话:0731-88830515 88830516   传真:0731-88710482   Email:administrator@cnnmol.com

互联网出版许可证:(署)网出证(京)字第342号   京ICP备17050991号-6      京公网安备11010802042557号