基于改进标记传播算法的基因表达谱数据分析

来源期刊:中南大学学报(自然科学版)2014年第7期

论文作者:王年 葛芳 王俊生 唐俊

文章页码:2237 - 2244

关键词:半监督学习;权值矩阵;标记传播;基因表达谱数据

Key words:semi-supervised learning; weighted matrix; label propagation; gene expression profile data

摘    要:针对原始标记传播算法迭代次数过多和阈值选取的不确定性等问题,提出一种改进的标记传播算法,并将其应用于基因表达谱数据分析。首先将高维基因表达谱数据表示为权值矩阵,同时定义一个表示样本类别属性的标记序列,并将其中少量样本标记为已知;然后利用根据Gauss-Seidel迭代算法推导出的迭代公式更新标记序列,并证明标记序列的解的收敛性;最后采用正负标记的方式,根据标记序列各分量的符号差异实现数据类别的划分。通过白血病和结肠癌数据集实验,证明了本文方法的有效性。

Abstract: To tackle problems such as excessive iterative times and indeterminate thresholds of original label propagation algorithm, an improved label propagation method was presented with the application in the analysis of gene expression profile data. First, a weighted matrix was constructed with gene expression profile data. Meanwhile, the label sequence indicating the class information was defined, where several samples were marked as labeled data. Then, the label sequence was updated by an iterative formula which inspired from Gauss-Seidel iteration and the solution of the label sequence was proved to be converged. Finally, the clustering problem was solved using plus-minus label which was on the basis of the signs of the label sequence. Experiments on the leukemia and colon cancer data show that the proposed method is feasible and effective.

有色金属在线官网  |   会议  |   在线投稿  |   购买纸书  |   科技图书馆

中南大学出版社 技术支持 版权声明   电话:0731-88830515 88830516   传真:0731-88710482   Email:administrator@cnnmol.com

互联网出版许可证:(署)网出证(京)字第342号   京ICP备17050991号-6      京公网安备11010802042557号