基于支持向量机的脂肪酶耐热序列与嗜热序列分类研究

来源期刊:中南大学学报(自然科学版)2011年第9期

论文作者:赵伟 许尤厚 郑甲 王玉光 周洪波

文章页码:2543 - 2550

关键词:氨基酸组成;多肽片段;非相邻二元组合;蛋白质热稳定性;支持向量机

Key words:amino acid composition; n-peptide composition; di-residue coupling; protein stability; support vector machines

摘    要:从GenBank数据库中获取了微生物来源的嗜热脂肪酶序列77条,耐热脂肪酶序列65条,分别统计分析序列中20种氨基酸出现的频次,二肽片段、三肽片段出现的差异以及非相邻二元组合的偏爱性。在此基础上,利用支持向量机(SVM)进行序列分类研究。研究结果表明:在统计学意义上,20种天然氨基酸残基中,亮氨酸、脯氨酸、蛋氨酸、苯丙氨酸、色氨酸和酪氨酸在嗜热蛋白序列中出现的频率高于其在耐热蛋白中出现的频率;二肽片段KC,EE,KE,RE, VE, YI, EK, VK, EV, YV, EY, KY, VY 和 YY的出现频率在嗜热蛋白中显著高于其在耐热蛋白中出现的频率。三肽片段的出现频率和非相邻二元组合的序列偏爱性也显示与蛋白耐热性显著相关。训练集的分类准确率达99.65%,真实数据集的分类准确率达到98.41%。

Abstract:

The amino acid compositions, the distributions of N(N=2, 3) neighboring amino acids and the non-adjacent di-residue coupling patterns in the sequences of 65 thermostable and 77 thermophilic lipases getting from GenBank were systematically analyzed. Based on the information, a statistical method based on support vector machines (SVMs) for discriminating thermophilic and thermostable lipases was developed. The results show that hydrophobic residues Leu, Pro, Met, Phe, Trp, as well as the polar residue Tyr have higher occurrences in thermophilic lipases than thermostable ones. The occurrences of KC, EE, KE, RE, VE, YI, EK, VK, EV, YV, EY, KY, VY and YY in thermophilic proteins are significantly more frequent. The composition of dipeptide, tripeptide and non-adjacent di-residue patterns contain more information than amino acid composition, and this information indicates the possible thermostable mechanism of microbial lipases. The accuracy of this method for the training dataset is 99.65%, and its accuracy for testing datasets is 98.41%.

相关论文

  • 暂无!

相关知识点

  • 暂无!

有色金属在线官网  |   会议  |   在线投稿  |   购买纸书  |   科技图书馆

中南大学出版社 技术支持 版权声明   电话:0731-88830515 88830516   传真:0731-88710482   Email:administrator@cnnmol.com

互联网出版许可证:(署)网出证(京)字第342号   京ICP备17050991号-6      京公网安备11010802042557号