基于改进的隐马尔科夫模型的词性标注方法

来源期刊:中南大学学报(自然科学版)2012年第8期

论文作者:袁里驰

文章页码:3053 - 3057

关键词:隐马尔可夫模型;马尔可夫族模型;词性标注;Viterbi 算法

Key words:hidden Markov model; Markov family model; part-of-speech tagging; Viterbi algorithm

摘    要:针对隐马尔可夫(HMM)词性标注模型状态输出独立同分布等与语言实际特性不够协调的假设,对隐马尔可夫模型进行改进,引入马尔可夫族模型。,该模型用条件独立性假设取代HMM模型的独立性假设。将马尔可夫族模型应用于词性标注,并结合句法分析进行词性标注。用改进的隐马尔可夫模型进行词性标注实验。实验结果表明:与条件独立性假设相比,独立性假设是过强假设,因而基于马尔可夫族模型的语言模型更符合语言等实际物理过程;在相同的测试条件下, 马尔可夫族模型明显好于隐马尔可夫模型,词性标注准确率从94.642%提高到97.126%。

Abstract: In order to defy the unrealistic assumption of the part-of-speech tagging method based on hidden Markov models that successive observations are independent and identically distributed within a state, Markov family model (MFM) was introduced. Independence assumption in HMM was placed by conditional independence assumption in MFM. Markov Family model was applied to part-of-speech tagging, and syntactic parsing was combined with part-of-speech tagging. The part-of-speech tagging experiments show that Markov family models (MFMs) have higher performance than hidden. From the view of the statistics, the assumption of independence is stronger than the assumption of conditional independence, so language model based on MFM is more realistic than HMM language mode. Markov models (HMMs) under the same testing conditions, the precision is enhanced from 94.642% to 97.126%.

有色金属在线官网  |   会议  |   在线投稿  |   购买纸书  |   科技图书馆

中南大学出版社 技术支持 版权声明   电话:0731-88830515 88830516   传真:0731-88710482   Email:administrator@cnnmol.com

互联网出版许可证:(署)网出证(京)字第342号   京ICP备17050991号-6      京公网安备11010802042557号