简介概要

Mapping methods for output-based objective speech quality assessment using data mining

来源期刊:中南大学学报(英文版)2014年第5期

论文作者:王晶 ZHAO Sheng-hui(赵胜辉) XIE Xiang(谢湘) KUANG Jing-ming(匡镜明)

文章页码:1919 - 1926

Key words:objective speech quality; data mining; multivariate non-linear regression; fuzzy neural network; support vector regression

Abstract: Objective speech quality is difficult to be measured without the input reference speech. Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm. The degraded speech is firstly separated into three classes (unvoiced, voiced and silence), and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining. Fuzzy Gaussian mixture model (GMM) is used to generate the artificial reference model trained on perceptual linear predictive (PLP) features. The mean opinion score (MOS) mapping methods including multivariate non-linear regression (MNLR), fuzzy neural network (FNN) and support vector regression (SVR) are designed and compared with the standard ITU-T P.563 method. Experimental results show that the assessment methods with data mining perform better than ITU-T P.563. Moreover, FNN and SVR are more efficient than MNLR, and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.

详情信息展示

Mapping methods for output-based objective speech quality assessment using data mining

WANG Jing(王晶), ZHAO Sheng-hui(赵胜辉), XIE Xiang(谢湘), KUANG Jing-ming(匡镜明)

(School of Information and Technology, Beijing Institute of Technology, Beijing 100081, China)

Abstract:Objective speech quality is difficult to be measured without the input reference speech. Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm. The degraded speech is firstly separated into three classes (unvoiced, voiced and silence), and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining. Fuzzy Gaussian mixture model (GMM) is used to generate the artificial reference model trained on perceptual linear predictive (PLP) features. The mean opinion score (MOS) mapping methods including multivariate non-linear regression (MNLR), fuzzy neural network (FNN) and support vector regression (SVR) are designed and compared with the standard ITU-T P.563 method. Experimental results show that the assessment methods with data mining perform better than ITU-T P.563. Moreover, FNN and SVR are more efficient than MNLR, and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.

Key words:objective speech quality; data mining; multivariate non-linear regression; fuzzy neural network; support vector regression

<上一页 1 下一页 >

相关论文

  • 暂无!

相关知识点

  • 暂无!

有色金属在线官网  |   会议  |   在线投稿  |   购买纸书  |   科技图书馆

中南大学出版社 技术支持 版权声明   电话:0731-88830515 88830516   传真:0731-88710482   Email:administrator@cnnmol.com

互联网出版许可证:(署)网出证(京)字第342号   京ICP备17050991号-6      京公网安备11010802042557号