Restoration scheme of instantaneous amplitude and phase using Kalman filter with efficient linear prediction for speech enhancementby Naushin Nower, Yang Liu, Masashi Unoki

Speech Communication

Similar

Autoregressive Parameter Estimation for Kalman Filtering Speech Enhancement

Authors:
Chang Huai You, Susanto Rahardja, Soo Ngee Koh
2007

Enhancement of microwave tomography using Kalman filter theory

Authors:
Ding Liang, Zhang Liang, Zhang Ziyi, Liu Peiguo, He Jianguo
2013

A rate control scheme using Kalman filtering for H.263

Authors:
Din-Yuen Chan, Shou-Jen Lin, Chun-Yuan Chang
2005

THE KALMAN FILTER AS A PREDICTION ERROR FILTER*

Authors:
N. OTT, H. G. MEDER
1972

LXVIII. The Abbot and Convent of Woburn to the King

Authors:
Thabbot and convent of Woburn
1843

Text

eo p ng nd T rm ne 5 1. Introduction Speech enhancement is concerned with improving the ity and efficiency in estimating spectral magnitude. The SS method (Boll, 1979) subtracts the estimated noise magnitude spectrum from the noisy speech magnitude spectrum, where the noise spectrum can be estimated and updated during periods when speech is absent. The Wiener filter ⇑ Corresponding author.

E-mail addresses: naushin@jaist.ac.jp (N. Nower), yangliu@jaist.ac.jp (Y. Liu), unoki@jaist.ac.jp (M. Unoki).

Available online at www.sciencedirect.com

ScienceDirect

Speech Communication 70The required speech signal in real world scenarios is frequently smeared by various kinds of noise. This noise not only degrades the perceptual aspects of speech quality and speech intelligibility but also reduces the performance of various automated speech systems, such as automatic speech recognition systems, speaker recognition systems, and hearing aids. Therefore, the quality and intelligibility of speech signals in noisy environments have to be enhanced. quality and intelligibility of corrupted speech in the presence of noise. Various methods of speech enhancement have already been proposed during the past two decades to remove the effects of noise from noisy speech to improve its quality. Of these, classical methods of speech enhancement, such as spectral subtraction (SS) (Boll, 1979), the

Ephraim–Malah algorithm (MMSE-STSA estimator) (Ephraim and Malah, 1984), and the Scalart–Filho algorithm (Wiener filtering) (Scalart and Filho, 1996), have attracted a great deal of attention because of their simplic-Abstract

This paper proposes a restoration scheme for the instantaneous amplitudes and phases in sub-bands by using a Kalman filter with linear prediction (LP). A few important studies have already proved that the phase spectrum in the short-time Fourier transform plays an important role in speech enhancement. Thus, the proposed scheme concentrates on simultaneously restoring both instantaneous amplitudes and phases. The Kalman filter, which is an optimal estimator in this scheme, is used for both instantaneous amplitudes and phases in the sub-band representation to remove the effect of noise. We found that the effectiveness of the Kalman filter depended on accurate estimates of LP coefficients. We propose an effective LP training phase to derive gender and content independent LP coefficients as central processing for Kalman filtering. We carried out objective and subjective tests under various noisy conditions to evaluate the effectiveness of the proposed scheme and compared it with typical methods. The signal to error ratio (SER), perceptual evaluation of speech quality (PESQ), and SNR loss were used as objective measures in these simulations. The mean preference score was used in subjective evaluations. The results revealed that the proposed scheme could effectively improve these objective and subjective measures more than those with typical methods.  2015 Elsevier B.V. All rights reserved.

Keywords: Speech enhancement; Instantaneous amplitude and phase; Kalman filter; Gammatone filterbank; Linear predictionRestoration scheme of instantan

Kalman filter with efficient linear

Naushin Nower ⇑, Ya

School of Information Science, Japan Advanced Institute of Science a

Received 12 August 2014; received in revised fo

Available onlihttp://dx.doi.org/10.1016/j.specom.2015.02.006 0167-6393/ 2015 Elsevier B.V. All rights reserved.us amplitude and phase using rediction for speech enhancement

Liu, Masashi Unoki echnology (JAIST), 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan 18 February 2015; accepted 23 February 2015

March 2015 www.elsevier.com/locate/specom (2015) 13–27 mm(Scalart and Filho, 1996) algorithm filters noisy speech signals by using a filter derived based on the minimum meansquare error (MMSE) criterion. These methods employ the short-time Fourier analysis-modification-synthesis (AMS) framework for speech enhancement.

There are various statistical methods of model-based speech enhancement in the literature in addition to these.

Modeling in the model-based approaches is done using the statistical properties of the speech signal over multiple frames. This modeling is performed using the hidden

Markov model (HMM) (Ephraim et al., 1989; Ephraim, 1992; Zhao and Kleijn, 2007), the Gaussian mixture model (GMM), or codebook-based methods (Sriram et al., 2007).

HMM-based speech enhancement is a renowned modelbased technique and resolves common problems with classical methods of speech enhancement in dealing with rapid variations in noise characteristics (Veisi and Hossein, 2013). Nishikawa et al. (2003) combined an independent component analysis (ICA) based noise estimator with multi-channel-wise non-linear signal processing to reduce noise further in their method of noise reduction.

However, all of their improvements were limited because they only considered the spectral domain.

Recent research has investigated the importance of speech enhancement in the modulation domain, such as the modulation-domain Kalman filter (MDKF) (So and

Paliwal, 2011; Paliwal et al., 2012). So and Paliwal’s method modeled temporal changes in the magnitude spectrum for both speech and noise without taking into consideration noise in the phase component (So and Paliwal, 2011). Consequently, the corpus based approach (Ji et al., 2011), model based speech enhancement with spectral estimation (Ruofei et al., 2012), and speech enhancement using nonnegative matrix factorization (NMF) (Mohammadiha et al., 2013; Sawada et al., 2013) are included in modern methods of speech enhancement. All the existing methods process corrupted speech signals by modifying or correcting speech in either temporal or spectral magnitude only and keeping the phase component unchanged. This is because the phase spectrum that is conventionally considered is unimportant and has been demonstrated not to contribute much toward speech enhancement. Wang and Lim emphasized this (Wang and Lim, 1982), which is perhaps the most cited work to justify the unimportance of phase in speech enhancement.