Int J Speech Technol (2015) 18:97–111
A detection and classification method for nasalized vowels in noise using product spectrum based cepstra
Shamima Najnin · Celia Shahnaz
Received: 2 February 2013 / Accepted: 11 January 2014 / Published online: 1 October 2014 © Springer Science+Business Media New York 2014
Abstract In this paper, a method based on cepstra derived from the product spectrum is developed for the detection and classification of nasalized vowels with varying degree of nasalization. Conventionally, features for detecting and classifying nasalized vowels are derived considering magnitude spectrum only, ignoring the phase spectrum. Exploiting the power spectrum and the group delay function of a band limited vowel, the product spectrum is defined thus incorporating the information of both magnitude and phase spectra. Unlike conventional mel frequency cepstral coefficients (MFCCs) derived from the power spectrum, MFCCs computed from the product spectrum, namely MFPSCCs are fed to a linear discriminant analysis (LDA) based classifier for the detection and classification of nasalized vowels. The performance of nasalized vowel detection and classification based on some of the state-of-the-art features, namely MFCCs, A1–
P1 are compared with that of the proposed feature using not only LDA based classifier but also support vector machine based classifier. A detail simulation results on TIMIT database show that the proposed cepstral features derived from the product spectrum outperform the state-of-the-art features in the task of detecting and classifying nasalized vowels in clean as well as different noisy conditions.
Keywords Oral vowel · Nasalized vowel · Group delay function · Power spectrum · Product spectrum
Department of Electrical and Computer Engineering,
Herff College of Engineering, University of Memphis,
Tennessee, USA e-mail: email@example.com
C. Shahnaz (B)
Department of Electrical and Electronic Engineering, Bangladesh
University of Engineering Technology, Dhaka 1000, Bangladesh e-mail: firstname.lastname@example.org 1 Introduction
In the event of nasalization, velum drops to allow coupling between the oral and nasal cavities. Since during the production of a nasal consonant, the vocal tract is excited by the vocal fold vibration, it is considered to be voiced. When nasal consonants are produced, air flows through the nasal tract and is radiated at the nostrils. The closed oral cavity and the sinuses of the nose from shunting cavities to the main path substantially influence the resulting radiated sound. Nasalized vowels are pronounced in a manner similar to nasal consonants, with the exception being that the oral cavity is not blocked, thereby allowing air to flow through both the nasal and oral cavities. Over 99 % of languages contain nasalized vowels or consonants (Maddieson 1984;
O’Shaughnessy 2000). In many languages, including American English, nasal consonants can have a profound effect on neighboring vowels. Vowel nasalization generally occurs as a result of coarticulation between vowels and adjacent nasal consonants: the velar lowering gesture associated with the nasal consonant overlaps with the vowel. Nasal coarticulation happens in both directions anticipatory and carryover and can extend across multiple segments and across word or syllable boundaries. Following the release of a nasal consonant, the initial portion of a following vowel will be nasalized during the time interval when the velum is closing. The same holds true for the final portion of a vowel preceding a nasal consonant. Coarticulatory nasalization of the vowel preceding a nasal consonant is a regular phenomenon in all languages of the world. The amount of co-articulated nasalization depends upon the particular language and dialect. The coarticulation can, however, be so large that the nasal murmur (the sound produced with a complete closure at a point in the oral cavity, and with an appreciable amount of coupling of the nasal passages to the vocal tract) is completely deleted and the 123 98 Int J Speech Technol (2015) 18:97–111 cue for the nasal consonant is only present as nasalization in the preceding vowel. This is especially true for spontaneous speech. Since anticipatory nasalization is common in American English, a sequence of a vowel plus a nasal consonant (VN) may, in many situations, be pronounced as a simple nasalized vowel, or a nasalized vowel plus a short, residual nasal murmur (Glass and Zue 1985). Phonetic descriptions indicate that the vowel nasalization occurs more often and for a longer duration in anticipatory context (in vowelnasal sequences) than in carryover context (in nasal-vowel sequences) (Bell-Berti 1993). Nasalized vowels are unique, because they are the only vowels where air flows through two channels and radiates from the nose and mouth. During vowel nasalization, the open vocal tract is coupled with the nasal cavity, introducing additional pole–zero pairs in the transfer function (Chen et al. 2007). Nasal coupling results in energy losses at low frequencies, damping of oral formants, and introduction of nasal formants corresponding to the resonances of the nasal cavity and sinuses. These spectral modifications due to nasalization are gradient: the lower the velum travels, the wider the port opens, and the more nasal the sound. Therefore, this relationship suggests that velar position may be recovered from the acoustic signal by determining the degree of nasality in the vowel. Nasality is also introduced because of defects in the functionality of the velopharyngeal mechanism. These defects in the velopharyngeal mechanism could be due to anatomical defects (cleft palate or other trauma), central nervous system damage (cerebral palsy or traumatic brain injury), or peripheral nervous system damage (Cairns et al. 1996). Inadvertent nasalization is also one of the most common problems of deaf speakers.
Though humans are sensitive to nasalized sounds, automated speech recognizers perform poorly when it comes to nasalized vowels (Johnson 2005). Nasalization of vowels makes it difficult to recognize vowels themselves because of a contraction of the perceptual vowel space due to the effects of nasalization. The increased confusion between nasalized vowels as compared to oral vowels is confirmed by using a simple vowel recognizer. Some speakers nasalize sounds indiscriminately. This could be either due to an anatomical or motor-based defect, or because deafness inhibited the persons ability to exercise adequate control over the velum. Further, different speakers nasalize to different degrees (Seaver et al. 1991). Thus, a measure of the overall nasal quality of speech can be a useful measure for a speaker recognition system using knowledge-based acoustic parameters to discriminate such speakers from others. Such acoustic parameters can hopefully be extracted as a byproduct of a system which can detect vowel nasalization. Hence, a vowel nasalization detector is also essential for speech recognition in languages with phonemic nasalization (i.e. there are minimal pairs of words in such languages which differ in meaning with just a change in the nasalization in the vowel), and therefore, considered an important part of a landmark-based speech recognition system. Further, it is suggested in Johnson (2005) that detection of vowel nasalization is important to give the pronunciation model the ability to learn that a nasalized vowel is a high probability substitute for a nasal consonant. Note that nasalization of the vowel might be the only feature distinguishing cat from can’t. So, the automatic detection of vowel nasalization is an important and challenging problem.