»
Pronunciation Modeling of Spontaneous Mandarin Speech
Using Phonetic Feature Distance and Optimal Gaussian Mixture Sharing
Liu
Yi1, Pascale Fung1,
William Byrne2, and Umar Ruhi3
1 Dept. EEE, Hong Kong University of
Science and Technology, Hong Kong
2 CLSP/ECE, The Johns Hopkins University
, Baltimore MD, USA
3 Dept. CS, University of Toronto,
Canada
Presented: May 2001.
Pronunciations in spontaneous, conversational speech tend to be much
more varied than in carefully read speech. Pronunciation modeling is
an efficient way to improve recognition performance. In this paper,
we propose incorporating pronunciation variations into acoustic model
training. We present our method of incorporating phonetic feature distance
into phone variation probabilities. In addition, we present an efficient
criterion for choosing the optimal Gaussian mixture components from
surface states, and share the selected Gaussian mixture components with
canonical states according to variation probabilities. Experiments showed
that phonetic feature distance based state level Gaussian mixture sharing
improves syllable accuracy by 4.17% absolutely after re-training. Adding
optimal mixture component selection, syllable accuracy improved significantly
by 4.78% absolutely with respect to the baseline.
|