This paper presents a series of experiments which test the use of sub-syllable acoustic data in the automatic detection of Tilt (Taylor, forthcoming) intonation events. A set of speaker-dependent HMMs is used to detect accents, boundaries, connections and silences. A base result is obtained, following Taylor, by training the models using fundamental frequency and RMS energy. A second baseline is obtained using normalized F0 and energy. These base figures are then compared to a number of experiments which augment the F0 and energy data with auto-correlation peak, zero-crossing, or cepstral coefficients. In all cases, both the first and second derivatives of each feature are included. The baseline results of the normalized data are within one percentage point of those in Taylor on the same speaker, which supports the comparison of this study with Taylor's. The best results at present show a relative error reduction of 12% over the baseline.References:
Taylor, P. forthcoming. Analysis and synthesis of intonation using the Tilt model.
(http://www.cstr.ed.ac.uk/publications/pending/Taylor_pending_d.ps)
To download this paper, please return to Proceedings of the 1998 Postgraduate Conference