One of the tasks involved in speech synthesis is the generation of prosody. As synthetic speech quality advances, adjustments in pitch and duration represent steps toward more natural speech. The experiment discussed here presents one method of generating natural looking and sounding pitch contours for use in speech synthesis.
The experiment uses the Tilt intonation labelling theory to produce natural F0 contours from prosodic and syllabic context. The labelling system provides four intonational events: accent, boundary, connection, and silence. The context features include accentedness, lexical stress, and segmental content information. It is reasonable to expect such information to be available at F0 generation time.
First, a basic overview of the speech data and Tilt theory is presented. This is followed by a description of the experiment and the context features used. The results of the experiment are then discussed and related to other work in the field.