Reliable techniques currently exist within speech technology to produce naturalistic intonation for speech synthesis on the basis of a phonological representation such as the ToBI system (Pierrehumbert & Hirschberg 1990). However, the intonation produced in dialogue systems is not as effective as it could be because a semantic theory of natural dialogue linking it to an appropriate phonological representation has not been sufficiently developed.
We propose that part of the reason the underlying research on the relationship between discourse semantics and intonation has been so problematic is because of the lack of reliably annotated data using ToBI, or any other, annotation system. In particular, in an exchange such as the following, themes have been claimed to be marked with L+H* pitch accents and rhemes with H* accents (Steedman 2000):
A: So, the '98 model costs £1,000, how much is the 2000 model? B: The 2000 model costs £1,500. L+H* LH% H*LL% ( theme )(rheme)
However, the phonetic difference between these two accents is not well understood, and is often said not to exist at all (Silverman et al 1992, Taylor 2000).
In a small production study, we show that when themes and rhemes are clearly marked with a primary pitch accent, this is consistently indicated by the shape of the F0 contour in relation to the segmental string in a way that seems to correspond to the ToBI distinction between L+H* and H*. In particular, in the H* accent the start of the rise to the F0 peak is aligned with the beginning of the consonant of the stressed syllable, but with the beginning of the stressed vowel in the L+H* accent. Further, the F0 level is substantially higher before an H* accent than immediately after it, whereas the F0 level is approximately the same before and after a L+H* accent. Much of the confusion in this area probably stems from the fact that themes are not often marked by pitch accents in English, or are marked by very weak pitch accents. We hope to determine the nature of this marking in future research.
In a follow-up perception experiment, using artificially manipulated F0 contours, we show that these differences can affect acceptability judgments for native speakers. These findings will enable us to more rigorously test and develop theories relating discourse semantics and intonation. This in turn should lead to better specification and realisation of intonation in synthesised dialogue systems.