Abstract

Are there better sub-word units than phones for automatic speech recognition?

Fiona Kenney

Most automatic speech recognition (ASR) systems use phones as the modelled sub-word unit. There are many reasons why these are useful: transcriptions and lexicons are available, the model set is finite. There are also indications that they are not optimal: concatenating phone models does not equate to modelling co-articulation. My research is looking at deriving optimal units from speech data, to be modelled with linear dynamic models (LDMs). This talk will look at some of the challenges involved in finding new units. An initial experiment looking at modelling syllable units with LDMs will be described and results discussed.