The Fundamental Characteristics of Speech and Challenges in Speaker Recognition

Prof. John Mason
University of Wales Swansea

There are several components or levels of information embedded in the acoustic speech signal, the most obvious of which is the spoken message itself. In the context of biometrics the key question is the identity of the person speaking. These two ideas lead respectively to automatic speech and automatic speaker recognition. This presentation covers the fundamental aspects of automatic speaker recognition, many of which at the fundametal signal processing level, are in common with the complementary task of automatic speech recognition.

The first part deals with features. Speech is very much a behavioural biometric in that the important information components are buried in the time domain signal, within the given acoustic bandwidth, encompassing differing messages, different people, different times, different conditions, and so on. The task of speaker recognition is to extract the identity of the person speaking while neutralising all these variations, including perhaps the text message itself. The complementary task of automatic speech recognition is to extract the message or text component while neutralising all the other unwanted variations, including that of the speaker. Interestingly, and perhaps a little counter-intuitively, features that tend to be used in both of these tasks are the same short-term spectral based cepstral representations. The fundamental ideas behind cepstra are presented.

The second part of the presentation considers aspects of classification with emphasis on the idea of data-driven models and the important concept of normalisation. In speech recognition vast quantities of speech data, perhaps more than any one person might hear in a lifetime, can be used to train a speech recogniser. Clearly this is not possible in the case of a speaker recogniser, since typically an utterance under test might span only seconds or minutes. Strategies for speaker modelling must reflect this practical limitation. The importance of the quantity and the quality of speech data is discussed.

The final part of the talk introduces assessment strategies that have evolved out of the open evaluations over the last 15 years.



European Commission

EU Horizon 2020



Technical Committee on Biometrics (TC4)



European Association for Signal Processing



Morpho - Safran group


EAB European Association for Biometrics


Biometrics Institute




University of Sassari