Download:

Introduction to Speaker Recognition

Prof. John Mason
University of Wales Swansea

There are several components or levels of information embedded in the acoustic speech signal, the most obvious of which is the spoken message itself. In the context of biometrics the key question is the identity of the person speaking. These two ideas lead respectively to automatic speech and automatic speaker recognition. This presentation covers the fundamental aspects of automatic speaker recognition, many of which just happen to be common with the complementary task of automatic speech recognition.

The first part deals with features. Speech is very much a behavioural biometric in that the important information components are buried in the time domain signal and this signal is practically infinite variation, encompassing differing messages, different people, different times, different conditions, and so on. The task of speaker recognition is to extract the identity of the person speaking while neutralising variations such as the text. Likewise the task of automatic speech recognition is to extract the message or text component while neutralising all the other unwanted variations, including that of the speaker. Interestingly, and perhaps a little counter-intuitively, features that tend to be used in both of these tasks are the same short-term spectral based cepstral representations. The fundamental ideas behind cepstra are presented.

The second part of the presentation considers aspects of classification with emphasis on the idea of data-driven models and the important concept of normalisation. In speech recognition vast quantities of speech data, perhaps more than any one person might hear in a lifetime, can be used to train a speech recogniser. Clearly this is not possible in the case of a speaker recogniser, since typically data for a given speaker is likely to span only seconds or minutes. Strategies for speaker modelling must reflect this practical limitation. The importance of the quantity and, as with all biometrics, the quality of speech data is discussed.

The final part of the talk introduces assessment strategies that have evolved out of the open evaluations over the last 10 years .

with support from