Signalrepresentasjoner for automatisk talegjenkjenning
Abstract
In this report we give an overwiev of methods for front-end processing of speech signals for automatic
speech recognition (ASR) that are described in the litterature.
The most common representation of speech in this context seems to be mel-frequency cepstral coeficient
(MFCC) with delta- and double-delta coefficients, usually combined with cepstral mean normalization
(CMN). Other representations include perceptual linear prediction (PLP) and linear prediction cepstral coefficients
(LPCC).