Efficiency of chosen speech descriptors in relation to emotion recognition
Citation
Dorota, K., Tomasz, S., & Gholamreza, A. (February 20, 2017). Efficiency of chosen speech descriptors in relation to emotion recognition. Eurasip Journal on Audio, Speech, and Music Processing, 2017, 1, 3.Abstract
This research paper presents parametrization of emotional speech using a pool of common features utilized in emotion recognition such as fundamental frequency, formants, energy, MFCC, PLP, and LPC coefficients. The pool is additionally expanded by perceptual coefficients such as BFCC, HFCC, RPLP, and RASTA PLP, which are used in speech recognition, but not applied in emotion detection. The main contribution of this work is the comparison of the accuracy performance of emotion detection for each feature type based on the results provided by both k-NN and SVM algorithms with 10-fold cross-validation. Analysis was performed on two different Polish emotional speech databases: voice performances by professional actors in comparison with the author's spontaneous speech.