posted on 2021-05-23, 15:40authored byYongjin Wang
In this work, we investigate the recognition of human emotional states from audiovisual signals. We extract prosodic, Mel-frequency Cepstral Coeffieient (MFCC), and formant frequency features to represent the audio characteristic of the emotional speech. A face detection scheme based on HSV color model is used to detect the face from the background. The facial expressions are represented by Gabor wavelet features. We perform feature selection by using the stepwise method based on Mahalanobis distance. The selected features are used to classify the emotional data into their corresponding classes. Different classification algorithms including Gaussian Mixture Model (GMM), K-nearest neighbours(K-NN), Neural Network (NN), and Fisher's Linear Discriminant Analysis (FLDA) are compared in this study. An adaptive multi-classifier scheme involving the analysis of individual class and combinations of different classes is proposed. Our recognition system is tested over a language independent database. The proposed FLDA-based multi-classifier scheme achieves the best overall and individual class recognition accuracy.