Speech-based human emotion recognition

Tabtabae, Talieh Seyed

doi:10.32920/ryerson.14651964.v1

Speech-based human emotion recognition

thesis

posted on 2021-06-08, 10:09 authored by Talieh Seyed Tabtabae

Automatic Emotion Recognition (AER) is an emerging research area in the Human-Computer Interaction (HCI) field. As Computers are becoming more and more popular every day, the study of interaction between humans (users) and computers is catching more attention. In order to have a more natural and friendly interface between humans and computers, it would be beneficial to give computers the ability to recognize situations the same way a human does. Equipped with an emotion recognition system, computers will be able to recognize their users' emotional state and show the appropriate reaction to that. In today's HCI systems, machines can recognize the speaker and also content of the speech, using speech recognition and speaker identification techniques. If machines are equipped with emotion recognition techniques, they can also know "how it is said" to react more appropriately, and make the interaction more natural. One of the most important human communication channels is the auditory channel which carries speech and vocal intonation. In fact people can perceive each other's emotional state by the way they talk. Therefore in this work the speech signals are analyzed in order to set up an automatic system which recognizes the human emotional state. Six discrete emotional states have been considered and categorized in this research: anger, happiness, fear, surprise, sadness, and disgust. A set of novel spectral features are proposed in this contribution. Two approaches are applied and the results are compared. In the first approach, all the acoustic features are extracted from consequent frames along the speech signals. The statistical values of features are considered to constitute the features vectors. Suport Vector Machine (SVM), which is a relatively new approach in the field of machine learning is used to classify the emotional states. In the second approach, spectral features are extracted from non-overlapping logarithmically-spaced frequency sub-bands. In order to make use of all the extracted information, sequence discriminant SVMs are adopted. The empirical results show that the employed techniques are very promising.

History

Language

English

Degree

Master of Applied Science

Program

Electrical and Computer Engineering

Granting Institution

Ryerson University

LAC Thesis Type

Thesis

Thesis Advisor

Aziz Guergachi Sridhar Krishnan

Year

2007

Usage metrics

Keywords

User interfaces (Computer systems)Emotions Human-computer interaction Automatic speech recognition Natural language processing (Computer science)

Licence

In Copyright

Speech-based human emotion recognition

History

Language

Degree

Program

Granting Institution

LAC Thesis Type

Thesis Advisor

Year

Usage metrics

Categories

Keywords

Licence

Exports