This thesis explores features characterizing the temporal dynamics and the use of ensemble techniques to
improve the performances of environmental sound recognition (ESR) system. Firstly, for acoustic scene
classification (ASC), local binary pattern (LBP) technique is applied to extract the temporal evolution
of Mel-frequency cepstral coefficients (MFCC) features, and the D3C ensemble classifier is adopted to
optimize the system performance. The results show that the proposed method achieved a classification
improvement of 8% compared to the baseline system.
Secondly, a new approach for sound event detection (SED) using Nonnegative Matrix Factor 2-
D Deconvolution (NMF2D) and RUSBoost techniques is presented. The idea is to capture the two dimensional
joint spectral and temporal information from the time-frequency representation (TFR) while
possibly separating the sound mixture into several sources. Besides, the RUSBoost ensemble technique
is utilized in the event detection process to alleviate class imbalance in the training data. This method
reduced the total error rate by 5% compared to the baseline method.