Since multi-modal data contain rich information about the semantics presented in the
sensory and media data, valid interpretation and integration of multi-modal information
is recognized as a central issue for the successful utilization of multimedia in a wide range
of applications. Thus, multi-modal information analysis is becoming an increasingly important
research topic in the multimedia community. However, the effective integration
of multi-modal information is a difficult problem, facing major challenges in the identification
and extraction of complementary and discriminatory features, and the impactful
fusion of information from multiple channels. In order to address the challenges, in
this thesis, we propose a discriminative analysis framework (DAF) for high performance
multi-modal information fusion.
The proposed framework has two realizations. We first introduce Discriminative
Multiple Canonical Correlation Analysis (DMCCA) as the fusion component of the
framework. DMCCA is capable of extracting more discriminative characteristics from
multi-modal information. We demonstrate that optimal performance by DMCCA can be
analytically and graphically verified, and Canonical Correlation Analysis (CCA), Multiple
Canonical Correlation Analysis (MCCA) and Discriminative Canonical Correlation
Analysis (DCCA) are special cases of DMCCA, thus establishing a unified framework for
canonical correlation analysis.
To further enhance the performance of discriminative analysis in multi-modal information
fusion, Kernel Entropy Component Analysis (KECA) is brought in to analyze
the projected vectors in DMCCA space, and thus forming the second realization of the
framework. By doing so, not only the discriminative relation is considered in DMCCA
space, but also the inherent complementary representation of the input data is revealed
by entropy estimation, leading to better utilization of the multi-modal information and
better pattern recognition performance.
Finally, we implement a prototype of the proposed DAF to demonstrate its performance
in handwritten digit recognition, face recognition and human emotion recognition.
Extensive experiments show that the proposed framework outperforms the existing methods
based on similar principles, clearly demonstrating the generic nature of the framework.
Furthermore, this work offers a promising direction to design advanced multi-modal
information fusion systems with great potential to impact the development of intelligent
human computer interaction systems.