Zhang_Ning.pdf (3.24 MB)
Download file

Large-scale Content-based Multimedia Analysis And Applications Using Bag-Of-Words Model

Download (3.24 MB)
posted on 2021-05-22, 10:26 authored by Ning Zhang
This dissertation focuses on the analysis of large-scale image and video data consortia with applications to multimedia indexing and retrieval. Bag-of-words (BoW) model is adopted and improved to suit the efficiency and effectiveness requirements in analyzing large-scale multimedia data. BoW method has been developed from the text retrieval domain and successfully applied in computer vision, such as image scene and object categorization. Specifically, we utilized the BoW model in the domain of image classification and retrieval, tackled challenges of large-scale multimedia applications of video analysis and mobile-based social activity recommendation using visual intents, respectively. Incorporating the BoW model with unsupervised classification, we propose a scalable and generic approach in video analysis. The method aims at systematically analyzing unlabeled video from its genre identification, frame classification, and event detection. Unlike conventional domain-knowledge dependent approaches, the BoW model is domain-knowledge independent. Moreover, the system is mainly unsupervised and requires minimum human input. Therefore, our method is capable of processing massive quantity of videos generically. In addition, for the evaluation, sports video has been used as the testing ground. Combining the BoW model with advanced retrieval algorithms, we propose a mobilebased visual search and social activity recommendation system. The merit of the BoW model in large-scale image retrieval is integrated with the flexible user interface provided by the mobile platform. Instead of text or voice input, the system takes visual images captured from the built-in camera and attempts to understand users’ intents through interactions. Subsequently, such intents are recognized through a retrieval mechanism using the BoW model. Finally, visual results are mapped onto contextually relevant information and entities (i.e. local business) for social task suggestions. Hence, the system offers users the ability to search information and make decisions on-the-go.





Doctor of Philosophy


Electrical and Computer Engineering

Granting Institution

Ryerson University

LAC Thesis Type