Toronto Metropolitan University
Curro_Domenico.pdf (19.05 MB)
Download file

Pose-Aware Embedding Networks and Multi-Modal Image-Language Retrieval

Download (19.05 MB)
posted on 2022-11-03, 16:57 authored by Domenico Curro

Inspired by recent work in human pose metric learning this thesis explores a family of pose-aware embedding networks designed for the purpose of image similarity retrieval. Circumventing the need for direct human joint localization, a series of CNN embedding networks are trained to respect a variety of Euclidean and language-primitive metric spaces. Querying with imagery alone presents certain limitations and thus this thesis proposes a multi-modal image-language embedding space, extending the current model to allow for language-primitive queries. This additional language mode provides the benefit of improving retrieval quality by 3% to 14% under the hit@k metric. Finally, two approaches are constructed to address the issues of conducting partial language-primitive queries, with the former generating maximally likely descriptors and the latter exploiting the network’s tendency to factorize the embedding space into (mostly) linearly separable sub-spaces. These two approaches improve upon recall by 13% and 17% over the provided baselines.





  • Master of Science


  • Computer Science

Granting Institution

Ryerson University

LAC Thesis Type

  • Thesis