An Attempt at Defining And Quantifying Image Describability Through Semantic Connection Between Visual and Language
One of the most challenging tasks of modern artificial intelligence systems is image captioning, the task requiring a machine to adequately comprehend the semantic content of visual data and correctly map it to a description within the language domain. Generally, to achieve acceptable performance, a learning system is presented with human-generated ground truth captions as a target to aim for. While significant progress has been achieved in creating highly functional image captioning systems, not much research has been focused on exploring the nature of the ground truth itself. In this thesis, such ground truth captions are analyzed in an attempt to find the semantic connection between visual data and associated language data describing it, revealing potential insights on human judgement and getting closer to defining and quantifying an abstract notion of image “describability”; the extent to which an image can be adequately described using language.
History
Language
EnglishDegree
- Master of Science
Program
- Computer Science
Granting Institution
Ryerson UniversityLAC Thesis Type
- Thesis