Toronto Metropolitan University
Browse
ef38bf2402a99d4e863501a3004f838b.pdf (2.12 MB)

Implicit Entity Recognition and Linking in Tweets

Download (2.12 MB)
thesis
posted on 2024-03-18, 16:55 authored by Hawre Hosseini
Linking textual content to entities from the knowledge graph has received increasing attention where surface form representations of entities are linked to appropriate entities. This allows textual content, e.g., social user-generated content, to be interpreted at a higher semantic level. However, recent research has shown that at least 15% of social user-generated content do not have explicit surface form representation of entities that they discuss. In other words, the subject of the content is only implied. For such cases, existing named entity recognition and linking methods, known as explicit entity linking, cannot perform linking because entity surface form is missing. The objective of this dissertation, while introducing and publicly sharing a comprehensive gold standard dataset for the tasks of implicit named entity recognition and linking, is to propose approaches to these tasks. We formulate the problem of recognizing implicit entity mentions in tweets, where we propose to leverage categorical and linguistically inspired features based on Systemic Functional Linguistics. Our implicit named entity recognizer achieves promising results on different evaluation metrics. Additionally, we propose two approaches for linking implicit mentions in tweets. Within the first, we formulate the problem of implicit entity linking as an ad-hoc document retrieval process where the input query is the tweet, which needs to be implicitly linked and the document space is the set of textual descriptions of entities in the knowledge graph. We systematically compare our work with existing work showing our method is able to provide improvements on a range of retrieval measures. Within the second approach, we model implicit entity linking as a learn to rank problem where knowledge graph entities are ranked based on their relevance to the input tweet. In doing so, we introduce and systematically classify appropriate features for identifying implicit entities. In our experiments, we show that our proposed features are able to improve the state of the art. For SFL-based recognition of implicit entity mentions as well as for the ad-hoc retrieval based and learn to rank based approaches to linking of such mentions, we provide qualitative assessment of the root causes for mislabeled instances in our experiments.

History

Language

eng

Degree

  • Doctor of Philosophy

Program

  • Electrical and Computer Engineering

Granting Institution

Ryerson University

LAC Thesis Type

  • Dissertation

Thesis Advisor

Ebrahim Bagheri

Year

2022

Usage metrics

    Electrical and Computer Engineering (Theses)

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC