Implicit Entity Recognition and Linking in Tweets

Hosseini, Hawre

doi:10.32920/25413838.v1

ef38bf2402a99d4e863501a3004f838b.pdf (2.12 MB)

Implicit Entity Recognition and Linking in Tweets

thesis

posted on 2024-03-18, 16:55 authored by Hawre Hosseini

Linking textual content to entities from the knowledge graph has received increasing attention where surface form representations of entities are linked to appropriate entities. This allows textual content, e.g., social user-generated content, to be interpreted at a higher semantic level. However, recent research has shown that at least 15% of social user-generated content do not have explicit surface form representation of entities that they discuss. In other words, the subject of the content is only implied. For such cases, existing named entity recognition and linking methods, known as explicit entity linking, cannot perform linking because entity surface form is missing. The objective of this dissertation, while introducing and publicly sharing a comprehensive gold standard dataset for the tasks of implicit named entity recognition and linking, is to propose approaches to these tasks. We formulate the problem of recognizing implicit entity mentions in tweets, where we propose to leverage categorical and linguistically inspired features based on Systemic Functional Linguistics. Our implicit named entity recognizer achieves promising results on different evaluation metrics. Additionally, we propose two approaches for linking implicit mentions in tweets. Within the first, we formulate the problem of implicit entity linking as an ad-hoc document retrieval process where the input query is the tweet, which needs to be implicitly linked and the document space is the set of textual descriptions of entities in the knowledge graph. We systematically compare our work with existing work showing our method is able to provide improvements on a range of retrieval measures. Within the second approach, we model implicit entity linking as a learn to rank problem where knowledge graph entities are ranked based on their relevance to the input tweet. In doing so, we introduce and systematically classify appropriate features for identifying implicit entities. In our experiments, we show that our proposed features are able to improve the state of the art. For SFL-based recognition of implicit entity mentions as well as for the ad-hoc retrieval based and learn to rank based approaches to linking of such mentions, we provide qualitative assessment of the root causes for mislabeled instances in our experiments.

History

Language

eng

Degree

Doctor of Philosophy

Program

Electrical and Computer Engineering

Granting Institution

Ryerson University

LAC Thesis Type

Dissertation

Thesis Advisor

Ebrahim Bagheri

Year

2022

Usage metrics

Keywords

linking social media twitter textual content linking textual content social users

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Implicit Entity Recognition and Linking in Tweets

History

Language

Degree

Program

Granting Institution

LAC Thesis Type

Thesis Advisor

Year

Usage metrics

Categories

Keywords

Licence

Exports