Toronto Metropolitan University
Browse
Tamannaee, Mahtab.pdf (2.07 MB)

Methods and Datasets for Feature-based query refinement collection

Download (2.07 MB)
thesis
posted on 2024-05-06, 19:12 authored by Mahtab Tamannaee

  

With the massive and fast-growing amount of information on the Web, maintaining the effectiveness of Information Retrieval (IR) is a real challenge. The system in charge of online search must be able to search through billions of documents stored on millions of devices (Manning Christopher D et al., 2010). Traditional information retrieval systems try to sort out the input queries by mostly emphasizing on lexical similarity and exact term matching between query and documents using frequency-based methods. In other words, the relevancy of a query to a document is viewed based on the closeness of the distribution of words in a candidate document to the query. Since the lexical content of the optimal response is not usually known to the user, the user formulates a query with vocabulary that may have minimal overlap with the vocabulary appearing in its optimal document. Low overlap between query and document vocabulary is called term mismatch which emerges in retrieval results as poor recall performance. The term mismatch problem also has been referred to as lexical gap or lexical chasm with query on one side of the gap and documents on the other side. IR systems use different techniques to bridge the lexical chasm and solve the term mismatch problem. Many different query refinement techniques have already been developed. Given the user query, each refinement technique outputs a modified version of user’s query that can be used as an arch over the lexical gap from the query side to the document side.

History

Language

English

Degree

  • Master of Applied Science

Program

  • Electrical and Computer Engineering

Granting Institution

Ryerson University

LAC Thesis Type

  • Thesis

Thesis Advisor

Dr. Ebrahim Bagheri

Year

2022

Usage metrics

    Electrical and Computer Engineering (Theses)

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC