Methods and Datasets for Feature-based query refinement collection
With the massive and fast-growing amount of information on the Web, maintaining the effectiveness of Information Retrieval (IR) is a real challenge. The system in charge of online search must be able to search through billions of documents stored on millions of devices (Manning Christopher D et al., 2010). Traditional information retrieval systems try to sort out the input queries by mostly emphasizing on lexical similarity and exact term matching between query and documents using frequency-based methods. In other words, the relevancy of a query to a document is viewed based on the closeness of the distribution of words in a candidate document to the query. Since the lexical content of the optimal response is not usually known to the user, the user formulates a query with vocabulary that may have minimal overlap with the vocabulary appearing in its optimal document. Low overlap between query and document vocabulary is called term mismatch which emerges in retrieval results as poor recall performance. The term mismatch problem also has been referred to as lexical gap or lexical chasm with query on one side of the gap and documents on the other side. IR systems use different techniques to bridge the lexical chasm and solve the term mismatch problem. Many different query refinement techniques have already been developed. Given the user query, each refinement technique outputs a modified version of user’s query that can be used as an arch over the lexical gap from the query side to the document side.
History
Language
EnglishDegree
- Master of Applied Science
Program
- Electrical and Computer Engineering
Granting Institution
Ryerson UniversityLAC Thesis Type
- Thesis