Toronto Metropolitan University
Browse
dc66bce016ad42b0e46d2b11b61e88cd.pdf (1.1 MB)

Discovering Related Terms and Detecting Trends in Software Engineering Using Word Embeddings

Download (1.1 MB)
thesis
posted on 2024-06-18, 19:07 authored by Janusan Baskararajah

The Software Engineering (SE) community is prolific, making it challenging for experts to keep up with the flood of new papers and for neophytes to enter the field. One solution that has been proposed to ease the burden of entry on the community would be automatic summarization of papers. While there exist term and trend summarization and analysis tools, the unique language utilized in SE requires bespoke solutions. Therefore, we posit that the community may benefit from a tool extracting terms and their interrelations from the SE community's text corpus and showing terms' trends. In this paper, we build a prototyping tool using the word embedding technique. We train the embeddings on the SE Body of Knowledge handbook and 15,233 research papers' titles and abstracts. We create test cases necessary for validation of the training of the embeddings. Upon gathering the trends of interrelated SE terms, we also use cluster analysis to investigate the trends, to help discover underlying patterns in the way trends in SE rise and fall in popularity. We provide representative examples showing that the embeddings may aid in summarizing terms and uncovering trends in the knowledge base, as well as showing examples of patterns that may lie underneath trends in interrelated terms in software engineering.

History

Language

eng

Degree

  • Master of Health Science

Program

  • Computer Science

Granting Institution

Ryerson University

LAC Thesis Type

  • Thesis

Thesis Advisor

Andriy Miranskyy

Year

2022

Usage metrics

    Computer Science (Theses)

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC