Discovering Related Terms and Detecting Trends in Software Engineering Using Word Embeddings
The Software Engineering (SE) community is prolific, making it challenging for experts to keep up with the flood of new papers and for neophytes to enter the field. One solution that has been proposed to ease the burden of entry on the community would be automatic summarization of papers. While there exist term and trend summarization and analysis tools, the unique language utilized in SE requires bespoke solutions. Therefore, we posit that the community may benefit from a tool extracting terms and their interrelations from the SE community's text corpus and showing terms' trends. In this paper, we build a prototyping tool using the word embedding technique. We train the embeddings on the SE Body of Knowledge handbook and 15,233 research papers' titles and abstracts. We create test cases necessary for validation of the training of the embeddings. Upon gathering the trends of interrelated SE terms, we also use cluster analysis to investigate the trends, to help discover underlying patterns in the way trends in SE rise and fall in popularity. We provide representative examples showing that the embeddings may aid in summarizing terms and uncovering trends in the knowledge base, as well as showing examples of patterns that may lie underneath trends in interrelated terms in software engineering.
History
Language
engDegree
- Master of Health Science
Program
- Computer Science
Granting Institution
Ryerson UniversityLAC Thesis Type
- Thesis