Toronto Metropolitan University
Browse
Alcantara_Jeanne.pdf (2.56 MB)

Performance Evaluation of a Big Data Application on Apache Spark

Download (2.56 MB)
thesis
posted on 2021-05-23, 09:18 authored by Jeanne Alcantara
Apache Spark enables a big data application—one that takes massive data as input and may produce massive data along its execution—to run in parallel on multiple nodes. Hence, for a big data application, performance is a vital issue. This project analyzes a WordCount application using Apache Spark, where the impact on the execution time and average utilization is assessed. To facilitate this assessment, the number of executor cores and the size of executor memory are varied across different sizes of data that the application has to process, and the different number of nodes in the cluster that the application runs on. It is concluded that different pairs (data size, number of nodes in the cluster) require different number of executor cores and different size of executor memory to obtain optimum results for execution time and average node utilization.

History

Language

eng

Degree

  • Master of Engineering

Program

  • Electrical and Computer Engineering

Granting Institution

Ryerson University

LAC Thesis Type

  • Thesis

Year

2019

Usage metrics

    Electrical and Computer Engineering (Theses)

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC