Power and Performance Based Autotuning of Heterogeneous Applications for CPU-GPU Systems
Recent advancement in artificial intelligence (AI) and deep learning has happened due to the usage of General Purpose Graphics Processing Units (GPUs) to implement these AI applications. GPU programming became easier with the advent of high-level abstraction API frameworks such as OpenCL and CUDA. The portability of these frameworks has been the performance cost. The GPU kernel performance is highly dependent on the underlying hardware architecture. The application kernels need their tuning every time it executes on a new device. The work presented in this thesis focuses on OpenCL kernels running on heterogeneous CPU-GPU systems. First, we present an analytical approach to estimate the power and performance of a convolution neural network (CNN) on a heterogeneous system that is useful for power and performance-based auto-tuning. Then we present our main contribution to multi-objective OpenCL kernels and propose an auto-tuner (MOKAT) for power and performance tuning. MOKAT tunes an OpenCL kernel without compromising on any of the two objectives (power and performance) and provides a final set of pareto-optimal kernels. MOKAT offers an integrated power calculation methodology for both online and offline tuning. It utilizes Non-Dominated Sorting Genetic Algorithm (NSGA-II) as the multi-objective evolutionary algorithm (MOEA). We describe the MOKAT API and internal structure of our framework. The two case studies of kernel tuning are related to 2D convolution and General Matrix Multiplication.
History
Language
EnglishDegree
- Master of Applied Science
Program
- Electrical and Computer Engineering
Granting Institution
Toronto Metropolitan UniversityLAC Thesis Type
- Thesis