posted on 2023-06-07, 20:03authored byMohid Tayyub
<p>Parallel computing with heterogeneous platforms that include multi-core CPUs, GPGPUs, traditional GPUs and FPGAs are increasingly being employed to meet high performance demands. However, the application developer must thoroughly understand the application to parallelize tasks. It is known that careful architecture specific adjustments are required for tuning an application to effectively utilise the underlying heterogeneous devices. A cross-platform expandable profiling framework is presented that can be used to highlight bottlenecks in the application and guide design changes by providing both coarse and fine grain application statistics. Machine learning models are trained to understand the application behaviour and underlines features relating to performance. While code instrumentation is used to unlock individual code statistics of processes and kernels. The presented framework is applied to a variety of applications by profiling and tuning various benchmarks and real-life cases studies such as collision detection. Through these case studies, comparisons are made with the current industrial tools and other state of the art tuning approaches. The results highlight the unique parameters that can be extracted from the proposed framework and effectiveness of the framework due to the notable performance increase achieved.</p>