Distributed ML-Based Network Traffic Classification Using a Data Parallelization Approach
The ever-increasing internet data leads to very large datasets which benefits ML/DL models for more accurate classification, increased diversity for classification of different types of network traffic, and handling anomaly traffic for prevention of potential cyber-attacks. We have deployed Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), advanced hybrid Convolutional LSTM (ConvLSTM), Convolutional GRU (ConvGRU), and XGBoost algorithm using Data Parallelization approach. Data Parallelization allows for faster training times on large datasets as shown in our results, and it is also beneficial in the cloud-edge environment by allowing efficient distribution of computation and data across multiple nodes for improved performance of edge devices. The experimental setup was implemented in the cloud and parallel training was executed using Nvidia Tesla Graphics Processing Units (GPUs). Lastly, comparison of the performance metric results is presented between the non-parallel centralized (single node) and data parallel distributed (two nodes and four nodes) approaches.
History
Language
EnglishDegree
- Master of Applied Science
Program
- Computer Networks
Granting Institution
Toronto Metropolitan UniversityLAC Thesis Type
- Thesis