Optimum Regularization Parameter C in Support Vector Machine (SVM) Binary Classification
Support Vector Machines (SVMs) are widely used learning algorithms for data classification. In machine learning algorithms, such as the SVM approach, one or more parameters control the smoothness of the solution and are required to be tuned for the optimum solution. Such parameters are called regularization parameters, which are critical in building robust and accurate algorithms to prevent overfitting and underfitting. In SVM, the regularization parameter, denoted by C, regularizes the training loss of misclassified data. Traditionally, the value of C is first set to one, and if after training the data, misclassifications are observed, C is tuned by K-fold Cross-Validation (CV) method, which is a time- consuming process. This thesis aims to rigorously analyze and study the behavior of the C value in SVM. The analysis shows that for the case of a linearly separable dataset, setting the value of C to one does not always provide the optimum solution, and in addition, it is shown that there exists a Minimum Acceptance Value (MAV) for C as a function of Separability and Scatteredness (S&S). S&S is a new notion that is defined in this thesis, inspired by the Signal-to-Noise ratio (SNR) definition and is shown to be a critical parameter in the analysis of SVM classifiers. The study is further extended for the case of linearly non-separable dataset, and it has shown that a lookup table based on the analysis of bias-variance tradeoff (BVB C-Table) provides the optimum value of C, which not only outperforms but also is much faster than, the existing k-fold CV. For example, in a simple binary classification scenario, a typical k-fold cross-validation can take more than two hours, whereas the proposed method requires only a couple of minutes in a python-based environment. Due to its efficiency, the proposed method of choosing the regularization parameter enables online binary classification and will have potential benefits in One-vs-All and One-vs-One SVM classification.
History
Language
EnglishDegree
- Master of Applied Science
Program
- Electrical and Computer Engineering
Granting Institution
Ryerson UniversityLAC Thesis Type
- Thesis