Stochastic-Based Hyperparameter Selection and Learnability Analysis in Supervised and Unsupervised Learning
The goal of a learning algorithm is to receive a training data set as input and provide a hypothesis that can generalize to all possible data points from a domain set. The hypothesis is chosen from hypothesis classes with potentially different complexity orders that represent controlling parameters in the learning process, also denoted as hyperparameters. Linear regression modeling is an important category of learning algorithms. Uncertainty of target samples in practical applications affects the generalization performance of the learned model. Failing to choose a proper model or hypothesis class can lead to serious issues such as underfitting or overfitting. These issues have been addressed by alternating cost functions or by utilizing cross-validation methods. These approaches can introduce new hyperparameters with their own new challenges and uncertainties or increase the computational complexity of the learning algorithm. On the other hand, the theory of probably approximately correct (PAC) aims at defining learnability based on probabilistic settings. Despite its theoretical value, PAC does not address practical learning issues on many occasions. This thesis is inspired by the foundation of PAC and is motivated by existing regression learning issues. The proposed approach, denoted by ε-Confidence Approximately Correct (ε-CoAC), utilizes Kullback—Leibler divergence (relative entropy) and proposes a new related typical set in the set of hyperparameters to tackle the learnability issue. ε-CoAC learnability is able to validate the learning process as a function of data length and as a function of the complexity order of the hypothesis class. Moreover, it enables the learner to compare hypothesis classes of different complexity order (hyperparameters) and choose among them the optimum with the minimum ε in the ε-CoAC framework. The ε-CoAC learnability not only overcomes the issues of overfitting and underfitting, but also shows advantages and superiority over the well-known cross-validation method in terms of time consumption and in terms of accuracy. A valuable application of ε-CoAC learnability is presented for simultaneous model order and time delay selection for LTI systems. Classical methods have approached this problem from two separate angles for time-delay estimation and for order selection with different cost functions. The ε-CoAC approach solves the problem with a unified cost function. The proposed method not only outperforms existing approaches but is also shown to be more robust to variations of the signal to noise ratio (SNR). The approach is also extended for online impulse response estimation and introduces efficient stopping criteria that are extremely valuable in practical applications. For the second hyperparameter analysis in machine learning, the challenge of regularization hyperparameter selection for the Support Vector Machine (SVM) algorithm is addressed. The regularization parameter controls the model capacity and the trade-off between the training and the generalization errors. It is shown that interestingly the introduced Separability and Scatteredness (S&S) ratio plays a key role in SVM hyperparameter selection, including kernel hyperparameters. Importance of S&S ratio in this context is similar to the role of the signal-to-noise ratio in the signal processing context. The proposed method outperforms existing cross-validation approaches, especially in the sense of computational complexity. For the hyperparameter selection in unsupervised learning, the fundamentals of ε-CoAC learnability is utilized by viewing the problem of clustering from a new angle. The application of the proposed stochastic based hyperparameter selecting algorithm can be generalized in the form of a validity index. The new validity index is shown to be superior to the state-of-the-art validity indices in the sense of accuracy and robustness to the cluster shape. Finally, the proposed validation index approach is extended for application in graph node clustering. The approach shows advantages over the existing methods in the sense of conductance and graph-based normalizing cuts.
History
Language
EnglishDegree
- Doctor of Philosophy
Program
- Electrical and Computer Engineering
Granting Institution
Toronto Metropolitan UniversityLAC Thesis Type
- Dissertation