GLM and SVM

8. GLM and SVM

Objectives

  • Understand linear, poisson, and logistic regression as generalised linear models (GLMs).

  • Apply GLMs to real-world problems and interpret the results.

  • Understand the basic concepts of support vector machine (SVM).

  • Apply SVM to real-world problems and interpret the results.

Expected time to complete: 4 hours

In this chapter, we will study generalised linear models (GLMs) and support vector machines (SVMs).

GLM is a a flexible generalisation of linear regression to allow the linear model to be related to the response variable in a more complex way (via a link function) than a simple linear relationship. The logistic regression in Chapter 3 is such a generalised linear model. In this chapter, we will study another GLM, the poisson regression that can predict count data.

SVM is a supervised machine learning model that was developed for classification problems but can be used for regression problems as well. In binary classification, it embodies the idea of finding a hyperplane that best separates the classes by maximising the margin between the hyperplane and the nearest data points from the two classes. SVM has been shown to perform well in a variety of settings, and is often considered one of the best “out of the box” classifiers.

Ingredients: Poisson regression

  • Input: features of data samples

  • Output: target values of data samples in the form of counts (non-negative integers)

  • Model: transform the target values (counts) to its log, fit a line (or plane/hyperplane) to the transformed training data, and map the estimated value on the fitted line (or plane/hyperplane) for the test data to a count value via the exponential function.

    • Hyperparameter(s): None.

    • Parameters: the intercept(s) and slope(s) of the fitted line (or plane/hyperplane), also known as the bias(es) and weight(s), respectively

  • Loss function: maximise the likelihood (probability) of the log of the target values for the training data samples.

  • Learning algorithm: Gradient descent on the negative log likelihood of the training data samples

Transparency: Poisson regression

System logic

  • Condition to produce certain output: to produce an output count \(y\), take the log of \(y\), locate this \(\log(y)\) value on the fitted line (or plane/hyperplane) and then find the corresponding input \(x\) (or \(\mathbf{x}\)) value.

Ingredients: SVM for binary classification

  • Input: features of data samples.

  • Output: binary class labels of data samples.

  • Model: fit a plane/hyperplane to the training data to separate the data points by their class labels with the largest margin.

    • Hyperparameter(s): a tuning/regularisation parameter \(\Omega\) for controlling the trade-off between the margin and the tolerance of misclassification (margin violation). If a kernel is used, other hyperparameters include the kernel type and the kernel parameters.

    • Parameter(s): the intercept (\(\beta_0\)) and coefficients (\(\alpha_1, \alpha_2, \ldots, \alpha_N\)) for each training data sample.

  • Loss function: maximises the margin and minimises the misclassification (margin violation).

  • Learning algorithm: quadratic programming.

Transparency: SVM for binary classification

System logic

  • Condition to produce certain output: to produce a class label \(y\), find out on which side of the fitted hyperplane this class label is located, and then all corresponding data points \(x\) (or \(\mathbf{x}\)) on this side of the hyperplane will be likely candidates to produce a label \(y\), the further away from the hyperplane, the more likely/confident.