Cross-Validation and Bootstrap

5. Cross-Validation and Bootstrap

Objectives

  • Understand the concepts and processes of cross-validation and bootstrap.

  • Evaluate the performance of machine learning models using cross-validation.

  • Quantify the uncertainty associated with a machine learning model using bootstrap.

Expected time to complete: 2.5 hours

As shown in previous chapters, the performance of a machine learning model is evaluated using a performance metric. The discussions in multiple hypothesis testing imply that if we compute the performance metric using only a single split of the data into training and test sets, we may not have high confidence in the results and subsequent conclusions drawn from them, particularly when the performance metric is sensitive to the particular split of the data. We’d better take into account the variability in the data.

In this chapter, we will learn about two resampling methods, cross-validation and bootstrap, to evaluate the performance of a machine learning model more rigorously and quantify the uncertainty associated with it. As the name “resampling” suggests, both methods involve resampling the data. They repeatedly draw samples from a training set and refit a model of interest on each set of drawn samples to obtain additional information about the fitted model. For example, to estimate the variability of a linear regression fit, we can repeatedly draw different samples from the training data, fit a linear regression to each new sample set, and then examine the extent to which the resulting fits differ. This approach allows us to obtain information that would not be available from fitting the model only once using the original training sample.

Process transparency: \(K\)-fold cross-validation

  • Starting point: a standardised dataset for a machine learning task with a performance metric defined and a machine learning model chosen to address a practical need/problem

  • Determine the number of folds \(K\) to use in cross-validation

  • Split the dataset into \(K\) folds

  • For each fold \(k\):

    • Train the model on the remaining \(K-1\) folds

    • Evaluate the model on fold \(k\)

  • Compute the average performance metric across the \(K\) folds

  • End point: Report the average performance metric and its standard deviation

Process transparency: bootstrap

  • Starting point: a standardised dataset for a machine learning task with a performance metric defined and a machine learning model chosen to address a practical need/problem

  • Determine the number of bootstrap samples \(B\) to use

  • For each bootstrap sample \(b\):

    • Draw a bootstrap sample from the dataset

    • Train the model on the bootstrap sample

    • Evaluate the model on the bootstrap sample

  • Compute the average performance metric across the \(B\) bootstrap samples

  • End point: Report the average performance metric and its standard deviation