Crossvalidation
Simulation
Techniques, Statistical Analysis Techniques
Crossvalidation techniques can be used to test the predictive
performance of models. The
techniques can be used to help prevent a model being over
fitted. Crossvalidation
involves repeatedly fitting a model to
subsets of the data (known as a training sets), and then using the rest
of the
data (known as validation sets) to test the performance of that model.
There are several ways of performing a crossvalidation
analysis, including the following:
Repeated random
subsampling validation
This method involves the following steps:
 Randomly assign each observation into one of two
groups:
training and validation.
 Fit the model to the observations in the training
set.
 Use the observations from the validation set to test
the
model’s performance. Store this information.
 Repeat steps 1 to 3 many times.
Kfold
crossvalidation
This method involves the following steps:
 Randomly partition the observations into K groups of equal length.
 For a group of observations, fit the model using all
observations except that group.
 Use that group’s observations to test the
model’s predictive
performance. Store this information.
 Repeat steps 2 and 3 for the other groups.
When the number of folds
(K) equals the number of
observations
in the data set, it is known as a leaveoneout
crossvalidation.
K
× 2 crossvalidation
This method involves the following steps:
 Randomly partition the observations into
two groups of equal length.
 Use one group to fit the model and the other group to
test
the model’s performance. Store this information.
 Repeat step 2 but with the two groups switched
around.
 Repeat steps 1 to 3 several times.
See
also:
Monte
Carlo Methods
Bootstrapping
