Decoding, MVPA, predictive models for neuroimaging diagnosis or prognosis all rely on cross-validation to measure the predictive accuracy of the model, and optionally to tune the decoder. Cross-validation relies on testing predictive power on left-out data unseen by the training of the predictive model. It is appealing as it is non-parametric and asymptotically unbiased. Common practice in neuroimaging relies on leave-one-out, yet statistical theory  suggests that this is suboptimal as the small test set leads to large variance and is easily biased by sample correlations.
Decoders usually come with a hyper-parameter that controls the regularization, ie a bias/variance tradeoff. In machine learning, this tradeoff is typically adjusted to the signal-to-noise ratio of the data using cross-validation to maximize predictive accuracy. In this case, the accuracy of the decoding must be done in an independent "validation set", using a "nested cross-validation.
Here we assess empirically these practices on neuroimaging data, to yield guidelines.
Given 8 open datasets from openfMRI , we assess cross-validation on 35 decoding tasks, 15 of which within subject. We leave out a large validation untouched and perform nested cross-validation in the rest of the data. In a first experiment we compared the accuracy of the decoder as measured by cross-validation with that measured on the left-out data. In a second experiment, we use the nested cross-validation to tune the decoders either by refitting on with the best parameter or by averaging the best models. We used standard linear decoders: SVM and logistic regression, both sparse (l1 penalty) and non-sparse (l2 penatly).
We assess a variety of cross-validation strategy: leaving out single samples, leaving out full sessions or subjects, and repeated random splits leaving out 20% of the sessions of subjects.
The first finding is a confirmation of the theory that repeated random splits should be preferred to leave-one-sample-out: they are less fragile, and less computationally costly.
Second, we find large error bars on cross-validation estimates of predictive power, 10% or more, particularly on within-subject analysis, likely because of marked sample inhomogeneities.
Finally, we find that setting decoder parameters by nested cross-validation does not lead to much prediction gain, in particular in the case of non-sparse models. This is probably a consequence of our second finding.
These conclusions are crucial for decoding and information mapping, that rely on the measure of the prediction accuracy. This measure is more fragile than practitioners often assume.