Assessing Learners

We assemble various notions of error that are useful in machine learning.

Throughout these notes, we deal with

To-Do

  • Notions of Error
    • Conditional Generalization/Test Error

    • Expected Generalization/Test Error

    • Empirical Error

    • In sample error

  • optimism
    • statements about covariance and optimism

  • Is learning possible?
    • VC dimension in relating training/in sample error pg 239 of ESL

Binary Classifiers

The accuracy is the probability your answer is correct when randomly drawing a sample from your observed data.

The notion below require a designation of positive and negative classes. That is, a boolean structure on the labels.

The sensitivity of the classifiers is the accuracy when one discards all the observations of the individuals who were not positive “in reality”.

The specificity of the classifiers is the accuracy when one discards all but those individual who are “in reality” negative.

Everything here is relative. In many cases of interest, “in reality” can not be given any falsifiable meaning.

In practice, “in reality” means “according to some other learner”. In the case above, the learner is the nonparametric, unregularized MLE, aka the “empirical distribution”.

This is something to keep in mind about these statistical assessments: they are always implicitly relative. Fortunately, hypothesis testing is tailored made for the relative situation.

Notions of Error

Definition

Conditional Generalization error

Definition

Generalization Error

Definition

kdf O

Definition

kdf O

Optimism

Definition

definition of optimism

The Possibility of Learning