The Permutation Test
====================

The "permutation test" tests which whether two different subpopulations, A and
B, are different by some metric, e.g. the mean/medians of an numerical
observation.

This test originates in (no surprise) Fisher's work in the thirties.

As a toy example that arises in practice, let's say we have a table where each
row is an individual, and one column takes one of two possible values, called A
or B. 

A priori, this column might be utterly meaningless in terms of the context at
hand. Moreover, for practical reasons, we want to limit the number of features
which influence our decisions.

In other words, we want to "make it hard" to establish that there is a
statistically significant difference between these two subpopulations. 

Therefore, **the null hypothesis of the permutation test asserts that the
difference between these two subpopulations is statistically indinstinguishable
from randomly splitting the population into two groups**. In other words, the
designation of whether a sample was in subpopulation A or B were completely
arbitrary. 

If we're going to test this hypothesis, we need to enter into the realm where
the null hypothesis was true. In this world, A and B were "from the same
population" so that any difference comes from the randomness in splitting this
singular population into two groups.

Therefore, in order to test this hypothesis, we need to simulate splitting the
population into two groups, and examine the differences between this two
groups. The choice of a metric (i.e. test statistic) of these two groups can
greatly simplify our task.

As in the case with most hypothesis tests, this null hypothesis (the difference
is due to noise) generates a distribution by looking at the difference of the
metrics between the two groups. 
  
.. note:: 

   We emphasize that the distribution of the test statistic under the null
   hypothesis is being computed via simulation, unlike the parametric tests
   which utilize tables of a known distribution. 

Let's say our alternative hypothesis is that B is better than A. In this case,
giving evidence for this difference requires us to show that a statistically
significant amount of differences generated by the null distribution are
greater than the difference between A and B. 

The p-value associated to the difference between A and B is straightfoward to
compute: its then the number of simulated differences which are greater than
the observed differences divided by the number of simulated differences. 

.. note:: 

   As p-values are computed by counting samples, we did not need to
   assume/argue that the distribution of the test statistic is of a particular,
   analytically nice form. It could be something like the median or mode, which
   resists an analytically computable description.

   This is the real power of the permutation test (more generally nonparametric
   tests): it's very flexible with the test statistic! 

.. warning::
   
   Like all bootstrap methods, certain assumptions must be met! Namely, that
   there are sufficient number of samples that the empirical distribution (of
   the test statistic) adequately approximates the "true" distribution.

The "permutation" part of the name is a bit unfortunate, as it hides the
underlying intent and logic of the test. It refers to one way of generating
splittings of the pooled sample, by permuting an ordered list of samples, and
splitting along the middle.