randtest2

 Performs a permutation or randomization test to compare the distributions of 
 two independent or paired data samples. 

 -- Function File: PVAL = randtest2 (X, Y)
 -- Function File: PVAL = randtest2 (X, Y, PAIRED)
 -- Function File: PVAL = randtest2 (X, Y, PAIRED, NREPS)
 -- Function File: PVAL = randtest2 (X, Y, PAIRED, NREPS)
 -- Function File: PVAL = randtest2 (X, Y, PAIRED, NREPS, FUNC)
 -- Function File: PVAL = randtest2 (X, Y, PAIRED, NREPS, FUNC, SEED)
 -- Function File: PVAL = randtest2 ([X, GX], [Y, GY], ...)
 -- Function File: [PVAL, STAT] = randtest (...)
 -- Function File: [PVAL, STAT, FPR] = randtest (...)
 -- Function File: [PVAL, STAT, FPR, PERMSTAT] = randtest (...)

     'PVAL = randtest2 (X, Y)' performs a randomization (a.k.a. permutation)
     test to ascertain whether data samples X and Y come from populations with
     the same distribution. Distributions are compared using the Wasserstein
     metric [1,2], which is the area of the difference between the empirical
     cumulative distribution functions of X and Y. The data in X and Y should
     be column vectors that represent measurements of the same variable. The
     value returned is a 2-tailed p-value against the null hypothesis computed
     using the absolute values of the test statistics.

     'PVAL = randtest2 (X, Y, PAIRED)' specifies whether X and Y should be
     treated as independent (unpaired) or paired samples. PAIRED accepts a
     logical scalar:
        o false (default): As above.
        o true: Performs a randomization or permutation test to ascertain
                whether paired or matched data samples X and Y come from
                populations with the same distribution. The vectors X and Y
                must contain the same number of sampling units.

     'PVAL = randtest2 (X, Y, PAIRED, NREPS)' specifies the number of resamples
     without replacement to take in the randomization test. By default, NREPS
     is 5000. If the number of possible permutations is smaller than NREPS, the
     test becomes exact. For example, if the number of sampling units across
     two independent samples is 6, then the number of possible permutations is
     factorial (6) = 720, so NREPS will be truncated at 720 and sampling will
     systematically evaluate all possible permutations. If the number of
     sampling units in each paired sample is 12, then the number of possible
     permutations is 2^12 = 4096, so NREPS will be truncated at 4096 and
     sampling will systematically evaluate all possible permutations. 

     'PVAL = randtest2 (X, Y, PAIRED, NREPS, FUNC)' also specifies a custom
     function calculated on the original samples, and the permuted or
     randomized resamples. Note that FUNC must compute a difference statistic
     between samples X and Y, and should either be a:
        o function handle or anonymous function,
        o string of function name, or
        o a cell array where the first cell is one of the above function
          definitions and the remaining cells are (additional) input arguments 
          to that function (other than the data arguments).
        See the built-in demos for example usage with the mean or vaiance.

     'PVAL = randtest2 (X, Y, PAIRED, NREPS, FUNC, SEED)' initialises the
     Mersenne Twister random number generator using an integer SEED value so
     that that the results of 'randtest2' results are reproducible when the
     test is approximate (i.e. when using randomization if not all permutations
     can be evaluated systematically).

     'PVAL = randtest2 ([X, GX], [Y, GY], ...)' also specifies the sampling
     units (i.e. clusters) using consecutive positive integers in GX and GY
     for X and Y respectively. Defining the sampling units has applications
     for clustered resampling, for example in the cases of nested experimental 
     designs. If PAIRED is false, numeric identifiers in GX and GY must be
     unique (e.g. 1,2,3 in GX, 4,5,6 in GY). If PAIRED is true, numeric
     identifiers in GX and GY must by identical (e.g. 1,2,3 in GX, 1,2,3 in
     GY). Note that when sampling units contain different numbers of values,
     function evaluations after sampling cannot be vectorized. If the parallel
     computing toolbox (Matlab) or package (Octave) is installed and loaded,
     then the function evaluations will be automatically accelerated by
     parallel processing on platforms with multiple processors.

     '[PVAL, STAT] = randtest2 (...)' also returns the test statistic.

     '[PVAL, STAT, FPR] = randtest2 (...)' also returns the minimum false
     positive risk (FPR) calculated for the p-value, computed using the
     Sellke-Berger approach.

     '[PVAL, STAT, FPR, PERMSTAT] = randtest2 (...)' also returns the
     statistics of the permutation distribution.

  Bibliography:
  [1] Dowd (2020) A New ECDF Two-Sample Test Statistic. arXiv.
       https://doi.org/10.48550/arXiv.2007.01360
  [2] https://en.wikipedia.org/wiki/Wasserstein_metric

  randtest2 (version 2023.09.16)
  Author: Andrew Charles Penn
  https://www.researchgate.net/profile/Andrew_Penn/

  Copyright 2019 Andrew Charles Penn
  This program is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program.  If not, see http://www.gnu.org/licenses/

Demonstration 1

The following code


 % Mouse data from Table 2 (page 11) of Efron and Tibshirani (1993)
 treatment = [94 197 16 38 99 141 23]';
 control = [52 104 146 10 51 30 40 27 46]';

 % Randomization test comparing the distributions of observations from two
 % independent samples (assuming i.i.d and exchangeability) using the
 % Wasserstein metric
 pval = randtest2 (control, treatment, false, 5000)

 % Randomization test comparing the difference in means between two
 % independent samples (assuming i.i.d and exchangeability) 
 pval = randtest2 (control, treatment, false, 5000, ...
                           @(x, y) mean (x) - mean (y))

 % Randomization test comparing the ratio of variances between two
 % independent samples (assuming i.i.d and exchangeability). (Note that
 % the log transformation is necessary to make the p-value two-tailed)
 pval = randtest2 (control, treatment, false, 5000, ...
                           @(x, y) log (var (y) ./ var (x)))

Produces the following output

pval = 0.3562
pval = 0.277
pval = 0.30584

Demonstration 2

The following code


 % Example data from: 
 % https://www.biostat.wisc.edu/~kbroman/teaching/labstat/third/notes18.pdf
 A = [117.3 100.1 94.5 135.5 92.9 118.9 144.8 103.9 103.8 153.6 163.1]';
 B = [145.9 94.8 108 122.6 130.2 143.9 149.9 138.5 91.7 162.6 202.5]';

 % Randomization test comparing the distributions of observations from two
 % paired or matching samples (assuming i.i.d and exchangeability) using the
 % Wasserstein metric
 pval = randtest2 (A, B, true, 5000)

 % Randomization test comparing the difference in means between two
 % independent samples (assuming i.i.d and exchangeability) 
 pval = randtest2 (A, B, true, 5000, @(A, B) mean (A) - mean (B))

 % Randomization test comparing the ratio of variances between two
 % paired or matching samples (assuming i.i.d and exchangeability). (Note
 % that the log transformation is necessary to make the p-value two-tailed)
 pval = randtest2 (A, B,  true, 5000, @(A, B) log (var (A) ./ var (B)))
                           

Produces the following output

pval = 0.12891
pval = 0.037109
pval = 0.51172

Demonstration 3

The following code


 X = [21,26,33,22,18,25,26,24,21,25,35,28,32,36,38]';
 GX = [1,1,1,1,2,2,2,2,2,2,3,3,3,3,3]';
 Y = [26,34,27,38,44,34,45,38,31,41,34,35,38,46]';
 GY = [4,4,4,5,5,5,5,5,6,6,6,6,6,6]';

 % Randomization test comparing the distributions of observations from two
 % independent samples (assuming i.i.d) using the Wasserstein metric
 pval = randtest2 (X, Y, false, 5000)

 % Randomization test comparing the distributions of clustered observations
 % from two independent samples using the Wasserstein metric
 pval = randtest2 ([X GX], [Y GY], false, 5000)

Produces the following output

pval = 0.0008
pval =   0.2

Demonstration 4

The following code


 X = [21,26,33,22,18,25,26,24,21,25,35,28,32,36,38]';
 GX = [1,1,1,1,2,2,2,2,2,2,3,3,3,3,3]';
 Y = [26,34,27,38,44,34,45,38,31,41,34,35,38,46,36]';
 GY = [1,1,1,1,2,2,2,2,2,2,3,3,3,3,3]';

 % Randomization test comparing the distributions of observations from two
 % paired or matched samples (assuming i.i.d) using the Wasserstein metric
 pval = randtest2 (X, Y, true, 5000)

 % Randomization test comparing the distributions of clustered observations
 % from two paired or matched using the Wasserstein metric
 pval = randtest2 ([X GX], [Y GY], true, 5000)

Produces the following output

pval = 0.0012
pval =  0.25

Package: statistics-resampling