Test for no adverse shift with outlier scores. Like goodness-of-fit testing, this two-sample comparison takes the training set, x_train as the as the reference. The method checks whether the test set, x_test, is worse off relative to this reference set. The function scorer assigns an outlier score to each instance/observation in both training and test set.

pt_refit(x_train, x_test, scorer, n_pt = 2000)

Arguments

x_train

Training (reference/validation) sample.

x_test

Test sample.

scorer

Function which returns a named list with outlier scores from the training and test sample. The first argument to scorer must be x_train; the second, x_test. The returned named list contains two elements: train and test, each of which is a vector of corresponding (outlier) scores. See notes below for more information.

n_pt

The number of permutations.

Value

A named list of class outlier.test containing:

  • statistic: observed WAUC statistic

  • seq_mct: sequential Monte Carlo test, when applicable

  • p_value: p-value

  • outlier_scores: outlier scores from training and test set

Details

The null distribution of the test statistic is based on n_pt permutations. For speed, this is implemented as a sequential Monte Carlo test with the simctest package. See Gandy (2009) for details. The prefix pt refers to permutation test. This approach does not use the asymptotic null distribution for the test statistic. This is the recommended approach for small samples. The test statistic is the weighted AUC (WAUC).

Notes

The scoring function, scorer, predicts out-of-sample scores by refitting the underlying algorithm from scorer at every permutation The suffix refit emphasizes this point. This is in contrast to the out-of-bag variant, pt_oob, which only fits once. This method can be be computationally expensive.

References

Kamulete, V. M. (2022). Test for non-negligible adverse shifts. In The 38th Conference on Uncertainty in Artificial Intelligence. PMLR.

Gandy, A. (2009). Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk. Journal of the American Statistical Association, 104(488), 1504-1511.

See also

[pt_oob()] for (faster) p-value approximation via out-of-bag predictions. [at_oob()] for p-value approximation from asymptotic null distribution.

Other permutation-test: pt_from_os(), pt_oob()

Examples

# \donttest{
library(dsos)
set.seed(12345)
data(iris)
setosa <- iris[1:50, 1:4] # Training sample: Species == 'setosa'
versicolor <- iris[51:100, 1:4] # Test sample: Species == 'versicolor'
scorer <- function(tr, te) list(train=runif(nrow(tr)), test=runif(nrow(te)))
pt_test <- pt_refit(setosa, versicolor, scorer = scorer)
pt_test
#> 	Frequentist test for no adverse shift 
#> 
#> p-value = 0.44444, test statistic (weighted AUC/WAUC) = 0.101
#> 
#> Alternative hypothesis: Pr(WAUC >= 0.101)
#> => the test set is worse off than training.
#> Sample sizes: 50 in training and 50 in test set.
# }