Test for no adverse shift with outlier scores. Like goodness-of-fit testing, this two-sample comparison takes the training set, x_train as the as the reference. The method checks whether the test set, x_test, is worse off relative to this reference set. The function scorer assigns an outlier score to each instance/observation in both training and test set.

at_from_os(os_train, os_test)

Arguments

os_train

Outlier scores in training (reference) set.

os_test

Outlier scores in test set.

Value

A named list of class outlier.test containing:

  • statistic: observed WAUC statistic

  • seq_mct: sequential Monte Carlo test, when applicable

  • p_value: p-value

  • outlier_scores: outlier scores from training and test set

Details

Li and Fine (2010) derives the asymptotic null distribution for the weighted AUC (WAUC), the test statistic. This approach does not use permutations and can, as a result, be much faster because it sidesteps the need to refit the scoring function scorer. This works well for large samples. The prefix at stands for asymptotic test to tell it apart from the prefix pt, the permutation test.

Notes

The outlier scores should all mimic out-of-sample behaviour. Mind that the training scores are not in-sample and thus, biased (overfitted) while the test scores are out-of-sample. The mismatch -- in-sample versus out-of-sample scores -- voids the test validity. A simple fix for this is to get the training scores from an indepedent (fresh) validation set; this follows the train/validation/test sample splitting convention and the validation set is effectively the reference set or distribution in this case.

References

Kamulete, V. M. (2022). Test for non-negligible adverse shifts. In The 38th Conference on Uncertainty in Artificial Intelligence. PMLR.

Gandy, A. (2009). Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk. Journal of the American Statistical Association, 104(488), 1504-1511.

See also

[at_oob()] for variant requiring a scoring function. [pt_from_os()] for permutation test with the outlier scores.

Other asymptotic-test: at_oob()

Examples

# \donttest{
library(dsos)
set.seed(12345)
os_train <- rnorm(n = 100)
os_test <- rnorm(n = 100)
test_result <- at_from_os(os_train, os_test)
test_result
#> 	Frequentist test for no adverse shift 
#> 
#> p-value = 0.94177, test statistic (weighted AUC/WAUC) = 0.0624
#> 
#> Alternative hypothesis: Pr(WAUC >= 0.0624)
#> => the test set is worse off than training.
#> Sample sizes: 100 in training and 100 in test set.
# }