Test for no adverse shift with outlier scores. Like goodness-of-fit testing, this two-sample comparison takes the training (outlier) scores, os_train, as the reference. The method checks whether the test scores, os_test, are worse off relative to the training set.

bf_from_os(os_train, os_test, n_pt = 4000, threshold = 1/12)

Arguments

os_train

Outlier scores in training (reference) set.

os_test

Outlier scores in test set.

n_pt

The number of permutations.

threshold

Threshold for adverse shift. Defaults to 1 / 12, the asymptotic value of the test statistic when the two samples are drawn from the same distribution.

Value

A named list of class outlier.bayes containing:

  • posterior: Posterior distribution of WAUC test statistic

  • threshold: WAUC threshold for adverse shift

  • adverse_probability: probability of adverse shift

  • bayes_factor: Bayes factor

  • outlier_scores: outlier scores from training and test set

Details

The posterior distribution of the test statistic is based on n_pt (boostrap) permutations. The method uses the Bayesian bootstrap as a resampling procedure as in Gu et al (2008). Johnson (2005) shows to leverage (turn) a test statistic into a Bayes factor. The test statistic is the weighted AUC (WAUC).

Notes

The outlier scores should all mimic out-of-sample behaviour. Mind that the training scores are not in-sample and thus, biased (overfitted) while the test scores are out-of-sample. The mismatch -- in-sample versus out-of-sample scores -- voids the test validity. A simple fix for this is to get the training scores from an indepedent (fresh) validation set; this follows the train/validation/test sample splitting convention and the validation set is effectively the reference set or distribution in this case.

References

Kamulete, V. M. (2023). Are you OK? A Bayesian test for adverse shift. Manuscript in preparation.

Johnson, V. E. (2005). Bayes factors based on test statistics. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(5), 689-701.

Gu, J., Ghosal, S., & Roy, A. (2008). Bayesian bootstrap estimation of ROC curve. Statistics in medicine, 27(26), 5407-5420.

See also

Other bayesian-test: as_bf(), as_pvalue(), bf_compare()

Examples

# \donttest{
library(dsos)
set.seed(12345)
os_train <- rnorm(n = 100)
os_test <- rnorm(n = 100)
bayes_test <- bf_from_os(os_train, os_test)
bayes_test
#> 	Bayesian test for no adverse shift 
#> 
#> Bayes factor (BF) = 0.07, cutoff (weighted AUC/WAUC) = 0.0833
#> 
#> Model: bayesian bootstrap with 4000 replicates (simulations) 
#> BF's numerator: Pr(WAUC >= 0.0833) 
#> BF's denominator: Pr(WAUC < 0.0833) 
#> => BF > 3 favors view that the test set is worse off than training.
#> Sample sizes: 100 in training and 100 in test set.
# To run in parallel on local cluster, uncomment the next two lines.
# library(future)
# future::plan(future::multisession)
parallel_test <- bf_from_os(os_train, os_test)
parallel_test
#> 	Bayesian test for no adverse shift 
#> 
#> Bayes factor (BF) = 0.06, cutoff (weighted AUC/WAUC) = 0.0833
#> 
#> Model: bayesian bootstrap with 4000 replicates (simulations) 
#> BF's numerator: Pr(WAUC >= 0.0833) 
#> BF's denominator: Pr(WAUC < 0.0833) 
#> => BF > 3 favors view that the test set is worse off than training.
#> Sample sizes: 100 in training and 100 in test set.
# }