nit¶
WeightedAUC
dataclass
¶
Bases: CTST
Two-sample test for no adverse shift using the weighted AUC (WAUC).
This test compares scores from two independent samples. We reject the null hypothesis of no adverse shift for unusually high values of the WAUC i.e. when the second sample is relatively worse than the first one. This is a robust nonparametric noninferiority test (NIT) with no pre-specified margin. It can be used, amongst other things, to detect dataset shift with outlier scores, hence the DSOS acronym.
Attributes:
| Name | Type | Description |
|---|---|---|
actual |
NDArray
|
Binary indicator for sample membership. |
predicted |
NDArray
|
Estimated (predicted) scores for corresponding samples in |
n_resamples |
(int, optional)
|
Number of resampling iterations, by default 9999. |
rng |
(Generator, optional)
|
Random number generator, by default np.random.default_rng(). |
n_jobs |
(int, optional)
|
Number of parallel jobs, by default 1. |
batch |
(int or None, optional)
|
Batch size for parallel processing, by default None. |
See Also
bayes.as_bf : Convert a one-sided p-value to a Bayes factor.
bayes.as_pvalue : Convert a Bayes factor to a one-sided p-value.
Notes
The frequentist null distribution of the WAUC is based on permutations [1]. The Bayesian posterior distribution of the WAUC is based on the Bayesian bootstrap [2]. Because this is a one-tailed test of direction (it asks the question, 'are we worse off?'), we can convert a one-sided p-value into a Bayes factor and vice versa. We can also use these p-values for sequential testing [3].
The test assumes that predicted are outlier scores and/or encode some
notions of outlyingness; higher value of predicted indicates worse
outcomes.
References
.. [1] Kamulete, Vathy M. "Test for non-negligible adverse shifts." Uncertainty in Artificial Intelligence. PMLR, 2022.
.. [2] Gu, Jiezhun, Subhashis Ghosal, and Anindya Roy. "Bayesian bootstrap estimation of ROC curve." Statistics in medicine 27.26 (2008): 5407-5420.
.. [3] Kamulete, Vathy M. "Are you OK? A Bayesian Sequential Test for Adverse Shift." 2025.
Examples:
>>> import numpy as np
>>> from samesame.nit import WeightedAUC
>>> # alternatively: from samesame.nit import DSOS
>>> actual = np.array([0, 1, 1, 0])
>>> scores = np.array([0.2, 0.8, 0.6, 0.4])
>>> wauc = WeightedAUC(actual, scores)
>>> print(wauc.pvalue)
>>> print(wauc.bayes_factor)
>>> wauc_ = WeightedAUC.from_samples(scores, scores)
>>> isinstance(wauc_, WeightedAUC)
True
Source code in src/samesame/nit.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | |
bayes_factor
cached
property
¶
Compute the Bayes factor using the Bayesian bootstrap.
Notes
The result is cached to avoid (expensive) recomputation.
posterior
cached
property
¶
Compute the posterior distribution of the WAUC.
Returns:
| Type | Description |
|---|---|
NDArray
|
The posterior distribution of the WAUC. |
Notes
The result is cached to avoid (expensive) recomputation since the posterior distribution uses the Bayesian bootstrap.
__init__(actual, predicted, n_resamples=9999, rng=np.random.default_rng(), n_jobs=1, batch=None)
¶
Initialize WeightedAUC.
Source code in src/samesame/nit.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
from_samples(first_sample, second_sample, n_resamples=9999, rng=np.random.default_rng(), n_jobs=1, batch=None)
classmethod
¶
Create a WeightedAUC instance from two samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
first_sample
|
NDArray
|
First sample of scores. These can be binary or continuous. |
required |
second_sample
|
NDArray
|
Second sample of scores. These can be binary or continuous. |
required |
Returns:
| Type | Description |
|---|---|
WeightedAUC
|
An instance of the WeightedAUC class. |
Source code in src/samesame/nit.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | |