Skip to content

nit

WeightedAUC dataclass

Bases: CTST

Two-sample test for no adverse shift using the weighted AUC (WAUC).

This test compares scores from two independent samples. We reject the null hypothesis of no adverse shift for unusually high values of the WAUC i.e. when the second sample is relatively worse than the first one. This is a robust nonparametric noninferiority test (NIT) with no pre-specified margin. It can be used, amongst other things, to detect dataset shift with outlier scores, hence the DSOS acronym.

Attributes:

Name Type Description
actual NDArray

Binary indicator for sample membership.

predicted NDArray

Estimated (predicted) scores for corresponding samples in actual.

n_resamples (int, optional)

Number of resampling iterations, by default 9999.

rng (Generator, optional)

Random number generator, by default np.random.default_rng().

n_jobs (int, optional)

Number of parallel jobs, by default 1.

batch (int or None, optional)

Batch size for parallel processing, by default None.

See Also

bayes.as_bf : Convert a one-sided p-value to a Bayes factor.

bayes.as_pvalue : Convert a Bayes factor to a one-sided p-value.

Notes

The frequentist null distribution of the WAUC is based on permutations [1]. The Bayesian posterior distribution of the WAUC is based on the Bayesian bootstrap [2]. Because this is a one-tailed test of direction (it asks the question, 'are we worse off?'), we can convert a one-sided p-value into a Bayes factor and vice versa. We can also use these p-values for sequential testing [3].

The test assumes that predicted are outlier scores and/or encode some notions of outlyingness; higher value of predicted indicates worse outcomes.

References

.. [1] Kamulete, Vathy M. "Test for non-negligible adverse shifts." Uncertainty in Artificial Intelligence. PMLR, 2022.

.. [2] Gu, Jiezhun, Subhashis Ghosal, and Anindya Roy. "Bayesian bootstrap estimation of ROC curve." Statistics in medicine 27.26 (2008): 5407-5420.

.. [3] Kamulete, Vathy M. "Are you OK? A Bayesian Sequential Test for Adverse Shift." 2025.

Examples:

>>> import numpy as np
>>> from samesame.nit import WeightedAUC
>>> # alternatively: from samesame.nit import DSOS
>>> actual = np.array([0, 1, 1, 0])
>>> scores = np.array([0.2, 0.8, 0.6, 0.4])
>>> wauc = WeightedAUC(actual, scores)
>>> print(wauc.pvalue)
>>> print(wauc.bayes_factor)
>>> wauc_ = WeightedAUC.from_samples(scores, scores)
>>> isinstance(wauc_, WeightedAUC)
True
Source code in src/samesame/nit.py
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
@dataclass
class WeightedAUC(CTST):
    """
    Two-sample test for no adverse shift using the weighted AUC (WAUC).

    This test compares scores from two independent samples. We reject the
    null hypothesis of no adverse shift for unusually high values of the WAUC
    i.e. when the second sample is relatively worse than the first one. This
    is a robust nonparametric noninferiority test (NIT) with no pre-specified
    margin. It can be used, amongst other things, to detect dataset shift with
    outlier scores, hence the DSOS acronym.

    Attributes
    ----------
    actual : NDArray
        Binary indicator for sample membership.
    predicted : NDArray
        Estimated (predicted) scores for corresponding samples in `actual`.
    n_resamples : int, optional
        Number of resampling iterations, by default 9999.
    rng : np.random.Generator, optional
        Random number generator, by default np.random.default_rng().
    n_jobs : int, optional
        Number of parallel jobs, by default 1.
    batch : int or None, optional
        Batch size for parallel processing, by default None.

    See Also
    --------
    bayes.as_bf : Convert a one-sided p-value to a Bayes factor.

    bayes.as_pvalue : Convert a Bayes factor to a one-sided p-value.

    Notes
    -----
    The frequentist null distribution of the WAUC is based on permutations
    [1]. The Bayesian posterior distribution of the WAUC is based on the
    Bayesian bootstrap [2]. Because this is a one-tailed test of direction
    (it asks the question, 'are we worse off?'), we can convert a one-sided
    p-value into a Bayes factor and vice versa. We can also use these p-values
    for sequential testing [3].

    The test assumes that `predicted` are outlier scores and/or encode some
    notions of outlyingness; higher value of `predicted` indicates worse
    outcomes.

    References
    ----------
    .. [1] Kamulete, Vathy M. "Test for non-negligible adverse shifts."
       Uncertainty in Artificial Intelligence. PMLR, 2022.

    .. [2] Gu, Jiezhun, Subhashis Ghosal, and Anindya Roy. "Bayesian bootstrap
       estimation of ROC curve." Statistics in medicine 27.26 (2008): 5407-5420.

    .. [3] Kamulete, Vathy M. "Are you OK? A Bayesian Sequential Test for
       Adverse Shift." 2025.

    Examples
    --------
    >>> import numpy as np
    >>> from samesame.nit import WeightedAUC
    >>> # alternatively: from samesame.nit import DSOS
    >>> actual = np.array([0, 1, 1, 0])
    >>> scores = np.array([0.2, 0.8, 0.6, 0.4])
    >>> wauc = WeightedAUC(actual, scores)
    >>> print(wauc.pvalue) # doctest: +SKIP
    >>> print(wauc.bayes_factor) # doctest: +SKIP
    >>> wauc_ = WeightedAUC.from_samples(scores, scores)
    >>> isinstance(wauc_, WeightedAUC)
    True
    """

    def __init__(
        self,
        actual: NDArray,
        predicted: NDArray,
        n_resamples: int = 9999,
        rng: np.random.Generator = np.random.default_rng(),
        n_jobs: int = 1,
        batch: int | None = None,
    ):
        """Initialize WeightedAUC."""
        super().__init__(
            actual=actual,
            predicted=predicted,
            metric=wauc,
            n_resamples=n_resamples,
            rng=rng,
            n_jobs=n_jobs,
            batch=batch,
            alternative="greater",
        )

    @cached_property
    def posterior(self) -> NDArray:
        """
        Compute the posterior distribution of the WAUC.

        Returns
        -------
        NDArray
            The posterior distribution of the WAUC.

        Notes
        -----
        The result is cached to avoid (expensive) recomputation since the
        posterior distribution uses the Bayesian bootstrap.
        """
        return bayesian_posterior(
            self.actual,
            self.predicted,
            self.metric,
            self.n_resamples,
            self.rng,
        )

    @cached_property
    def bayes_factor(self):
        """
        Compute the Bayes factor using the Bayesian bootstrap.

        Notes
        -----
        The result is cached to avoid (expensive) recomputation.
        """
        bayes_threshold = float(np.mean(self.null))
        bf_ = _bayes_factor(self.posterior, bayes_threshold)
        return bf_

    @classmethod
    def from_samples(
        cls,
        first_sample: NDArray,
        second_sample: NDArray,
        n_resamples: int = 9999,
        rng: np.random.Generator = np.random.default_rng(),
        n_jobs: int = 1,
        batch: int | None = None,
    ):
        """
        Create a WeightedAUC instance from two samples.

        Parameters
        ----------
        first_sample : NDArray
            First sample of scores. These can be binary or continuous.
        second_sample : NDArray
            Second sample of scores. These can be binary or continuous.

        Returns
        -------
        WeightedAUC
            An instance of the WeightedAUC class.
        """
        assert type_of_target(first_sample) == type_of_target(second_sample)
        samples = (first_sample, second_sample)
        actual = assign_labels(samples)
        predicted = concat_samples(samples)
        return cls(
            actual,
            predicted,
            n_resamples,
            rng,
            n_jobs,
            batch,
        )

bayes_factor cached property

Compute the Bayes factor using the Bayesian bootstrap.

Notes

The result is cached to avoid (expensive) recomputation.

posterior cached property

Compute the posterior distribution of the WAUC.

Returns:

Type Description
NDArray

The posterior distribution of the WAUC.

Notes

The result is cached to avoid (expensive) recomputation since the posterior distribution uses the Bayesian bootstrap.

__init__(actual, predicted, n_resamples=9999, rng=np.random.default_rng(), n_jobs=1, batch=None)

Initialize WeightedAUC.

Source code in src/samesame/nit.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
def __init__(
    self,
    actual: NDArray,
    predicted: NDArray,
    n_resamples: int = 9999,
    rng: np.random.Generator = np.random.default_rng(),
    n_jobs: int = 1,
    batch: int | None = None,
):
    """Initialize WeightedAUC."""
    super().__init__(
        actual=actual,
        predicted=predicted,
        metric=wauc,
        n_resamples=n_resamples,
        rng=rng,
        n_jobs=n_jobs,
        batch=batch,
        alternative="greater",
    )

from_samples(first_sample, second_sample, n_resamples=9999, rng=np.random.default_rng(), n_jobs=1, batch=None) classmethod

Create a WeightedAUC instance from two samples.

Parameters:

Name Type Description Default
first_sample NDArray

First sample of scores. These can be binary or continuous.

required
second_sample NDArray

Second sample of scores. These can be binary or continuous.

required

Returns:

Type Description
WeightedAUC

An instance of the WeightedAUC class.

Source code in src/samesame/nit.py
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
@classmethod
def from_samples(
    cls,
    first_sample: NDArray,
    second_sample: NDArray,
    n_resamples: int = 9999,
    rng: np.random.Generator = np.random.default_rng(),
    n_jobs: int = 1,
    batch: int | None = None,
):
    """
    Create a WeightedAUC instance from two samples.

    Parameters
    ----------
    first_sample : NDArray
        First sample of scores. These can be binary or continuous.
    second_sample : NDArray
        Second sample of scores. These can be binary or continuous.

    Returns
    -------
    WeightedAUC
        An instance of the WeightedAUC class.
    """
    assert type_of_target(first_sample) == type_of_target(second_sample)
    samples = (first_sample, second_sample)
    actual = assign_labels(samples)
    predicted = concat_samples(samples)
    return cls(
        actual,
        predicted,
        n_resamples,
        rng,
        n_jobs,
        batch,
    )