Testing functions¶

Use this page for the two primary user-facing tests: test_shift(...) and test_adverse_shift(...). Start here if you are new to the package or want the simplest API surface.

What you get back¶

test_shift(...) returns ShiftDetails with .statistic, .pvalue, .statistic_name, and .null_distribution
test_adverse_shift(...) returns AdverseShiftDetails with .statistic, .pvalue, .direction, and .null_distribution

For Bayesian output or advanced controls, see the advanced page.

Task-first API for hypothesis tests over outlier scores.

The primary API exposes:

test_shift — test whether two outlier score distributions differ
test_adverse_shift — test for harmful shifts with explicit direction
adverse_shift_posterior — Bayesian evidence layer on top of an adverse-shift result

All test functions return a full result including the null distribution.

`AdverseShiftDetails` `dataclass` ¶

Bases: TestResult

Result of an adverse-shift test, including the full null distribution.

Source code in src/samesame/_types.py

@dataclass(frozen=True)
class AdverseShiftDetails(TestResult):
    """Result of an adverse-shift test, including the full null distribution."""

    direction: Direction
    null_distribution: NDArray[np.float64]

`BayesianEvidence` `dataclass` ¶

Bayesian evidence layer computed on top of an adverse-shift result.

Source code in src/samesame/_types.py

@dataclass(frozen=True)
class BayesianEvidence:
    """Bayesian evidence layer computed on top of an adverse-shift result."""

    posterior: NDArray[np.float64]
    bayes_factor: float

`ContextualWeights` `dataclass` ¶

Importance weights for source and target groups, used to correct for covariate shift between source and target during a shift test.

Attributes:

Name	Type	Description
`source`	`NDArray[float64]`	Importance weights for source samples, normalized to sum to `len(source)`.
`target`	`NDArray[float64]`	Importance weights for target samples, normalized to sum to `len(target)`.

Source code in src/samesame/weights.py

@dataclass(frozen=True)
class ContextualWeights:
    """Importance weights for source and target groups, used to
    correct for covariate shift between source and target during a shift test.

    Attributes
    ----------
    source : NDArray[np.float64]
        Importance weights for source samples, normalized to sum to
        ``len(source)``.
    target : NDArray[np.float64]
        Importance weights for target samples, normalized to sum to
        ``len(target)``.
    """

    source: NDArray[np.float64]
    target: NDArray[np.float64]

`ShiftDetails` `dataclass` ¶

Bases: TestResult

Result of a shift test, including the full null distribution.

Source code in src/samesame/_types.py

@dataclass(frozen=True)
class ShiftDetails(TestResult):
    """Result of a shift test, including the full null distribution."""

    statistic_name: str
    null_distribution: NDArray[np.float64]

`TestResult` `dataclass` ¶

Shared fields for all test results.

Source code in src/samesame/_types.py

@dataclass(frozen=True)
class TestResult:
    """Shared fields for all test results."""

    statistic: float
    pvalue: float

`adverse_shift_posterior(*, source, target, direction, n_resamples=9999, rng=None, weights=None, threshold=1 / 12)` ¶

Compute Bayesian evidence for adverse shift using a bootstrap posterior.

Provides a Bayesian evidence layer on top of the adverse-shift test: runs a Bayesian bootstrap over the WAUC metric and returns posterior draws together with a Bayes factor against a reference threshold.

Parameters:

Name	Type	Description	Default
`source`	`ArrayLike`	Baseline outlier scores, typically from training or reference data.	required
`target`	`ArrayLike`	New outlier scores to compare against `source`, typically from production or deployment data.	required
`direction`	`('higher-is-worse', 'higher-is-better')`	Whether higher outlier scores indicate worse outcomes (`'higher-is-worse'`) or better outcomes (`'higher-is-better'`). Required to determine the direction of adverse shift.	`'higher-is-worse'`
`n_resamples`	`int`	Number of Bayesian bootstrap resamples, by default `9999`.	`9999`
`rng`	`Generator or None`	Random number generator for reproducibility. `None` creates a fresh one.	`None`
`weights`	`ContextualWeights or None`	Importance weights to correct for covariate shift and related concerns between source and target. Build from domain probabilities using :func:`~samesame.weights.contextual_weights`, or construct `ContextualWeights(source=..., target=...)` directly. Pass `None` (default) to run an unweighted test.	`None`
`threshold`	`float`	WAUC value used as the null reference for the Bayes factor. Defaults to `1/12`, the asymptotic expected WAUC under the null hypothesis that source and target are from the same distribution.	`1 / 12`

Returns:

Type	Description
`BayesianEvidence`	Immutable result with `posterior` draws and `bayes_factor`.

`test_adverse_shift(*, source, target, direction, n_resamples=9999, batch=None, rng=None, weights=None)` ¶

Test whether the target sample is harmfully shifted.

Parameters:

Name	Type	Description	Default
`source`	`ArrayLike`	Baseline outlier scores, typically from training or reference data.	required
`target`	`ArrayLike`	New outlier scores to compare against `source`, typically from production or deployment data.	required
`direction`	`('higher-is-worse', 'higher-is-better')`	Whether higher outlier scores indicate worse outcomes (`'higher-is-worse'`) or better outcomes (`'higher-is-better'`). Required to determine the direction of adverse shift.	`'higher-is-worse'`
`n_resamples`	`int`	Number of permutation resamples, by default `9999`.	`9999`
`batch`	`int or None`	Number of resamples to process per batch. `None` uses a single batch.	`None`
`rng`	`Generator or None`	Random number generator for reproducibility. `None` creates a fresh one.	`None`
`weights`	`ContextualWeights or None`	Importance weights to correct for covariate shift and related concerns between source and target. Build from domain probabilities using :func:`~samesame.weights.contextual_weights`, or construct `ContextualWeights(source=..., target=...)` directly. Pass `None` (default) to run an unweighted test.	`None`

Returns:

Type	Description
`AdverseShiftDetails`	Immutable result with `statistic`, `pvalue`, `direction`, and `null_distribution`.

`test_shift(*, source, target, statistic='roc_auc', alternative='two-sided', n_resamples=9999, batch=None, rng=None, weights=None)` ¶

Test whether the source and target outlier score distributions differ.

Parameters:

Name	Type	Description	Default
`source`	`ArrayLike`	Baseline outlier scores, typically from training or reference data.	required
`target`	`ArrayLike`	New outlier scores to compare against `source`, typically from production or deployment data.	required
`statistic`	`('roc_auc', 'balanced_accuracy', 'matthews_corrcoef')`	Named built-in statistic used inside the permutation test.	`'roc_auc'`
`alternative`	`('two-sided', 'less', 'greater')`	Alternative hypothesis for the permutation test, by default `'two-sided'`.	`'two-sided'`
`n_resamples`	`int`	Number of permutation resamples, by default `9999`.	`9999`
`batch`	`int or None`	Number of resamples to process per batch. `None` uses a single batch.	`None`
`rng`	`Generator or None`	Random number generator for reproducibility. `None` creates a fresh one.	`None`
`weights`	`ContextualWeights or None`	Importance weights to correct for covariate shift and related concerns between source and target. Build from domain probabilities using :func:`~samesame.weights.contextual_weights`, or construct `ContextualWeights(source=..., target=...)` directly. Pass `None` (default) to run an unweighted test.	`None`

Returns:

Type	Description
`ShiftDetails`	Immutable result with `statistic`, `pvalue`, `statistic_name`, and `null_distribution`.

Source code in src/samesame/_api.py

def test_shift(
    *,
    source: ArrayLike,
    target: ArrayLike,
    statistic: ShiftStatistic = "roc_auc",
    alternative: Literal["less", "greater", "two-sided"] = "two-sided",
    n_resamples: int = 9999,
    batch: int | None = None,
    rng: np.random.Generator | None = None,
    weights: ContextualWeights | None = None,
) -> ShiftDetails:
    """Test whether the source and target outlier score distributions differ.

    Parameters
    ----------
    source : ArrayLike
        Baseline outlier scores, typically from training or reference data.
    target : ArrayLike
        New outlier scores to compare against ``source``, typically from
        production or deployment data.
    statistic : {'roc_auc', 'balanced_accuracy', 'matthews_corrcoef'}, optional
        Named built-in statistic used inside the permutation test.
    alternative : {'two-sided', 'less', 'greater'}, optional
        Alternative hypothesis for the permutation test, by default
        ``'two-sided'``.
    n_resamples : int, optional
        Number of permutation resamples, by default ``9999``.
    batch : int or None, optional
        Number of resamples to process per batch. ``None`` uses a single
        batch.
    rng : numpy.random.Generator or None, optional
        Random number generator for reproducibility. ``None`` creates a
        fresh one.
    weights : ContextualWeights or None, optional
        Importance weights to correct for covariate shift and related concerns
        between source and target. Build from domain probabilities using
        :func:`~samesame.weights.contextual_weights`, or construct
        ``ContextualWeights(source=..., target=...)`` directly.
        Pass ``None`` (default) to run an unweighted test.

    Returns
    -------
    ShiftDetails
        Immutable result with ``statistic``, ``pvalue``, ``statistic_name``,
        and ``null_distribution``.
    """
    dataset = build_two_sample_dataset(source, target)
    actual, predicted = dataset.labels, dataset.scores
    statistic_name, metric = get_shift_metric(statistic)
    _validate_shift_scores(statistic_name, predicted)
    effective_weight = _resolve_weights(weights, dataset.n_source, dataset.n_target)
    result = _run_permutation_test(
        actual,
        predicted,
        metric,
        n_resamples=n_resamples,
        alternative=alternative,
        sample_weight=effective_weight,
        rng=rng,
        batch=batch,
    )
    return ShiftDetails(
        statistic=float(result.statistic),
        pvalue=float(result.pvalue),
        statistic_name=statistic_name,
        null_distribution=np.asarray(result.null_distribution, dtype=np.float64),
    )

Testing functions¶

What you get back¶

AdverseShiftDetails dataclass ¶

BayesianEvidence dataclass ¶

ContextualWeights dataclass ¶

ShiftDetails dataclass ¶

TestResult dataclass ¶

adverse_shift_posterior(*, source, target, direction, n_resamples=9999, rng=None, weights=None, threshold=1 / 12) ¶

test_adverse_shift(*, source, target, direction, n_resamples=9999, batch=None, rng=None, weights=None) ¶

test_shift(*, source, target, statistic='roc_auc', alternative='two-sided', n_resamples=9999, batch=None, rng=None, weights=None) ¶

`AdverseShiftDetails` `dataclass` ¶

`BayesianEvidence` `dataclass` ¶

`ContextualWeights` `dataclass` ¶

`ShiftDetails` `dataclass` ¶

`TestResult` `dataclass` ¶

`adverse_shift_posterior(*, source, target, direction, n_resamples=9999, rng=None, weights=None, threshold=1 / 12)` ¶

`test_adverse_shift(*, source, target, direction, n_resamples=9999, batch=None, rng=None, weights=None)` ¶

`test_shift(*, source, target, statistic='roc_auc', alternative='two-sided', n_resamples=9999, batch=None, rng=None, weights=None)` ¶