Skip to content

samesame

Development Status Python Downloads Static Badge UAI 2022 uv Ruff

Same, same but different ...

samesame compares a reference group with a new group and tells you whether the new group looks different, and whether it moved in a worse direction.

In the package, the reference group is called source and the new group is called target. That could mean training vs production data, a baseline batch vs a fresh batch, or one segment vs another.

The package is built around two practical questions:

  • Did anything change?
  • Did the change point in a worse direction?

You answer those questions with the signal that matches your use case: predicted risk, model confidence, prediction error, or a classifier score used to compare two datasets.

Start here

Quick example

import numpy as np
import samesame as ss

rng = np.random.default_rng(123_456)
source_scores = rng.normal(size=600)
target_scores = rng.normal(size=600)

shift = ss.shift.detect_shift(source_scores, target_scores)
harm = ss.shift.detect_harm(
    source_scores,
    target_scores,
    direction="higher-is-worse",
)

print(f"Shift p-value: {shift.pvalue:.4f}")
print(f"Harm  p-value: {harm.pvalue:.4f}")

A small p-value from detect_shift(...) means the groups differ. A small p-value from detect_harm(...) means the target group also moved in the declared worse direction.

Common signals

Choose the signal that matches the decision you need to make:

  • Predicted risk when higher values already mean higher business risk.
  • Prediction error when labels are available and you want to measure accuracy directly.
  • Confidence score when you want to monitor certainty rather than business impact.
  • Domain-classifier score when your goal is to detect distribution shift between datasets.

The package does not force one interpretation on you. It gives you a small set of tests you can reuse across these settings.

Why it works well in practice

samesame is statistically grounded, but the working model is simple:

  1. Build a numeric signal for source and target.
  2. Test for any change with ss.shift.detect_shift(...).
  3. Test for directional harm with ss.shift.detect_harm(...) when direction matters.

Both tests are permutation-based, which keeps the assumptions light. When source and target differ in feature support, ss.weights.from_domain_probabilities(...) lets you focus the test on the region where the two groups are genuinely comparable.

Pick a guide

Installation

python -m pip install samesame