samesame¶
Same, same but different ...
samesame helps you answer a question every data scientist faces after deploying a model:
"Has my data changed in a way that could hurt my model?"
It provides two complementary statistical tests:
- CTST — detects whether two datasets come from different distributions ("something changed")
- DSOS — detects whether that change is actually harmful ("things got worse")
This distinction matters. Not every distributional difference is a problem.
samesame helps you tell the two apart so you can avoid unnecessary alerts and focus on real issues.
Who is this for?¶
samesame is useful whenever you need to compare two datasets statistically, for example:
- Model monitoring — Is my production data starting to look different from my training data?
- Data validation — Does this new data batch match the distribution I expect?
- Drift detection — Has the input distribution shifted between last month and this month?
- A/B testing — Are the two groups I'm comparing actually comparable?
Installation¶
python -m pip install samesame
Quick Start¶
The example below shows why having two tests matters.
Imagine you have outlier scores from a training set and a test set — higher score means more unusual. You want to know: (a) are the distributions different? and (b) is the test set actually worse?
import numpy as np
from sklearn.metrics import roc_auc_score
from samesame.ctst import CTST
from samesame.nit import DSOS
rng = np.random.default_rng(123_456)
os_train = rng.normal(size=600) # outlier scores from training
os_test = rng.normal(size=600) # outlier scores from deployment
# Question 1: Are the distributions different?
ctst = CTST.from_samples(os_train, os_test, metric=roc_auc_score)
print(f"CTST p-value: {ctst.pvalue:.4f}")
# CTST p-value: 0.0358 → distributions differ (small p-value)
# Question 2: Is the test set actually worse (more outliers)?
dsos = DSOS.from_samples(os_train, os_test)
print(f"DSOS p-value: {dsos.pvalue:.4f}")
# DSOS p-value: 0.9500 → no adverse shift detected (large p-value)
What this means: CTST flags a difference (the distributions are not identical), but DSOS says the test set is not disproportionately worse. This is a common real-world situation — minor statistical differences that do not signal a real problem. Without DSOS, you might raise a false alarm.
Modules¶
| Module | What it does |
|---|---|
samesame.ctst |
Classifier two-sample tests — did the distribution change? |
samesame.nit |
Noninferiority tests — is the change actually harmful? |
samesame.bayes |
Bayesian inference — convert p-values to Bayes factors |
samesame.ood |
Out-of-distribution scoring — flag unusual inputs |
Test result attributes¶
Every test result object exposes these attributes (where applicable):
| Attribute | Description |
|---|---|
.statistic |
The observed test statistic |
.null |
The null distribution (from permutations) |
.pvalue |
The p-value |
.posterior |
The Bayesian posterior distribution (DSOS only) |
.bayes_factor |
The Bayes factor (DSOS only) |
Examples¶
Step-by-step worked examples are available in the documentation:
- Detecting distribution shifts
- Noninferiority testing
- Credit risk: shift and degradation
- Credit OOD detection
Dependencies¶
samesame has minimal dependencies. It is built on top of, and fully compatible with,
scikit-learn and numpy.