How to: Restrict testing to common support on both sides¶
Use this guide when both source and target contain low-overlap observations and source-only reweighting is not enough.
This is the most aggressive weighting mode. Use it only when you have a real reason to believe the problem sits on both sides of the comparison.
Step 1 - Start from the source-reweighting setup¶
This guide continues from Focus harmful-shift testing on shared support. At that point you already have:
source_probandtarget_probfrom the domain classifiertrain_riskanddeployment_riskas the harmful-shift signal
Step 2 - Weight both groups¶
from samesame.weights import from_domain_probabilities
weights_both = from_domain_probabilities(
source_prob=source_prob,
target_prob=target_prob,
mode="both",
lambda_=0.5,
)
double_weighted = ss.shift.detect_harm(
source=train_risk,
target=deployment_risk,
direction="higher-is-worse",
weights=weights_both,
random_state=12345,
)
print(f"Double-weighted p-value: {double_weighted.pvalue:.4f}")
Step 3 - Compare the three views¶
weights_source = from_domain_probabilities(
source_prob=source_prob,
target_prob=target_prob,
mode="source",
lambda_=0.5,
)
source_weighted = ss.shift.detect_harm(
source=train_risk,
target=deployment_risk,
direction="higher-is-worse",
weights=weights_source,
random_state=12345,
)
print(f"Unweighted p-value: {unweighted.pvalue:.4f}")
print(f"Source-weighted p-value: {source_weighted.pvalue:.4f}")
print(f"Double-weighted p-value: {double_weighted.pvalue:.4f}")
Think of the three results this way:
- Unweighted looks at the full populations.
- Source-weighted focuses on overlap from the source side.
- Double-weighted focuses on common support from both sides.
If the signal shrinks only after double-weighting, target-side outliers were still influencing the result after source reweighting.
Choosing lambda_¶
lambda_=0.5 is the safest starting point.
- Lower values make the correction stronger and the variance higher.
- Higher values move closer to uniform weights.
If you are unsure, start at 0.5, inspect the sensitivity, and only move lower when you trust the
domain probabilities and the overlap story.
For the intuition behind the formulas, see When importance weights help.