metrics¶
Compute the weighted area under the ROC.
wauc(actual, predicted, *, sample_weight=None)
¶
Compute the weighted area under the ROC curve (WAUC).
Calculates the WAUC by weighting the true positive rate (TPR) at each false positive rate (FPR) threshold, optionally using sample weights. The weights are computed as the squared empirical weighted cumulative distribution function (EW-CDF) of the predicted scores for the negative class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
actual
|
NDArray[int_]
|
Ground truth binary labels (0 or 1). |
required |
predicted
|
NDArray
|
Predicted scores or probabilities. |
required |
sample_weight
|
NDArray or None
|
Sample weights. If None, all samples are given equal weight. |
None
|
Returns:
| Type | Description |
|---|---|
float
|
The weighted area under the ROC curve. |
Notes
The function uses the roc_curve from scikit-learn to compute FPR, TPR,
and thresholds. The empirical weighted CDF is computed for the negative
class predictions using ECDFDiscrete. The WAUC is calculated using the
trapezoidal rule, weighting the TPR by the squared EW-CDF at each
threshold [1].
References
.. [1] Li, Jialiang, and Jason P. Fine. "Weighted Area under the Receiver Operating Characteristic Curve and Its Application to Gene Selection." Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 59, no. 4, 2010, pp. 673-692.
Examples:
>>> import numpy as np
>>> from samesame.metrics import wauc
>>> actual = np.array([0, 1, 0, 1])
>>> predicted = np.array([0.1, 0.4, 0.35, 0.8])
>>> wauc(actual, predicted)
np.float64(0.625)
Source code in src/samesame/metrics.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |