regression.rank#
Methods to score the quality of each label in a regression dataset. These can be used to rank the examples whose Y-value is most likely erroneous.
Note: Label quality scores are most accurate when they are computed based on out-of-sample predictions
from your regression model.
To obtain out-of-sample predictions for every datapoint in your dataset, you can use cross-validation. This is encouraged to get better results.
If you have a sklearn-compatible regression model, consider using cleanlab.regression.learn.CleanLearning
instead, which can more accurately identify noisy label values.
Functions:
|
Returns label quality score for each example in the regression dataset. |
- cleanlab.regression.rank.get_label_quality_scores(labels, predictions, *, method='outre')[source]#
Returns label quality score for each example in the regression dataset.
Each score is a continous value in the range [0,1]
1 - clean label (given label is likely correct).
0 - dirty label (given label is likely incorrect).
- Parameters:
labels (
array_like
) – Raw labels from original dataset. 1D array of shape(N, )
containing the given labels for each example (aka. Y-value, response/target/dependent variable), where N is number of examples in the dataset.predictions (
np.ndarray
) – 1D array of shape(N,)
containing the predicted label for each example in the dataset. These should be out-of-sample predictions from a trained regression model, which you can obtain for every example in your dataset via cross-validation.method (
{"residual", "outre"}
, default"outre"
) – String specifying which method to use for scoring the quality of each label and identifying which labels appear most noisy.
- Return type:
ndarray
- Returns:
label_quality_scores
– Array of shape(N, )
of scores between 0 and 1, one per example in the dataset.Lower scores indicate examples more likely to contain a label issue.
Examples
>>> import numpy as np >>> from cleanlab.regression.rank import get_label_quality_scores >>> labels = np.array([1,2,3,4]) >>> predictions = np.array([2,2,5,4.1]) >>> label_quality_scores = get_label_quality_scores(labels, predictions) >>> label_quality_scores array([0.00323821, 0.33692597, 0.00191686, 0.33692597])