regression.rank#
Methods to score the quality of each label in a regression dataset. These can be used to rank the examples whose Y-value is most likely erroneous.
Note: Label quality scores are most accurate when they are computed based on out-of-sample predictions from your regression model.
To obtain out-of-sample predictions for every datapoint in your dataset, you can use cross-validation. This is encouraged to get better results.
If you have a sklearn-compatible regression model, consider using cleanlab.regression.learn.CleanLearning instead, which can more accurately identify noisy label values.
Functions:
| 
 | Returns label quality score for each example in the regression dataset. | 
- cleanlab.regression.rank.get_label_quality_scores(labels, predictions, *, method='outre')[source]#
- Returns label quality score for each example in the regression dataset. - Each score is a continous value in the range [0,1] - 1 - clean label (given label is likely correct). 
- 0 - dirty label (given label is likely incorrect). 
 - Parameters:
- labels ( - array_like) – Raw labels from original dataset. 1D array of shape- (N, )containing the given labels for each example (aka. Y-value, response/target/dependent variable), where N is number of examples in the dataset.
- predictions ( - np.ndarray) – 1D array of shape- (N,)containing the predicted label for each example in the dataset. These should be out-of-sample predictions from a trained regression model, which you can obtain for every example in your dataset via cross-validation.
- method ( - {"residual", "outre"}, default- "outre") – String specifying which method to use for scoring the quality of each label and identifying which labels appear most noisy.
 
- Return type:
- ndarray
- Returns:
- label_quality_scores– Array of shape- (N, )of scores between 0 and 1, one per example in the dataset.- Lower scores indicate examples more likely to contain a label issue. 
 - Examples - >>> import numpy as np >>> from cleanlab.regression.rank import get_label_quality_scores >>> labels = np.array([1,2,3,4]) >>> predictions = np.array([2,2,5,4.1]) >>> label_quality_scores = get_label_quality_scores(labels, predictions) >>> label_quality_scores array([0.00323821, 0.33692597, 0.00191686, 0.33692597])