model_outputs#

This module contains the ModelOutput class, which is used internally within Datalab to represent model outputs (e.g. predictions, probabilities, etc.) and process them for issue finding. This class and associated naming conventions are subject to change and is not meant to be used by users.

Classes:

ModelOutput(data)

An abstract class for representing model outputs (e.g.

MultiClassPredProbs(data)

A class for representing a model's predicted probabilities for each class in a multi-class classification problem.

RegressionPredictions(data)

A class for representing a model's predictions for a regression problem.

MultiLabelPredProbs(data)

A class for representing a model's predicted probabilities for each class in a multilabel classification problem.

class cleanlab.datalab.internal.model_outputs.ModelOutput(data)[source]#

Bases: ABC

An abstract class for representing model outputs (e.g. predictions, probabilities, etc.) for internal use within Datalab. This class is not meant to be used by users.

It is used internally within the issue-finding process Datalab runs to assign types to the data and process it accordingly.

Parameters:

data (array-like) – The model outputs. Not to be confused with the data used to train the model. This is mainly intended for NumPy arrays.

Attributes:

Methods:

validate()

Validate the data format and content.

collect()

Fetch the data for issue finding.

data: ndarray#
abstract validate()[source]#

Validate the data format and content. E.g. a pred_probs object used for classification should be a 2D array with values between 0 and 1 and sum to 1 for each row.

abstract collect()[source]#

Fetch the data for issue finding. Usually this is just the data itself, but sometimes it may be a transformation of the data (e.g. a 1D array of predictions from a 2D array of predicted probabilities).

class cleanlab.datalab.internal.model_outputs.MultiClassPredProbs(data)[source]#

Bases: ModelOutput

A class for representing a model’s predicted probabilities for each class in a multi-class classification problem. This class is not meant to be used by users.

Attributes:

Methods:

validate()

Validate the data format and content.

collect()

Fetch the data for issue finding.

argument = 'pred_probs'#
validate()[source]#

Validate the data format and content. E.g. a pred_probs object used for classification should be a 2D array with values between 0 and 1 and sum to 1 for each row.

collect()[source]#

Fetch the data for issue finding. Usually this is just the data itself, but sometimes it may be a transformation of the data (e.g. a 1D array of predictions from a 2D array of predicted probabilities).

data: ndarray#
class cleanlab.datalab.internal.model_outputs.RegressionPredictions(data)[source]#

Bases: ModelOutput

A class for representing a model’s predictions for a regression problem. This class is not meant to be used by users.

Attributes:

Methods:

validate()

Validate the data format and content.

collect()

Fetch the data for issue finding.

argument = 'predictions'#
validate()[source]#

Validate the data format and content. E.g. a pred_probs object used for classification should be a 2D array with values between 0 and 1 and sum to 1 for each row.

collect()[source]#

Fetch the data for issue finding. Usually this is just the data itself, but sometimes it may be a transformation of the data (e.g. a 1D array of predictions from a 2D array of predicted probabilities).

data: ndarray#
class cleanlab.datalab.internal.model_outputs.MultiLabelPredProbs(data)[source]#

Bases: ModelOutput

A class for representing a model’s predicted probabilities for each class in a multilabel classification problem. This class is not meant to be used by users.

Attributes:

Methods:

validate()

Validate the data format and content.

collect()

Fetch the data for issue finding.

argument = 'pred_probs'#
validate()[source]#

Validate the data format and content. E.g. a pred_probs object used for classification should be a 2D array with values between 0 and 1 and sum to 1 for each row.

collect()[source]#

Fetch the data for issue finding. Usually this is just the data itself, but sometimes it may be a transformation of the data (e.g. a 1D array of predictions from a 2D array of predicted probabilities).

data: ndarray#