latent_algebra#
Functions:

Compute the inverse noise matrix if py := P(true_label=k) is given. 

Compute the noise matrix P(label=k_strue_label=k_y). 

Compute ps := P(labels=k), py := P(true_labels=k), and the inverse noise matrix. 

Compute py := P(true_labels=k) from ps := P(labels=k), noise_matrix, and the inverse noise matrix. 

Compute py := P(true_label=k), and the inverse noise matrix. 

Compute pyx := P(true_label=kx) from pred_probs := P(label=kx), and the noise_matrix and inverse noise matrix. 
 cleanlab.internal.latent_algebra.compute_inv_noise_matrix(py, noise_matrix, *, ps=None)[source]#
Compute the inverse noise matrix if py := P(true_label=k) is given.
 Parameters
py (
np.array (shape (K
,1))
) – The fraction (prior probability) of each TRUE class label, P(true_label = k)noise_matrix (
np.array
ofshape (K
,K)
,K = number
ofclasses
) – A conditional probability matrix of the form P(label=k_strue_label=k_y) containing the fraction of examples in every class, labeled as every other class. Assumes columns of noise_matrix sum to 1.ps (
np.array (shape (K
,1))
) – The fraction (prior probability) of each NOISY given label, P(labels = k). ps is easily computable from py and should only be provided if it has already been precomputed, to increase code efficiency.
Examples
For loop based implementation:
# Number of classes K = len(py) # 'ps' is p(labels=k) = noise_matrix * p(true_labels=k) # because in *vector computation*: P(label=ktrue_label=k) * p(true_label=k) = P(label=k) if ps is None: ps = noise_matrix.dot(py) # Estimate the (K, K) inverse noise matrix P(true_label = k_y  label = k_s) inverse_noise_matrix = np.empty(shape=(K,K)) # k_s is the class value k of noisy label `label == k` for k_s in range(K): # k_y is the (guessed) class value k of true label y for k_y in range(K): # P(true_labellabel) = P(labely) * P(true_label) / P(labels) inverse_noise_matrix[k_y][k_s] = noise_matrix[k_s][k_y] * py[k_y] / ps[k_s]
 cleanlab.internal.latent_algebra.compute_noise_matrix_from_inverse(ps, inverse_noise_matrix, *, py=None)[source]#
Compute the noise matrix P(label=k_strue_label=k_y).
 Parameters
py (
np.array (shape (K
,1))
) – The fraction (prior probability) of each TRUE class label, P(true_label = k)inverse_noise_matrix (
np.array
ofshape (K
,K)
,K = number
ofclasses
) – A conditional probability matrix of the form P(true_label=k_ylabel=k_s) representing the estimated fraction observed examples in each class k_s, that are mislabeled examples from every other class k_y. If None, the inverse_noise_matrix will be computed from pred_probs and labels. Assumes columns of inverse_noise_matrix sum to 1.ps (
np.array (shape (K
,1))
) – The fraction (prior probability) of each observed NOISY label, P(labels = k). ps is easily computable from py and should only be provided if it has already been precomputed, to increase code efficiency.
 Returns
noise_matrix – A conditional probability matrix of the form P(label=k_strue_label=k_y) containing the fraction of examples in every class, labeled as every other class. Columns of noise_matrix sum to 1.
 Return type
np.array
ofshape (K
,K)
,K = number
ofclasses
Examples
For loop based implementation:
# Number of classes labels K = len(ps) # 'py' is p(true_label=k) = inverse_noise_matrix * p(true_label=k) # because in *vector computation*: P(true_label=klabel=k) * p(label=k) = P(true_label=k) if py is None: py = inverse_noise_matrix.dot(ps) # Estimate the (K, K) noise matrix P(labels = k_s  true_labels = k_y) noise_matrix = np.empty(shape=(K,K)) # k_s is the class value k of noisy label `labels == k` for k_s in range(K): # k_y is the (guessed) class value k of true label y for k_y in range(K): # P(labelsy) = P(true_labellabels) * P(labels) / P(true_label) noise_matrix[k_s][k_y] = inverse_noise_matrix[k_y][k_s] * ps[k_s] / py[k_y]
 cleanlab.internal.latent_algebra.compute_ps_py_inv_noise_matrix(labels, noise_matrix)[source]#
Compute ps := P(labels=k), py := P(true_labels=k), and the inverse noise matrix.
 Parameters
labels (
np.array
) – A discrete vector of noisy labels, i.e. some labels may be erroneous. Format requirements: for dataset with K classes, labels must be in {0,1,…,K1}.noise_matrix (
np.array
ofshape (K
,K)
,K = number
ofclasses
) – A conditional probability matrix of the form P(label=k_strue_label=k_y) containing the fraction of examples in every class, labeled as every other class. Assumes columns of noise_matrix sum to 1.
 cleanlab.internal.latent_algebra.compute_py(ps, noise_matrix, inverse_noise_matrix, *, py_method='cnt', true_labels_class_counts=None)[source]#
Compute py := P(true_labels=k) from ps := P(labels=k), noise_matrix, and the inverse noise matrix.
This method is ** ROBUST ** when py_method = ‘cnt’ It may work well even when the noise matrices are estimated poorly by using the diagonals of the matrices instead of all the probabilities in the entire matrix.
 Parameters
ps (
np.array (shape (K
,)
or(1
,K))
) – The fraction (prior probability) of each observed, noisy label, P(labels = k)noise_matrix (
np.array
ofshape (K
,K)
,K = number
ofclasses
) – A conditional probability matrix of the form P(label=k_strue_label=k_y) containing the fraction of examples in every class, labeled as every other class. Assumes columns of noise_matrix sum to 1.inverse_noise_matrix (
np.array
ofshape (K
,K)
,K = number
ofclasses
) – A conditional probability matrix of the form P(true_label=k_ylabel=k_s) representing the estimated fraction observed examples in each class k_s, that are mislabeled examples from every other class k_y. If None, the inverse_noise_matrix will be computed from pred_probs and labels. Assumes columns of inverse_noise_matrix sum to 1.py_method (
str (Options
:[``
”cnt”, ``"eqn"
,"marginal"
,"marginal_ps"
])
) – How to compute the latent prior p(true_label=k). Default is “cnt” as it often works well even when the noise matrices are estimated poorly by using the matrix diagonals instead of all the probabilities.true_labels_class_counts (
np.array (shape (K
,)
or(1
,K))
) – The marginal counts of the confident joint (like cj.sum(axis = 0))
 Returns
py – The fraction (prior probability) of each TRUE class label, P(true_label = k).
 Return type
np.array (shape (K
,)
or(1
,K))
 cleanlab.internal.latent_algebra.compute_py_inv_noise_matrix(ps, noise_matrix)[source]#
Compute py := P(true_label=k), and the inverse noise matrix.
 Parameters
ps (
np.array (shape (K
,)
or(1
,K))
) – The fraction (prior probability) of each observed, NOISY class P(labels = k).noise_matrix (
np.array
ofshape (K
,K)
,K = number
ofclasses
) – A conditional probability matrix of the form P(label=k_strue_label=k_y) containing the fraction of examples in every class, labeled as every other class. Assumes columns of noise_matrix sum to 1.
 cleanlab.internal.latent_algebra.compute_pyx(pred_probs, noise_matrix, inverse_noise_matrix)[source]#
Compute pyx := P(true_label=kx) from pred_probs := P(label=kx), and the noise_matrix and inverse noise matrix.
This method is ROBUST  meaning it works well even when the noise matrices are estimated poorly by only using the diagonals of the matrices which tend to be easy to estimate correctly.
 Parameters
pred_probs (
np.array (shape (N
,K))
) – P(label=kx) is a matrix with K modelpredicted probabilities. Each row of this matrix corresponds to an example x and contains the modelpredicted probabilities that x belongs to each possible class. The columns must be ordered such that these probabilities correspond to class 0,1,2,… pred_probs should have been computed using 3 (or higher) fold crossvalidation.noise_matrix (
np.array
ofshape (K
,K)
,K = number
ofclasses
) – A conditional probability matrix of the form P(label=k_strue_label=k_y) containing the fraction of examples in every class, labeled as every other class. Assumes columns of noise_matrix sum to 1.inverse_noise_matrix (
np.array
ofshape (K
,K)
,K = number
ofclasses
) – A conditional probability matrix of the form P(true_label=k_ylabel=k_s) representing the estimated fraction observed examples in each class k_s, that are mislabeled examples from every other class k_y. If None, the inverse_noise_matrix will be computed from pred_probs and labels. Assumes columns of inverse_noise_matrix sum to 1.
 Returns
pyx – P(true_label=kx) is a matrix with K modelpredicted probabilities. Each row of this matrix corresponds to an example x and contains the modelpredicted probabilities that x belongs to each possible class. The columns must be ordered such that these probabilities correspond to class 0,1,2,… pred_probs should have been computed using 3 (or higher) fold crossvalidation.
 Return type
np.array (shape (N
,K))