
Text classification with FastText models that are compatible with cleanlab. This module allows you to easily find label issues in your text datasets.

You must first pip install fasttext


data_loader([fn, indices, label, batch_size])

Returns a generator, yielding two lists containing [labels], [text].


FastTextClassifier(train_data_fn[, ...])

cleanlab.experimental.fasttext.data_loader(fn=None, indices=None, label='__label__', batch_size=1000)[source]#

Returns a generator, yielding two lists containing [labels], [text]. Items are always returned in the order in the file, regardless if indices are provided.

class cleanlab.experimental.fasttext.FastTextClassifier(train_data_fn, test_data_fn=None, labels=None, tmp_dir='', label='__label__', del_intermediate_data=True, kwargs_train_supervised={}, p_at_k=1, batch_size=1000)[source]#

Bases: BaseEstimator


fit([X, y, sample_weight])

Trains the fast text classifier.

predict_proba([X, train_data, return_labels])

Produces a probability matrix with examples on rows and classes on columns, where each row sums to 1 and captures the probability of the example belonging to each class.

predict([X, train_data, return_labels])

Predict labels of X

score([X, y, sample_weight, k])

Compute the average precision @ k (single label) of the labels predicted from X and the true labels given by y.


Get parameters for this estimator.


Set the parameters of this estimator.

fit(X=None, y=None, sample_weight=None)[source]#

Trains the fast text classifier. Typical usage requires NO parameters, just # No params.

  • X (iterable, e.g. list, numpy array (default None)) – The list of indices of the data to use. When in doubt, set as None. None defaults to range(len(data)).

  • y (None) – Leave this as None. It’s a filler to suit sklearns reqs.

  • sample_weight (None) – Leave this as None. It’s a filler to suit sklearns reqs.

predict_proba(X=None, train_data=True, return_labels=False)[source]#

Produces a probability matrix with examples on rows and classes on columns, where each row sums to 1 and captures the probability of the example belonging to each class.

predict(X=None, train_data=True, return_labels=False)[source]#

Predict labels of X

score(X=None, y=None, sample_weight=None, k=None)[source]#

Compute the average precision @ k (single label) of the labels predicted from X and the true labels given by y. score expects a y variable. In this case, y is the noisy labels.


Get parameters for this estimator.


deep (bool, default True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.


params (dict) – Parameter names mapped to their values.


Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.


**params (dict) – Estimator parameters.


self (estimator instance) – Estimator instance.