fasttext#

Text classification with FastText models that are compatible with cleanlab. This module allows you to easily find label issues in your text datasets.

You must first pip install fasttext

Classes:

FastTextClassifier(train_data_fn[, ...])

Functions:

data_loader([fn, indices, label, batch_size])

Returns a generator, yielding two lists containing [labels], [text].

class cleanlab.experimental.fasttext.FastTextClassifier(train_data_fn, test_data_fn=None, labels=None, tmp_dir='', label='__label__', del_intermediate_data=True, kwargs_train_supervised={}, p_at_k=1, batch_size=1000)[source]#

Bases: sklearn.base.BaseEstimator

Methods:

fit([X, y, sample_weight])

Trains the fast text classifier.

get_params([deep])

Get parameters for this estimator.

predict([X, train_data, return_labels])

Predict labels of X

predict_proba([X, train_data, return_labels])

Produces a probability matrix with examples on rows and classes on columns, where each row sums to 1 and captures the probability of the example belonging to each class.

score([X, y, sample_weight, k])

Compute the average precision @ k (single label) of the labels predicted from X and the true labels given by y.

set_params(**params)

Set the parameters of this estimator.

fit(X=None, y=None, sample_weight=None)[source]#

Trains the fast text classifier. Typical usage requires NO parameters, just clf.fit() # No params.

Parameters
  • X (iterable, e.g. list, numpy array (default None)) – The list of indices of the data to use. When in doubt, set as None. None defaults to range(len(data)).

  • y (None) – Leave this as None. It’s a filler to suit sklearns reqs.

  • sample_weight (None) – Leave this as None. It’s a filler to suit sklearns reqs.

get_params(deep=True)#

Get parameters for this estimator.

Parameters

deep (bool, default True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

predict(X=None, train_data=True, return_labels=False)[source]#

Predict labels of X

predict_proba(X=None, train_data=True, return_labels=False)[source]#

Produces a probability matrix with examples on rows and classes on columns, where each row sums to 1 and captures the probability of the example belonging to each class.

score(X=None, y=None, sample_weight=None, k=None)[source]#

Compute the average precision @ k (single label) of the labels predicted from X and the true labels given by y. score expects a y variable. In this case, y is the noisy labels.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

cleanlab.experimental.fasttext.data_loader(fn=None, indices=None, label='__label__', batch_size=1000)[source]#

Returns a generator, yielding two lists containing [labels], [text]. Items are always returned in the order in the file, regardless if indices are provided.