fasttext#

Text classification with FastText models that are compatible with cleanlab. This module allows you to easily find label issues in your text datasets.

You must first pip install fasttext

Classes:

 FastTextClassifier(train_data_fn[, ...])

Functions:

 data_loader([fn, indices, label, batch_size]) Returns a generator, yielding two lists containing [labels], [text].
class cleanlab.experimental.fasttext.FastTextClassifier(train_data_fn, test_data_fn=None, labels=None, tmp_dir='', label='__label__', del_intermediate_data=True, kwargs_train_supervised={}, p_at_k=1, batch_size=1000)[source]#

Bases: BaseEstimator

Methods:

 fit([X, y, sample_weight]) Trains the fast text classifier. get_params([deep]) Get parameters for this estimator. predict([X, train_data, return_labels]) Predict labels of X predict_proba([X, train_data, return_labels]) Produces a probability matrix with examples on rows and classes on columns, where each row sums to 1 and captures the probability of the example belonging to each class. score([X, y, sample_weight, k]) Compute the average precision @ k (single label) of the labels predicted from X and the true labels given by y. set_params(**params) Set the parameters of this estimator.
fit(X=None, y=None, sample_weight=None)[source]#

Trains the fast text classifier. Typical usage requires NO parameters, just clf.fit() # No params.

Parameters:
• X (iterable, e.g. list, numpy array (default None)) – The list of indices of the data to use. When in doubt, set as None. None defaults to range(len(data)).

• y (None) – Leave this as None. It’s a filler to suit sklearns reqs.

• sample_weight (None) – Leave this as None. It’s a filler to suit sklearns reqs.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

predict(X=None, train_data=True, return_labels=False)[source]#

Predict labels of X

predict_proba(X=None, train_data=True, return_labels=False)[source]#

Produces a probability matrix with examples on rows and classes on columns, where each row sums to 1 and captures the probability of the example belonging to each class.

score(X=None, y=None, sample_weight=None, k=None)[source]#

Compute the average precision @ k (single label) of the labels predicted from X and the true labels given by y. score expects a y variable. In this case, y is the noisy labels.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.