fasttext#
Text classification with fastText models that are compatible with cleanlab. This module allows you to easily find label issues in your text datasets.
You must have fastText installed: pip install fasttext
.
Tips:
Check out our example using this class: fasttext_amazon_reviews
Our unit tests also provide basic usage examples.
Functions:
|
Returns a generator, yielding two lists containing [labels], [text]. |
Classes:
|
Instantiate a fastText classifier that is compatible with |
- cleanlab.models.fasttext.data_loader(fn=None, indices=None, label='__label__', batch_size=1000)[source]#
Returns a generator, yielding two lists containing [labels], [text]. Items are always returned in the order in the file, regardless if indices are provided.
- class cleanlab.models.fasttext.FastTextClassifier(train_data_fn, test_data_fn=None, labels=None, tmp_dir='', label='__label__', del_intermediate_data=True, kwargs_train_supervised={}, p_at_k=1, batch_size=1000)[source]#
Bases:
BaseEstimator
Instantiate a fastText classifier that is compatible with
CleanLearning
.- Parameters:
train_data_fn (
str
) – File name of the training data in the format compatible with fastText.test_data_fn (
str
, optional) – File name of the test data in the format compatible with fastText.
Methods:
fit
([X, y, sample_weight])Trains the fast text classifier.
predict_proba
([X, train_data, return_labels])Produces a probability matrix with examples on rows and classes on columns, where each row sums to 1 and captures the probability of the example belonging to each class.
predict
([X, train_data, return_labels])Predict labels of X
score
([X, y, sample_weight, k])Compute the average precision @ k (single label) of the labels predicted from X and the true labels given by y.
get_params
([deep])Get parameters for this estimator.
set_params
(**params)Set the parameters of this estimator.
- fit(X=None, y=None, sample_weight=None)[source]#
Trains the fast text classifier. Typical usage requires NO parameters, just clf.fit() # No params.
- Parameters:
X (
iterable
,e.g. list
,numpy array (default None)
) – The list of indices of the data to use. When in doubt, set as None. None defaults to range(len(data)).y (
None
) – Leave this as None. It’s a filler to suit sklearns reqs.sample_weight (
None
) – Leave this as None. It’s a filler to suit sklearns reqs.
- predict_proba(X=None, train_data=True, return_labels=False)[source]#
Produces a probability matrix with examples on rows and classes on columns, where each row sums to 1 and captures the probability of the example belonging to each class.
- score(X=None, y=None, sample_weight=None, k=None)[source]#
Compute the average precision @ k (single label) of the labels predicted from X and the true labels given by y. score expects a y variable. In this case, y is the noisy labels.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
deep (
bool
, defaultTrue
) – If True, will return the parameters for this estimator and contained subobjects that are estimators.- Returns:
params (
dict
) – Parameter names mapped to their values.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (
dict
) – Estimator parameters.- Returns:
self (
estimator instance
) – Estimator instance.