outlier#

Helper functions used internally for outlier detection tasks.

Functions:

transform_distances_to_scores(distances, k, t)

Returns an outlier score for each example based on its average distance to its k nearest neighbors.

cleanlab.internal.outlier.transform_distances_to_scores(distances, k, t)[source]#

Returns an outlier score for each example based on its average distance to its k nearest neighbors.

The transformation of a distance, dd , to a score, oo , is based on the following formula:

o=exp(dt)o = \exp\left(-dt\right)

where tt scales the distance to a score in the range [0,1].

Parameters:
  • distances (np.ndarray) – An array of distances of shape (N, num_neighbors), where N is the number of examples. Each row contains the distances to each example’s num_neighbors nearest neighbors. It is assumed that each row is sorted in ascending order.

  • k (int) – Number of neighbors used to compute the average distance to each example. This assumes that the second dimension of distances is k or greater, but it uses slicing to avoid indexing errors.

  • t (int) – Controls transformation of distances between examples into similarity scores that lie in [0,1].

Return type:

ndarray

Returns:

ood_features_scores (np.ndarray) – An array of outlier scores of shape (N,) for N examples.

Examples

>>> import numpy as np
>>> from cleanlab.outlier import transform_distances_to_scores
>>> distances = np.array([[0.0, 0.1, 0.25],
...                       [0.15, 0.2, 0.3]])
>>> transform_distances_to_scores(distances, k=2, t=1)
array([0.95122942, 0.83945702])