Skip to content

glassbox.inspector.outliers

IQR-based outlier detection for numeric columns.


OutlierDetector

Detects outliers within numerical columns of a dataset.

flag_outliers

flag_outliers(data, cols)

Identify outliers for specified columns.

Parameters:

Name Type Description Default
data Dataset

The dataset containing the columns.

required
cols List[str]

A list of column names to check for outliers.

required

Returns:

Type Description
Dict

Mapping from column names to OutlierInfo objects.

Source code in glassbox/inspector/outliers.py
def flag_outliers(self, data: Dataset, cols: List[str]) -> Dict[str, OutlierInfo]:
    """
    Identify outliers for specified columns.

    Parameters
    ----------
    data : Dataset
        The dataset containing the columns.
    cols : List[str]
        A list of column names to check for outliers.

    Returns
    -------
    Dict
        Mapping from column names to OutlierInfo objects.
    """
    results = {}
    for col_name in cols:
        # Extract and cast to float to assure vector operations on mixed dataset object matrices
        col_data = data.get_columns(col_name).data[:, 0].astype(float)
        col_valid = col_data[~np.isnan(col_data)]
        if len(col_valid) == 0:
            results[col_name] = OutlierInfo(
                count=0, lower_bound=float("nan"), upper_bound=float("nan")
            )
            continue
        lower, upper = calc_iqr(col_valid)

        count = int(np.sum((col_valid < lower) | (col_valid > upper)))
        results[col_name] = OutlierInfo(
            count=count, lower_bound=lower, upper_bound=upper
        )
    return results