Skip to content

glassbox.inspector.report

Data classes for the EDA report: EDAReport, OutlierInfo, MissingInfo, NumericStats, CategoricalStats.


FeatureType

Bases: Enum

Enumerates possible statistical data types for features.

CollinearityPair dataclass

CollinearityPair(feature_a, feature_b, score, metric)

Represents an association or correlation between two features.

OutlierInfo dataclass

OutlierInfo(count, lower_bound, upper_bound)

Stores outlier bounds and count for a single feature.

MissingInfo dataclass

MissingInfo(count, percentage)

Stores missing values count and percentage.

NumericStats dataclass

NumericStats(mean, median, std, skew, kurt)

Stores basic summary statistics for a numerical feature.

CategoricalStats dataclass

CategoricalStats(mode, cardinality)

Stores summary statistics for a categorical feature.

EDAReport dataclass

EDAReport(
    feature_types,
    missing_values,
    outliers_info,
    summary_stats,
    collinearity_map,
)

Container for the complete Exploratory Data Analysis report.

to_json

to_json()

Serialize the entire EDA report down to a JSON string.

Returns:

Type Description
str

JSON representation of the report.

Source code in glassbox/inspector/report.py
def to_json(self) -> str:
    """
    Serialize the entire EDA report down to a JSON string.

    Returns
    -------
    str
        JSON representation of the report.
    """

    class EnumEncoder(json.JSONEncoder):
        def default(self, obj):
            if isinstance(obj, Enum):
                return obj.name
            if isinstance(obj, float) and np.isnan(obj):
                return None
            return super().default(obj)

    # We need to handle nan to null if missing, but json.dumps handles nan by default to NaN.
    # But JSON standard doesn't support NaN, so let's allow it standard.
    return json.dumps(dataclasses.asdict(self), cls=EnumEncoder)