glassbox.inspector¶
Non-destructive Exploratory Data Analysis (EDA) toolkit.
DataAuditor
¶
Orchestrates the EDA process to generate a complete report.
run_audit
¶
Perform a full audit on the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
The dataset to audit. |
required |
Returns:
| Type | Description |
|---|---|
EDAReport
|
A comprehensive report containing EDA results. |
Source code in glassbox/inspector/auditor.py
AutoTyper
¶
Infers logical data types for dataset columns.
infer_types
¶
Infer feature types for all columns in the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
The dataset to analyze. |
required |
Returns:
| Type | Description |
|---|---|
Dict
|
Mapping from column names to their inferred FeatureType. |
Source code in glassbox/inspector/auto_typer.py
OutlierDetector
¶
Detects outliers within numerical columns of a dataset.
flag_outliers
¶
Identify outliers for specified columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
The dataset containing the columns. |
required |
cols
|
List[str]
|
A list of column names to check for outliers. |
required |
Returns:
| Type | Description |
|---|---|
Dict
|
Mapping from column names to OutlierInfo objects. |
Source code in glassbox/inspector/outliers.py
EDAReport
dataclass
¶
Container for the complete Exploratory Data Analysis report.
to_json
¶
Serialize the entire EDA report down to a JSON string.
Returns:
| Type | Description |
|---|---|
str
|
JSON representation of the report. |
Source code in glassbox/inspector/report.py
OutlierInfo
dataclass
¶
Stores outlier bounds and count for a single feature.
AssociationAnalyzer
¶
Analyzes pairwise correlations and associations between features.
build_associations
¶
Compute pairwise correlation and associations across specified columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
Input dataset. |
required |
num_cols
|
List[str]
|
Numerical columns to inspect with Pearson. |
required |
cat_cols
|
List[str]
|
Categorical columns to inspect with Cramer's V. |
required |
Returns:
| Type | Description |
|---|---|
List
|
A list of CollinearityPair objects containing scores. |
Source code in glassbox/inspector/statistics.py
StatProfiler
¶
Calculates summary statistics for dataset columns.
calculate_numeric_stats
¶
Compute statistics for numerical columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
The dataset containing the inputs. |
required |
cols
|
List[str]
|
List of column names to analyze. |
required |
Returns:
| Type | Description |
|---|---|
Dict
|
Mapping from column names to NumericStats objects. |
Source code in glassbox/inspector/statistics.py
calculate_categorical_stats
¶
Compute statistics for categorical columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dataset
|
The dataset containing the inputs. |
required |
cols
|
List[str]
|
List of column names to analyze. |
required |
Returns:
| Type | Description |
|---|---|
Dict
|
Mapping from column names to CategoricalStats objects. |