glassbox.cleaner¶
Scikit-learn-style data cleaning transformers.
LabelEncoder
¶
Bases: BaseTransformer
Encode target labels with value between 0 and n_classes-1.
Source code in glassbox/cleaner/encoders.py
fit
¶
Learn the vocabulary of the labels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
Fitted encoder instance. |
Source code in glassbox/cleaner/encoders.py
transform
¶
Transform labels to normalized encoding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Transformed array properly encoded. |
Source code in glassbox/cleaner/encoders.py
OneHotEncoder
¶
Bases: BaseTransformer
Encode categorical features as a one-hot numeric array.
Source code in glassbox/cleaner/encoders.py
fit
¶
Learn the categorical levels for encoding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
Fitted encoder instance. |
Source code in glassbox/cleaner/encoders.py
transform
¶
Transform the dataset into a one-hot encoded representation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Transformed array properly encoded. |
Source code in glassbox/cleaner/encoders.py
ImputationStrategy
¶
Bases: Enum
Strategies available for imputing missing values.
SimpleImputer
¶
Bases: BaseTransformer
Replaces missing values using a specified statistical strategy.
Notes
This imputer supports basic strategies like mean, median, mode, or a constant value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
ImputationStrategy
|
The strategy used for missing value imputation. |
ImputationStrategy.MEAN
|
constant_value
|
Union[float, str, None]
|
The value to use when strategy is CONSTANT. |
0.0
|
Source code in glassbox/cleaner/imputers.py
fit
¶
Learn the imputation values from the training data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
Fitted imputer instance. |
Source code in glassbox/cleaner/imputers.py
transform
¶
Impute missing values in the given dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Transformed array with missing values imputed. |
Source code in glassbox/cleaner/imputers.py
OutlierCapper
¶
Bases: BaseTransformer
Identifies and caps numerical outliers based on specified bounds.
Source code in glassbox/cleaner/outliers.py
fit
¶
Detect boundaries for outlier capping from the training data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
Fitted outlier capper instance. |
Source code in glassbox/cleaner/outliers.py
transform
¶
Cap outliers in the input dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Transformed array with outliers capped. |
Source code in glassbox/cleaner/outliers.py
MinMaxScaler
¶
Bases: BaseTransformer
Transforms features by scaling each feature to a given range.
Source code in glassbox/cleaner/scalers.py
fit
¶
Compute the minimum and maximum to be used for later scaling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
Fitted scaler instance. |
Source code in glassbox/cleaner/scalers.py
transform
¶
Scale features of X according to feature range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Transformed array properly scaled. |
Source code in glassbox/cleaner/scalers.py
StandardScaler
¶
Bases: BaseTransformer
Standardizes features by removing the mean and scaling to unit variance.
Source code in glassbox/cleaner/scalers.py
fit
¶
Compute the mean and standard deviation to be used for later scaling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
Fitted scaler instance. |
Source code in glassbox/cleaner/scalers.py
transform
¶
Perform standardization by centering and scaling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Transformed array properly scaled. |