Skip to content

Models

The glassbox.models module provides machine learning algorithms for classification and regression. All models follow the fit → predict contract defined by BaseModel.


Model API

Kroki

model.fit(X, y)       # Train on (n_samples, n_features) array
model.predict(X)      # Returns predictions array

Decision Trees

Kroki

CART-style decision trees that recursively split features to minimize a cost function.

DecisionTreeClassifier

Uses Gini impurity as the split criterion and majority vote (mode) for leaf predictions.

from glassbox.models import DecisionTreeClassifier

model = DecisionTreeClassifier(max_depth=10, min_samples_split=5)
model.fit(X_train, y_train)
preds = model.predict(X_test)

DecisionTreeRegressor

Uses variance reduction as the split criterion and mean for leaf predictions.

from glassbox.models import DecisionTreeRegressor

model = DecisionTreeRegressor(max_depth=15)
model.fit(X_train, y_train)
preds = model.predict(X_test)

Parameters

Parameter Default Description
max_depth 100 Maximum depth of the tree.
min_samples_split 2 Minimum samples needed to split a node.

Random Forests

Kroki

Ensemble of decision trees trained on bootstrapped samples with random feature subsets (√n_features).

RandomForestClassifier

Aggregates predictions via majority vote.

from glassbox.models import RandomForestClassifier

model = RandomForestClassifier(n_estimators=50, max_depth=10)
model.fit(X_train, y_train)
preds = model.predict(X_test)

RandomForestRegressor

Aggregates predictions via averaging.

from glassbox.models import RandomForestRegressor

model = RandomForestRegressor(n_estimators=50, max_depth=10)
model.fit(X_train, y_train)
preds = model.predict(X_test)

Parameters

Parameter Default Description
n_estimators 100 Number of trees in the forest.
max_depth 100 Maximum depth of each tree.
min_samples_split 2 Minimum samples needed to split a node.

K-Nearest Neighbors

Kroki

Instance-based learning that predicts based on the k closest training samples.

Configuration Enums

DistanceMetric Formula
EUCLIDEAN √Σ(xᵢ − yᵢ)²
MANHATTAN Σ|xᵢ − yᵢ|
SearchAlgorithm Description
BRUTE_FORCE Exhaustive pairwise distance computation.
KD_TREE Space-partitioning tree for faster lookup.

KNeighborsClassifier

Predicts via majority vote among the k nearest neighbors.

from glassbox.models import KNeighborsClassifier, DistanceMetric, SearchAlgorithm

model = KNeighborsClassifier(
    k=5,
    metric=DistanceMetric.EUCLIDEAN,
    algorithm=SearchAlgorithm.KD_TREE,
)
model.fit(X_train, y_train)
preds = model.predict(X_test)

KNeighborsRegressor

Predicts via averaging the k nearest neighbors' targets.

from glassbox.models import KNeighborsRegressor

model = KNeighborsRegressor(k=7, metric=DistanceMetric.MANHATTAN)
model.fit(X_train, y_train)
preds = model.predict(X_test)

Parameters

Parameter Default Description
k 5 Number of neighbors.
metric EUCLIDEAN Distance metric.
algorithm BRUTE_FORCE Nearest-neighbor search strategy.
Single-sample prediction

KNN models accept both batch input (n_samples, n_features) and single-sample input (n_features,) in predict().


Gaussian Naive Bayes

Kroki

A probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions between the features. Features are assumed to follow a Gaussian distribution.

GaussianNB

from glassbox.models import GaussianNB

model = GaussianNB()
model.fit(X_train, y_train)
preds = model.predict(X_test)

Linear Models

Kroki

Models that fit a linear surface to the data, trained using gradient descent optimization.

LinearRegression

Predicts a continuous target variable by finding the line of best fit.

from glassbox.models import LinearRegression

model = LinearRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X_train, y_train)
preds = model.predict(X_test)

LogisticRegression

Predicts a categorical target variable using the logistic (sigmoid) function to output probabilities.

from glassbox.models import LogisticRegression

model = LogisticRegression(learning_rate=0.1, n_iterations=1000)
model.fit(X_train, y_train)
preds = model.predict(X_test)

Parameters

Parameter Default Description
learning_rate 0.01 The step size for gradient descent optimization.
n_iterations 1000 The number of optimization iterations.

API Reference

BaseModel

Bases: ABC

fit abstractmethod

fit(X, y)

Fits the model to the training data.

Parameters:

Name Type Description Default
X ndarray

Training data of shape (n_samples, n_features).

required
y ndarray

Target values of shape (n_samples,).

required

Returns:

Type Description
Self

The fitted model.

Source code in glassbox/models/_base.py
@abstractmethod
def fit(self, X: np.ndarray, y: np.ndarray) -> Self:
    """
    Fits the model to the training data.

    Parameters
    ----------
    X : np.ndarray
        Training data of shape (n_samples, n_features).
    y : np.ndarray
        Target values of shape (n_samples,).

    Returns
    -------
    Self
        The fitted model.
    """
    raise NotImplementedError

predict abstractmethod

predict(X, **kwargs)

Predicts target values for the given data.

Parameters:

Name Type Description Default
X ndarray

Data to predict on, of shape (n_samples, n_features).

required
**kwargs Any

Additional keyword arguments.

{}

Returns:

Type Description
ndarray

Predicted target values.

Source code in glassbox/models/_base.py
@abstractmethod
def predict(self, X: np.ndarray, **kwargs: Any) -> np.ndarray:
    """
    Predicts target values for the given data.

    Parameters
    ----------
    X : np.ndarray
        Data to predict on, of shape (n_samples, n_features).
    **kwargs : Any
        Additional keyword arguments.

    Returns
    -------
    np.ndarray
        Predicted target values.
    """
    raise NotImplementedError

DecisionTreeClassifier

DecisionTreeClassifier(max_depth=100, min_samples_split=2)

Bases: BaseTree

A decision tree classifier.

Source code in glassbox/models/trees/_base.py
def __init__(self, max_depth: int = 100, min_samples_split: int = 2) -> None:
    """
    Initialize the base tree model.

    Parameters
    ----------
    max_depth : int, default=100
        Maximum depth of the tree.
    min_samples_split : int, default=2
        Minimum number of samples required to split an internal node.
    """
    self.max_depth = max_depth if max_depth is not None else float("inf")
    self.min_samples_split = min_samples_split
    self.root: Optional[_Node] = None

DecisionTreeRegressor

DecisionTreeRegressor(max_depth=100, min_samples_split=2)

Bases: BaseTree

A decision tree regressor.

Source code in glassbox/models/trees/_base.py
def __init__(self, max_depth: int = 100, min_samples_split: int = 2) -> None:
    """
    Initialize the base tree model.

    Parameters
    ----------
    max_depth : int, default=100
        Maximum depth of the tree.
    min_samples_split : int, default=2
        Minimum number of samples required to split an internal node.
    """
    self.max_depth = max_depth if max_depth is not None else float("inf")
    self.min_samples_split = min_samples_split
    self.root: Optional[_Node] = None

RandomForestClassifier

RandomForestClassifier(
    n_estimators=100, max_depth=100, min_samples_split=2
)

Bases: BaseRandomForest

Random Forest classifier using Decision Tree classification models.

Initialize the random forest classifier.

Parameters:

Name Type Description Default
n_estimators int

The number of trees in the forest.

100
max_depth int

Maximum depth of individual trees.

100
min_samples_split int

Minimum number of samples required to split an internal node.

2
Source code in glassbox/models/ensemble/classifier.py
def __init__(
    self, n_estimators: int = 100, max_depth: int = 100, min_samples_split: int = 2
) -> None:
    """
    Initialize the random forest classifier.

    Parameters
    ----------
    n_estimators : int, default=100
        The number of trees in the forest.
    max_depth : int, default=100
        Maximum depth of individual trees.
    min_samples_split : int, default=2
        Minimum number of samples required to split an internal node.
    """
    super().__init__(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
    )
    self.trees: List[DecisionTreeClassifier] = []

RandomForestRegressor

RandomForestRegressor(
    n_estimators=100, max_depth=100, min_samples_split=2
)

Bases: BaseRandomForest

Random Forest regressor using Decision Tree regression models.

Initialize the random forest regressor.

Parameters:

Name Type Description Default
n_estimators int

The number of trees in the forest.

100
max_depth int

Maximum depth of individual trees.

100
min_samples_split int

Minimum number of samples required to split an internal node.

2
Source code in glassbox/models/ensemble/regressor.py
def __init__(
    self, n_estimators: int = 100, max_depth: int = 100, min_samples_split: int = 2
) -> None:
    """
    Initialize the random forest regressor.

    Parameters
    ----------
    n_estimators : int, default=100
        The number of trees in the forest.
    max_depth : int, default=100
        Maximum depth of individual trees.
    min_samples_split : int, default=2
        Minimum number of samples required to split an internal node.
    """
    super().__init__(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
    )
    self.trees: List[DecisionTreeRegressor] = []

KNeighborsClassifier

KNeighborsClassifier(
    k=5, metric=EUCLIDEAN, algorithm=BRUTE_FORCE
)

Bases: BaseKNN

Source code in glassbox/models/neighbors/_knn.py
def __init__(
    self,
    k: int = 5,
    metric: DistanceMetric = DistanceMetric.EUCLIDEAN,
    algorithm: SearchAlgorithm = SearchAlgorithm.BRUTE_FORCE,
) -> None:
    """
    Initialize the BaseKNN estimator.

    Parameters
    ----------
    k : int, default=5
        Number of neighbors to use.
    metric : DistanceMetric, default=DistanceMetric.EUCLIDEAN
        Distance metric to compute distances.
    algorithm : SearchAlgorithm, default=SearchAlgorithm.BRUTE_FORCE
        Algorithm used to compute the nearest neighbors.
    """
    self.k: int = k
    self.metric: DistanceMetric = metric
    self.algorithm: SearchAlgorithm = algorithm
    self.index: BaseIndex | None = None
    self.y_train: np.ndarray | None = None

KNeighborsRegressor

KNeighborsRegressor(
    k=5, metric=EUCLIDEAN, algorithm=BRUTE_FORCE
)

Bases: BaseKNN

Source code in glassbox/models/neighbors/_knn.py
def __init__(
    self,
    k: int = 5,
    metric: DistanceMetric = DistanceMetric.EUCLIDEAN,
    algorithm: SearchAlgorithm = SearchAlgorithm.BRUTE_FORCE,
) -> None:
    """
    Initialize the BaseKNN estimator.

    Parameters
    ----------
    k : int, default=5
        Number of neighbors to use.
    metric : DistanceMetric, default=DistanceMetric.EUCLIDEAN
        Distance metric to compute distances.
    algorithm : SearchAlgorithm, default=SearchAlgorithm.BRUTE_FORCE
        Algorithm used to compute the nearest neighbors.
    """
    self.k: int = k
    self.metric: DistanceMetric = metric
    self.algorithm: SearchAlgorithm = algorithm
    self.index: BaseIndex | None = None
    self.y_train: np.ndarray | None = None

DistanceMetric

Bases: Enum

SearchAlgorithm

Bases: Enum

GaussianNB

GaussianNB(epsilon=1e-09)

Bases: BaseModel

Gaussian Naive Bayes classifier.

A probabilistic classifier based on Bayes' theorem with the assumption that features follow a Gaussian (normal) distribution within each class.

Parameters:

Name Type Description Default
epsilon float

Small constant to avoid division by zero in variance calculations.

1e-9

Attributes:

Name Type Description
epsilon float

Small constant to avoid division by zero.

classes ndarray

Unique class labels, shape (n_classes,).

class_priors dict

Prior probability for each class.

class_means dict

Mean of each feature per class.

class_variances dict

Variance of each feature per class.

Initialize the Gaussian Naive Bayes classifier.

Parameters:

Name Type Description Default
epsilon float

Small constant to avoid division by zero in variance calculations.

1e-9
Source code in glassbox/models/gaussian_nb/gaussian_nb.py
def __init__(self, epsilon: float = 1e-9) -> None:
    """
    Initialize the Gaussian Naive Bayes classifier.

    Parameters
    ----------
    epsilon : float, default=1e-9
        Small constant to avoid division by zero in variance calculations.
    """
    self.epsilon: float = epsilon
    self.classes: np.ndarray = np.array([])
    self.class_priors: dict = {}
    self.class_means: dict = {}
    self.class_variances: dict = {}

fit

fit(X, y)

Fit the Gaussian Naive Bayes model to training data.

Calculates the mean, variance, and prior probability for each feature in each class.

Parameters:

Name Type Description Default
X ndarray

Training data of shape (n_samples, n_features).

required
y ndarray

Target values of shape (n_samples,).

required

Returns:

Type Description
Self

The fitted model.

Raises:

Type Description
ValueError

If X and y have incompatible dimensions.

Source code in glassbox/models/gaussian_nb/gaussian_nb.py
def fit(self, X: np.ndarray, y: np.ndarray) -> Self:
    """
    Fit the Gaussian Naive Bayes model to training data.

    Calculates the mean, variance, and prior probability for each feature
    in each class.

    Parameters
    ----------
    X : np.ndarray
        Training data of shape (n_samples, n_features).
    y : np.ndarray
        Target values of shape (n_samples,).

    Returns
    -------
    Self
        The fitted model.

    Raises
    ------
    ValueError
        If X and y have incompatible dimensions.
    """
    if X.shape[0] != y.shape[0]:
        raise ValueError(
            f"X and y must have the same number of samples, "
            f"got {X.shape[0]} and {y.shape[0]}"
        )

    self.classes = np.unique(y)

    for cls in self.classes:
        X_cls = X[y == cls]
        self.class_means[cls] = np.mean(X_cls, axis=0)
        self.class_variances[cls] = np.var(X_cls, axis=0)
        self.class_priors[cls] = X_cls.shape[0] / X.shape[0]

    return self

predict

predict(X, **kwargs)

Predict class labels for samples in X.

Parameters:

Name Type Description Default
X ndarray

Data to predict on, of shape (n_samples, n_features).

required
**kwargs Any

Additional keyword arguments (unused).

{}

Returns:

Type Description
ndarray

Predicted class labels of shape (n_samples,).

Raises:

Type Description
ValueError

If model has not been fitted yet.

Source code in glassbox/models/gaussian_nb/gaussian_nb.py
def predict(self, X: np.ndarray, **kwargs: Any) -> np.ndarray:
    """
    Predict class labels for samples in X.

    Parameters
    ----------
    X : np.ndarray
        Data to predict on, of shape (n_samples, n_features).
    **kwargs : Any
        Additional keyword arguments (unused).

    Returns
    -------
    np.ndarray
        Predicted class labels of shape (n_samples,).

    Raises
    ------
    ValueError
        If model has not been fitted yet.
    """
    if len(self.classes) == 0:
        raise ValueError("Model has not been fitted yet")

    probabilities = self.predict_proba(X)
    class_indices = np.argmax(probabilities, axis=1)
    return self.classes[class_indices]

predict_proba

predict_proba(X)

Predict class probabilities for samples in X.

Parameters:

Name Type Description Default
X ndarray

Data to predict on, of shape (n_samples, n_features).

required

Returns:

Type Description
ndarray

Predicted class probabilities of shape (n_samples, n_classes). Each row sums to 1.0.

Raises:

Type Description
ValueError

If model has not been fitted yet.

Source code in glassbox/models/gaussian_nb/gaussian_nb.py
def predict_proba(self, X: np.ndarray) -> np.ndarray:
    """
    Predict class probabilities for samples in X.

    Parameters
    ----------
    X : np.ndarray
        Data to predict on, of shape (n_samples, n_features).

    Returns
    -------
    np.ndarray
        Predicted class probabilities of shape (n_samples, n_classes).
        Each row sums to 1.0.

    Raises
    ------
    ValueError
        If model has not been fitted yet.
    """
    if len(self.classes) == 0:
        raise ValueError("Model has not been fitted yet")

    n_samples = X.shape[0]
    n_classes = len(self.classes)
    log_posteriors = np.zeros((n_samples, n_classes))

    for class_idx, cls in enumerate(self.classes):
        log_prior = np.log(self.class_priors[cls])
        pdf = self._calculate_pdf(class_idx, X)
        log_likelihood = np.sum(np.log(pdf), axis=1)
        log_posteriors[:, class_idx] = log_prior + log_likelihood

    # Convert from log probabilities to probabilities using softmax
    # Subtract max for numerical stability
    max_log_posteriors = np.max(log_posteriors, axis=1, keepdims=True)
    log_posteriors_stable = log_posteriors - max_log_posteriors
    probabilities = np.exp(log_posteriors_stable)
    probabilities = probabilities / np.sum(probabilities, axis=1, keepdims=True)

    return probabilities

BaseLinearModel

BaseLinearModel(
    learning_rate=0.01,
    max_epochs=1000,
    tol=1e-06,
    schedule=CONSTANT,
)

Bases: BaseModel

Abstract base class for linear models trained with gradient-based optimization.

Parameters:

Name Type Description Default
learning_rate float

Initial learning rate used by the optimizer.

0.01
max_epochs int

Maximum number of optimization epochs.

1000
tol float

Convergence tolerance used by stopping criteria.

1e-6
schedule LearningSchedule

Strategy used to update the learning rate across epochs.

LearningSchedule.CONSTANT

Initialize shared linear-model hyperparameters and learned coefficients.

Parameters:

Name Type Description Default
learning_rate float

Initial learning rate used by the optimizer.

0.01
max_epochs int

Maximum number of optimization epochs.

1000
tol float

Convergence tolerance used by stopping criteria.

1e-6
schedule LearningSchedule

Strategy used to update the learning rate across epochs.

LearningSchedule.CONSTANT
Source code in glassbox/models/linear_model/_base.py
def __init__(
    self,
    learning_rate: float = 0.01,
    max_epochs: int = 1000,
    tol: float = 1e-6,
    schedule: LearningSchedule = LearningSchedule.CONSTANT,
) -> None:
    """
    Initialize shared linear-model hyperparameters and learned coefficients.

    Parameters
    ----------
    learning_rate : float, default=0.01
        Initial learning rate used by the optimizer.
    max_epochs : int, default=1000
        Maximum number of optimization epochs.
    tol : float, default=1e-6
        Convergence tolerance used by stopping criteria.
    schedule : LearningSchedule, default=LearningSchedule.CONSTANT
        Strategy used to update the learning rate across epochs.
    """
    if learning_rate <= 0:
        raise ValueError("learning_rate must be strictly positive")
    if max_epochs <= 0:
        raise ValueError("max_epochs must be strictly positive")
    if tol < 0:
        raise ValueError("tol must be non-negative")

    self.learning_rate = learning_rate
    self.max_epochs = max_epochs
    self.tol = tol
    self.schedule = schedule
    self.weights: np.ndarray = np.array([])
    self.bias: float = 0.0

fit abstractmethod

fit(X, y)

Fit the linear model to training data.

Parameters:

Name Type Description Default
X ndarray

Training feature matrix of shape (n_samples, n_features).

required
y ndarray

Training target vector of shape (n_samples,).

required

Returns:

Type Description
Self

The fitted model instance.

Source code in glassbox/models/linear_model/_base.py
@abstractmethod
def fit(self, X: np.ndarray, y: np.ndarray) -> Self:
    """
    Fit the linear model to training data.

    Parameters
    ----------
    X : np.ndarray
        Training feature matrix of shape (n_samples, n_features).
    y : np.ndarray
        Training target vector of shape (n_samples,).

    Returns
    -------
    Self
        The fitted model instance.
    """
    raise NotImplementedError

predict abstractmethod

predict(X, **kwargs)

Predict target values for input samples.

Parameters:

Name Type Description Default
X ndarray

Input feature matrix of shape (n_samples, n_features).

required
**kwargs Any

Additional keyword arguments for prediction.

{}

Returns:

Type Description
ndarray

Predicted values of shape (n_samples,).

Source code in glassbox/models/linear_model/_base.py
@abstractmethod
def predict(self, X: np.ndarray, **kwargs: Any) -> np.ndarray:
    """
    Predict target values for input samples.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (n_samples, n_features).
    **kwargs : Any
        Additional keyword arguments for prediction.

    Returns
    -------
    np.ndarray
        Predicted values of shape (n_samples,).
    """
    raise NotImplementedError

LinearRegression

LinearRegression(
    learning_rate=0.01,
    max_epochs=1000,
    tol=1e-06,
    schedule=CONSTANT,
)

Bases: BaseLinearModel

Linear regression model.

Source code in glassbox/models/linear_model/_base.py
def __init__(
    self,
    learning_rate: float = 0.01,
    max_epochs: int = 1000,
    tol: float = 1e-6,
    schedule: LearningSchedule = LearningSchedule.CONSTANT,
) -> None:
    """
    Initialize shared linear-model hyperparameters and learned coefficients.

    Parameters
    ----------
    learning_rate : float, default=0.01
        Initial learning rate used by the optimizer.
    max_epochs : int, default=1000
        Maximum number of optimization epochs.
    tol : float, default=1e-6
        Convergence tolerance used by stopping criteria.
    schedule : LearningSchedule, default=LearningSchedule.CONSTANT
        Strategy used to update the learning rate across epochs.
    """
    if learning_rate <= 0:
        raise ValueError("learning_rate must be strictly positive")
    if max_epochs <= 0:
        raise ValueError("max_epochs must be strictly positive")
    if tol < 0:
        raise ValueError("tol must be non-negative")

    self.learning_rate = learning_rate
    self.max_epochs = max_epochs
    self.tol = tol
    self.schedule = schedule
    self.weights: np.ndarray = np.array([])
    self.bias: float = 0.0

fit

fit(X, y)

Fit the linear regression model to training data.

Parameters:

Name Type Description Default
X ndarray

Training feature matrix of shape (n_samples, n_features).

required
y ndarray

Training target vector of shape (n_samples,).

required

Returns:

Type Description
Self

The fitted model instance.

Source code in glassbox/models/linear_model/linear.py
def fit(self, X: np.ndarray, y: np.ndarray) -> Self:
    """
    Fit the linear regression model to training data.

    Parameters
    ----------
    X : np.ndarray
        Training feature matrix of shape (n_samples, n_features).
    y : np.ndarray
        Training target vector of shape (n_samples,).

    Returns
    -------
    Self
        The fitted model instance.
    """
    X_arr = np.asarray(X, dtype=float)
    y_arr = np.asarray(y, dtype=float)

    if X_arr.ndim != 2:
        raise ValueError("X must be a 2D array")
    if y_arr.ndim != 1:
        raise ValueError("y must be a 1D array")
    if X_arr.shape[0] != y_arr.shape[0]:
        raise ValueError("X and y must contain the same number of samples")
    if X_arr.shape[0] == 0:
        raise ValueError("X and y cannot be empty")

    n_samples, n_features = X_arr.shape
    self.weights = np.zeros(n_features, dtype=float)
    self.bias = 0.0

    previous_loss = np.inf
    for epoch in range(self.max_epochs):
        learning_rate = self._update_learning_rate(epoch)

        predictions = X_arr @ self.weights + self.bias
        errors = predictions - y_arr

        gradient_w = (2.0 / n_samples) * (X_arr.T @ errors)
        gradient_b = 2.0 * np.mean(errors)

        self.weights -= learning_rate * gradient_w
        self.bias -= learning_rate * gradient_b

        current_loss = float(np.mean(errors**2))
        if abs(previous_loss - current_loss) <= self.tol:
            break
        previous_loss = current_loss

    return self

predict

predict(X, **kwargs)

Predict continuous target values for input samples.

Parameters:

Name Type Description Default
X ndarray

Input feature matrix of shape (n_samples, n_features).

required
**kwargs Any

Additional keyword arguments for prediction.

{}

Returns:

Type Description
ndarray

Predicted values of shape (n_samples,).

Source code in glassbox/models/linear_model/linear.py
def predict(self, X: np.ndarray, **kwargs: Any) -> np.ndarray:
    """
    Predict continuous target values for input samples.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (n_samples, n_features).
    **kwargs : Any
        Additional keyword arguments for prediction.

    Returns
    -------
    np.ndarray
        Predicted values of shape (n_samples,).
    """
    if self.weights.size == 0:
        raise RuntimeError("Model is not fitted yet.")

    X_arr = np.asarray(X, dtype=float)
    if X_arr.ndim != 2:
        raise ValueError("X must be a 2D array")
    if X_arr.shape[1] != self.weights.shape[0]:
        raise ValueError("X must have the same number of features used during fit")

    return X_arr @ self.weights + self.bias

LogisticRegression

LogisticRegression(
    learning_rate=0.01,
    max_epochs=1000,
    tol=1e-06,
    schedule=CONSTANT,
)

Bases: BaseLinearModel

Logistic regression model for binary classification.

Source code in glassbox/models/linear_model/_base.py
def __init__(
    self,
    learning_rate: float = 0.01,
    max_epochs: int = 1000,
    tol: float = 1e-6,
    schedule: LearningSchedule = LearningSchedule.CONSTANT,
) -> None:
    """
    Initialize shared linear-model hyperparameters and learned coefficients.

    Parameters
    ----------
    learning_rate : float, default=0.01
        Initial learning rate used by the optimizer.
    max_epochs : int, default=1000
        Maximum number of optimization epochs.
    tol : float, default=1e-6
        Convergence tolerance used by stopping criteria.
    schedule : LearningSchedule, default=LearningSchedule.CONSTANT
        Strategy used to update the learning rate across epochs.
    """
    if learning_rate <= 0:
        raise ValueError("learning_rate must be strictly positive")
    if max_epochs <= 0:
        raise ValueError("max_epochs must be strictly positive")
    if tol < 0:
        raise ValueError("tol must be non-negative")

    self.learning_rate = learning_rate
    self.max_epochs = max_epochs
    self.tol = tol
    self.schedule = schedule
    self.weights: np.ndarray = np.array([])
    self.bias: float = 0.0

fit

fit(X, y)

Fit the logistic regression model to training data.

Parameters:

Name Type Description Default
X ndarray

Training feature matrix of shape (n_samples, n_features).

required
y ndarray

Training target vector of shape (n_samples,).

required

Returns:

Type Description
Self

The fitted model instance.

Source code in glassbox/models/linear_model/logistic.py
def fit(self, X: np.ndarray, y: np.ndarray) -> Self:
    """
    Fit the logistic regression model to training data.

    Parameters
    ----------
    X : np.ndarray
        Training feature matrix of shape (n_samples, n_features).
    y : np.ndarray
        Training target vector of shape (n_samples,).

    Returns
    -------
    Self
        The fitted model instance.
    """
    X_arr = np.asarray(X, dtype=float)
    y_arr = np.asarray(y)

    if X_arr.ndim != 2:
        raise ValueError("X must be a 2D array")
    if y_arr.ndim != 1:
        raise ValueError("y must be a 1D array")
    if X_arr.shape[0] != y_arr.shape[0]:
        raise ValueError("X and y must contain the same number of samples")
    if X_arr.shape[0] == 0:
        raise ValueError("X and y cannot be empty")

    classes = np.unique(y_arr)
    if not np.all(np.isin(classes, np.array([0, 1]))):
        raise ValueError(f"y must contain binary labels encoded as 0 and 1, but found: {classes.tolist()}")

    y_bin = y_arr.astype(float)

    n_samples, n_features = X_arr.shape
    self.weights = np.zeros(n_features, dtype=float)
    self.bias = 0.0

    previous_loss = np.inf
    for epoch in range(self.max_epochs):
        learning_rate = self._update_learning_rate(epoch)

        logits = X_arr @ self.weights + self.bias
        probabilities = self._sigmoid(logits)
        errors = probabilities - y_bin

        gradient_w = (X_arr.T @ errors) / n_samples
        gradient_b = float(np.mean(errors))

        self.weights -= learning_rate * gradient_w
        self.bias -= learning_rate * gradient_b

        probabilities_clipped = np.clip(probabilities, 1e-15, 1.0 - 1e-15)
        current_loss = float(
            -np.mean(
                y_bin * np.log(probabilities_clipped)
                + (1.0 - y_bin) * np.log(1.0 - probabilities_clipped)
            )
        )
        if abs(previous_loss - current_loss) <= self.tol:
            break
        previous_loss = current_loss

    return self

predict

predict(X, **kwargs)

Predict class labels for input samples.

Parameters:

Name Type Description Default
X ndarray

Input feature matrix of shape (n_samples, n_features).

required
**kwargs Any

Additional keyword arguments for prediction.

{}

Returns:

Type Description
ndarray

Predicted class labels of shape (n_samples,).

Source code in glassbox/models/linear_model/logistic.py
def predict(self, X: np.ndarray, **kwargs: Any) -> np.ndarray:
    """
    Predict class labels for input samples.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (n_samples, n_features).
    **kwargs : Any
        Additional keyword arguments for prediction.

    Returns
    -------
    np.ndarray
        Predicted class labels of shape (n_samples,).
    """
    threshold = kwargs.get("threshold", 0.5)
    if not isinstance(threshold, (int, float)):
        raise ValueError("threshold must be a numeric value")
    if threshold < 0.0 or threshold > 1.0:
        raise ValueError("threshold must be in the [0.0, 1.0] interval")

    probabilities = self.predict_proba(X)
    return (probabilities >= float(threshold)).astype(int)

predict_proba

predict_proba(X)

Predict class probabilities for input samples.

Parameters:

Name Type Description Default
X ndarray

Input feature matrix of shape (n_samples, n_features).

required

Returns:

Type Description
ndarray

Predicted probabilities of shape (n_samples,).

Source code in glassbox/models/linear_model/logistic.py
def predict_proba(self, X: np.ndarray) -> np.ndarray:
    """
    Predict class probabilities for input samples.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (n_samples, n_features).

    Returns
    -------
    np.ndarray
        Predicted probabilities of shape (n_samples,).
    """
    if self.weights.size == 0:
        raise RuntimeError("Model is not fitted yet.")

    X_arr = np.asarray(X, dtype=float)
    if X_arr.ndim != 2:
        raise ValueError("X must be a 2D array")
    if X_arr.shape[1] != self.weights.shape[0]:
        raise ValueError("X must have the same number of features used during fit")

    logits = X_arr @ self.weights + self.bias
    return self._sigmoid(logits)

LearningSchedule

Bases: Enum

Learning rate scheduling strategies for linear models.

Attributes:

Name Type Description
CONSTANT LearningSchedule

Keep the learning rate fixed across epochs.

TIME_DECAY LearningSchedule

Decrease the learning rate proportionally with epoch growth.

EXPONENTIAL LearningSchedule

Decrease the learning rate exponentially over epochs.