Models¶

The glassbox.models module provides machine learning algorithms for classification and regression. All models follow the fit → predict contract defined by BaseModel.

Model API¶

Kroki

model.fit(X, y)       # Train on (n_samples, n_features) array
model.predict(X)      # Returns predictions array

Decision Trees¶

Kroki

CART-style decision trees that recursively split features to minimize a cost function.

DecisionTreeClassifier¶

Uses Gini impurity as the split criterion and majority vote (mode) for leaf predictions.

from glassbox.models import DecisionTreeClassifier

model = DecisionTreeClassifier(max_depth=10, min_samples_split=5)
model.fit(X_train, y_train)
preds = model.predict(X_test)

DecisionTreeRegressor¶

Uses variance reduction as the split criterion and mean for leaf predictions.

from glassbox.models import DecisionTreeRegressor

model = DecisionTreeRegressor(max_depth=15)
model.fit(X_train, y_train)
preds = model.predict(X_test)

Parameters¶

Parameter	Default	Description
`max_depth`	`100`	Maximum depth of the tree.
`min_samples_split`	`2`	Minimum samples needed to split a node.

Random Forests¶

Kroki

Ensemble of decision trees trained on bootstrapped samples with random feature subsets (√n_features).

RandomForestClassifier¶

Aggregates predictions via majority vote.

from glassbox.models import RandomForestClassifier

model = RandomForestClassifier(n_estimators=50, max_depth=10)
model.fit(X_train, y_train)
preds = model.predict(X_test)

RandomForestRegressor¶

Aggregates predictions via averaging.

from glassbox.models import RandomForestRegressor

model = RandomForestRegressor(n_estimators=50, max_depth=10)
model.fit(X_train, y_train)
preds = model.predict(X_test)

Parameters¶

Parameter	Default	Description
`n_estimators`	`100`	Number of trees in the forest.
`max_depth`	`100`	Maximum depth of each tree.
`min_samples_split`	`2`	Minimum samples needed to split a node.

K-Nearest Neighbors¶

Kroki

Instance-based learning that predicts based on the k closest training samples.

Configuration Enums¶

Distance MetricsSearch Algorithms

`DistanceMetric`	Formula
`EUCLIDEAN`	√Σ(xᵢ − yᵢ)²
`MANHATTAN`	Σ\|xᵢ − yᵢ\|

`SearchAlgorithm`	Description
`BRUTE_FORCE`	Exhaustive pairwise distance computation.
`KD_TREE`	Space-partitioning tree for faster lookup.

KNeighborsClassifier¶

Predicts via majority vote among the k nearest neighbors.

from glassbox.models import KNeighborsClassifier, DistanceMetric, SearchAlgorithm

model = KNeighborsClassifier(
    k=5,
    metric=DistanceMetric.EUCLIDEAN,
    algorithm=SearchAlgorithm.KD_TREE,
)
model.fit(X_train, y_train)
preds = model.predict(X_test)

KNeighborsRegressor¶

Predicts via averaging the k nearest neighbors' targets.

from glassbox.models import KNeighborsRegressor

model = KNeighborsRegressor(k=7, metric=DistanceMetric.MANHATTAN)
model.fit(X_train, y_train)
preds = model.predict(X_test)

Parameters¶

Parameter	Default	Description
`k`	`5`	Number of neighbors.
`metric`	`EUCLIDEAN`	Distance metric.
`algorithm`	`BRUTE_FORCE`	Nearest-neighbor search strategy.

Single-sample prediction

KNN models accept both batch input (n_samples, n_features) and single-sample input (n_features,) in predict().

Gaussian Naive Bayes¶

Kroki

A probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions between the features. Features are assumed to follow a Gaussian distribution.

GaussianNB¶

from glassbox.models import GaussianNB

model = GaussianNB()
model.fit(X_train, y_train)
preds = model.predict(X_test)

Linear Models¶

Kroki

Models that fit a linear surface to the data, trained using gradient descent optimization.

LinearRegression¶

Predicts a continuous target variable by finding the line of best fit.

from glassbox.models import LinearRegression

model = LinearRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X_train, y_train)
preds = model.predict(X_test)

LogisticRegression¶

Predicts a categorical target variable using the logistic (sigmoid) function to output probabilities.

from glassbox.models import LogisticRegression

model = LogisticRegression(learning_rate=0.1, n_iterations=1000)
model.fit(X_train, y_train)
preds = model.predict(X_test)

Parameters¶

Parameter	Default	Description
`learning_rate`	`0.01`	The step size for gradient descent optimization.
`n_iterations`	`1000`	The number of optimization iterations.

API Reference¶

BaseModel ¶

Bases: ABC

fit `abstractmethod` ¶

fit(X, y)

Fits the model to the training data.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training data of shape (n_samples, n_features).	required
`y`	`ndarray`	Target values of shape (n_samples,).	required

Returns:

Type	Description
`Self`	The fitted model.

Source code in glassbox/models/_base.py

@abstractmethod
def fit(self, X: np.ndarray, y: np.ndarray) -> Self:
    """
    Fits the model to the training data.

    Parameters
    ----------
    X : np.ndarray
        Training data of shape (n_samples, n_features).
    y : np.ndarray
        Target values of shape (n_samples,).

    Returns
    -------
    Self
        The fitted model.
    """
    raise NotImplementedError

predict `abstractmethod` ¶

predict(X, **kwargs)

Predicts target values for the given data.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Data to predict on, of shape (n_samples, n_features).	required
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Returns:

Type	Description
`ndarray`	Predicted target values.

Source code in glassbox/models/_base.py

@abstractmethod
def predict(self, X: np.ndarray, **kwargs: Any) -> np.ndarray:
    """
    Predicts target values for the given data.

    Parameters
    ----------
    X : np.ndarray
        Data to predict on, of shape (n_samples, n_features).
    **kwargs : Any
        Additional keyword arguments.

    Returns
    -------
    np.ndarray
        Predicted target values.
    """
    raise NotImplementedError

DecisionTreeClassifier ¶

DecisionTreeClassifier(max_depth=100, min_samples_split=2)

Bases: BaseTree

A decision tree classifier.

Source code in glassbox/models/trees/_base.py

def __init__(self, max_depth: int = 100, min_samples_split: int = 2) -> None:
    """
    Initialize the base tree model.

    Parameters
    ----------
    max_depth : int, default=100
        Maximum depth of the tree.
    min_samples_split : int, default=2
        Minimum number of samples required to split an internal node.
    """
    self.max_depth = max_depth if max_depth is not None else float("inf")
    self.min_samples_split = min_samples_split
    self.root: Optional[_Node] = None

DecisionTreeRegressor ¶

DecisionTreeRegressor(max_depth=100, min_samples_split=2)

Bases: BaseTree

A decision tree regressor.

Source code in glassbox/models/trees/_base.py

def __init__(self, max_depth: int = 100, min_samples_split: int = 2) -> None:
    """
    Initialize the base tree model.

    Parameters
    ----------
    max_depth : int, default=100
        Maximum depth of the tree.
    min_samples_split : int, default=2
        Minimum number of samples required to split an internal node.
    """
    self.max_depth = max_depth if max_depth is not None else float("inf")
    self.min_samples_split = min_samples_split
    self.root: Optional[_Node] = None

RandomForestClassifier ¶

RandomForestClassifier(
    n_estimators=100, max_depth=100, min_samples_split=2
)

Bases: BaseRandomForest

Random Forest classifier using Decision Tree classification models.

Initialize the random forest classifier.

Parameters:

Name	Type	Description	Default
`n_estimators`	`int`	The number of trees in the forest.	`100`
`max_depth`	`int`	Maximum depth of individual trees.	`100`
`min_samples_split`	`int`	Minimum number of samples required to split an internal node.	`2`

Source code in glassbox/models/ensemble/classifier.py

def __init__(
    self, n_estimators: int = 100, max_depth: int = 100, min_samples_split: int = 2
) -> None:
    """
    Initialize the random forest classifier.

    Parameters
    ----------
    n_estimators : int, default=100
        The number of trees in the forest.
    max_depth : int, default=100
        Maximum depth of individual trees.
    min_samples_split : int, default=2
        Minimum number of samples required to split an internal node.
    """
    super().__init__(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
    )
    self.trees: List[DecisionTreeClassifier] = []

RandomForestRegressor ¶

RandomForestRegressor(
    n_estimators=100, max_depth=100, min_samples_split=2
)

Bases: BaseRandomForest

Random Forest regressor using Decision Tree regression models.

Initialize the random forest regressor.

Parameters:

Name	Type	Description	Default
`n_estimators`	`int`	The number of trees in the forest.	`100`
`max_depth`	`int`	Maximum depth of individual trees.	`100`
`min_samples_split`	`int`	Minimum number of samples required to split an internal node.	`2`

Source code in glassbox/models/ensemble/regressor.py

def __init__(
    self, n_estimators: int = 100, max_depth: int = 100, min_samples_split: int = 2
) -> None:
    """
    Initialize the random forest regressor.

    Parameters
    ----------
    n_estimators : int, default=100
        The number of trees in the forest.
    max_depth : int, default=100
        Maximum depth of individual trees.
    min_samples_split : int, default=2
        Minimum number of samples required to split an internal node.
    """
    super().__init__(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
    )
    self.trees: List[DecisionTreeRegressor] = []

KNeighborsClassifier ¶

KNeighborsClassifier(
    k=5, metric=EUCLIDEAN, algorithm=BRUTE_FORCE
)

Bases: BaseKNN

Source code in glassbox/models/neighbors/_knn.py

def __init__(
    self,
    k: int = 5,
    metric: DistanceMetric = DistanceMetric.EUCLIDEAN,
    algorithm: SearchAlgorithm = SearchAlgorithm.BRUTE_FORCE,
) -> None:
    """
    Initialize the BaseKNN estimator.

    Parameters
    ----------
    k : int, default=5
        Number of neighbors to use.
    metric : DistanceMetric, default=DistanceMetric.EUCLIDEAN
        Distance metric to compute distances.
    algorithm : SearchAlgorithm, default=SearchAlgorithm.BRUTE_FORCE
        Algorithm used to compute the nearest neighbors.
    """
    self.k: int = k
    self.metric: DistanceMetric = metric
    self.algorithm: SearchAlgorithm = algorithm
    self.index: BaseIndex | None = None
    self.y_train: np.ndarray | None = None

KNeighborsRegressor ¶

KNeighborsRegressor(
    k=5, metric=EUCLIDEAN, algorithm=BRUTE_FORCE
)

Bases: BaseKNN

Source code in glassbox/models/neighbors/_knn.py

def __init__(
    self,
    k: int = 5,
    metric: DistanceMetric = DistanceMetric.EUCLIDEAN,
    algorithm: SearchAlgorithm = SearchAlgorithm.BRUTE_FORCE,
) -> None:
    """
    Initialize the BaseKNN estimator.

    Parameters
    ----------
    k : int, default=5
        Number of neighbors to use.
    metric : DistanceMetric, default=DistanceMetric.EUCLIDEAN
        Distance metric to compute distances.
    algorithm : SearchAlgorithm, default=SearchAlgorithm.BRUTE_FORCE
        Algorithm used to compute the nearest neighbors.
    """
    self.k: int = k
    self.metric: DistanceMetric = metric
    self.algorithm: SearchAlgorithm = algorithm
    self.index: BaseIndex | None = None
    self.y_train: np.ndarray | None = None

DistanceMetric ¶

Bases: Enum

SearchAlgorithm ¶

Bases: Enum

GaussianNB ¶

GaussianNB(epsilon=1e-09)

Bases: BaseModel

Gaussian Naive Bayes classifier.

A probabilistic classifier based on Bayes' theorem with the assumption that features follow a Gaussian (normal) distribution within each class.

Parameters:

Name	Type	Description	Default
`epsilon`	`float`	Small constant to avoid division by zero in variance calculations.	`1e-9`

Attributes:

Name	Type	Description
`epsilon`	`float`	Small constant to avoid division by zero.
`classes`	`ndarray`	Unique class labels, shape (n_classes,).
`class_priors`	`dict`	Prior probability for each class.
`class_means`	`dict`	Mean of each feature per class.
`class_variances`	`dict`	Variance of each feature per class.

Initialize the Gaussian Naive Bayes classifier.

Parameters:

Name	Type	Description	Default
`epsilon`	`float`	Small constant to avoid division by zero in variance calculations.	`1e-9`

Source code in glassbox/models/gaussian_nb/gaussian_nb.py

def __init__(self, epsilon: float = 1e-9) -> None:
    """
    Initialize the Gaussian Naive Bayes classifier.

    Parameters
    ----------
    epsilon : float, default=1e-9
        Small constant to avoid division by zero in variance calculations.
    """
    self.epsilon: float = epsilon
    self.classes: np.ndarray = np.array([])
    self.class_priors: dict = {}
    self.class_means: dict = {}
    self.class_variances: dict = {}

fit ¶

fit(X, y)

Fit the Gaussian Naive Bayes model to training data.

Calculates the mean, variance, and prior probability for each feature in each class.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training data of shape (n_samples, n_features).	required
`y`	`ndarray`	Target values of shape (n_samples,).	required

Returns:

Type	Description
`Self`	The fitted model.

Raises:

Type	Description
`ValueError`	If X and y have incompatible dimensions.

Source code in glassbox/models/gaussian_nb/gaussian_nb.py

def fit(self, X: np.ndarray, y: np.ndarray) -> Self:
    """
    Fit the Gaussian Naive Bayes model to training data.

    Calculates the mean, variance, and prior probability for each feature
    in each class.

    Parameters
    ----------
    X : np.ndarray
        Training data of shape (n_samples, n_features).
    y : np.ndarray
        Target values of shape (n_samples,).

    Returns
    -------
    Self
        The fitted model.

    Raises
    ------
    ValueError
        If X and y have incompatible dimensions.
    """
    if X.shape[0] != y.shape[0]:
        raise ValueError(
            f"X and y must have the same number of samples, "
            f"got {X.shape[0]} and {y.shape[0]}"
        )

    self.classes = np.unique(y)

    for cls in self.classes:
        X_cls = X[y == cls]
        self.class_means[cls] = np.mean(X_cls, axis=0)
        self.class_variances[cls] = np.var(X_cls, axis=0)
        self.class_priors[cls] = X_cls.shape[0] / X.shape[0]

    return self

predict ¶

predict(X, **kwargs)

Predict class labels for samples in X.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Data to predict on, of shape (n_samples, n_features).	required
`**kwargs`	`Any`	Additional keyword arguments (unused).	`{}`

Returns:

Type	Description
`ndarray`	Predicted class labels of shape (n_samples,).

Raises:

Type	Description
`ValueError`	If model has not been fitted yet.

Source code in glassbox/models/gaussian_nb/gaussian_nb.py

def predict(self, X: np.ndarray, **kwargs: Any) -> np.ndarray:
    """
    Predict class labels for samples in X.

    Parameters
    ----------
    X : np.ndarray
        Data to predict on, of shape (n_samples, n_features).
    **kwargs : Any
        Additional keyword arguments (unused).

    Returns
    -------
    np.ndarray
        Predicted class labels of shape (n_samples,).

    Raises
    ------
    ValueError
        If model has not been fitted yet.
    """
    if len(self.classes) == 0:
        raise ValueError("Model has not been fitted yet")

    probabilities = self.predict_proba(X)
    class_indices = np.argmax(probabilities, axis=1)
    return self.classes[class_indices]

predict_proba ¶

predict_proba(X)

Predict class probabilities for samples in X.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Data to predict on, of shape (n_samples, n_features).	required

Returns:

Type	Description
`ndarray`	Predicted class probabilities of shape (n_samples, n_classes). Each row sums to 1.0.

Raises:

Type	Description
`ValueError`	If model has not been fitted yet.

Source code in glassbox/models/gaussian_nb/gaussian_nb.py

def predict_proba(self, X: np.ndarray) -> np.ndarray:
    """
    Predict class probabilities for samples in X.

    Parameters
    ----------
    X : np.ndarray
        Data to predict on, of shape (n_samples, n_features).

    Returns
    -------
    np.ndarray
        Predicted class probabilities of shape (n_samples, n_classes).
        Each row sums to 1.0.

    Raises
    ------
    ValueError
        If model has not been fitted yet.
    """
    if len(self.classes) == 0:
        raise ValueError("Model has not been fitted yet")

    n_samples = X.shape[0]
    n_classes = len(self.classes)
    log_posteriors = np.zeros((n_samples, n_classes))

    for class_idx, cls in enumerate(self.classes):
        log_prior = np.log(self.class_priors[cls])
        pdf = self._calculate_pdf(class_idx, X)
        log_likelihood = np.sum(np.log(pdf), axis=1)
        log_posteriors[:, class_idx] = log_prior + log_likelihood

    # Convert from log probabilities to probabilities using softmax
    # Subtract max for numerical stability
    max_log_posteriors = np.max(log_posteriors, axis=1, keepdims=True)
    log_posteriors_stable = log_posteriors - max_log_posteriors
    probabilities = np.exp(log_posteriors_stable)
    probabilities = probabilities / np.sum(probabilities, axis=1, keepdims=True)

    return probabilities

BaseLinearModel ¶

BaseLinearModel(
    learning_rate=0.01,
    max_epochs=1000,
    tol=1e-06,
    schedule=CONSTANT,
)

Bases: BaseModel

Abstract base class for linear models trained with gradient-based optimization.

Parameters:

Name	Type	Description	Default
`learning_rate`	`float`	Initial learning rate used by the optimizer.	`0.01`
`max_epochs`	`int`	Maximum number of optimization epochs.	`1000`
`tol`	`float`	Convergence tolerance used by stopping criteria.	`1e-6`
`schedule`	`LearningSchedule`	Strategy used to update the learning rate across epochs.	`LearningSchedule.CONSTANT`

Initialize shared linear-model hyperparameters and learned coefficients.

Parameters:

Name	Type	Description	Default
`learning_rate`	`float`	Initial learning rate used by the optimizer.	`0.01`
`max_epochs`	`int`	Maximum number of optimization epochs.	`1000`
`tol`	`float`	Convergence tolerance used by stopping criteria.	`1e-6`
`schedule`	`LearningSchedule`	Strategy used to update the learning rate across epochs.	`LearningSchedule.CONSTANT`

Source code in glassbox/models/linear_model/_base.py

def __init__(
    self,
    learning_rate: float = 0.01,
    max_epochs: int = 1000,
    tol: float = 1e-6,
    schedule: LearningSchedule = LearningSchedule.CONSTANT,
) -> None:
    """
    Initialize shared linear-model hyperparameters and learned coefficients.

    Parameters
    ----------
    learning_rate : float, default=0.01
        Initial learning rate used by the optimizer.
    max_epochs : int, default=1000
        Maximum number of optimization epochs.
    tol : float, default=1e-6
        Convergence tolerance used by stopping criteria.
    schedule : LearningSchedule, default=LearningSchedule.CONSTANT
        Strategy used to update the learning rate across epochs.
    """
    if learning_rate <= 0:
        raise ValueError("learning_rate must be strictly positive")
    if max_epochs <= 0:
        raise ValueError("max_epochs must be strictly positive")
    if tol < 0:
        raise ValueError("tol must be non-negative")

    self.learning_rate = learning_rate
    self.max_epochs = max_epochs
    self.tol = tol
    self.schedule = schedule
    self.weights: np.ndarray = np.array([])
    self.bias: float = 0.0

fit `abstractmethod` ¶

fit(X, y)

Fit the linear model to training data.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training feature matrix of shape (n_samples, n_features).	required
`y`	`ndarray`	Training target vector of shape (n_samples,).	required

Returns:

Type	Description
`Self`	The fitted model instance.

Source code in glassbox/models/linear_model/_base.py

@abstractmethod
def fit(self, X: np.ndarray, y: np.ndarray) -> Self:
    """
    Fit the linear model to training data.

    Parameters
    ----------
    X : np.ndarray
        Training feature matrix of shape (n_samples, n_features).
    y : np.ndarray
        Training target vector of shape (n_samples,).

    Returns
    -------
    Self
        The fitted model instance.
    """
    raise NotImplementedError

predict `abstractmethod` ¶

predict(X, **kwargs)

Predict target values for input samples.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Input feature matrix of shape (n_samples, n_features).	required
`**kwargs`	`Any`	Additional keyword arguments for prediction.	`{}`

Returns:

Type	Description
`ndarray`	Predicted values of shape (n_samples,).

Source code in glassbox/models/linear_model/_base.py

@abstractmethod
def predict(self, X: np.ndarray, **kwargs: Any) -> np.ndarray:
    """
    Predict target values for input samples.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (n_samples, n_features).
    **kwargs : Any
        Additional keyword arguments for prediction.

    Returns
    -------
    np.ndarray
        Predicted values of shape (n_samples,).
    """
    raise NotImplementedError

LinearRegression ¶

LinearRegression(
    learning_rate=0.01,
    max_epochs=1000,
    tol=1e-06,
    schedule=CONSTANT,
)

Bases: BaseLinearModel

Linear regression model.

Source code in glassbox/models/linear_model/_base.py

def __init__(
    self,
    learning_rate: float = 0.01,
    max_epochs: int = 1000,
    tol: float = 1e-6,
    schedule: LearningSchedule = LearningSchedule.CONSTANT,
) -> None:
    """
    Initialize shared linear-model hyperparameters and learned coefficients.

    Parameters
    ----------
    learning_rate : float, default=0.01
        Initial learning rate used by the optimizer.
    max_epochs : int, default=1000
        Maximum number of optimization epochs.
    tol : float, default=1e-6
        Convergence tolerance used by stopping criteria.
    schedule : LearningSchedule, default=LearningSchedule.CONSTANT
        Strategy used to update the learning rate across epochs.
    """
    if learning_rate <= 0:
        raise ValueError("learning_rate must be strictly positive")
    if max_epochs <= 0:
        raise ValueError("max_epochs must be strictly positive")
    if tol < 0:
        raise ValueError("tol must be non-negative")

    self.learning_rate = learning_rate
    self.max_epochs = max_epochs
    self.tol = tol
    self.schedule = schedule
    self.weights: np.ndarray = np.array([])
    self.bias: float = 0.0

fit ¶

fit(X, y)

Fit the linear regression model to training data.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training feature matrix of shape (n_samples, n_features).	required
`y`	`ndarray`	Training target vector of shape (n_samples,).	required

Returns:

Type	Description
`Self`	The fitted model instance.

Source code in glassbox/models/linear_model/linear.py

def fit(self, X: np.ndarray, y: np.ndarray) -> Self:
    """
    Fit the linear regression model to training data.

    Parameters
    ----------
    X : np.ndarray
        Training feature matrix of shape (n_samples, n_features).
    y : np.ndarray
        Training target vector of shape (n_samples,).

    Returns
    -------
    Self
        The fitted model instance.
    """
    X_arr = np.asarray(X, dtype=float)
    y_arr = np.asarray(y, dtype=float)

    if X_arr.ndim != 2:
        raise ValueError("X must be a 2D array")
    if y_arr.ndim != 1:
        raise ValueError("y must be a 1D array")
    if X_arr.shape[0] != y_arr.shape[0]:
        raise ValueError("X and y must contain the same number of samples")
    if X_arr.shape[0] == 0:
        raise ValueError("X and y cannot be empty")

    n_samples, n_features = X_arr.shape
    self.weights = np.zeros(n_features, dtype=float)
    self.bias = 0.0

    previous_loss = np.inf
    for epoch in range(self.max_epochs):
        learning_rate = self._update_learning_rate(epoch)

        predictions = X_arr @ self.weights + self.bias
        errors = predictions - y_arr

        gradient_w = (2.0 / n_samples) * (X_arr.T @ errors)
        gradient_b = 2.0 * np.mean(errors)

        self.weights -= learning_rate * gradient_w
        self.bias -= learning_rate * gradient_b

        current_loss = float(np.mean(errors**2))
        if abs(previous_loss - current_loss) <= self.tol:
            break
        previous_loss = current_loss

    return self

predict ¶

predict(X, **kwargs)

Predict continuous target values for input samples.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Input feature matrix of shape (n_samples, n_features).	required
`**kwargs`	`Any`	Additional keyword arguments for prediction.	`{}`

Returns:

Type	Description
`ndarray`	Predicted values of shape (n_samples,).

Source code in glassbox/models/linear_model/linear.py

def predict(self, X: np.ndarray, **kwargs: Any) -> np.ndarray:
    """
    Predict continuous target values for input samples.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (n_samples, n_features).
    **kwargs : Any
        Additional keyword arguments for prediction.

    Returns
    -------
    np.ndarray
        Predicted values of shape (n_samples,).
    """
    if self.weights.size == 0:
        raise RuntimeError("Model is not fitted yet.")

    X_arr = np.asarray(X, dtype=float)
    if X_arr.ndim != 2:
        raise ValueError("X must be a 2D array")
    if X_arr.shape[1] != self.weights.shape[0]:
        raise ValueError("X must have the same number of features used during fit")

    return X_arr @ self.weights + self.bias

LogisticRegression ¶

LogisticRegression(
    learning_rate=0.01,
    max_epochs=1000,
    tol=1e-06,
    schedule=CONSTANT,
)

Bases: BaseLinearModel

Logistic regression model for binary classification.

Source code in glassbox/models/linear_model/_base.py

def __init__(
    self,
    learning_rate: float = 0.01,
    max_epochs: int = 1000,
    tol: float = 1e-6,
    schedule: LearningSchedule = LearningSchedule.CONSTANT,
) -> None:
    """
    Initialize shared linear-model hyperparameters and learned coefficients.

    Parameters
    ----------
    learning_rate : float, default=0.01
        Initial learning rate used by the optimizer.
    max_epochs : int, default=1000
        Maximum number of optimization epochs.
    tol : float, default=1e-6
        Convergence tolerance used by stopping criteria.
    schedule : LearningSchedule, default=LearningSchedule.CONSTANT
        Strategy used to update the learning rate across epochs.
    """
    if learning_rate <= 0:
        raise ValueError("learning_rate must be strictly positive")
    if max_epochs <= 0:
        raise ValueError("max_epochs must be strictly positive")
    if tol < 0:
        raise ValueError("tol must be non-negative")

    self.learning_rate = learning_rate
    self.max_epochs = max_epochs
    self.tol = tol
    self.schedule = schedule
    self.weights: np.ndarray = np.array([])
    self.bias: float = 0.0

fit ¶

fit(X, y)

Fit the logistic regression model to training data.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training feature matrix of shape (n_samples, n_features).	required
`y`	`ndarray`	Training target vector of shape (n_samples,).	required

Returns:

Type	Description
`Self`	The fitted model instance.

Source code in glassbox/models/linear_model/logistic.py

def fit(self, X: np.ndarray, y: np.ndarray) -> Self:
    """
    Fit the logistic regression model to training data.

    Parameters
    ----------
    X : np.ndarray
        Training feature matrix of shape (n_samples, n_features).
    y : np.ndarray
        Training target vector of shape (n_samples,).

    Returns
    -------
    Self
        The fitted model instance.
    """
    X_arr = np.asarray(X, dtype=float)
    y_arr = np.asarray(y)

    if X_arr.ndim != 2:
        raise ValueError("X must be a 2D array")
    if y_arr.ndim != 1:
        raise ValueError("y must be a 1D array")
    if X_arr.shape[0] != y_arr.shape[0]:
        raise ValueError("X and y must contain the same number of samples")
    if X_arr.shape[0] == 0:
        raise ValueError("X and y cannot be empty")

    classes = np.unique(y_arr)
    if not np.all(np.isin(classes, np.array([0, 1]))):
        raise ValueError(f"y must contain binary labels encoded as 0 and 1, but found: {classes.tolist()}")

    y_bin = y_arr.astype(float)

    n_samples, n_features = X_arr.shape
    self.weights = np.zeros(n_features, dtype=float)
    self.bias = 0.0

    previous_loss = np.inf
    for epoch in range(self.max_epochs):
        learning_rate = self._update_learning_rate(epoch)

        logits = X_arr @ self.weights + self.bias
        probabilities = self._sigmoid(logits)
        errors = probabilities - y_bin

        gradient_w = (X_arr.T @ errors) / n_samples
        gradient_b = float(np.mean(errors))

        self.weights -= learning_rate * gradient_w
        self.bias -= learning_rate * gradient_b

        probabilities_clipped = np.clip(probabilities, 1e-15, 1.0 - 1e-15)
        current_loss = float(
            -np.mean(
                y_bin * np.log(probabilities_clipped)
                + (1.0 - y_bin) * np.log(1.0 - probabilities_clipped)
            )
        )
        if abs(previous_loss - current_loss) <= self.tol:
            break
        previous_loss = current_loss

    return self

predict ¶

predict(X, **kwargs)

Predict class labels for input samples.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Input feature matrix of shape (n_samples, n_features).	required
`**kwargs`	`Any`	Additional keyword arguments for prediction.	`{}`

Returns:

Type	Description
`ndarray`	Predicted class labels of shape (n_samples,).

Source code in glassbox/models/linear_model/logistic.py

def predict(self, X: np.ndarray, **kwargs: Any) -> np.ndarray:
    """
    Predict class labels for input samples.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (n_samples, n_features).
    **kwargs : Any
        Additional keyword arguments for prediction.

    Returns
    -------
    np.ndarray
        Predicted class labels of shape (n_samples,).
    """
    threshold = kwargs.get("threshold", 0.5)
    if not isinstance(threshold, (int, float)):
        raise ValueError("threshold must be a numeric value")
    if threshold < 0.0 or threshold > 1.0:
        raise ValueError("threshold must be in the [0.0, 1.0] interval")

    probabilities = self.predict_proba(X)
    return (probabilities >= float(threshold)).astype(int)

predict_proba ¶

predict_proba(X)

Predict class probabilities for input samples.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Input feature matrix of shape (n_samples, n_features).	required

Returns:

Type	Description
`ndarray`	Predicted probabilities of shape (n_samples,).

Source code in glassbox/models/linear_model/logistic.py

def predict_proba(self, X: np.ndarray) -> np.ndarray:
    """
    Predict class probabilities for input samples.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (n_samples, n_features).

    Returns
    -------
    np.ndarray
        Predicted probabilities of shape (n_samples,).
    """
    if self.weights.size == 0:
        raise RuntimeError("Model is not fitted yet.")

    X_arr = np.asarray(X, dtype=float)
    if X_arr.ndim != 2:
        raise ValueError("X must be a 2D array")
    if X_arr.shape[1] != self.weights.shape[0]:
        raise ValueError("X must have the same number of features used during fit")

    logits = X_arr @ self.weights + self.bias
    return self._sigmoid(logits)

LearningSchedule ¶

Bases: Enum

Learning rate scheduling strategies for linear models.

Attributes:

Name	Type	Description
`CONSTANT`	`LearningSchedule`	Keep the learning rate fixed across epochs.
`TIME_DECAY`	`LearningSchedule`	Decrease the learning rate proportionally with epoch growth.
`EXPONENTIAL`	`LearningSchedule`	Decrease the learning rate exponentially over epochs.

Models¶

Model API¶

Decision Trees¶

DecisionTreeClassifier¶

DecisionTreeRegressor¶

Parameters¶

Random Forests¶

RandomForestClassifier¶

RandomForestRegressor¶

Parameters¶

K-Nearest Neighbors¶

Configuration Enums¶

KNeighborsClassifier¶

KNeighborsRegressor¶

Parameters¶

Gaussian Naive Bayes¶

GaussianNB¶

Linear Models¶

LinearRegression¶

LogisticRegression¶

Parameters¶

API Reference¶

BaseModel ¶

fit abstractmethod ¶

predict abstractmethod ¶

DecisionTreeClassifier ¶

DecisionTreeRegressor ¶

RandomForestClassifier ¶

RandomForestRegressor ¶

KNeighborsClassifier ¶

KNeighborsRegressor ¶

DistanceMetric ¶

SearchAlgorithm ¶

GaussianNB ¶

fit ¶

predict ¶

predict_proba ¶

BaseLinearModel ¶

fit abstractmethod ¶

predict abstractmethod ¶

LinearRegression ¶

fit ¶

predict ¶

LogisticRegression ¶

fit ¶

predict ¶

predict_proba ¶

LearningSchedule ¶

fit `abstractmethod` ¶

predict `abstractmethod` ¶

fit `abstractmethod` ¶

predict `abstractmethod` ¶