Models¶
The glassbox.models module provides machine learning algorithms for classification and regression. All models follow the fit → predict contract defined by BaseModel.
Model API¶
model.fit(X, y) # Train on (n_samples, n_features) array
model.predict(X) # Returns predictions array
Decision Trees¶
CART-style decision trees that recursively split features to minimize a cost function.
DecisionTreeClassifier¶
Uses Gini impurity as the split criterion and majority vote (mode) for leaf predictions.
from glassbox.models import DecisionTreeClassifier
model = DecisionTreeClassifier(max_depth=10, min_samples_split=5)
model.fit(X_train, y_train)
preds = model.predict(X_test)
DecisionTreeRegressor¶
Uses variance reduction as the split criterion and mean for leaf predictions.
from glassbox.models import DecisionTreeRegressor
model = DecisionTreeRegressor(max_depth=15)
model.fit(X_train, y_train)
preds = model.predict(X_test)
Parameters¶
| Parameter | Default | Description |
|---|---|---|
max_depth |
100 |
Maximum depth of the tree. |
min_samples_split |
2 |
Minimum samples needed to split a node. |
Random Forests¶
Ensemble of decision trees trained on bootstrapped samples with random feature subsets (√n_features).
RandomForestClassifier¶
Aggregates predictions via majority vote.
from glassbox.models import RandomForestClassifier
model = RandomForestClassifier(n_estimators=50, max_depth=10)
model.fit(X_train, y_train)
preds = model.predict(X_test)
RandomForestRegressor¶
Aggregates predictions via averaging.
from glassbox.models import RandomForestRegressor
model = RandomForestRegressor(n_estimators=50, max_depth=10)
model.fit(X_train, y_train)
preds = model.predict(X_test)
Parameters¶
| Parameter | Default | Description |
|---|---|---|
n_estimators |
100 |
Number of trees in the forest. |
max_depth |
100 |
Maximum depth of each tree. |
min_samples_split |
2 |
Minimum samples needed to split a node. |
K-Nearest Neighbors¶
Instance-based learning that predicts based on the k closest training samples.
Configuration Enums¶
DistanceMetric |
Formula |
|---|---|
EUCLIDEAN |
√Σ(xᵢ − yᵢ)² |
MANHATTAN |
Σ|xᵢ − yᵢ| |
SearchAlgorithm |
Description |
|---|---|
BRUTE_FORCE |
Exhaustive pairwise distance computation. |
KD_TREE |
Space-partitioning tree for faster lookup. |
KNeighborsClassifier¶
Predicts via majority vote among the k nearest neighbors.
from glassbox.models import KNeighborsClassifier, DistanceMetric, SearchAlgorithm
model = KNeighborsClassifier(
k=5,
metric=DistanceMetric.EUCLIDEAN,
algorithm=SearchAlgorithm.KD_TREE,
)
model.fit(X_train, y_train)
preds = model.predict(X_test)
KNeighborsRegressor¶
Predicts via averaging the k nearest neighbors' targets.
from glassbox.models import KNeighborsRegressor
model = KNeighborsRegressor(k=7, metric=DistanceMetric.MANHATTAN)
model.fit(X_train, y_train)
preds = model.predict(X_test)
Parameters¶
| Parameter | Default | Description |
|---|---|---|
k |
5 |
Number of neighbors. |
metric |
EUCLIDEAN |
Distance metric. |
algorithm |
BRUTE_FORCE |
Nearest-neighbor search strategy. |
Single-sample prediction
KNN models accept both batch input (n_samples, n_features) and single-sample input (n_features,) in predict().
Gaussian Naive Bayes¶
A probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions between the features. Features are assumed to follow a Gaussian distribution.
GaussianNB¶
from glassbox.models import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)
preds = model.predict(X_test)
Linear Models¶
Models that fit a linear surface to the data, trained using gradient descent optimization.
LinearRegression¶
Predicts a continuous target variable by finding the line of best fit.
from glassbox.models import LinearRegression
model = LinearRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X_train, y_train)
preds = model.predict(X_test)
LogisticRegression¶
Predicts a categorical target variable using the logistic (sigmoid) function to output probabilities.
from glassbox.models import LogisticRegression
model = LogisticRegression(learning_rate=0.1, n_iterations=1000)
model.fit(X_train, y_train)
preds = model.predict(X_test)
Parameters¶
| Parameter | Default | Description |
|---|---|---|
learning_rate |
0.01 |
The step size for gradient descent optimization. |
n_iterations |
1000 |
The number of optimization iterations. |
API Reference¶
BaseModel
¶
Bases: ABC
fit
abstractmethod
¶
Fits the model to the training data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Training data of shape (n_samples, n_features). |
required |
y
|
ndarray
|
Target values of shape (n_samples,). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
The fitted model. |
Source code in glassbox/models/_base.py
predict
abstractmethod
¶
Predicts target values for the given data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Data to predict on, of shape (n_samples, n_features). |
required |
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Predicted target values. |
Source code in glassbox/models/_base.py
DecisionTreeClassifier
¶
Bases: BaseTree
A decision tree classifier.
Source code in glassbox/models/trees/_base.py
DecisionTreeRegressor
¶
Bases: BaseTree
A decision tree regressor.
Source code in glassbox/models/trees/_base.py
RandomForestClassifier
¶
Bases: BaseRandomForest
Random Forest classifier using Decision Tree classification models.
Initialize the random forest classifier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_estimators
|
int
|
The number of trees in the forest. |
100
|
max_depth
|
int
|
Maximum depth of individual trees. |
100
|
min_samples_split
|
int
|
Minimum number of samples required to split an internal node. |
2
|
Source code in glassbox/models/ensemble/classifier.py
RandomForestRegressor
¶
Bases: BaseRandomForest
Random Forest regressor using Decision Tree regression models.
Initialize the random forest regressor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_estimators
|
int
|
The number of trees in the forest. |
100
|
max_depth
|
int
|
Maximum depth of individual trees. |
100
|
min_samples_split
|
int
|
Minimum number of samples required to split an internal node. |
2
|
Source code in glassbox/models/ensemble/regressor.py
KNeighborsClassifier
¶
Bases: BaseKNN
Source code in glassbox/models/neighbors/_knn.py
KNeighborsRegressor
¶
Bases: BaseKNN
Source code in glassbox/models/neighbors/_knn.py
DistanceMetric
¶
Bases: Enum
SearchAlgorithm
¶
Bases: Enum
GaussianNB
¶
Bases: BaseModel
Gaussian Naive Bayes classifier.
A probabilistic classifier based on Bayes' theorem with the assumption that features follow a Gaussian (normal) distribution within each class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
epsilon
|
float
|
Small constant to avoid division by zero in variance calculations. |
1e-9
|
Attributes:
| Name | Type | Description |
|---|---|---|
epsilon |
float
|
Small constant to avoid division by zero. |
classes |
ndarray
|
Unique class labels, shape (n_classes,). |
class_priors |
dict
|
Prior probability for each class. |
class_means |
dict
|
Mean of each feature per class. |
class_variances |
dict
|
Variance of each feature per class. |
Initialize the Gaussian Naive Bayes classifier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
epsilon
|
float
|
Small constant to avoid division by zero in variance calculations. |
1e-9
|
Source code in glassbox/models/gaussian_nb/gaussian_nb.py
fit
¶
Fit the Gaussian Naive Bayes model to training data.
Calculates the mean, variance, and prior probability for each feature in each class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Training data of shape (n_samples, n_features). |
required |
y
|
ndarray
|
Target values of shape (n_samples,). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
The fitted model. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If X and y have incompatible dimensions. |
Source code in glassbox/models/gaussian_nb/gaussian_nb.py
predict
¶
Predict class labels for samples in X.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Data to predict on, of shape (n_samples, n_features). |
required |
**kwargs
|
Any
|
Additional keyword arguments (unused). |
{}
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Predicted class labels of shape (n_samples,). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If model has not been fitted yet. |
Source code in glassbox/models/gaussian_nb/gaussian_nb.py
predict_proba
¶
Predict class probabilities for samples in X.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Data to predict on, of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Predicted class probabilities of shape (n_samples, n_classes). Each row sums to 1.0. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If model has not been fitted yet. |
Source code in glassbox/models/gaussian_nb/gaussian_nb.py
BaseLinearModel
¶
Bases: BaseModel
Abstract base class for linear models trained with gradient-based optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
learning_rate
|
float
|
Initial learning rate used by the optimizer. |
0.01
|
max_epochs
|
int
|
Maximum number of optimization epochs. |
1000
|
tol
|
float
|
Convergence tolerance used by stopping criteria. |
1e-6
|
schedule
|
LearningSchedule
|
Strategy used to update the learning rate across epochs. |
LearningSchedule.CONSTANT
|
Initialize shared linear-model hyperparameters and learned coefficients.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
learning_rate
|
float
|
Initial learning rate used by the optimizer. |
0.01
|
max_epochs
|
int
|
Maximum number of optimization epochs. |
1000
|
tol
|
float
|
Convergence tolerance used by stopping criteria. |
1e-6
|
schedule
|
LearningSchedule
|
Strategy used to update the learning rate across epochs. |
LearningSchedule.CONSTANT
|
Source code in glassbox/models/linear_model/_base.py
fit
abstractmethod
¶
Fit the linear model to training data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Training feature matrix of shape (n_samples, n_features). |
required |
y
|
ndarray
|
Training target vector of shape (n_samples,). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
The fitted model instance. |
Source code in glassbox/models/linear_model/_base.py
predict
abstractmethod
¶
Predict target values for input samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input feature matrix of shape (n_samples, n_features). |
required |
**kwargs
|
Any
|
Additional keyword arguments for prediction. |
{}
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Predicted values of shape (n_samples,). |
Source code in glassbox/models/linear_model/_base.py
LinearRegression
¶
Bases: BaseLinearModel
Linear regression model.
Source code in glassbox/models/linear_model/_base.py
fit
¶
Fit the linear regression model to training data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Training feature matrix of shape (n_samples, n_features). |
required |
y
|
ndarray
|
Training target vector of shape (n_samples,). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
The fitted model instance. |
Source code in glassbox/models/linear_model/linear.py
predict
¶
Predict continuous target values for input samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input feature matrix of shape (n_samples, n_features). |
required |
**kwargs
|
Any
|
Additional keyword arguments for prediction. |
{}
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Predicted values of shape (n_samples,). |
Source code in glassbox/models/linear_model/linear.py
LogisticRegression
¶
Bases: BaseLinearModel
Logistic regression model for binary classification.
Source code in glassbox/models/linear_model/_base.py
fit
¶
Fit the logistic regression model to training data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Training feature matrix of shape (n_samples, n_features). |
required |
y
|
ndarray
|
Training target vector of shape (n_samples,). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
The fitted model instance. |
Source code in glassbox/models/linear_model/logistic.py
predict
¶
Predict class labels for input samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input feature matrix of shape (n_samples, n_features). |
required |
**kwargs
|
Any
|
Additional keyword arguments for prediction. |
{}
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Predicted class labels of shape (n_samples,). |
Source code in glassbox/models/linear_model/logistic.py
predict_proba
¶
Predict class probabilities for input samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Input feature matrix of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Predicted probabilities of shape (n_samples,). |
Source code in glassbox/models/linear_model/logistic.py
LearningSchedule
¶
Bases: Enum
Learning rate scheduling strategies for linear models.
Attributes:
| Name | Type | Description |
|---|---|---|
CONSTANT |
LearningSchedule
|
Keep the learning rate fixed across epochs. |
TIME_DECAY |
LearningSchedule
|
Decrease the learning rate proportionally with epoch growth. |
EXPONENTIAL |
LearningSchedule
|
Decrease the learning rate exponentially over epochs. |