Orchestrator¶
The glassbox.orchestrator module provides tools for model selection and hyperparameter tuning through cross-validation.
Hyperparameter Search¶
Automate the process of finding the optimal hyperparameters for your models.
GridSearchCV¶
Exhaustive search over specified parameter values for an estimator. Each combination is evaluated using cross-validation.
from glassbox.orchestrator import GridSearchCV
from glassbox.models import DecisionTreeClassifier
param_grid = {
'max_depth': [5, 10, 15],
'min_samples_split': [2, 5]
}
model = DecisionTreeClassifier()
search = GridSearchCV(model, param_grid, cv=5)
search.fit(X_train, y_train)
# The best model is automatically refitted on the entire training set
print(search.best_params_)
preds = search.predict(X_test)
RandomizedSearchCV¶
Randomized search over a hyperparameter space. Trades exhaustiveness for computational speed.
from glassbox.orchestrator import RandomizedSearchCV
from glassbox.models import RandomForestClassifier
param_space = {
'max_depth': [5, 10, 15, 20],
'n_estimators': [10, 50, 100]
}
model = RandomForestClassifier()
search = RandomizedSearchCV(model, param_space, n_iter=10, cv=5)
search.fit(X_train, y_train)
print(search.best_params_)
preds = search.predict(X_test)
Cross-Validation Splitters¶
Strategies to split data into training and validation sets.
KFoldSplitter¶
Splits the dataset into n_splits consecutive folds, preserving the underlying order if not shuffled.
from glassbox.orchestrator import KFoldSplitter
splitter = KFoldSplitter(n_splits=5, shuffle=True, random_seed=42)
for train_idx, val_idx in splitter.split(X):
X_fold_train, X_fold_val = X[train_idx], X[val_idx]
StratifiedKFoldSplitter¶
Splits the dataset into folds while preserving the percentage of samples for each class in y. Ideally suited for classification problems with imbalanced labels.
from glassbox.orchestrator import StratifiedKFoldSplitter
splitter = StratifiedKFoldSplitter(n_splits=5, shuffle=True, random_seed=42)
for train_idx, val_idx in splitter.split(X, y):
y_fold_train, y_fold_val = y[train_idx], y[val_idx]
API Reference¶
BaseSearch
¶
Bases: ABC
Abstract base class for search-based model selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimator
|
BaseModel
|
The model to optimize. |
required |
param_space
|
Dict
|
Parameter search space. |
required |
cv_engine
|
BaseSplitter
|
Cross-validation splitter. |
required |
scoring_func
|
Callable
|
Scoring function used to evaluate candidates. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
best_params_ |
Dict
|
Best found parameter set. |
best_score_ |
float
|
Best scoring value. |
best_estimator_ |
BaseModel
|
Best estimator instance. |
Source code in glassbox/orchestrator/base_search.py
fit
¶
Fit the search object and select the best estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Training data of shape (n_samples, n_features). |
required |
y
|
ndarray
|
Target values of shape (n_samples,). |
required |
Returns:
| Type | Description |
|---|---|
Self
|
The fitted search object. |
Source code in glassbox/orchestrator/base_search.py
BaseSplitter
¶
Bases: ABC
Abstract base class for cross-validation splitters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_splits
|
int
|
Number of splits. |
5
|
shuffle
|
bool
|
Whether to shuffle data before splitting. |
False
|
Source code in glassbox/orchestrator/base_splitter.py
split
abstractmethod
¶
Generate train/test indices for cross-validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Data array of shape (n_samples, n_features). |
required |
y
|
ndarray
|
Target values of shape (n_samples,). |
required |
Returns:
| Type | Description |
|---|---|
Generator[Tuple[ndarray, ndarray], None, None]
|
Generator yielding training and validation index tuples. |
Source code in glassbox/orchestrator/base_splitter.py
GridSearchCV
¶
Bases: BaseSearch
Exhaustive grid search over a parameter space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimator
|
BaseModel
|
The model to optimize. |
required |
param_space
|
Dict
|
Parameter grid for exhaustive search. |
required |
cv_engine
|
BaseSplitter
|
Cross-validation splitter. |
required |
scoring_func
|
Callable
|
Scoring function used to evaluate candidates. |
required |
Source code in glassbox/orchestrator/base_search.py
RandomizedSearchCV
¶
Bases: BaseSearch
Randomized search over a parameter space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimator
|
BaseModel
|
The model to optimize. |
required |
param_space
|
Dict
|
Distribution for random search. |
required |
cv_engine
|
BaseSplitter
|
Cross-validation splitter. |
required |
scoring_func
|
Callable
|
Scoring function used to evaluate candidates. |
required |
n_iter
|
int
|
Number of random parameter candidates to evaluate. |
10
|
time_budget
|
float
|
Maximum time budget for the search. |
0.0
|
Source code in glassbox/orchestrator/randomized_search.py
KFoldSplitter
¶
Bases: BaseSplitter
K-fold cross-validation splitter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_splits
|
int
|
Number of folds. |
5
|
shuffle
|
bool
|
Whether to shuffle data before splitting. |
False
|
Source code in glassbox/orchestrator/base_splitter.py
split
¶
Generate train/test splits for K-fold cross-validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Data array of shape (n_samples, n_features). |
required |
y
|
ndarray
|
Target values of shape (n_samples,). |
required |
Returns:
| Type | Description |
|---|---|
Generator[Tuple[ndarray, ndarray], None, None]
|
Generator yielding training and validation index tuples. |
Source code in glassbox/orchestrator/splitters.py
StratifiedKFoldSplitter
¶
Bases: BaseSplitter
Stratified K-fold cross-validation splitter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_splits
|
int
|
Number of folds. |
5
|
shuffle
|
bool
|
Whether to shuffle data before splitting. |
False
|
Source code in glassbox/orchestrator/base_splitter.py
split
¶
Generate stratified train/test splits for K-fold cross-validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Data array of shape (n_samples, n_features). |
required |
y
|
ndarray
|
Target values of shape (n_samples,). |
required |
Returns:
| Type | Description |
|---|---|
Generator[Tuple[ndarray, ndarray], None, None]
|
Generator yielding training and validation index tuples. |