Skip to content

Getting Started

Installation

GlassBox requires Python 3.11+ and NumPy.

pip install .
pip install .[dev]

Quickstart

A minimal end-to-end workflow: load → inspect → clean → train → predict.

1. Load a Dataset

from glassbox.frame import read_csv

ds = read_csv("data.csv")
print(ds)  # Dataset(shape=(1000, 12), columns=[...])

2. Inspect the Data

from glassbox.inspector import DataAuditor

auditor = DataAuditor()
report = auditor.run_audit(ds)

# Feature types detected automatically
print(report.feature_types)

# Outlier counts per numeric column
print(report.outliers_info)

# Export to JSON
print(report.to_json())

3. Clean the Data

import numpy as np
from glassbox.cleaner import SimpleImputer, StandardScaler

X = ds.get_columns(["feat_1", "feat_2"]).data.astype(float)

# Impute missing values with the column mean
imputer = SimpleImputer()
X = imputer.fit_transform(X)

# Standardize to zero-mean, unit-variance
scaler = StandardScaler()
X = scaler.fit_transform(X)

4. Train a Model

from glassbox.models import DecisionTreeClassifier

y = ds.get_columns("target").data[:, 0].astype(float)

model = DecisionTreeClassifier(max_depth=10)
model.fit(X, y)

5. Predict

predictions = model.predict(X)
print(predictions[:10])

What's Next?

  • Frame — Data loading and manipulation.
  • Inspector — Exploratory Data Analysis.
  • Cleaner — Data preprocessing pipeline.
  • Models — Machine learning algorithms.
  • API Reference — Auto-generated from source docstrings.