Frame¶
The glassbox.frame module provides a lightweight data container and CSV I/O utilities. All data is stored internally as np.ndarray — no pandas dependency.
Loading Data¶
Use read_csv to load a CSV file into a Dataset:
from glassbox.frame import read_csv
ds = read_csv("students.csv")
print(ds)
# Dataset(shape=(1000, 12), columns=['Age', 'Gender', 'Score', ...])
Type inference
Columns that can be fully cast to float are stored as float64.
Mixed or string columns remain as object dtype.
Empty cells and "NA" values are converted to np.nan.
The Dataset Class¶
Dataset wraps a 2-D NumPy array with named columns.
Properties¶
| Property | Type | Description |
|---|---|---|
data |
np.ndarray |
The underlying 2-D array. |
columns |
List[str] |
Column names. |
shape |
Tuple[int, int] |
(n_rows, n_cols). |
Selecting Columns¶
# Single column → Dataset with 1 column
ages = ds.get_columns("Age")
# Multiple columns → Dataset subset
subset = ds.get_columns(["Age", "Score"])
Selecting Rows¶
Modifying Data¶
# Update an existing column
ds.update_column("Score", new_scores)
# Drop columns
ds.drop_columns(["Unused_1", "Unused_2"])
# Add new columns from another Dataset
ds.add_columns(extra_ds)
Saving Data¶
Tip
to_csv automatically formats whole-number floats without a decimal point and properly escapes commas and quotes in string values.
API Reference¶
Dataset
¶
Container for data matrices with multiple helper functions.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
ndarray
|
Data arranged as a 2D matrix. To access columns, take the transpose. |
columns |
List[str]
|
Names of the columns stored in a list. |
shape |
Tuple[int, int]
|
Shape of the dataset (# of rows, # of columns). |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray
|
Data arranged as a 2D matrix - (n_rows, n_cols) |
required |
columns
|
List[str]
|
column names - must match data.shape[1] |
required |
Source code in glassbox/frame/dataset.py
get_columns
¶
Retrieve data for specific columns by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
names
|
str | List[str]
|
A single column name or a list of column names. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Array slice representing the requested columns. |
Source code in glassbox/frame/dataset.py
get_rows
¶
Get specific rows based on indices and return as a new dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indices
|
ndarray
|
Integer array of row coordinates. |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
A new Dataset instance containing the selected rows. |
Source code in glassbox/frame/dataset.py
update_column
¶
Update the array content for an existing column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Target column to update. |
required |
new_data
|
ndarray
|
Array values to overwrite the column. |
required |
Returns:
| Type | Description |
|---|---|
None
|
|
Source code in glassbox/frame/dataset.py
drop_columns
¶
Remove columns by name from the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
names
|
str | List[str]
|
Target column or list of columns to remove. |
required |
Returns:
| Type | Description |
|---|---|
None
|
|
Raises:
| Type | Description |
|---|---|
KeyError
|
if one of the columns to drop doesn't exist in the dataset |
Source code in glassbox/frame/dataset.py
add_columns
¶
Add new columns alongside the dataset arrays.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new_dataset
|
Dataset
|
New data to append. |
required |
Returns:
| Type | Description |
|---|---|
None
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If column to add already exists |
Source code in glassbox/frame/dataset.py
read_csv
¶
Load a CSV file into a Dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
Path to the CSV file. |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
Loaded dataset object. |