Autoconfig

This module provides functionality for automatically configuring and searching optimal parameters for a tabular data engine. The main class, Autoconfig, is used to perform grid search over various model architectures and batch sizes to find the best configuration based on reconstruction loss.

class clearbox_synthetic.utils.autoconfig.autoconfig.Autoconfig(train_ds: ndarray, numerical_features_sizes: int, categorical_features_sizes: List, y_train_ds: ndarray | None = None)[source]

Bases: object

A class for automatically configuring and searching optimal parameters for a tabular engine.

train_ds

The training dataset.

Type:

np.ndarray

y_train_ds

The target values for the training dataset.

Type:

np.ndarray, optional

numerical_features_sizes

The size of ordinal features.

Type:

int

categorical_features_sizes

The sizes of categorical features.

Type:

List

Performs a grid search to find the optimal model configuration.

The grid search iterates over different model architectures and batch sizes, fitting the model using multiple threads, and evaluates each model to determine the configuration with the lowest mean reconstruction loss.

Returns:

The optimal configuration (architecture and batch size) based on the evaluation loss.

Return type:

list

clearbox_synthetic.utils.autoconfig.autoconfig.learning_rule(training_rows_size: int)[source]

Determines the learning rate, number of epochs, and batch size based on the size of the training data.

Parameters:

training_rows_size (int) – The number of rows in the training dataset.

Returns:

A tuple containing (learning_rate, epochs, batch_size).

Return type:

Tuple[int, int, int]