Validating model fit

treecat.validate

treecat.validate.eval(dataset_path, param_csv_path, models_dir, result_path, **options)

Evaluate trained models.

treecat.validate.read_param_csv(param_csv_path, **options)

Reads configs from a csv file.

Args:
param_csv_path: The path to a csv file with one line per config. options: A dict of extra config parameters.
Returns:
A pair (header, configs), where: header is a list of parameters, and configs is list of config dicts.
treecat.validate.split_data(ragged_index, num_rows, num_parts, partid)

Split a dataset into training + holdout for n-fold crossvalidation.

This splits a dataset into num_parts disjoint parts by randomly holding out cells. Note that whereas supervised crossvalidation typically holds out entire rows, our unsupervised crossvalidation is intended to evaluate a model of the full joint distribution.

Args:
ragged_index: A [V+1]-shaped numpy array of indices into the ragged
data array, where V is the number of features.

num_rows: An integer, the number of rows in the dataset. num_parts: An integer, the number of folds in n-fold crossvalidation. partid: An integer in [0, num_parts).

Returns:
A [N,R]-shaped mask where True means held-out and False means training. Here N = num_rows and R = ragged_index[-1].
treecat.validate.train(dataset_path, param_csv_path, models_dir, **options)

Tune parameters specified in a csv file.

treecat.validate.train_task(dataset_path, model_path, config_str)

INTERNAL Train a single model.