Developer tooling

treecat.profile

treecat.profile.eval(rows=100, cols=10, cats=4, tool='timers')

Profile treecat.validate.eval on a random dataset. Available tools: timers, time, snakeviz, line_profiler, pdb

treecat.profile.serve(rows=100, cols=10, cats=4, tool='timers')

Profile TreeCatServer on a random dataset. Available tools: timers, time, snakeviz, line_profiler, pdb

treecat.profile.serve_files(model_path, config_path, num_samples)

INTERNAL Serve from pickled model, config.

treecat.profile.train(rows=100, cols=10, epochs=5, clusters=32, parallel=False, tool='timers')

Profile TreeCatTrainer on a random dataset. Available tools: timers, time, snakeviz, line_profiler, pdb

treecat.profile.train_files(dataset_path, config_path)

INTERNAL Train from pickled dataset, config.

treecat.generate

treecat.generate.clean()

Clean out cache of generated datasets.

treecat.generate.generate_clean_dataset(tree, num_rows, num_cats)

Generate a dataset whose structure should be easy to learn.

This generates a highly correlated uniformly distributed dataset with given tree structure. This is useful to test that structure learning can recover a known structure.

Args:

tree: A TreeStructure instance. num_rows: The number of rows in the generated dataset. num_cats: The number of categories in the geneated categorical dataset.

This will also be used for the number of latent classes.
Returns:
A dict with key ‘table’ and value a Table object.
treecat.generate.generate_dataset(num_rows, num_cols, num_cats=4, rate=1.0)

Generate a random dataset.

Returns:
A dataset dict with fields ‘schema’ and ‘table’.
treecat.generate.generate_dataset_file(num_rows, num_cols, num_cats=4, rate=1.0)

Generate a random dataset.

Returns:
The path to a gzipped pickled data table.
treecat.generate.generate_model_file(num_rows, num_cols, num_cats=4, rate=1.0)

Generate a random model.

Returns:
The path to a gzipped pickled model.