Serving a trained model¶

treecat.serving¶

class treecat.serving.DataServer(dataset, ensemble)¶

A schema-aware server interface for TreeCat and ensemble models.

edge_logits¶: A [K]-shaped array of log odds of edges in the complete graph.

estimate_tree()¶: Returns a tuple of edges. Each edge is a (vertex,vertex) pair.

feature_density()¶: Returns a [V]-shaped array of feature densities in [0, 1].

feature_names¶: Returns a tuple containing the names of all features.

latent_correlation()¶

Compute correlation matrix among latent features.

This computes the generalization of Pearson’s correlation to discrete data. Let I(X;Y) be the mutual information. Then define correlation as

rho(X,Y) = sqrt(1 - exp(-2 I(X;Y)))

Returns:: A [V, V]-shaped numpy array of feature-feature correlations.

latent_perplexity()¶

Compute perplexity = exp(entropy) of latent variables.

Perplexity is an information theoretic measure of the number of clusters or latent classes. Perplexity is a real number in the range [1, M], where M is model_num_clusters.

Returns:: A [V]-shaped numpy array of perplexity.

logprob(rows, evidence=None)¶

Compute non-normalized log probabilies of many rows of data.

If evidence is specified, compute conditional log probability; otherwise compute unconditional log probability.

Args:

data: A list of rows of data, where each row is a sparse dict: mapping feature name to feature value.
evidence: An optional row of conditioning data, as a sparse dict: mapping feature name to feature value.

Returns:

An [len(rows)]-shaped numpy array of log probabilities.

median(evidence)¶

Compute an L1-loss-minimizing row of data conditioned on evidence.

Args:

evidence: A single row of conditioning data, as a sparse dict: mapping feature name to feature value.

Returns:

A row of data as a full dict mapping feature name to feature value.

mode(evidence)¶

Compute a maximum a posteriori row of data conditioned on evidence.

Args:

evidence: A single row of conditioning data, as a sparse dict: mapping feature name to feature value.

Returns:

A row of data as a full dict mapping feature name to feature value.

observed_perplexity()¶

Compute perplexity = exp(entropy) of observed variables.

Perplexity is an information theoretic measure of the number of clusters or observed classes. Perplexity is a real number in the range [1, dim[v]], where dim[v] is the number of categories in an observed categorical variable or 2 for an ordinal variable.

Returns:: A [V]-shaped numpy array of perplexity.

sample(N, evidence=None)¶

Draw N samples from the posterior distribution.

Args:: N: The number of samples to draw. evidence: An optional single row of conditioning data, as a sparse

dict mapping feature name to feature value.
Returns:: An [N, R]-shaped numpy array of sampled multinomial data.

sample_tree(num_samples)¶: Returns a num_samples-long list of trees, each a list of pairs.

class treecat.serving.EnsembleServer(ensemble)¶

Class for serving queries against a trained TreeCat ensemble.

latent_correlation()¶

Compute correlation matrix among latent features.

This computes the generalization of Pearson’s correlation to discrete data. Let I(X;Y) be the mutual information. Then define correlation as

rho(X,Y) = sqrt(1 - exp(-2 I(X;Y)))

Returns:: A [V, V]-shaped numpy array of feature-feature correlations.

latent_perplexity()¶

Compute perplexity = exp(entropy) of latent variables.

Perplexity is an information theoretic measure of the number of clusters or latent classes. Perplexity is a real number in the range [1, M], where M is model_num_clusters.

Returns:: A [V]-shaped numpy array of perplexity.

marginals(data)¶: Compute observed marginals conditioned on data.

mode(counts, data)¶: Compute a maximum a posteriori data value conditioned on data.

observed_perplexity(counts)¶

Compute perplexity = exp(entropy) of observed variables.

Perplexity is an information theoretic measure of the number of clusters or observed classes. Perplexity is a real number in the range [1, dim[v]], where dim[v] is the number of categories in an observed categorical variable or 2 for an ordinal variable.

Args:: counts: A [V]-shaped array of multinomial counts.
Returns:: A [V]-shaped numpy array of perplexity.

class treecat.serving.ServerBase(ragged_index)¶

Base class for TreeCat and Ensemble servers.

edge_logits¶: Get edge log probabilities on the complete graph.

estimate_tree()¶: Return the maximum a posteriori estimated tree structure.

latent_correlation()¶: Compute correlation matrix among latent features.

latent_perplexity()¶: Compute perplexity = exp(entropy) of latent variables.

logprob(data)¶: Compute non-normalized log probabilies of many rows of data.

make_zero_row()¶: Make an empty data row.

marginals(data)¶: Compute observed marginals conditioned on data.

median(counts, data)¶

Compute L1-loss-minimizing quantized marginals conditioned on data.

Args:: counts: A [V]-shaped numpy array of quantization resolutions. data: An [N, R]-shaped numpy array of row of conditioning data, as

a ragged nummpy array of multinomial counts, where R = server.ragged_size.
Returns:: An array of the same shape as data, but with specified counts.

mode(counts, data)¶: Compute a maximum a posteriori data value conditioned on data.

observed_perplexity(counts)¶: Compute perplexity = exp(entropy) of observed variables.

sample(N, counts, data=None)¶: Draw N samples from the posterior distribution.

sample_tree(num_samples)¶: Return a num_samples-long list of trees, each a list of pairs.

class treecat.serving.TreeCatServer(model)¶

Class for serving queries against a trained TreeCat model.

latent_correlation()¶

Compute correlation matrix among latent features.

This computes the generalization of Pearson’s correlation to discrete data. Let I(X;Y) be the mutual information. Then define correlation as

rho(X,Y) = sqrt(1 - exp(-2 I(X;Y)))

Returns:: A [V, V]-shaped numpy array of feature-feature correlations.

latent_perplexity()¶

Compute perplexity = exp(entropy) of latent variables.

Perplexity is an information theoretic measure of the number of clusters or latent classes. Perplexity is a real number in the range [1, M], where M is model_num_clusters.

Returns:: A [V]-shaped numpy array of perplexity.

logprob(data)¶

Compute non-normalized log probabilies of many rows of data.

To compute conditional probabilty, use the identity:

log P(data|evidence) = server.logprob(data + evidence)

server.logprob(evidence)

Args:

data: A [N,R]-shaped ragged nummpy array of multinomial count data,: where N is the number of rows, and R = server.ragged_size.

Returns:

An [N]-shaped numpy array of log probabilities.

marginals(data)¶

Compute observed marginals conditioned on data.

Args:

data: An [N, R]-shaped numpy array of row of conditioning data, as: a ragged nummpy array of multinomial counts, where R = server.ragged_size.

Returns:

An real-valued array of the same shape as data.

mode(counts, data)¶: Compute a maximum a posteriori data value conditioned on data.

observed_perplexity(counts)¶

Compute perplexity = exp(entropy) of observed variables.

Perplexity is an information theoretic measure of the number of clusters or latent classes. Perplexity is a real number in the range [1, M], where M is model_num_clusters.

Args:: counts: A [V]-shaped array of multinomial counts.
Returns:: A [V]-shaped numpy array of perplexity.

sample(N, counts, data=None)¶

Draw N samples from the posterior distribution.

Args:

size: The number of samples to draw. counts: A [V]-shaped numpy array of requested counts of

multinomials to sample.

data: An optional single row of conditioning data, as a [R]-shaped: ragged numpy array of multinomial counts, where R = server.ragged_size.

Returns:

An [N, R]-shaped numpy array of sampled multinomial data.

sample_tree(num_samples)¶: Returns a num_samples-long list of trees, each a list of pairs.

treecat.serving.correlation(probs)¶

Compute correlation rho(X,Y) = sqrt(1 - exp(-2 I(X;Y))).

Args:: probs: An [M, M]-shaped numpy array representing a joint distribution.
Returns:: A number in [0,1) representing the information-theoretic correlation.

treecat.serving.multinomial_entropy(probs, count)¶

Compute entropy of multinomial distribution with given probs and count.

Args:: probs: A 1-dimensional array of normalized probabilities. count: The number of draws in a multinomial distribution.
Returns:: A number in [0, count * len(probs)] representing entropy.

treecat.serving.serve_model(dataset, model)¶

Create a server object from the given dataset and model.

Args:

dataset: Either the filename of a pickled dataset or an already loaded: dataset.
model: Either the filename of a pickled TreeCat model or ensemble, or: an already loaded model or ensemble.

Returns:

A DataServer object.