Serving a trained model¶
treecat.serving¶
-
class
treecat.serving.
DataServer
(dataset, ensemble)¶ A schema-aware server interface for TreeCat and ensemble models.
-
edge_logits
¶ A [K]-shaped array of log odds of edges in the complete graph.
-
estimate_tree
()¶ Returns a tuple of edges. Each edge is a (vertex,vertex) pair.
-
feature_density
()¶ Returns a [V]-shaped array of feature densities in [0, 1].
-
feature_names
¶ Returns a tuple containing the names of all features.
-
latent_correlation
()¶ Compute correlation matrix among latent features.
This computes the generalization of Pearson’s correlation to discrete data. Let I(X;Y) be the mutual information. Then define correlation as
rho(X,Y) = sqrt(1 - exp(-2 I(X;Y)))- Returns:
- A [V, V]-shaped numpy array of feature-feature correlations.
-
latent_perplexity
()¶ Compute perplexity = exp(entropy) of latent variables.
Perplexity is an information theoretic measure of the number of clusters or latent classes. Perplexity is a real number in the range [1, M], where M is model_num_clusters.
- Returns:
- A [V]-shaped numpy array of perplexity.
-
logprob
(rows, evidence=None)¶ Compute non-normalized log probabilies of many rows of data.
If evidence is specified, compute conditional log probability; otherwise compute unconditional log probability.
- Args:
- data: A list of rows of data, where each row is a sparse dict
- mapping feature name to feature value.
- evidence: An optional row of conditioning data, as a sparse dict
- mapping feature name to feature value.
- Returns:
- An [len(rows)]-shaped numpy array of log probabilities.
-
median
(evidence)¶ Compute an L1-loss-minimizing row of data conditioned on evidence.
- Args:
- evidence: A single row of conditioning data, as a sparse dict
- mapping feature name to feature value.
- Returns:
- A row of data as a full dict mapping feature name to feature value.
-
mode
(evidence)¶ Compute a maximum a posteriori row of data conditioned on evidence.
- Args:
- evidence: A single row of conditioning data, as a sparse dict
- mapping feature name to feature value.
- Returns:
- A row of data as a full dict mapping feature name to feature value.
-
observed_perplexity
()¶ Compute perplexity = exp(entropy) of observed variables.
Perplexity is an information theoretic measure of the number of clusters or observed classes. Perplexity is a real number in the range [1, dim[v]], where dim[v] is the number of categories in an observed categorical variable or 2 for an ordinal variable.
- Returns:
- A [V]-shaped numpy array of perplexity.
-
sample
(N, evidence=None)¶ Draw N samples from the posterior distribution.
- Args:
N: The number of samples to draw. evidence: An optional single row of conditioning data, as a sparse
dict mapping feature name to feature value.- Returns:
- An [N, R]-shaped numpy array of sampled multinomial data.
-
sample_tree
(num_samples)¶ Returns a num_samples-long list of trees, each a list of pairs.
-
-
class
treecat.serving.
EnsembleServer
(ensemble)¶ Class for serving queries against a trained TreeCat ensemble.
-
latent_correlation
()¶ Compute correlation matrix among latent features.
This computes the generalization of Pearson’s correlation to discrete data. Let I(X;Y) be the mutual information. Then define correlation as
rho(X,Y) = sqrt(1 - exp(-2 I(X;Y)))- Returns:
- A [V, V]-shaped numpy array of feature-feature correlations.
-
latent_perplexity
()¶ Compute perplexity = exp(entropy) of latent variables.
Perplexity is an information theoretic measure of the number of clusters or latent classes. Perplexity is a real number in the range [1, M], where M is model_num_clusters.
- Returns:
- A [V]-shaped numpy array of perplexity.
-
marginals
(data)¶ Compute observed marginals conditioned on data.
-
mode
(counts, data)¶ Compute a maximum a posteriori data value conditioned on data.
-
observed_perplexity
(counts)¶ Compute perplexity = exp(entropy) of observed variables.
Perplexity is an information theoretic measure of the number of clusters or observed classes. Perplexity is a real number in the range [1, dim[v]], where dim[v] is the number of categories in an observed categorical variable or 2 for an ordinal variable.
- Args:
- counts: A [V]-shaped array of multinomial counts.
- Returns:
- A [V]-shaped numpy array of perplexity.
-
-
class
treecat.serving.
ServerBase
(ragged_index)¶ Base class for TreeCat and Ensemble servers.
-
edge_logits
¶ Get edge log probabilities on the complete graph.
-
estimate_tree
()¶ Return the maximum a posteriori estimated tree structure.
-
latent_correlation
()¶ Compute correlation matrix among latent features.
-
latent_perplexity
()¶ Compute perplexity = exp(entropy) of latent variables.
-
logprob
(data)¶ Compute non-normalized log probabilies of many rows of data.
-
make_zero_row
()¶ Make an empty data row.
-
marginals
(data)¶ Compute observed marginals conditioned on data.
-
median
(counts, data)¶ Compute L1-loss-minimizing quantized marginals conditioned on data.
- Args:
counts: A [V]-shaped numpy array of quantization resolutions. data: An [N, R]-shaped numpy array of row of conditioning data, as
a ragged nummpy array of multinomial counts, where R = server.ragged_size.- Returns:
- An array of the same shape as data, but with specified counts.
-
mode
(counts, data)¶ Compute a maximum a posteriori data value conditioned on data.
-
observed_perplexity
(counts)¶ Compute perplexity = exp(entropy) of observed variables.
-
sample
(N, counts, data=None)¶ Draw N samples from the posterior distribution.
-
sample_tree
(num_samples)¶ Return a num_samples-long list of trees, each a list of pairs.
-
-
class
treecat.serving.
TreeCatServer
(model)¶ Class for serving queries against a trained TreeCat model.
-
latent_correlation
()¶ Compute correlation matrix among latent features.
This computes the generalization of Pearson’s correlation to discrete data. Let I(X;Y) be the mutual information. Then define correlation as
rho(X,Y) = sqrt(1 - exp(-2 I(X;Y)))- Returns:
- A [V, V]-shaped numpy array of feature-feature correlations.
-
latent_perplexity
()¶ Compute perplexity = exp(entropy) of latent variables.
Perplexity is an information theoretic measure of the number of clusters or latent classes. Perplexity is a real number in the range [1, M], where M is model_num_clusters.
- Returns:
- A [V]-shaped numpy array of perplexity.
-
logprob
(data)¶ Compute non-normalized log probabilies of many rows of data.
To compute conditional probabilty, use the identity:
- log P(data|evidence) = server.logprob(data + evidence)
- server.logprob(evidence)
- Args:
- data: A [N,R]-shaped ragged nummpy array of multinomial count data,
- where N is the number of rows, and R = server.ragged_size.
- Returns:
- An [N]-shaped numpy array of log probabilities.
-
marginals
(data)¶ Compute observed marginals conditioned on data.
- Args:
- data: An [N, R]-shaped numpy array of row of conditioning data, as
- a ragged nummpy array of multinomial counts, where R = server.ragged_size.
- Returns:
- An real-valued array of the same shape as data.
-
mode
(counts, data)¶ Compute a maximum a posteriori data value conditioned on data.
-
observed_perplexity
(counts)¶ Compute perplexity = exp(entropy) of observed variables.
Perplexity is an information theoretic measure of the number of clusters or latent classes. Perplexity is a real number in the range [1, M], where M is model_num_clusters.
- Args:
- counts: A [V]-shaped array of multinomial counts.
- Returns:
- A [V]-shaped numpy array of perplexity.
-
sample
(N, counts, data=None)¶ Draw N samples from the posterior distribution.
- Args:
size: The number of samples to draw. counts: A [V]-shaped numpy array of requested counts of
multinomials to sample.- data: An optional single row of conditioning data, as a [R]-shaped
- ragged numpy array of multinomial counts, where R = server.ragged_size.
- Returns:
- An [N, R]-shaped numpy array of sampled multinomial data.
-
sample_tree
(num_samples)¶ Returns a num_samples-long list of trees, each a list of pairs.
-
-
treecat.serving.
correlation
(probs)¶ Compute correlation rho(X,Y) = sqrt(1 - exp(-2 I(X;Y))).
- Args:
- probs: An [M, M]-shaped numpy array representing a joint distribution.
- Returns:
- A number in [0,1) representing the information-theoretic correlation.
-
treecat.serving.
multinomial_entropy
(probs, count)¶ Compute entropy of multinomial distribution with given probs and count.
- Args:
- probs: A 1-dimensional array of normalized probabilities. count: The number of draws in a multinomial distribution.
- Returns:
- A number in [0, count * len(probs)] representing entropy.
-
treecat.serving.
serve_model
(dataset, model)¶ Create a server object from the given dataset and model.
- Args:
- dataset: Either the filename of a pickled dataset or an already loaded
- dataset.
- model: Either the filename of a pickled TreeCat model or ensemble, or
- an already loaded model or ensemble.
- Returns:
- A DataServer object.