-
Notifications
You must be signed in to change notification settings - Fork 3
LIP0017
LIP | 17 |
---|---|
Title | Revising dense SPN layers |
Author | J. van de Wolfshaar |
Status | Draft |
Type | Standard |
Discussion | Issue #47 |
PR | |
Created | September 1, 2018 |
This LIP proposes a new way of creating implementing dense SPNs which is inspired by some of the comments made in Probabilistic Deep Learning using Random Sum-Product Networks. It is very likely to result in much more scalable SPNs, allowing many more mixtures and subsets at a lower computational cost.
The idea is to assign separate tensor axes to (i) the scopes Sc
, (ii) the
decompositions D
and (iv) the nodes N
.
For now, we assume that the decompositions only occur
at the top-most level, and are thus not recursively generated. By doing so, the decomposition
parameter actually comes closer to the number of 'repetitions' in the paper mentioned above. For a
layer Layer0
, we now have an output tensor of shape [B,D,Sc0,N0]
(batch, decomposition,
scope and node).
A sums layer computes sums for each decomposition and each subset. Let Sum1
be the node connected
to Layer0
. Its weight tensor will have shape [D,Sc0,N0,N1]
.
The value computation of Sum1
will go as follows:
class SumsLayer:
def log_value(self, incoming, log_weights):
# Transpose so that dimensions go from [B,D,Sc,N] to [D,Sc,B,N]
trans_dsbn = tf.transpose(incoming, (1, 2, 0, 3))
# Compute matrix log-computation, using tf.matmul's ability to parallelize
# the outer-most dimensions (D and S in our case)
out_dsbn = logmatmul(trans_dsbn, log_weights)
# Transpose back so that we get [B,D,Sc,N] again
out = tf.transpose(out_dsbn, (2, 0, 1, 3))
return out
Where we can use logmatmul
that is already implemented in jostosh's fork.
By doing so, we implicitly broadcast the two tensors conveniently, leading to much faster
computation than using tf.reduce_logsumexp
(which requires several more ops and much more
memory to realize).
A products layer will be parameterized by how many scopes it merges (the number of subsets that are
joined). Two adjacent subsets along the scope dimension (Sc0
) will be merged by an outer product,
so that we maximally exploit TensorFlow's broadcasting mechanism.
Let's assume each product takes two inputs (num_subsets == 2
in old terminology), then we
can realize the computation of products as follows:
class ProductsLayer:
def log_value(self, incoming):
# Straightforward to generalize this
num_subsets = 2
_, num_decomps, num_scopes, num_nodes = incoming.shape.as_list()
# Reshape the incoming tensor to have a dimension for the number of subsets. The new shape
# is [B, D, Sc0/2, 2, N]
reshaped = tf.reshape(
incoming, [-1, num_decomps, num_scopes // num_subsets,
num_subsets, num_nodes])
# Split the tensor where each output corresponds to a subset
# to merge with the others
split_sc0, split_sc1 = tf.split(
incoming, axis=3, num_or_size_splits=num_subsets)
# Prepare for outer product (through broadcasting)
split_sc1 = tf.reshape(
split_sc1, [-1, num_decomps, num_scopes // num_subsets, num_nodes, 1])
# Compute outer products in log space (by broadcasting an addition), the
# resulting tensor will have shape [B,D,Sc0/2,N,N]
out_bdsnn = split_sc0 + split_sc1
# Reshape by flattening the two innermost dimensions
out = tf.reshape(
out_bdsnn, [-1, num_decomps, num_scopes // num_subsets,
num_nodes ** num_subsets])
return out
If we were to implement the layers as defined above, we need the number of scopes at the first layer to be a power of the number of subsets. We can realize this with padding. So if we'd have 2 subsets per product and 14 variables to start with, we use a scope dimension size of 16 (first power of 2 greater than 14) and pad accordingly.
It will be much easier to keep track of the scopes, and so validity checking becomes easier as well.
- Internally pass on Tensors with the
[D,Sc,B,N]
format (so not batch-major) so that thetf.transpose
incovations in the sums become obsolete. This will speed things up even more.