Skip to content
Jos van de Wolfshaar edited this page Sep 1, 2018 · 7 revisions

LIP 17 - Revising dense SPN layers

LIP 17
Title Revising dense SPN layers
Author J. van de Wolfshaar
Status Draft
Type Standard
Discussion Issue #47
PR
Created September 1, 2018

Introduction

This LIP proposes a new way of creating implementing dense SPNs which is inspired by some of the comments made in Probabilistic Deep Learning using Random Sum-Product Networks. It is very likely to result in much more scalable SPNs, allowing many more mixtures and subsets at a lower computational cost.

Technical Background

The idea is to assign separate tensor axes to (i) the scopes Sc, (ii) the decompositions D and (iv) the nodes N.

For now, we assume that the decompositions only occur at the top-most level, and are thus not recursively generated. By doing so, the decomposition parameter actually comes closer to the number of 'repetitions' in the paper mentioned above. For a layer Layer0, we now have an output tensor of shape [B,D,Sc0,N0] (batch, decomposition, scope and node).

Computing sums

A sums layer computes sums for each decomposition and each subset. Let Sum1 be the node connected to Layer0. Its weight tensor will have shape [D,Sc0,N0,N1].

Value computation

The value computation of Sum1 will go as follows:

class SumsLayer:
    def log_value(self, incoming, log_weights):
        # Transpose so that dimensions go from [B,D,Sc,N] to [D,Sc,B,N]
        trans_dsbn = tf.transpose(incoming, (1, 2, 0, 3))
        
        # Compute matrix log-computation, using tf.matmul's ability to parallelize 
        # the outer-most dimensions (D and S in our case)
        out_dsbn = logmatmul(trans_dsbn, log_weights)
        
        # Transpose back so that we get [B,D,Sc,N] again
        out = tf.transpose(out_dsbn, (2, 0, 1, 3))
        return out

Where we can use logmatmul that is already implemented in jostosh's fork. By doing so, we implicitly broadcast the two tensors conveniently, leading to much faster computation than using tf.reduce_logsumexp (which requires several more ops and much more memory to realize).

Computing products

A products layer will be parameterized by how many scopes it merges (the number of subsets that are joined). Two adjacent subsets along the scope dimension (Sc0) will be merged by an outer product, so that we maximally exploit TensorFlow's broadcasting mechanism.

Value computation

Let's assume each product takes two inputs (num_subsets == 2 in old terminology), then we can realize the computation of products as follows:

class ProductsLayer:
    def log_value(self, incoming):
        # Straightforward to generalize this
        num_subsets = 2  
        _, num_decomps, num_scopes, num_nodes = incoming.shape.as_list()
        
        # Reshape the incoming tensor to have a dimension for the number of subsets. The new shape
        # is [B, D, Sc0/2, 2, N]
        reshaped = tf.reshape(
            incoming, [-1, num_decomps, num_scopes // num_subsets, 
            num_subsets, num_nodes])
        
        # Split the tensor where each output corresponds to a subset 
        # to merge with the others
        split_sc0, split_sc1 = tf.split(
            incoming, axis=3, num_or_size_splits=num_subsets)
        
        # Prepare for outer product (through broadcasting)
        split_sc1 = tf.reshape(
            split_sc1, [-1, num_decomps, num_scopes // num_subsets, num_nodes, 1])
        
        # Compute outer products in log space (by broadcasting an addition), the 
        # resulting tensor will have shape [B,D,Sc0/2,N,N]
        out_bdsnn = split_sc0 + split_sc1
        
        # Reshape by flattening the two innermost dimensions
        out = tf.reshape(
            out_bdsnn, [-1, num_decomps, num_scopes // num_subsets, 
                        num_nodes ** num_subsets])
        return out        

Some things to consider

If we were to implement the layers as defined above, we need the number of scopes at the first layer to be a power of the number of subsets. We can realize this with padding. So if we'd have 2 subsets per product and 14 variables to start with, we use a scope dimension size of 16 (first power of 2 greater than 14) and pad accordingly.

It will be much easier to keep track of the scopes, and so validity checking becomes easier as well.

Extensions

  • Internally pass on Tensors with the [D,Sc,B,N] format (so not batch-major) so that the tf.transpose incovations in the sums become obsolete. This will speed things up even more.

Decision

Clone this wiki locally