Skip to content
Jos van de Wolfshaar edited this page Feb 23, 2018 · 3 revisions

LIP 6 - Sums node with varying value input sizes

LIP 6
Title Sums node with varying value input sizes
Author J. van de Wolfshaar
Status Draft
Type Standard
Discussion Issue #36
PR
Created Feb 7, 2018

Introduction

At this point, the Sums class in the branch multi_nodes_dense_generator in nash911/libspn can only model multiple sums of the same dimensionality, that is, each of the sums considers the same amount of input columns. This LIP proposes a way to break the latter constraint. By doing so, the new SumsLayer class could be modeling all sums at a single depth of a tree-structured SPN. By concentrating many computations in fewer TensorFlow Ops, we most likely improve efficiency and we reduce the graph size significantly.

Technical Background

This LIP requires us alter many of the existing methods in Sums. For simplicity, we'll consider the _compute_value method, which in pseudo code does something like:

function _compute_value(self, weight_tensor, ivs_tensor, *value_tensors)
    # Get indices per value input
    value_tensor_indices = get_indices_for_all_value_inputs()

    # For each of these gather a set of columns to sum and concatenate the columns
    values_subsets = map_gather(values, value_tensor_indices)

    values = concatenate_columns(values_subsets)
    values = reshape(values, sum_reduce_shape)
    return sum_over_last_dim(values * weight_tensor)

So this takes subsets of columns, concatenates them, reshapes the result and sums over last dimension after element-wise multiplcation with weights.

The reshaping operation directly relies on the assumption that all sums are of the same dimensionality, so we will propose a different approach that would work for sums with different dimension sizes. Other class methods will have other pieces of code that rely on this assumption, but we'll limit our discussion to this function for brevity.

Proposal

We want the class to be able to model multiple sums with heterogeneous sizes. As noted before, this requires the alteration of multiple methods, but we'll focus once again to the example stated in the previous section. It could be replaced by the following:

function _compute_value(self, masked_weight_tensor, ivs_tensor, *value_tensors)
    # Get indices per value input
    value_tensor_indices = get_indices_for_all_value_inputs()
    # Concatenate the indices AND value input tensors
    value_tensor_indices_concat = concatenate_indices(value_tensor_indices)
    values_concat = concatenate_columns(value_tensors)

    # For each of these gather a set of columns to sum and concatenate the columns
    values = gather_cols_3d_with_padding(
        values_concat, value_tensor_indices_concat)
    return sum_over_last_dim(values * masked_weight_tensor)

The gather_cols_3d_with_padding functionality already exists as an Op in the mutli_nodes_dense_generator and is called gather_cols_3D. This Op takes an input tensor and a list of indices. The list of indices might be nested, meaning that we could specify:

gathered_tensor = gather_cols_3d(value_tensor, [[0, 1, 2], [0, 2], [3, 4, 5], [1]])

This Op will return a tensor with dimensions [batch_size, num_sums, max_num_cols], or in this case [batch_size, 4, 3], since we have 4 lists in the nested list and a maximum inner list length of 3 (either [0, 1, 2] or [3, 4, 5]).

Note that gathered_tensor[:, 1, :] will contain the zeroth and second columns of value_tensor and a trailing zero-column, so the padding is done implicitly.

The _compute_log_value method would require a similar implementation, where the element wise multiplication is replaced by element-wise addition.

The weights_tensor will have zero values for 'padded' parts of the sums that are modeled. Let's say we have a SumsLayer that models three Sums, one with size 5, one with size 3 and another with size 2. We would need 10 weights in total, but to be able to perform component-wise after the gather_cols_3d operation, we construct a weight matrix with shape [num_sums, max_sum_size], which in this case amounts to [3, 5].

In short, we propose the following:

  • Create a SumsLayer class that can model sums of different dimension sizes
    • Revise methods relying on assumptions of equally sized sums, including:
      • _compute_value
      • _compute_log_value
      • _compute_mpe_value
      • _compute_log_mpe_value
      • _compute_value_common
      • _compute_mpe_path
      • _compute_log_mpe_path
      • _compute_mpe_path_common

Examples

After this, we should be able to do replace the following

iv_x = spn.IVs(num_vars=3, num_vals=2, name="iv_x")
sums_1 = spn.ParSums((iv_x, [0,1]), num_sums=2, name="sums_1")
sums_2 = spn.ParSums((iv_x, [2,3,4,5]), num_sums=1, name="sums_2")

with:

iv_x = spn.IVs(num_vars=3, num_vals=2, name="iv_x")
sums_1_2 = spn.SumsLayer([(iv_x, [0,1]), (iv_x, [0,1]), (iv_x, [2,3,4,5])],
    num_sums_or_sizes=[4, 2, 2], name="sums_1_2")

The parameter num_sums_or_sizes either designates the total number of sums as an int or the size of each sum that is modeled in the case of a list. If an int is passed to the constructor, the sums will be sized equally and divided over a tensor that holds the concatenation of all inputs. If a list is provided, the sums will be divided in the same way but with potentially different sizes.

Extensions

There are a few extensions:

  • Automatically convert between multiple ParSums to a single Sums
  • Revising the Products class to be able to handle products of varying dimensionality
  • Think of another input format that simplifies the first argument of the SumsLayer constructor

Performance comparison

In the test below, we model 100 sum nodes. The sums' inputs are divided in 10 groups, each having its own IVs input. We can model this using 10 ParSums where each ParSum connects to a single IVs and computes 10 sums. We compare this against a single SumsLayer that models the very same structure but combines the full component-wise application of latent IVs, the component-wise weighting and the reduction in non-log or log space to compute the outcome of each sum. The test measures the wall time for 250 runs of MPE path calculation.

It can be seen that the sums_layer implementations yield much smaller graph sizes and are significantly faster for all configurations. We consider three different versions of sums_layer:

  • sums_layer computes MPE counts and scatters the counts to each input, even though some inputs nodes might not be unique, so summing of the counts happens later in the graph for these non-unique nodes.
  • sums_layer_v2 computes the sum of the counts of each unique input before passing it to _scatter_to_input_tensors, yielding a considerably smaller graph size
  • sums_layer_v3 computes sums of counts as sums_layer_v2, but also reuses the component-wise weighted tensor computed during forward pass in the backward pass (see also LIP7).
#-----------------------
InferenceType: MPE
-----------------------
CPU          op  size indices   ivs  setup_time weights_init_time  first_run_time  rest_run_time    correct
GPU          op  size indices   ivs  setup_time weights_init_time  first_run_time  rest_run_time    correct
       par_sums   342     Yes    No      150.62             16.50           34.53           5.42       True
     sums_layer   359     Yes    No      141.29             13.50           10.24           4.69       True
  sums_layer_v2    57     Yes    No       61.04             49.43           18.15           4.64       True
  sums_layer_v3    52     Yes    No       56.04             49.98           17.33           4.69       True

-----------------------
InferenceType: MPE
-----------------------
CPU          op  size indices   ivs  setup_time weights_init_time  first_run_time  rest_run_time    correct
GPU          op  size indices   ivs  setup_time weights_init_time  first_run_time  rest_run_time    correct
       par_sums   532     Yes   Yes      221.21             20.78           38.81           6.94       True
     sums_layer   378     Yes   Yes      227.03             13.81           12.20           5.29       True
  sums_layer_v2    76     Yes   Yes       67.63             49.98           20.58           5.06       True
  sums_layer_v3    68     Yes   Yes       63.75             50.69           19.98           5.52       True

-----------------------
InferenceType: MPE-LOG
-----------------------
CPU          op  size indices   ivs  setup_time weights_init_time  first_run_time  rest_run_time    correct
GPU          op  size indices   ivs  setup_time weights_init_time  first_run_time  rest_run_time    correct
       par_sums   363     Yes    No      160.13             17.09           23.44           5.67       True
     sums_layer   362     Yes    No      142.26             13.25            9.78           4.58       True
  sums_layer_v2    60     Yes    No       61.77             49.30           17.80           4.49       True
  sums_layer_v3    55     Yes    No       56.47             49.26           18.15           4.74       True

-----------------------
InferenceType: MPE-LOG
-----------------------
CPU          op  size indices   ivs  setup_time weights_init_time  first_run_time  rest_run_time    correct
GPU          op  size indices   ivs  setup_time weights_init_time  first_run_time  rest_run_time    correct
       par_sums   563     Yes   Yes      317.90             21.56           41.37           7.35       True
     sums_layer   382     Yes   Yes      149.24             13.64           12.20           5.20       True
  sums_layer_v2    80     Yes   Yes       70.97             55.02           34.58           5.22       True
  sums_layer_v3    72     Yes   Yes       64.55             50.17           20.28           5.22       True

Decision

Clone this wiki locally