-
Notifications
You must be signed in to change notification settings - Fork 3
LIP0013
LIP | 13 |
---|---|
Title | Gathering columns into a 3D tensor (with padding if necessary) |
Author | A. Ranganath |
Status | Draft |
Type | Standard |
Discussion | Issue #44 |
PR | #45 |
Created | March 6, 2018 |
A custom tf-op for gathering columns from a 2D tensor into a 3D tensor. If the number of columns to be gathered is not even among 2D slices of the resulting 3D tensor, then the residual columns will be padded with a padding-value.
Multi-op nodes like Sums, Products and PermProds operate on 3D tensors, performing reduction operation, during value and path calculations. These 3D tensors are initially gathered as a single-wide 2D tensor, and then reshaped to 3D, where the first dimension corresponds to batch-size, followed by number of ops modeled as the second dimension, and input-size as the third dimension.
An assumption in these nodes are that all ops modeled within have the same (homogeneous) inputs-size. With newer multi-op nodes like SumsLayer
and ProductsLayer
, this assumption is dropped, as each op modeled can have different (heterogeneous) input-sizes.
To model ops with heterogeneous input-sizes, it would be necessary to insert column-vectors of zeros or ones (for sums or products) into the wide 2D tensor, before being reshaped into a 3D tensor.
Instead, a more optimal approach would be to develop a custom tf-op for gathering columns from a 2D tensor into a 3D tensor, wherein slices with fewer columns to gather are padded with a padding-value (eg: 0, 1, -inf, etc.) defined and set as an attribute of the op, during graph construction.
Create a custom tf-op, with both cpu and gpu OpKernels, for gathering columns from a 2D tensor into a 3D tensor. The params
parameter would accept either a 1D or 2D tensor, indices
would be a nested list of indices, wherein each inner-list would be a set of column indices to be gathered, per 2D slice in the resulting 3D tensor.
Lengths of the inner lists of the indices
parameter can either be homogeneous or heterogeneous. If homogeneous then the OpKernels would just gather values from the params
tensor, into the shape of (batch x len(indices) x len(indices[0]))
. On the other hand, if heterogeneous, then the OpKernels would first initialize the output tensor with pad-elem
value (attribute of the op), and then gather values from the params
tensor, into the shape of (batch x len(indices) x max(len(ind) for ind in indices))
.
Following are the performance metric, comparing between three alternatives: (a) Custom gather_cols op, (b) Proposed gather_cols_3d op, (c) TF gather op. Test cases include 'Non-padded' (i.e, homogeneous column sizes) and 'Padded' (heterogeneous column sizes) cases.
-----------------------
Non-padded
-----------------------
CPU op dt: size setup_time first_run_time rest_run_time correct
custom_gather int32: 69 89.60 79.66 70.00 True
custom_gather int64: 69 153.92 75.09 72.80 True
custom_gather_3d int32: 49 14.86 21.49 19.91 True
custom_gather_3d int64: 49 14.87 20.13 41.93 True
tf_gather int32: 79 96.24 73.18 59.88 True
tf_gather int64: 79 167.78 72.18 61.73 True
GPU op dt: size setup_time first_run_time rest_run_time correct
custom_gather int32: 69 161.59 284.48 1.50 True
custom_gather int64: 69 101.24 8.45 1.31 True
custom_gather_3d int32: 49 34.92 10.44 1.35 True
custom_gather_3d int64: 49 16.96 6.68 1.34 True
tf_gather int32: 79 139.51 8.92 1.37 True
tf_gather int64: 79 103.47 8.14 1.31 True
-----------------------
Padded
-----------------------
CPU op dt: size setup_time first_run_time rest_run_time correct
custom_gather int32: 2019 738.01 172.90 12.31 True
custom_gather int64: 2019 708.18 163.64 12.52 True
custom_gather_3d int32: 49 22.29 41.60 34.66 True
custom_gather_3d int64: 49 29.13 41.32 34.96 True
tf_gather int32: 2519 1132.94 198.98 12.37 True
tf_gather int64: 2519 1013.32 204.24 12.32 True
GPU op dt: size setup_time first_run_time rest_run_time correct
custom_gather int32: 2019 778.28 161.96 15.83 True
custom_gather int64: 2019 922.06 179.78 14.24 True
custom_gather_3d int32: 49 37.36 9.33 1.59 True
custom_gather_3d int64: 49 23.63 7.43 1.65 True
tf_gather int32: 2519 1016.83 205.57 14.25 True
tf_gather int64: 2519 1034.35 213.37 14.26 True
For the 'Non-padded' case, the proposed op has a slightly smaller graph size, with comparable results in terms performance, with respect to the other two alternatives. For the 'Padded' case, there is a significant improvement, both in terms of graph size and performance.