-
Notifications
You must be signed in to change notification settings - Fork 3
Hints for Performance Testing
-
Always use feeding as the mechanism for providing all inputs. Using constants enables some pre-computing mechanism which results in ops being run on CPU, even if they are supposed to be run on a GPU.
-
Use
tf.identity(tf.placeholder())
to provide inputs. This way, we can expect that the input is copied to GPU only once to the identity op, and then served to other ops. -
Instead of stacking ops, make them dependent on each other using control flow ops. This simulates op stacking, while permitting the use of arbitrary inputs. For instance:
ops = op_fun(params_op, indices) for _ in range(self.num_ops - 1): ops = op_fun(tf.tuple([params_op, ops])[0], indices)
Here, the tuple adds a dependency on the output of the previous op, but still feeds the proper input to the next op.
-
Always test performance on inputs of different dtypes. The performance my vary greatly between dtypes.
-
Test on several machines, performance varies between GPU/CPU configurations.