You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to run the NCP example that is part of the documentation in noise.py (with minor additions to get a runnable program):
importedward2asedimporttensorflowastfbatch_size, dataset_size=128, 1000# some random datafeatures=tf.random.normal((dataset_size, 25))
labels=tf.random.normal((dataset_size, 1))
inputs=tf.keras.layers.Input(shape=(25,))
x=ed.layers.NCPNormalPerturb()(inputs) # double input batchx=tf.keras.layers.Dense(64, activation='relu')(x)
x=tf.keras.layers.Dense(64, activation='relu')(x)
means=ed.layers.DenseVariationalDropout(1, activation=None)(x) # get meanmeans=ed.layers.NCPNormalOutput(labels)(means) # halve input batchstddevs=tf.keras.layers.Dense(1, activation='softplus')(x[:batch_size])
outputs=tf.keras.layers.Lambda(lambdax: ed.Normal(x[0], x[1]))([means, stddevs])
model=tf.keras.Model(inputs=inputs, outputs=outputs)
optimizer=tf.optimizers.Adam(learning_rate=1e-3)
# Run training loop.num_steps=1000for_inrange(num_steps):
withtf.GradientTape() astape:
predictions=model(features)
loss=-tf.reduce_mean(predictions.distribution.log_prob(labels))
loss+=model.losses[0] /dataset_size# KL regularizer for output layerloss+=model.losses[-1]
trainable_vars=model.trainable_variablesgradients=tape.gradient(loss, trainable_vars)
optimizer.apply_gradients(zip(gradients, trainable_vars))
and ran into:
ValueError: Arguments `loc` and `scale` must have compatible shapes; loc.shape=(1000, 1), scale.shape=(128, 1).
That's clear because the training loop runs full-batch updates and stddevs = tf.keras.layers.Dense(1, activation='softplus')(x[:batch_size]) only uses the first batch_size elements. But that's not related to my main question. Changing the training loop to use mini-batches
...
ds=tf.data.Dataset.from_tensor_slices((features, labels)).batch(batch_size)
# Run training loop.num_steps=1000foriinrange(num_steps):
print(i)
forfeatures_batch, labels_batchinds:
withtf.GradientTape() astape:
predictions=model(features_batch)
loss=tf.reduce_mean(predictions.distribution.log_prob(labels_batch))
...
fixes the above problem but introduces a new problem in NCPNormalOutput:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1000,1] vs. [128,1] [Op:SquaredDifference]
i.e. the shape of labels passed as constructor argument to NCPNormalOutput is incompatible with the shape of the mini-batch. Does NCPNormalOutput (when centering at the labels) not support mini-batch updates at the moment?
Semi-related: layers NCPNormalPerturb and NCPNormalOutput are only needed during training. At test/prediction
time they seem to have no influence on the result. So why is NCP-related functionality designed as layers at all? Shouldn't this be a concern of the loss function only? Edit: Ok, I see that NCPNormalOutput creates a distribution from its input and samples from that distribution, hence has an influence on the result. Nevertheless, this behavior doesn't seem to be related to NCPs, so my previous question remains.
The text was updated successfully, but these errors were encountered:
Environment
I installed Edward2 with
tf-nightly
as dependencywhich set up the following dependencies:
Python version is 3.7.9.
Problem
I tried to run the NCP example that is part of the documentation in noise.py (with minor additions to get a runnable program):
and ran into:
That's clear because the training loop runs full-batch updates and
stddevs = tf.keras.layers.Dense(1, activation='softplus')(x[:batch_size])
only uses the firstbatch_size
elements. But that's not related to my main question. Changing the training loop to use mini-batchesfixes the above problem but introduces a new problem in
NCPNormalOutput
:i.e. the shape of
labels
passed as constructor argument toNCPNormalOutput
is incompatible with the shape of the mini-batch. DoesNCPNormalOutput
(when centering at the labels) not support mini-batch updates at the moment?Semi-related: layers
NCPNormalPerturb
andNCPNormalOutput
are only needed during training. At test/predictiontime they seem to have no influence on the result. So why is NCP-related functionality designed as layers at all? Shouldn't this be a concern of the loss function only? Edit: Ok, I see that
NCPNormalOutput
creates a distribution from its input and samples from that distribution, hence has an influence on the result. Nevertheless, this behavior doesn't seem to be related to NCPs, so my previous question remains.The text was updated successfully, but these errors were encountered: