-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gradient calculation for the covariance between points in GPyModelWrapper #347
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #347 +/- ##
==========================================
+ Coverage 89.43% 89.46% +0.02%
==========================================
Files 123 123
Lines 4051 4061 +10
Branches 462 461 -1
==========================================
+ Hits 3623 3633 +10
Misses 332 332
Partials 96 96
Continue to review full report at Codecov.
|
Apologies for taking time on this one, it's a bit hard for me to make complete sense of this change. I promise to get through it! |
No worries, let me know if there is anything I can do to explain and document it better in the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two major concerns:
- This PR introduces a new method that isn't declared in any model interfaces. That's backwards from Emukit's point of view. Are you planning to use this method anywhere? Perhaps create a new acquisition? If so, let's define an interface, or augment an existing one if that makes more sense.
- Both changes carry some pretty heavy matrix algebra, which is really hard to follow. Can you please add some comments throughout, to clarify what's going on? Thanks!
There are also some minor comments, which can be found below.
dkx1X_dx = np.zeros((d, q1*q1, n)) | ||
dkx1x2_dx = np.zeros((d, q1*q1, q2)) | ||
for i in range(d): | ||
dkx1X_dx[i, ::q1 + 1, :] = kern.dK_dX(x1, x_train, i) | ||
dkx1x2_dx[i, ::q1 + 1, :] = kern.dK_dX(x1, x2, i) | ||
dkx1X_dx = dkx1X_dx.reshape((d, q1, q1, n)) | ||
dkx1x2_dx = dkx1x2_dx.reshape((d, q1, q1, q2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming here is hard to get through: dkx1X
? Can it be improved?
:return: gradient of the covariance matrix of shape (q1, q2) between outputs at X1 and X2 | ||
(return shape is (q1, q2, q1, d)). | ||
""" | ||
dcov_dx1 = dCov(X1, X2, self.model.X, self.model.kern, self.model.posterior.woodbury_inv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three inputs come directly from the model. I wander if it makes sense to just pass model in. Or not extract dCov
as a stateless method at all. COnisder this: all this method is doing is just calling dCov, literally nothing else. And so far dCov is only used here. It seems pretty specific to me to not get reused. Is that impression correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fair to say, I was primarily just trying to mirror the implementation of dSigma
that was already in the module. Happy to not separate it out.
To answer the two points:
This is indeed intended for an implementation of a new acquisition. It comes in handy whenever optimizing any of many batch acquisition functions sequentially – i.e. by selecting optimising batch acquisition with respect to one point in the batch at a time. Currently,
Fair point, I'll try and make it easier to follow. |
…kit into gradients-for-covariance-gpy
Codecov Report
@@ Coverage Diff @@
## main #347 +/- ##
==========================================
- Coverage 89.01% 88.99% -0.03%
==========================================
Files 128 128
Lines 4244 4254 +10
Branches 609 609
==========================================
+ Hits 3778 3786 +8
- Misses 369 371 +2
Partials 97 97
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are good to go here, just one outdated docstring to update
:param x1: Prediction inputs of shape (q1, d) | ||
:param x2: Prediction inputs of shape (q2, d) | ||
:param x_train: Training inputs of shape (n_train, d) | ||
:param kern: Covariance of the GP model | ||
:param w_inv: Woodbury inverse of the posterior fit of the GP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is a bit out of date now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed redundant flags
:return: nd array of shape (q1, q2, d) representing the gradient of the posterior covariance between x1 and x2 | ||
with respect to x1. res[i, j, k] is the gradient of Cov(y1[i], y2[j]) with respect to x1[i, k] | ||
""" | ||
# Get the relevent shapes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Get the relevent shapes | |
# Get the relevant shapes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
dkxX_dx = np.zeros((d, q*q, n)) | ||
# Tensor for the gradients of full covariance matrix at points x_predict (of shape (q, q) with respect to | ||
# x_predict (of shape (q, d)) | ||
dkxx_dx = np.zeros((d, q*q, q)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming here is a bit unfortunate. I know it was already there, but do you have any ideas how to improve it? the only way to tell the two vars apart is one upper/lower case X, and that's so poor to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to something more verbose, let me know if you prefer it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to improve it, let me know what you think!
What about the interface discussion? Is this resolved @apaleyes ? Where exactly is the new method required in Emukit, i.e., is there an acquisition function that requires the method? |
Hey @mmahsereci , good to hear from you again! @BrunoKM promised us an acquisition later down the line. But that's a good point that I forgot all about - the function shall be there as an implementation of some interface. Thanks for catching! |
That sounds great, thanks for catching me up @apaleyes! And, yes I am back now it seems. :) |
@apaleyes sorry, resolving the interfaces has been on the stack of my todos for the past couple of weeks, I will try to get around to it soon. Yes, this is for a new acquisition that is still in the pipeline :) |
@BrunoKM are you still planning to work on this PR (and the follow up of it)? Or shall we just close it? |
Thanks for the follow-up @apaleyes! I'm still happy to finish off this PR (if I don't do it before the end-of-year, feel free to close it). I've been a bit on the fence about whether to push the new acquisition soon afterwards. It's one with particularly tricky gradients: i.e. manually writing a gradient function for it in pure NumPy turned out to be a pain. The current implementation internally uses either jax/pytorch to automatically get the gradient function. I'd assume adding these dependencies is not in line with the philosophy of the rest of the EmuKit repo. I'm still hoping I'll be able to add the manual gradients later down the line and push the new acquisition function, but due to limited time, I can't say how soon that might be. |
Co-authored-by: Andrei Paleyes <apaleyes@users.noreply.github.com>
…kit into gradients-for-covariance-gpy
I added an interface that specifies the covariance between points derivative method. Let me know your thoughts! As for the new acquisition function, I think I'll be able to add it without an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, just one comment
@@ -72,6 +72,35 @@ def get_joint_prediction_gradients(self, X: np.ndarray) -> Tuple[np.ndarray, np. | |||
raise NotImplementedError | |||
|
|||
|
|||
class ICrossCovarianceDifferentiable: | |||
def get_covariance_between_points(self, X1: np.ndarray, X2: np.ndarray) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have an interface that defines nearly the same method: https://github.com/EmuKit/emukit/blob/main/emukit/bayesian_optimization/interfaces/models.py
While this isn't that big of an issue, would be nice if we could separate them somehow. Few ideas:
- Just rename this method
- Rename the other mehtod
- Drag this method into a separate interface
Pick whichever you prefer!
This change adds a method
get_covariance_between_points_gradients
to theGPyModelWrapper
to facilitate obtaining the gradients of the correspondingget_covariance_between_points
method implemented in the wrapper.These gradients are useful when optimising any batch acquisition function that relies on the covariance between batch points (such as Multi-point Expected Improvement) sequentially, i.e. when optimising with respect to one element of the batch only.
A second change introduced is vectorization of the gradient calculation for full covariance in the
GPyModelWrapper
class. Replacing the for-loop with numpy array operations results in ~50% speed-up for larger batches of points (> 50).By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.