Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gradient calculation for the covariance between points in GPyModelWrapper #347

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

BrunoKM
Copy link
Contributor

@BrunoKM BrunoKM commented Feb 17, 2021

This change adds a method get_covariance_between_points_gradients to the GPyModelWrapper to facilitate obtaining the gradients of the corresponding get_covariance_between_points method implemented in the wrapper.

These gradients are useful when optimising any batch acquisition function that relies on the covariance between batch points (such as Multi-point Expected Improvement) sequentially, i.e. when optimising with respect to one element of the batch only.

A second change introduced is vectorization of the gradient calculation for full covariance in the GPyModelWrapper class. Replacing the for-loop with numpy array operations results in ~50% speed-up for larger batches of points (> 50).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-io
Copy link

codecov-io commented Feb 17, 2021

Codecov Report

Merging #347 (6738faa) into master (b793989) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #347      +/-   ##
==========================================
+ Coverage   89.43%   89.46%   +0.02%     
==========================================
  Files         123      123              
  Lines        4051     4061      +10     
  Branches      462      461       -1     
==========================================
+ Hits         3623     3633      +10     
  Misses        332      332              
  Partials       96       96              
Impacted Files Coverage Δ
emukit/model_wrappers/gpy_model_wrappers.py 84.66% <100.00%> (+1.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b793989...6738faa. Read the comment docs.

@apaleyes
Copy link
Collaborator

apaleyes commented Mar 9, 2021

Apologies for taking time on this one, it's a bit hard for me to make complete sense of this change. I promise to get through it!

@BrunoKM
Copy link
Contributor Author

BrunoKM commented Mar 10, 2021

No worries, let me know if there is anything I can do to explain and document it better in the code.

Copy link
Collaborator

@apaleyes apaleyes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two major concerns:

  1. This PR introduces a new method that isn't declared in any model interfaces. That's backwards from Emukit's point of view. Are you planning to use this method anywhere? Perhaps create a new acquisition? If so, let's define an interface, or augment an existing one if that makes more sense.
  2. Both changes carry some pretty heavy matrix algebra, which is really hard to follow. Can you please add some comments throughout, to clarify what's going on? Thanks!

There are also some minor comments, which can be found below.

Comment on lines 240 to 246
dkx1X_dx = np.zeros((d, q1*q1, n))
dkx1x2_dx = np.zeros((d, q1*q1, q2))
for i in range(d):
dkx1X_dx[i, ::q1 + 1, :] = kern.dK_dX(x1, x_train, i)
dkx1x2_dx[i, ::q1 + 1, :] = kern.dK_dX(x1, x2, i)
dkx1X_dx = dkx1X_dx.reshape((d, q1, q1, n))
dkx1x2_dx = dkx1x2_dx.reshape((d, q1, q1, q2))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming here is hard to get through: dkx1X? Can it be improved?

:return: gradient of the covariance matrix of shape (q1, q2) between outputs at X1 and X2
(return shape is (q1, q2, q1, d)).
"""
dcov_dx1 = dCov(X1, X2, self.model.X, self.model.kern, self.model.posterior.woodbury_inv)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three inputs come directly from the model. I wander if it makes sense to just pass model in. Or not extract dCov as a stateless method at all. COnisder this: all this method is doing is just calling dCov, literally nothing else. And so far dCov is only used here. It seems pretty specific to me to not get reused. Is that impression correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fair to say, I was primarily just trying to mirror the implementation of dSigma that was already in the module. Happy to not separate it out.

@BrunoKM
Copy link
Contributor Author

BrunoKM commented Apr 11, 2021

To answer the two points:

  1. This PR introduces a new method that isn't declared in any model interfaces. That's backwards from Emukit's point of view. Are you planning to use this method anywhere? Perhaps create a new acquisition? If so, let's define an interface, or augment an existing one if that makes more sense.

This is indeed intended for an implementation of a new acquisition. It comes in handy whenever optimizing any of many batch acquisition functions sequentially – i.e. by selecting optimising batch acquisition with respect to one point in the batch at a time. Currently, get_covariance_between_points is part of the IEntropySearchModel interface, so maybe something like IDifferentiableEntropySeachModel would make sense?

  1. Both changes carry some pretty heavy matrix algebra, which is really hard to follow. Can you please add some comments throughout, to clarify what's going on? Thanks!

Fair point, I'll try and make it easier to follow.

@codecov-commenter
Copy link

codecov-commenter commented Jun 11, 2021

Codecov Report

Merging #347 (c280f6d) into main (cdcb0d0) will decrease coverage by 0.02%.
The diff coverage is 93.10%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #347      +/-   ##
==========================================
- Coverage   89.01%   88.99%   -0.03%     
==========================================
  Files         128      128              
  Lines        4244     4254      +10     
  Branches      609      609              
==========================================
+ Hits         3778     3786       +8     
- Misses        369      371       +2     
  Partials       97       97              
Impacted Files Coverage Δ
emukit/core/interfaces/__init__.py 100.00% <ø> (ø)
emukit/core/interfaces/models.py 61.76% <60.00%> (-0.31%) ⬇️
emukit/model_wrappers/gpy_model_wrappers.py 84.24% <100.00%> (+0.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cdcb0d0...c280f6d. Read the comment docs.

Copy link
Collaborator

@apaleyes apaleyes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are good to go here, just one outdated docstring to update

Comment on lines 126 to 130
:param x1: Prediction inputs of shape (q1, d)
:param x2: Prediction inputs of shape (q2, d)
:param x_train: Training inputs of shape (n_train, d)
:param kern: Covariance of the GP model
:param w_inv: Woodbury inverse of the posterior fit of the GP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is a bit out of date now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed redundant flags

:return: nd array of shape (q1, q2, d) representing the gradient of the posterior covariance between x1 and x2
with respect to x1. res[i, j, k] is the gradient of Cov(y1[i], y2[j]) with respect to x1[i, k]
"""
# Get the relevent shapes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Get the relevent shapes
# Get the relevant shapes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

emukit/model_wrappers/gpy_model_wrappers.py Outdated Show resolved Hide resolved
Comment on lines 221 to 224
dkxX_dx = np.zeros((d, q*q, n))
# Tensor for the gradients of full covariance matrix at points x_predict (of shape (q, q) with respect to
# x_predict (of shape (q, d))
dkxx_dx = np.zeros((d, q*q, q))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming here is a bit unfortunate. I know it was already there, but do you have any ideas how to improve it? the only way to tell the two vars apart is one upper/lower case X, and that's so poor to read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to something more verbose, let me know if you prefer it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to improve it, let me know what you think!

@mmahsereci
Copy link
Contributor

mmahsereci commented Jun 11, 2021

What about the interface discussion? Is this resolved @apaleyes ? Where exactly is the new method required in Emukit, i.e., is there an acquisition function that requires the method?

@apaleyes
Copy link
Collaborator

Hey @mmahsereci , good to hear from you again!

@BrunoKM promised us an acquisition later down the line. But that's a good point that I forgot all about - the function shall be there as an implementation of some interface. Thanks for catching!

@mmahsereci
Copy link
Contributor

That sounds great, thanks for catching me up @apaleyes! And, yes I am back now it seems. :)

@BrunoKM
Copy link
Contributor Author

BrunoKM commented Jun 14, 2021

@apaleyes sorry, resolving the interfaces has been on the stack of my todos for the past couple of weeks, I will try to get around to it soon.

Yes, this is for a new acquisition that is still in the pipeline :)

@apaleyes
Copy link
Collaborator

apaleyes commented Dec 9, 2021

@BrunoKM are you still planning to work on this PR (and the follow up of it)? Or shall we just close it?

@BrunoKM
Copy link
Contributor Author

BrunoKM commented Dec 11, 2021

Thanks for the follow-up @apaleyes!

I'm still happy to finish off this PR (if I don't do it before the end-of-year, feel free to close it).

I've been a bit on the fence about whether to push the new acquisition soon afterwards. It's one with particularly tricky gradients: i.e. manually writing a gradient function for it in pure NumPy turned out to be a pain. The current implementation internally uses either jax/pytorch to automatically get the gradient function. I'd assume adding these dependencies is not in line with the philosophy of the rest of the EmuKit repo.

I'm still hoping I'll be able to add the manual gradients later down the line and push the new acquisition function, but due to limited time, I can't say how soon that might be.

@BrunoKM
Copy link
Contributor Author

BrunoKM commented Jan 1, 2022

I added an interface that specifies the covariance between points derivative method. Let me know your thoughts!

As for the new acquisition function, I think I'll be able to add it without an evaluate_with_gradients method relatively soon. An update with evaluate_with_gradients could follow later down the line.

Copy link
Collaborator

@apaleyes apaleyes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, just one comment

@@ -72,6 +72,35 @@ def get_joint_prediction_gradients(self, X: np.ndarray) -> Tuple[np.ndarray, np.
raise NotImplementedError


class ICrossCovarianceDifferentiable:
def get_covariance_between_points(self, X1: np.ndarray, X2: np.ndarray) -> np.ndarray:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have an interface that defines nearly the same method: https://github.com/EmuKit/emukit/blob/main/emukit/bayesian_optimization/interfaces/models.py

While this isn't that big of an issue, would be nice if we could separate them somehow. Few ideas:

  • Just rename this method
  • Rename the other mehtod
  • Drag this method into a separate interface

Pick whichever you prefer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants