Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loose the Neuropod backend version match #539

Open
jiaguofang opened this issue Mar 22, 2022 · 2 comments
Open

Loose the Neuropod backend version match #539

jiaguofang opened this issue Mar 22, 2022 · 2 comments

Comments

@jiaguofang
Copy link

Feature

Is your feature request related to a problem? Please describe.

In Michelangelo, Neuropod JNI and backends are built into different places: Neuropod JNI is built into a michelangelo.jar, while backends are built into a Docker image. When upgrading the Neuropod version, e.g. 0.3.0-rc5 to 0.3.0-rc6, we often see the following issue across our use cases. The main reason is that when the issue happens, the michelangelo.jar got upgraded with a newer Neuropod JNI version (e.g. 0.3.0-rc6), but the backends in the Docker image are still with an old version (e.g. 0.3.0-rc5). This is unavoidable since many users build their own training pipelines and we don't have an efficient way to upgrade the new version in all the places.

An error occurred while calling o894.transform.
: com.uber.neuropod.NeuropodJNIException: Neuropod Error: The model being loaded requires a Neuropod backend for type 'python' and version range '*'. However, a backend satisfying these requirements was not found. See the installation instructions at https://neuropod.ai/installing to install a backend. Retry with log level TRACE for more information.
	at com.uber.neuropod.Neuropod.nativeNew(Native Method)
	at com.uber.neuropod.Neuropod.<init>(Neuropod.java:49)

Describe the solution you'd like

Instead of matching the exact version between Neuropod JNI and backends, can you maintain a backward-compatible list of versions? Meaning Neuropod JNI 0.3.0-rc6 can be compatible with [0.3.0-rc6, 0.3.0-rc5, 0.3.0-rc4, ...]. In this way, the Neuropod JNI version in michelangelo.jar can be different from the backends version in the Docker image. So that users won't get the above error when there is a Neuropod version upgrade.

Describe alternatives you've considered

Additional context

@VivekPanyam
Copy link
Collaborator

This is a good suggestion and something I've thought about a bit. Unfortunately, it's kinda tricky to do correctly.

The backends depend on code compiled into libneuropod.so. For example, all the backends use code in /internal and some use code in /bindings and /core as well. This effectively means that it's quite likely that a header file change almost anywhere will break ABI compatibility for backends.

We can use tools to track this and ensure we don't break ABI compatibility for backends within a {MAJOR/MINOR}1 version, but since most header file changes are likely to break it, this approach might not be particularly useful at the moment. Header only/templated classes make finding accidental breakages much more complex as well. This also requires deciding on a versioning strategy and the guarantees we're willing to stick to (e.g. a backend will work with all Neuropod libraries within a specific {MAJOR/MINOR}1 version, there will be a release with breaking changes at most once a quarter, etc). This also significantly impacts backend loading logic so we have to be thoughtful about our approach.

All of the above also will likely make our CI tooling more complex as well.

Adding version mismatch flexibility will be fairly complex to do (and test) correctly so we need to figure out if it's worth it. Maybe there's another solution that's simpler (e.g. mount backends as a docker volume at runtime instead of building them into the image, inject them the same way you inject the .jar file, etc.).

1 This depends on if we want to follow semver or not. For example, PyTorch does a minor release every quarter (~90 days), but these minor releases contain backwards incompatible changes (which goes against semver rules): https://github.com/pytorch/pytorch/releases

@VivekPanyam
Copy link
Collaborator

VivekPanyam commented Mar 29, 2022

This effectively means that it's quite likely that a header file change almost anywhere will break ABI compatibility for backends.

To clarify, we haven't changed a header file in a way that should impact ABI compatibility since June of last year. However, the point of the quote above is that it would be very easy to accidentally break ABI compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants