Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove submodules from checkout actions #1332

Merged
merged 2 commits into from
Jun 18, 2024
Merged

Conversation

ocefpaf
Copy link
Collaborator

@ocefpaf ocefpaf commented Jun 14, 2024

@jswhit I'm not a big fan of submodules b/c they add an extra layer of complexity, specially with code tracking/versioning. This change will ship the latest submodule code with the sdist. We are already using the same when building the wheels. Which, BTW, stomped me for a while until I realized they were missing.

Closes #1331 and #1329.

@sebastic
Copy link
Contributor

sdist are not a good choice for packagers because maintaining MANIFEST.in is too error prone, that's why the Debian package uses the GitHub tar archives.

Vendoring the bits of nc-complex that are required to build is a better choice.

@ocefpaf
Copy link
Collaborator Author

ocefpaf commented Jun 14, 2024

sdist are not a good choice for packagers because maintaining MANIFEST.in is too error prone, that's why the Debian package uses the GitHub tar archives.

In a perfect GitHub world I agree with you but I've seeing so many bad practices on both PyPI sdist and GH tarballs that I don't think one is better than the other. I'm one of the core maintainers for conda-forge and we deal with tons of Python scientific packages that, due to some "magic" are only possible to build from PyPI or GH depending on the level of acrobatics the maintainers are doing. Regardless, the use of submodules here broke both and we need to fix it.

Vendoring the bits of nc-complex that are required to build is a better choice.

Agreed. I'm not a main maintainer here. Let's see what they have to say. Maybe an alternative solution is to make that code optional. Not sure...

@jswhit
Copy link
Collaborator

jswhit commented Jun 14, 2024

@ZedThree - having nc-complex as a submodule in this repo is causing some issues with releases (see also issue #1331 and issue #1329). Do you have any suggestions on how to fix this? Could we just include a copy of the C source and header files here?

@sebastic
Copy link
Contributor

Do you have any suggestions on how to fix this?

Vendor the three required files, e.g. apply this patch:

https://salsa.debian.org/debian-gis-team/netcdf4-python/-/blob/debian/1.7.0-1/debian/patches/nc-complex.patch?ref_type=tags

@dopplershift
Copy link
Member

If you use setuptools_scm, you don't need to worry about MANIFEST.in.

@jswhit
Copy link
Collaborator

jswhit commented Jun 14, 2024

Do you have any suggestions on how to fix this?

Vendor the three required files, e.g. apply this patch:

https://salsa.debian.org/debian-gis-team/netcdf4-python/-/blob/debian/1.7.0-1/debian/patches/nc-complex.patch?ref_type=tags

by 'vendoring' you mean just including a fork of the needed source files directly in this repo?

@sebastic
Copy link
Contributor

by 'vendoring' you mean just including a fork of the needed source files directly in this repo?

Yes, just copy the files needed for netcdf4-python into its source tree. And copy them again when you would have updated the submodule for a newer version.

@jswhit
Copy link
Collaborator

jswhit commented Jun 14, 2024

If you use setuptools_scm, you don't need to worry about MANIFEST.in.

done in PR #1337

@aaraney
Copy link

aaraney commented Jun 14, 2024

I would recommend continuing to use the submodule over a naive vendoring approach. You lose provenance, version, and licensing information (the last not so applicable to this project). While submodules can be a pain, updating vendored code is far more error prone and will be more difficult maintain in the long term. It will also be more difficult to determine where and when breaking changes occurred.

@ocefpaf
Copy link
Collaborator Author

ocefpaf commented Jun 15, 2024

@aaraney while all that is true, the reasons why these where submodules in the first place are not strong enough to justify using it. Maybe this code should always have been part of the main code base.

I will modify this PR to remove the submodule init to conform with #1337.

@aaraney
Copy link

aaraney commented Jun 15, 2024

@aaraney while all that is true, the reasons why these where submodules in the first place are not strong enough to justify using it

Can you elaborate on why you think that is the case? I will also note, vendoring code in this way is also an increased security risk. It's much easier to verify where code came using submodules. A naive vendoring approach is much more susceptible to for example, a social engineering attack.

From other issues I've read, the main justification has been it's burdensome when building wheels in CI. This is remedied by initializing the submodules after the checkout step in the gh action and including the necessary files in MANIFEST.in.

@sebastic
Copy link
Contributor

From other issues I've read, the main justification has been it's burdensome when building wheels in CI. This is remedied by initializing the submodules after the checkout step in the gh action and including the necessary files in MANIFEST.in.

It also makes it impossible for distributions to build packages using the GitHub tarball. git archive does not recursively include the submodules. PyPI sdists are not an acceptable alternative, it only provides the latest version whereas prior versions can be downloaded via the GitHub tags.

@aaraney
Copy link

aaraney commented Jun 15, 2024

Thanks for the added context, @sebastic! That is really unfortunate... it looks like there have been several unsuccessful requests to add such a feature:

I the spirit of trying to find a compromise it seems like there are two main options:

  • Use a github action to add a release artifact that includes submodules. This is what pytorch does, for example (PR introducing feature, gh action workflow).
  • Vendor files from an external source (e.g. git repo) and use a tool in CI / local development to synchronize changes. For example, a tool like, vendir or something hand rolled. The tool should be configured using checked in files that leave a paper trail to the source and a version of the vendored files.

I lean slightly to the second option b.c. fewer / no upstream changes should be required by for example, package managers. Not to mention it avoids the whole xz security conversation. My main concern is provenance and specifically version / tag tracking for transparency and debugging sake (e.g. debugging netCDF4-python builds in a, lets call it, unique environment 😄).

@sebastic
Copy link
Contributor

If nc-complex installs its headers and code as a header-only library it doesn't need to be vendored, nor when it provides the functions via a shared library.

@aaraney
Copy link

aaraney commented Jun 15, 2024

If nc-complex installs its headers and code as a header-only library it doesn't need to be vendored, nor when it provides the functions via a shared library.

If I am following correctly, you are talking about the case when nc-complex is already compiled and present on the system?

@sebastic
Copy link
Contributor

If nc-complex installs its headers and code as a header-only library it doesn't need to be vendored, nor when it provides the functions via a shared library.

If I am following correctly, you are talking about the case when nc-complex is already compiled and present on the system?

Yes.

I haven't dug into which bits of nc-complex are being used by netcdf4-python, and whether or not it's feasible to expect those to be exposed by public headers and functions, but in theory it would remove the need to vendor this code in netcdf4-python.

@aaraney
Copy link

aaraney commented Jun 15, 2024

I see what you are saying. It could be treated like the netcdf-c and netcdf-fortran dependencies. I assume that's not being done because nc-complex is not readily available on package indexes?

@ocefpaf ocefpaf changed the title ship sdist with submodules remove submodules from checkout actions Jun 17, 2024
@ocefpaf
Copy link
Collaborator Author

ocefpaf commented Jun 17, 2024

This one will pass only after #1337 is merged.

@dopplershift
Copy link
Member

PyPI sdists are not an acceptable alternative, it only provides the latest version

You have access to every sdist that was ever uploaded to PyPI, provided they weren't deleted. Also, sdists on PyPI are static and never change (stable sha256) whereas the hashes for github tarballs have been known to change, which creates its own problems. I avoid using the github tarballs whenever I can for packaging.

@jswhit
Copy link
Collaborator

jswhit commented Jun 18, 2024

This one will pass only after #1337 is merged.

PR #1337 is merged.

@jswhit jswhit merged commit 36cdc8b into Unidata:master Jun 18, 2024
29 of 34 checks passed
@ocefpaf ocefpaf deleted the ship_submodules branch June 18, 2024 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1.7.0 fails to build (cc1: fatal error: external/nc_complex/src/nc_complex.c: No such file or directory)
5 participants