-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve symlink contents in sdists #713
base: main
Are you sure you want to change the base?
Conversation
Wheels do not support symlinks (as they are ZIP files), but tarballs do. For consistent results however, expand those symlinks to the files they reference. This is consistent with 0.16.0, and fixes this regression in 0.17.0.
35d3a09
to
b1eb285
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that having the result of meson dist
contain symbolic links and expect meson-python
to turn them into regular files is a legitimate expectation. Why do you have symbolic links in your repository? I also have a few comments regarding the implementation.
if member.islnk() or member.issym(): | ||
# Symlinks are relative to member directory, but hard links are relative to tarball root. | ||
path = member.name.rsplit('/', 1)[0] + '/' if member.issym() else '' | ||
orig = meson_dist.getmember(path + member.linkname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are meson or git enforcing that symlinks do not point to location outside the tarball or outside the repository?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git does not appear to enforce any rules about committed symlinks.
# Symlinks are relative to member directory, but hard links are relative to tarball root. | ||
path = member.name.rsplit('/', 1)[0] + '/' if member.issym() else '' | ||
orig = meson_dist.getmember(path + member.linkname) | ||
member = copy.copy(member) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the copy here necessary?
member.mode = orig.mode | ||
member.mtime = orig.mtime | ||
member.size = orig.size | ||
member.type = tarfile.REGTYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't the link point to a directory, or to a special file?
# Meson 1.1.0, see https://github.com/mesonbuild/meson/pull/11432. | ||
# Run the test anyway to ensure that meson-python can produce a | ||
# wheel also for older versions of Meson. | ||
if MESON_VERSION >= (1, 1, 99): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't use exlude_files
or exclude_directories
in your test, why this version check?
This is not a regression. We decided to do not support symbolic links and other special files. Supporting them is not exactly trivial and I didn't see an use case for them. Why does matplotlib need symbolic links in its source repository? |
Is this documented someplace? We looked at the release notes for 0.17 and did not see any mention of this change. As this appeared to be an undocumented behavior change that broke us, from our point of view it looked like a regression.
We have files that we want to keep in-sync that we want to have different names (the second name is imposed on us via GTK) and while there are a bunch of ways we could solve this (just have two copies and be careful to keep them the same or add a build step that makes the second copy) using a symlink is the simplest way (trust the file system to do its job and git sorts it out on windows). I suspect reasonable people will disagree about if this is a "good enough" reason or not. However, I am not sure that it is worth debating the merits of our use of symlinks. It is a fact that we do and I can not believe mpl is the only project that will trip on this so what can we do to save the next project? My preference would be to restore the behavior from 0.16, but if that is not on the table I think the second best option is to hard-fail the build when symlinks are encountered. The behavior in 0.17 of just dropping the files is the worst of all options and I would still argue is a bug even if not supporting symlinks is the intended behavior. |
I was just typing a reply that kinda overlaps with @tacaswell's. Whether this is a regression or not is imho a semantic difference that isn't super important. This looks like a case of Hyrum's Law: we didn't explicitly promise or test that symlinks work for an sdist, but Matplotlib did rely on it and now it broke. Since matplotlib/matplotlib#29229 looks like it has a pretty high impact, let's see what we can do here to mitigate it - either temporarily or permanently. Re symlinks in general: I still have hope for support for symlinks in wheels materializing (it's been giving a thumbs up in principle, however wheel 2.0 is a heavy lift). So if the change in this PR isn't too onerous, supporting symlinks in sdist creation doesn't seem unreasonable to me. If that sdist creation turns symlinks into copies, that means wheel builds work too, so there is a use case (and IIRC we've had questions about this before, like with |
I was not trying to argue that we should not support symlinks. I was just curious to see what is the motivation to have them in the repository of a cross-platform project, given that AFAIK git turns them into regular files on Windows. The main problem with supporting symlinks is with sanitizing their destination. Including a file in the sdist outside the repository is obviously bad for reproducibility, but it is also a data exfiltration vector that I don't think we should support. Other than that, supporting symlinks to regular files within the repository by turning them into regular files is most likely a desirable feature. I'm much less convinced that hard links should be supported. I don't think git has support for hard links anyway. Does it? Someone could be creating hard links in a dist-script, but I cannot think about any reason why someone would do that.
IIRC
Unfortunately that would be a compatibility break as bad as silently dropping support for symlinks: someone may be relying on the fact that special files are silently dropped and their build would be broken by this change. However, it is a good idea to add a warning for special files that do not make into the sdist. |
Anecdotally, meson itself has a bunch of symlinks in the testsuite, some of which are because we are in fact testing the functionality of meson functions that are documented to interact with symlinks in specific ways. We had to delete them from git and start creating them in |
Wheels do not support symlinks (as they are ZIP files), but tarballs do. For consistent results however, expand those symlinks to the files they reference.
This is consistent with 0.16.0, and fixes this regression in 0.17.0.
In 0.16.0, the test was whether the source directory path was a file:
https://github.com/mesonbuild/meson-python/blob/0.16.0/mesonpy/__init__.py#L893
Then this file would be opened and inserted into the tarball:
https://github.com/mesonbuild/meson-python/blob/0.16.0/mesonpy/__init__.py#L909-L910
This path check returns True for symlinks, and the file opening would insert the target contents into the tarball.
As of 0.17.0 (specifically #587), the check is whether the
TarInfo
is a file:https://github.com/mesonbuild/meson-python/blob/0.17.0/mesonpy/__init__.py#L882
This does not return True for a symlink, and then they get skipped.
If we just check
member.issym()
, then what ends up in the tarball is a symlink, so we need to copy information from the original in order to produce the contents as used to occur in 0.16.0.