Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the way GAP terminates subprocesses to avoid a race condition that could lead to unwanted warnings or even hangs #5804

Merged
merged 1 commit into from
Oct 2, 2024

Conversation

fingolfin
Copy link
Member

This might help with oscar-system/Oscar.jl#4136 where we the singular GAP package is loaded, and when GAP exits, an AtExit function is run which ends up invoking CloseStream and thus CLOSE_PTY_IOSTREAM -- and we end up closing the file descriptor to the child process before the child properly terminated.

It seems logical to me to call kill, waitpid, close in that order, which this PR implements. Before we did kill, close, waitpid.

Note that I have not yet confirmed this actually fixes the issue (and that issue is not always reproducible, and apparently not at all on macOS, just on Linux... Ah well)

@fingolfin fingolfin added kind: bug Issues describing general bugs, and PRs fixing them topic: kernel labels Sep 24, 2024
@fingolfin
Copy link
Member Author

In some quick tests it seemed to resolve the issue.

@ChrisJefferson
Copy link
Contributor

It wouldn't shock me if this upset something else -- this kind of thing is hard to do. The better option (but this would be a much bigger change) would be to stop singular printing to stderr, because then it can shout all it likes as it is closed, and it won't disturb anyone.

@fingolfin
Copy link
Member Author

The printing is the least concern; but it also sometimes hangs. Presumably because it is trying to interact with its stdin or stdout which is now gone. So silencing stderr here would in a sense make things worse because when it hangs you'd have even less of a clue as to what is wrong.

@fingolfin
Copy link
Member Author

@ChrisJefferson I've been testing this for a week now and it works well.

I don't see how this could cause problems, do you have anything specific in mind, @ChrisJefferson ?

@ChrisJefferson
Copy link
Contributor

I didn't have any concrete plan, just experience that process cleaning up a annoying :) But seeing as it seems to be improving things, lets try merging and see if anything else pops up.

@ChrisJefferson ChrisJefferson merged commit 00e1e10 into gap-system:master Oct 2, 2024
30 checks passed
@fingolfin fingolfin deleted the mh/kill-wait branch October 4, 2024 07:50
@fingolfin fingolfin changed the title kernel: wait for child process to die before closing pty Improve the way GAP kills of subprocesses to avoid a race condition that could lead to unwanted warnings or even hangs Oct 14, 2024
@fingolfin fingolfin added the release notes: use title For PRs: the title of this PR is suitable for direct use in the release notes label Oct 14, 2024
@fingolfin fingolfin changed the title Improve the way GAP kills of subprocesses to avoid a race condition that could lead to unwanted warnings or even hangs Improve the way GAP terminates subprocesses to avoid a race condition that could lead to unwanted warnings or even hangs Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: bug Issues describing general bugs, and PRs fixing them release notes: use title For PRs: the title of this PR is suitable for direct use in the release notes topic: kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants