Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful shutdown #11

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Graceful shutdown #11

wants to merge 2 commits into from

Conversation

kendru
Copy link

@kendru kendru commented Apr 30, 2021

This PR does 3 things:

  • Cause the child process to be stopped with SIGTERM instead of SIGKILL. SIGKILL is still sent if the process has not stopped after a fixed period.
  • Start the child process in its own process group so that stopping the child will also stop any of its owned processes.
  • Intercept SIGKILL/SIGTERM in uni itself and trigger a shutdown of its child process

Resolves #10

Copy link
Member

@brandonbloom brandonbloom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Comments are pretty minor. Up to you whether to address or not. I think it's correct for all the "Cases to handle". There is a comment with that string "Cases to handle" at the top of run.go, which we should probably move to internal/watch.go.

go func() {
_ = <-shutdown
fmt.Fprintf(os.Stderr, "Received shutdown signal. Waiting for process to finish.\n")
cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two things going on here that are not "library-safe", but I think it's OK because this is "cmd" code and not "internal" code.

  1. I believe you're supposed to defer cancel() right after you call WithCancel, so that you can cleanup any channel resources that are allocated in order to make cancelling possible.
  2. This go routine may block on shutdown forever.

Not sure there is any action for this comment, just thinking aloud as I review this code.

err = internal.Run(repo, runOpts)
ctx, cancel := context.WithCancel(cmd.Context())
shutdown := make(chan os.Signal, 1)
signal.Notify(shutdown, syscall.SIGINT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to handle SIGKILL too?

signal.Notify(shutdown, syscall.SIGINT)
go func() {
_ = <-shutdown
fmt.Fprintf(os.Stderr, "Received shutdown signal. Waiting for process to finish.\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda a lie, if we've got a timeout and then kill without waiting. Also, if we're logging this, we probably should log when that timeout fires. If we have both of those logging things, maybe they belong close together in the code?

@@ -118,10 +121,11 @@ if (typeof main === 'function') {
node.Stdin = os.Stdin
node.Stdout = os.Stdout
node.Stderr = os.Stderr
node.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this incantation. A comment would help.

@@ -132,11 +136,22 @@ func (proc *cmdProcess) Start() error {
return proc.cmd.Start()
}

func (proc *cmdProcess) Kill() error {
func (proc *cmdProcess) Stop() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b/c of the timeout, this really StopAndThenKill(), instead of either Stop or Kill. Maybe it's Kill(gracePeriod)?

@@ -152,9 +162,17 @@ func (opts buildAndWatch) Run() error {
close(abort)
return err
}
case <-ctx.Done():
close(abort)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewer note: this looks right b/c of the following return, so you can't unintentionally close the abort channel twice 👍🏻

_ = syscall.Kill(-proc.cmd.Process.Pid, syscall.SIGKILL)
}()

return syscall.Kill(-proc.cmd.Process.Pid, syscall.SIGTERM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is strange to me. We async start a kill after a sleep, then we attempt a term and return that error code. Then the calling code needs the waitDone logic b/c this Stop function doesn't actually wait for the kill. If instead, this function is Kill(gracePeriod), and you do the term first, then race between proc.cmd.Wait() and time.Sleep(), now the call site can just be Kill(gracePeriod) and the waitDone thing is not necessary. Right?

@@ -86,6 +87,11 @@ func (opts buildAndWatch) Run() error {
for {
proc := opts.CreateProcess()
done := make(chan error, 1)
waitDone := func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not necessary. See elsewhere.

@brandonbloom
Copy link
Member

Note: @kendru reported that this has some subtle bugs still.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Attempt to stop child process gracefully
2 participants