-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful shutdown #11
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Comments are pretty minor. Up to you whether to address or not. I think it's correct for all the "Cases to handle". There is a comment with that string "Cases to handle" at the top of run.go, which we should probably move to internal/watch.go.
go func() { | ||
_ = <-shutdown | ||
fmt.Fprintf(os.Stderr, "Received shutdown signal. Waiting for process to finish.\n") | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two things going on here that are not "library-safe", but I think it's OK because this is "cmd" code and not "internal" code.
- I believe you're supposed to
defer cancel()
right after you callWithCancel
, so that you can cleanup any channel resources that are allocated in order to make cancelling possible. - This go routine may block on
shutdown
forever.
Not sure there is any action for this comment, just thinking aloud as I review this code.
err = internal.Run(repo, runOpts) | ||
ctx, cancel := context.WithCancel(cmd.Context()) | ||
shutdown := make(chan os.Signal, 1) | ||
signal.Notify(shutdown, syscall.SIGINT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to handle SIGKILL
too?
signal.Notify(shutdown, syscall.SIGINT) | ||
go func() { | ||
_ = <-shutdown | ||
fmt.Fprintf(os.Stderr, "Received shutdown signal. Waiting for process to finish.\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kinda a lie, if we've got a timeout and then kill without waiting. Also, if we're logging this, we probably should log when that timeout fires. If we have both of those logging things, maybe they belong close together in the code?
@@ -118,10 +121,11 @@ if (typeof main === 'function') { | |||
node.Stdin = os.Stdin | |||
node.Stdout = os.Stdout | |||
node.Stderr = os.Stderr | |||
node.SysProcAttr = &syscall.SysProcAttr{Setpgid: true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this incantation. A comment would help.
@@ -132,11 +136,22 @@ func (proc *cmdProcess) Start() error { | |||
return proc.cmd.Start() | |||
} | |||
|
|||
func (proc *cmdProcess) Kill() error { | |||
func (proc *cmdProcess) Stop() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b/c of the timeout, this really StopAndThenKill(), instead of either Stop or Kill. Maybe it's Kill(gracePeriod)?
@@ -152,9 +162,17 @@ func (opts buildAndWatch) Run() error { | |||
close(abort) | |||
return err | |||
} | |||
case <-ctx.Done(): | |||
close(abort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewer note: this looks right b/c of the following return, so you can't unintentionally close the abort channel twice 👍🏻
_ = syscall.Kill(-proc.cmd.Process.Pid, syscall.SIGKILL) | ||
}() | ||
|
||
return syscall.Kill(-proc.cmd.Process.Pid, syscall.SIGTERM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is strange to me. We async start a kill after a sleep, then we attempt a term and return that error code. Then the calling code needs the waitDone
logic b/c this Stop function doesn't actually wait for the kill. If instead, this function is Kill(gracePeriod), and you do the term first, then race between proc.cmd.Wait() and time.Sleep(), now the call site can just be Kill(gracePeriod)
and the waitDone
thing is not necessary. Right?
@@ -86,6 +87,11 @@ func (opts buildAndWatch) Run() error { | |||
for { | |||
proc := opts.CreateProcess() | |||
done := make(chan error, 1) | |||
waitDone := func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not necessary. See elsewhere.
Note: @kendru reported that this has some subtle bugs still. |
This PR does 3 things:
Resolves #10