Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest retry if lockfile disappears #34

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bboreham
Copy link

I saw very occasional failures like open /tmp/ignite-snapshot.lock: no such file or directory", and suspect it could be here.

Seems possible if the lock file was present when we do the Lstat(), but has been removed by the time we get to reading it.

I considered re-doing the switch so this would be a case at top level, but went for minimal lines changed. Let me know if you prefer something else.

@nightlyone
Copy link
Owner

If the lockfile disappears and we cannot find the process owning it between existance and disappearance, the owner is probably not in the same pid space of the test runner or on a remote network file. So that use case of the lockfile package is not supported, since you cannot identify the PID of the owner. @bboreham can you confirm this theory?

@bboreham
Copy link
Author

bboreham commented Feb 5, 2021

I didn’t understand “cannot find the process owning it”.

What if the process has exited?

@zimmski
Copy link

zimmski commented Apr 27, 2021

We (@symflower) have the same problem: sometimes "no such file or directory" is returned as an error for LockFile.TryLock. The change provided by @bboreham fixes that (THANKS!).

Here is a reproducer which reproduces the problem for me in <1min, usually in a few seconds. With the change the problem is gone. I know it does not make sense to acquire a lock with this package like that, but it reproduces the problem and that was what i am aiming for.

package main

import (
	"fmt"
	"math/rand"
	"os"
	"sync"
	"time"

	"github.com/nightlyone/lockfile"
)

func main() {
	path, err := os.MkdirTemp("", "provoke-go-lockfile")
	if err != nil {
		panic(err)
	}

	var wg sync.WaitGroup
	wg.Add(1)

	for i := 0; i < 10; i++ {
		go func(i int) {
			for {
				fmt.Printf("%d: Create lock\n", i)

				lock, err := lockfile.New(path + "/you-better.lock")
				if err != nil {
					panic(err)
				}

				fmt.Printf("%d: Try to lock\n", i)

				for {
					err = lock.TryLock()
					if err == nil {
						break
					}

					if err != lockfile.ErrBusy && err != lockfile.ErrNotExist {
						panic(err)
					}

					time.Sleep(time.Millisecond)
				}

				fmt.Printf("%d: LOCKED\n", i)

				time.Sleep(time.Duration(rand.Int63n(2)+1) * time.Second)

				fmt.Printf("%d: Let's unlock\n", i)

				lock.Unlock()
			}
		}(i)
	}

	wg.Wait()
}

@chancila
Copy link

I ran into this bug as well...it's pretty easy to reproduce with many processes spamming for the lock.

this is just a race condition in your code, assuming a process p1 already has a lock, if a 2nd process p2 is trying to grab a lock and it successfully reaches the ownership check...that is to say it failed to create a link to it's own pid file, and it verified that the existing lockfile exists, between the call to os.SameFile(fiTmp, fiLock) and the read in GetOwner the lockfile can be deleted by p1.

@zimmski
Copy link

zimmski commented Jan 13, 2023

@nightlyone could you take another look at the change and the reproducer I posted. This is still a problem for us.

@vistaarjuneja
Copy link

hey @nightlyone! we're facing the same issue while trying multiple VM creations in parallel. Would be great if you can review this PR once

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants