Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky "unsafe pointer conversion"-panics with -race and go1.14rc1 #29

Open
imsodin opened this issue Feb 10, 2020 · 15 comments
Open

Flaky "unsafe pointer conversion"-panics with -race and go1.14rc1 #29

imsodin opened this issue Feb 10, 2020 · 15 comments

Comments

@imsodin
Copy link

imsodin commented Feb 10, 2020

I cannot reproduce the panic below with a minimal example and it occurs very often, but neither all the time nor in the same test - so this report might be entirely unhelpful, but I am reporting it anyway in case it makes any sense to you:

When running the Syncthing database testsuite on go1.14rc1 with race detection enabled on linux/amd64 I get the following panic:

$ go test -v -race -short -count 1 github.com/syncthing/syncthing/lib/db/ 2>&1 | tee test.out
fatal error: checkptr: unsafe pointer conversion

goroutine 148 [running]:
runtime.throw(0xd106c0, 0x23)
	/media/ext4_data/Linux/source/go/src/runtime/panic.go:1112 +0x72 fp=0xc00022ba50 sp=0xc00022ba20 pc=0x4630d2
runtime.checkptrAlignment(0xc00011e3f1, 0xc40880, 0x1)
	/media/ext4_data/Linux/source/go/src/runtime/checkptr.go:13 +0xd0 fp=0xc00022ba80 sp=0xc00022ba50 pc=0x4348a0
github.com/spaolacci/murmur3.(*digest128).bmix(0xc000123c80, 0xc00011e3f1, 0x20, 0x2f, 0x0, 0x48, 0x58)
	/media/ext4_data/Coding/go/pkg/mod/github.com/spaolacci/murmur3@v1.1.0/murmur128.go:67 +0xc0 fp=0xc00022bae8 sp=0xc00022ba80 pc=0xb649f0
github.com/spaolacci/murmur3.(*digest).Write(0xc000123c80, 0xc00011e3f1, 0x20, 0x2f, 0xb64474, 0xc000123c80, 0xc000123c80)
	/media/ext4_data/Coding/go/pkg/mod/github.com/spaolacci/murmur3@v1.1.0/murmur.go:51 +0x127 fp=0xc00022bb98 sp=0xc00022bae8 pc=0xb63dd7
github.com/spaolacci/murmur3.(*digest128).Write(0xc000123c80, 0xc00011e3f1, 0x20, 0x2f, 0x10, 0x10, 0x21)
	<autogenerated>:1 +0x6a fp=0xc00022bbf8 sp=0xc00022bb98 pc=0xb6558a
github.com/willf/bloom.baseHashes(0xc00011e3f1, 0x20, 0x2f, 0x0, 0x0, 0x0, 0x0)
	/media/ext4_data/Coding/go/pkg/mod/github.com/willf/bloom@v2.0.3+incompatible/bloom.go:97 +0xb9 fp=0xc00022bc68 sp=0xc00022bbf8 pc=0xb6d179
github.com/willf/bloom.(*BloomFilter).Test(0xc000134700, 0xc00011e3f1, 0x20, 0x2f, 0xdf6fe0)
	/media/ext4_data/Coding/go/pkg/mod/github.com/willf/bloom@v2.0.3+incompatible/bloom.go:183 +0x6b fp=0xc00022bd20 sp=0xc00022bc68 pc=0xb6deab
github.com/syncthing/syncthing/lib/db.(*Lowlevel).gcBlocks(0xc000098870, 0x0, 0x0)
	/media/ext4_data/Coding/go/src/github.com/syncthing/syncthing/lib/db/lowlevel.go:570 +0x4d4 fp=0xc00022beb8 sp=0xc00022bd20 pc=0xb798b4
github.com/syncthing/syncthing/lib/db.(*Lowlevel).gcRunner(0xc000098870)
	/media/ext4_data/Coding/go/src/github.com/syncthing/syncthing/lib/db/lowlevel.go:483 +0x279 fp=0xc00022bfd8 sp=0xc00022beb8 pc=0xb78fb9
runtime.goexit()
	/media/ext4_data/Linux/source/go/src/runtime/asm_amd64.s:1375 +0x1 fp=0xc00022bfe0 sp=0xc00022bfd8 pc=0x4969c1
created by github.com/syncthing/syncthing/lib/db.NewLowlevel
	/media/ext4_data/Coding/go/src/github.com/syncthing/syncthing/lib/db/lowlevel.go:65 +0x36a

(rest of the backtrace does not include murmur3 nor bloom).

Do you have any ideas what this might be about? Thanks in advance for any help.

@calmh
Copy link

calmh commented Feb 26, 2020

I suspect the problem is that there is no guarantee that the alignment is correct after the pointer conversion here:

t := (*[2]uint64)(unsafe.Pointer(&p[i*16]))

@calmh
Copy link

calmh commented Feb 26, 2020

This test reproduces it reliably:

func TestUnalignedWrite(t *testing.T) {
	b := make([]byte, 128)
	for i := 0; i < 16; i++ {
		Sum128(b[i:])
	}
}

(Also one of the existing string tests)

calmh added a commit to calmh/murmur3 that referenced this issue Feb 26, 2020
This removes the incorrect unsafe usage, instead using the marshalling
operations from encoding/binary.

Obviously, it's a bit slower:

	benchmark                    old ns/op     new ns/op     delta
	Benchmark32/1-8              4.68          5.02          +7.26%
	Benchmark32/2-8              5.15          5.16          +0.19%
	Benchmark32/4-8              5.05          5.56          +10.10%
	Benchmark32/8-8              6.10          6.95          +13.93%
	Benchmark32/16-8             8.18          9.94          +21.52%
	Benchmark32/32-8             12.5          15.4          +23.20%
	Benchmark32/64-8             20.7          26.7          +28.99%
	Benchmark32/128-8            38.3          55.9          +45.95%
	Benchmark32/256-8            75.8          94.1          +24.14%
	Benchmark32/512-8            150           191           +27.33%
	Benchmark32/1024-8           299           375           +25.42%
	Benchmark32/2048-8           586           730           +24.57%
	Benchmark32/4096-8           1170          1450          +23.93%
	Benchmark32/8192-8           2347          2888          +23.05%
	BenchmarkPartial32/8-8       157           155           -1.27%
	BenchmarkPartial32/16-8      185           184           -0.54%
	BenchmarkPartial32/32-8      243           240           -1.23%
	BenchmarkPartial32/64-8      201           206           +2.49%
	BenchmarkPartial32/128-8     223           240           +7.62%
	Benchmark64/1-8              12.3          12.2          -0.81%
	Benchmark64/2-8              12.8          12.9          +0.78%
	Benchmark64/4-8              14.2          13.9          -2.11%
	Benchmark64/8-8              16.1          15.5          -3.73%
	Benchmark64/16-8             13.7          15.3          +11.68%
	Benchmark64/32-8             16.2          19.2          +18.52%
	Benchmark64/64-8             20.3          26.4          +30.05%
	Benchmark64/128-8            28.8          40.8          +41.67%
	Benchmark64/256-8            47.3          68.8          +45.45%
	Benchmark64/512-8            81.8          126           +54.03%
	Benchmark64/1024-8           151           242           +60.26%
	Benchmark64/2048-8           291           477           +63.92%
	Benchmark64/4096-8           566           948           +67.49%
	Benchmark64/8192-8           1115          1873          +67.98%
	Benchmark128/1-8             12.9          12.9          +0.00%
	Benchmark128/2-8             13.2          13.5          +2.27%
	Benchmark128/4-8             14.3          14.2          -0.70%
	Benchmark128/8-8             15.9          16.1          +1.26%
	Benchmark128/16-8            13.6          16.0          +17.65%
	Benchmark128/32-8            15.9          20.0          +25.79%
	Benchmark128/64-8            20.5          26.9          +31.22%
	Benchmark128/128-8           29.1          41.2          +41.58%
	Benchmark128/256-8           46.5          69.6          +49.68%
	Benchmark128/512-8           81.0          126           +55.56%
	Benchmark128/1024-8          150           244           +62.67%
	Benchmark128/2048-8          292           471           +61.30%
	Benchmark128/4096-8          566           921           +62.72%
	Benchmark128/8192-8          1115          1868          +67.53%

Nonetheless, it's correct and doesn't crash.
calmh added a commit to calmh/murmur3 that referenced this issue Feb 26, 2020
This removes the incorrect unsafe usage, instead using the marshalling
operations from encoding/binary.

Obviously, it's a bit slower:

	benchmark                    old ns/op     new ns/op     delta
	Benchmark32/1-8              4.68          5.02          +7.26%
	Benchmark32/2-8              5.15          5.16          +0.19%
	Benchmark32/4-8              5.05          5.56          +10.10%
	Benchmark32/8-8              6.10          6.95          +13.93%
	Benchmark32/16-8             8.18          9.94          +21.52%
	Benchmark32/32-8             12.5          15.4          +23.20%
	Benchmark32/64-8             20.7          26.7          +28.99%
	Benchmark32/128-8            38.3          55.9          +45.95%
	Benchmark32/256-8            75.8          94.1          +24.14%
	Benchmark32/512-8            150           191           +27.33%
	Benchmark32/1024-8           299           375           +25.42%
	Benchmark32/2048-8           586           730           +24.57%
	Benchmark32/4096-8           1170          1450          +23.93%
	Benchmark32/8192-8           2347          2888          +23.05%
	BenchmarkPartial32/8-8       157           155           -1.27%
	BenchmarkPartial32/16-8      185           184           -0.54%
	BenchmarkPartial32/32-8      243           240           -1.23%
	BenchmarkPartial32/64-8      201           206           +2.49%
	BenchmarkPartial32/128-8     223           240           +7.62%
	Benchmark64/1-8              12.3          12.2          -0.81%
	Benchmark64/2-8              12.8          12.9          +0.78%
	Benchmark64/4-8              14.2          13.9          -2.11%
	Benchmark64/8-8              16.1          15.5          -3.73%
	Benchmark64/16-8             13.7          15.3          +11.68%
	Benchmark64/32-8             16.2          19.2          +18.52%
	Benchmark64/64-8             20.3          26.4          +30.05%
	Benchmark64/128-8            28.8          40.8          +41.67%
	Benchmark64/256-8            47.3          68.8          +45.45%
	Benchmark64/512-8            81.8          126           +54.03%
	Benchmark64/1024-8           151           242           +60.26%
	Benchmark64/2048-8           291           477           +63.92%
	Benchmark64/4096-8           566           948           +67.49%
	Benchmark64/8192-8           1115          1873          +67.98%
	Benchmark128/1-8             12.9          12.9          +0.00%
	Benchmark128/2-8             13.2          13.5          +2.27%
	Benchmark128/4-8             14.3          14.2          -0.70%
	Benchmark128/8-8             15.9          16.1          +1.26%
	Benchmark128/16-8            13.6          16.0          +17.65%
	Benchmark128/32-8            15.9          20.0          +25.79%
	Benchmark128/64-8            20.5          26.9          +31.22%
	Benchmark128/128-8           29.1          41.2          +41.58%
	Benchmark128/256-8           46.5          69.6          +49.68%
	Benchmark128/512-8           81.0          126           +55.56%
	Benchmark128/1024-8          150           244           +62.67%
	Benchmark128/2048-8          292           471           +61.30%
	Benchmark128/4096-8          566           921           +62.72%
	Benchmark128/8192-8          1115          1868          +67.53%

Nonetheless, it's correct and doesn't crash.
@geoah
Copy link

geoah commented Feb 26, 2020

Sigh. go1.14 is officially out (yay), and checkptr validation is now a real thing :D

@calmh thank you for taking the time to work on this, I stole your test to try and solve this the way go-cmp did, with #31 being the result.

Benchmark results seem a bit better than using encoding/binary but not sure if it's enough to worth keeping unsafe around there.

Sigh. Sorry it's getting late, that was completely wrong, let me try again. :D

Converting a Pointer to a uintptr produces the memory address of the value pointed at, as an integer. The usual use for such a uintptr is to print it.

Conversion of a uintptr back to Pointer is not valid in general.

A uintptr is an integer, not a reference. Converting a Pointer to a uintptr creates an integer value with no pointer semantics. Even if a uintptr holds the address of some object, the garbage collector will not update that uintptr’s value if the object moves, nor will that uintptr keep the object from being reclaimed.

If p points into an allocated object, it can be advanced through the object by conversion to uintptr, addition of an offset, and conversion back to Pointer.

I was going for something like this but checkptr still doesn't like it.

        nblocks := len(p) / 4
        for i := 0; i < nblocks; i++ {
-               k1 := *(*uint32)(unsafe.Pointer(&p[i*4]))
+               k1 := *(*uint32)(unsafe.Pointer(uintptr(unsafe.Pointer(&p[0])) + uintptr(i*4)*unsafe.Sizeof(&p[0])))

@calmh
Copy link

calmh commented Feb 27, 2020

The check enforces this:

When converting unsafe.Pointer to *T, the resulting pointer must be aligned appropriately for T.

Any given byte is not going to be properly aligned for uint32 or uint64 so we can’t assume that we can do that conversion unfortunately.

@twmb
Copy link

twmb commented Feb 28, 2020

Fundamentally, the unsafe code needs changing. Feel free to take my patches: twmb/murmur3@b0c3ada...HEAD

Note that using binary.LittleEndian (or my path) will resolve this repos tickets around issues across architectures, as well.

@calmh
Copy link

calmh commented Feb 28, 2020

That patch looks better than mine, which was just a quick fix. We might end up depending on your repo instead, @twmb. :)

@mangup
Copy link

mangup commented Apr 1, 2020

not merged. So, I've switched my code to github.com/DataDog/mmh3

@oliverpool
Copy link

The fork of @twmb seems quite nice: https://github.com/twmb/murmur3

@twmb
Copy link

twmb commented Apr 2, 2020

FWIW DataDog/mmh3 theoretically same unaligned input problem, but I think it goes through unsafe & reflect.SliceHeader in a roundabout enough way that the new checks don't catch it:

        h := *(*reflect.SliceHeader)(unsafe.Pointer(&key))
        h.Len = nblocks * 2
        b := *(*[]uint64)(unsafe.Pointer(&h))

@kumakichi
Copy link

seems commit 229247d33b has already solved this problem, maybe we can close this issue?

@twmb
Copy link

twmb commented Jul 2, 2020

Likely, but for what it's worth only the warning was removed, the underlying issue the warning was about still exists. Given that nobody complained about the issue in years, it may not a large issue.

"The main benefit of the check is to catch people making alignment assumptions on x86, where their program would then fail on another architecture. That just doesn't seem that much of a pain point to me. Even if it fails on another architecture, at least it does so loudly, without silently corrupting data"

@jgheewala
Copy link

I am facing similar issue when migrating to go1.14.x

fatal error: checkptr: unsafe pointer arithmetic

goroutine 26 [running]:
runtime.throw(0xecd19c, 0x23)
/usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc000054d68 sp=0xc000054d38 pc=0x467402
runtime.checkptrArithmetic(0xc000054e38, 0x0, 0x0, 0x0)
/usr/local/go/src/runtime/checkptr.go:43 +0xb5 fp=0xc000054d98 sp=0xc000054d68 pc=0x438a95
github.com/spaolacci/murmur3.Sum32WithSeed(0xc000054e38, 0x1b, 0x20, 0xc000000000, 0x1b)
/go/pkg/mod/github.com/spaolacci/murmur3@v1.1.0/murmur32.go:129 +0x8a fp=0xc000054e00 sp=0xc000054d98 pc=0xcf070a
github.com/spaolacci/murmur3.Sum32(...)
/go/pkg/mod/github.com/spaolacci/murmur3@v1.1.0/murmur32.go:111

@dragonsinth
Copy link

This is still broken in go1.15.5 darwin/amd64 for me. In particular, murmur32 is tripping checkptr even when the pointer is actually aligned. Seems like a bug in Go to me.

@sheregeda
Copy link

sheregeda commented Nov 22, 2023

I am facing the same issue

fatal error: checkptr: pointer arithmetic result points to invalid allocation

goroutine 46 [running]:
runtime.throw({0x1036bb2d3?, 0x3cb?})
	/Users/sheregeda/.gvm/gos/go1.19.8/src/runtime/panic.go:1047 +0x40 fp=0xc0002ad0f0 sp=0xc0002ad0c0 pc=0x102e03100
runtime.checkptrArithmetic(0x50?, {0x0, 0x0, 0xc0007da358?})
	/Users/sheregeda/.gvm/gos/go1.19.8/src/runtime/checkptr.go:69 +0xac fp=0xc0002ad120 sp=0xc0002ad0f0 pc=0x102dd063c
github.com/spaolacci/murmur3.Sum32WithSeed({0xc0002ad248, 0xd, 0x20}, 0x0)
	/Users/sheregeda/.gvm/pkgsets/go1.19.8/tabby-notify/pkg/mod/github.com/spaolacci/murmur3@v1.1.0/murmur32.go:129 +0x7c fp=0xc0002ad190 sp=0xc0002ad120 pc=0x1035f581c
github.com/spaolacci/murmur3.Sum32(...)
	/Users/sheregeda/.gvm/pkgsets/go1.19.8/tabby-notify/pkg/mod/github.com/spaolacci/murmur3@v1.1.0/murmur32.go:111
gitlab.com/tabby.ai/notify/internal/services/sms.generateSeedByPhone(...)
	/Users/sheregeda/work/tabby/notify/internal/services/sms/service.go:312
gitlab.com/tabby.ai/notify/internal/services/sms.shuffleRouting({0xc0003fc2d0, 0x3, 0x10368dca8?}, {0xc00011b830, 0x3, 0x102dd06d0?}, {0xc0007c9953, 0xd})
	/Users/sheregeda/work/tabby/notify/internal/services/sms/service.go:289 +0x100 fp=0xc0002ad350 sp=0xc0002ad190 pc=0x10365b860
gitlab.com/tabby.ai/notify/internal/services/sms.(*Service).getProviders(0xc000595bd0, {0x103a6b2f0, 0xc0001a6018}, {0x102e47940?}, 0x0)
	/Users/sheregeda/work/tabby/notify/internal/services/sms/service.go:205 +0x74c fp=0xc0002ad690 sp=0xc0002ad350 pc=0x10365ac2c
gitlab.com/tabby.ai/notify/internal/services/sms.(*Service).SendSMS(0xc000595bd0, {0x103a6b2f0, 0xc0001a6018}, {0x0?}, {0x10368c0af, 0x9}, 0x103f63960?)
	/Users/sheregeda/work/tabby/notify/internal/services/sms/service.go:121 +0x280 fp=0xc0002ad900 sp=0xc0002ad690 pc=0x103659c30
gitlab.com/tabby.ai/notify/internal/services/sms.(*ServiceTestSuite).Test_SendSMS.func2()
	/Users/sheregeda/work/tabby/notify/internal/services/sms/service_test.go:624 +0x58c fp=0xc0002ade50 sp=0xc0002ad900 pc=0x10366217c
github.com/stretchr/testify/suite.(*Suite).Run.func2(0x0?)
	/Users/sheregeda/.gvm/pkgsets/go1.19.8/tabby-notify/pkg/mod/github.com/stretchr/testify@v1.8.2/suite/suite.go:112 +0x4c fp=0xc0002ade90 sp=0xc0002ade50 pc=0x103642fec
testing.tRunner(0xc0004df040, 0xc0001bc780)
	/Users/sheregeda/.gvm/gos/go1.19.8/src/testing/testing.go:1446 +0x18c fp=0xc0002adfa0 sp=0xc0002ade90 pc=0x102f1c18c
testing.(*T).Run.func1()
	/Users/sheregeda/.gvm/gos/go1.19.8/src/testing/testing.go:1493 +0x44 fp=0xc0002adfd0 sp=0xc0002adfa0 pc=0x102f1d2f4
runtime.goexit()
	/Users/sheregeda/.gvm/gos/go1.19.8/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc0002adfd0 sp=0xc0002adfd0 pc=0x102e38bf4
created by testing.(*T).Run
	/Users/sheregeda/.gvm/gos/go1.19.8/src/testing/testing.go:1493 +0x560

@oliverpool
Copy link

@sheregeda did you read the thread? Especially #29 (comment)

The fork of @twmb seems quite nice: https://github.com/twmb/murmur3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants