Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster shiftNRuneBytes #305

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zhangyunhao116
Copy link

@zhangyunhao116 zhangyunhao116 commented Mar 9, 2020

The new implementation is 1.1x faster than the previous version. Since the shiftNRuneBytes is a Hot Path in lookup function, a small improvement would be meaningful.

A simple benchmark for this change:

func BenchmarkShiftNRuneBytes(b *testing.B) {
	var x [4]byte
	x[0], x[1], x[2], x[3] = 1, 2, 3, 4
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		res = shiftNRuneBytes(x, b.N%4)
	}
}

benchstat:

name                  time/op
ShiftNRuneBytes-8     1.35ns ± 3%
ShiftNRuneBytesNew-8  1.23ns ± 2%

Furthermore, the new implementation instructions size reduce to 88(the previous implementation is 143).
Go assembly output:

"".shiftNRuneBytes STEXT nosplit size=143 args=0x18 locals=0x0
	0x0000 00000 (main.go:13)	TEXT	"".shiftNRuneBytes(SB), NOSPLIT|ABIInternal, $0-24
	0x0000 00000 (main.go:13)	MOVL	$0, "".~r2+24(SP)
	0x0008 00008 (main.go:14)	MOVQ	"".n+16(SP), AX
	0x000d 00013 (main.go:14)	CMPQ	AX, $1
	0x0011 00017 (main.go:17)	JGT	86
	0x0013 00019 (main.go:15)	TESTQ	AX, AX
	0x0016 00022 (main.go:15)	JEQ	77
	0x0018 00024 (main.go:14)	CMPQ	AX, $1
	0x001c 00028 (main.go:17)	JEQ	41
	0x001e 00030 (main.go:24)	MOVL	""..stmp_0(SB), AX
	0x0024 00036 (main.go:24)	MOVL	AX, "".~r2+24(SP)
	0x0028 00040 (main.go:24)	RET
	0x0029 00041 (main.go:18)	MOVBLZX	"".rb+9(SP), AX
	0x002e 00046 (main.go:18)	MOVBLZX	"".rb+10(SP), CX
	0x0033 00051 (main.go:18)	MOVBLZX	"".rb+11(SP), DX
	0x0038 00056 (main.go:18)	MOVL	$0, "".~r2+24(SP)
	0x0040 00064 (main.go:18)	MOVB	AL, "".~r2+24(SP)
	0x0044 00068 (main.go:18)	MOVB	CL, "".~r2+25(SP)
	0x0048 00072 (main.go:18)	MOVB	DL, "".~r2+26(SP)
	0x004c 00076 (main.go:18)	RET
	0x004d 00077 (main.go:16)	MOVL	"".rb+8(SP), AX
	0x0051 00081 (main.go:16)	MOVL	AX, "".~r2+24(SP)
	0x0055 00085 (main.go:16)	RET
	0x0056 00086 (main.go:19)	CMPQ	AX, $2
	0x005a 00090 (main.go:19)	JEQ	116
	0x005c 00092 (main.go:21)	CMPQ	AX, $3
	0x0060 00096 (main.go:21)	JNE	30
	0x0062 00098 (main.go:22)	MOVBLZX	"".rb+11(SP), AX
	0x0067 00103 (main.go:22)	MOVL	$0, "".~r2+24(SP)
	0x006f 00111 (main.go:22)	MOVB	AL, "".~r2+24(SP)
	0x0073 00115 (main.go:22)	RET
	0x0074 00116 (main.go:20)	MOVBLZX	"".rb+10(SP), AX
	0x0079 00121 (main.go:20)	MOVBLZX	"".rb+11(SP), CX
	0x007e 00126 (main.go:20)	MOVL	$0, "".~r2+24(SP)
	0x0086 00134 (main.go:20)	MOVB	AL, "".~r2+24(SP)
	0x008a 00138 (main.go:20)	MOVB	CL, "".~r2+25(SP)
	0x008e 00142 (main.go:20)	RET
"".shiftNRuneBytesNew STEXT nosplit size=88 args=0x18 locals=0x0
	0x0000 00000 (main.go:28)	TEXT	"".shiftNRuneBytesNew(SB), NOSPLIT|ABIInternal, $0-24
	0x0000 00000 (main.go:28)	MOVL	$0, "".res+24(SP)
	0x0008 00008 (main.go:29)	MOVQ	"".n+16(SP), AX
	0x000d 00013 (main.go:29)	TESTQ	AX, AX
	0x0010 00016 (main.go:29)	JEQ	79
	0x0012 00018 (main.go:31)	CMPQ	AX, $1
	0x0016 00022 (main.go:31)	JNE	44
	0x0018 00024 (main.go:32)	MOVWLZX	"".rb+9(SP), AX
	0x001d 00029 (main.go:32)	MOVBLZX	"".rb+11(SP), CX
	0x0022 00034 (main.go:32)	MOVW	AX, "".res+24(SP)
	0x0027 00039 (main.go:32)	MOVB	CL, "".res+26(SP)
	0x002b 00043 (main.go:38)	RET
	0x002c 00044 (main.go:33)	CMPQ	AX, $2
	0x0030 00048 (main.go:33)	JNE	62
	0x0032 00050 (main.go:34)	MOVWLZX	"".rb+10(SP), AX
	0x0037 00055 (main.go:34)	MOVW	AX, "".res+24(SP)
	0x003c 00060 (main.go:34)	JMP	43
	0x003e 00062 (main.go:35)	CMPQ	AX, $3
	0x0042 00066 (main.go:35)	JNE	43
	0x0044 00068 (main.go:36)	MOVBLZX	"".rb+11(SP), AX
	0x0049 00073 (main.go:36)	MOVB	AL, "".res+24(SP)
	0x004d 00077 (main.go:36)	JMP	43
	0x004f 00079 (main.go:30)	MOVL	"".rb+8(SP), AX
	0x0053 00083 (main.go:30)	MOVL	AX, "".res+24(SP)
	0x0057 00087 (main.go:30)	RET

Copy link
Owner

@julienschmidt julienschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the switch have an impact in the (reliably) measurable range? I'd personally prefer it for its slightly better readability.
We're speaking about sub-ns differences anyway.

@zhangyunhao116
Copy link
Author

zhangyunhao116 commented Mar 10, 2020

I also agree that the previous version is more readable. And I found that the previous benchmark was not reliable, since it was tested on my laptop.

A more precise benchmark for this change, the new implementation is 1.26x faster in linux environment. (CPU: 9700k goarch: amd64 goos: linux)
go test -bench=. -benchtime=100000000x -count=50
benchstat:

name                  time/op
ShiftNRuneBytes-8     1.24ns ± 0%
ShiftNRuneBytesNew-8  0.98ns ± 1%

Maybe it is worth to do that? it's your call :)

@julienschmidt
Copy link
Owner

Your change consists of two changes: using if/else instead of the switch and using assignments instead of assembling a new array. How does the following perform?

switch n {
case 0:
	return rb
case 1:
	res[0], res[1], res[2] = rb[1], rb[2], rb[3]
case 2:
	res[0], res[1] = rb[2], rb[3]
case 3:
	res[0] = rb[3]
}
return

@zhangyunhao116
Copy link
Author

According to the benchmark, looks like the performance improvement comes from both change the switch and using assignment. I guess there is a compiler optimization in this situation.

func shiftNRuneBytesNew2(rb [4]byte, n int) (res [4]byte) {
	switch n {
	case 0:
		return rb
	case 1:
		res[0], res[1], res[2] = rb[1], rb[2], rb[3]
	case 2:
		res[0], res[1] = rb[2], rb[3]
	case 3:
		res[0] = rb[3]
	}
	return
}

func shiftNRuneBytesNew3(rb [4]byte, n int) [4]byte {
	if n == 0 {
		return rb
	} else if n == 1 {
		return [4]byte{rb[1], rb[2], rb[3], 0}
	} else if n == 2 {
		return [4]byte{rb[2], rb[3]}
	} else if n == 3 {
		return [4]byte{rb[3]}
	}
	return [4]byte{}
}

benchstat

name                   time/op
ShiftNRuneBytes-8      1.24ns ± 1%
ShiftNRuneBytesNew-8   0.98ns ± 1%
ShiftNRuneBytesNew2-8  1.24ns ± 1%
ShiftNRuneBytesNew3-8  1.24ns ± 0%

@julienschmidt julienschmidt added this to the v1.4 milestone Jul 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants