Michael Munday
f31a18ded4
cmd/compile: add some generic composite type optimizations
...
Propagate values through some wide Zero/Move operations. Among
other things this allows us to optimize some kinds of array
initialization. For example, the following code no longer
requires a temporary be allocated on the stack. Instead it
writes the values directly into the return value.
func f(i uint32) [4]uint32 {
return [4]uint32{i, i+1, i+2, i+3}
}
The return value is unnecessarily cleared but removing that is
probably a task for dead store analysis (I think it needs to
be able to match multiple Store ops to wide Zero ops).
In order to reliably remove stack variables that are rendered
unnecessary by these new rules I've added a new generic version
of the unread autos elimination pass.
These rules are triggered more than 5000 times when building and
testing the standard library.
Updates #15925 (fixes for arrays of up to 4 elements).
Updates #24386 (fixes for up to 4 kept elements).
Updates #24416 .
compilebench results:
name old time/op new time/op delta
Template 353ms ± 5% 359ms ± 3% ~ (p=0.143 n=10+10)
Unicode 219ms ± 1% 217ms ± 4% ~ (p=0.740 n=7+10)
GoTypes 1.26s ± 1% 1.26s ± 2% ~ (p=0.549 n=9+10)
Compiler 6.00s ± 1% 6.08s ± 1% +1.42% (p=0.000 n=9+8)
SSA 15.3s ± 2% 15.6s ± 1% +2.43% (p=0.000 n=10+10)
Flate 237ms ± 2% 240ms ± 2% +1.31% (p=0.015 n=10+10)
GoParser 285ms ± 1% 285ms ± 1% ~ (p=0.878 n=8+8)
Reflect 797ms ± 3% 807ms ± 2% ~ (p=0.065 n=9+10)
Tar 334ms ± 0% 335ms ± 4% ~ (p=0.460 n=8+10)
XML 419ms ± 0% 423ms ± 1% +0.91% (p=0.001 n=7+9)
StdCmd 46.0s ± 0% 46.4s ± 0% +0.85% (p=0.000 n=9+9)
name old user-time/op new user-time/op delta
Template 337ms ± 3% 346ms ± 5% ~ (p=0.053 n=9+10)
Unicode 205ms ±10% 205ms ± 8% ~ (p=1.000 n=10+10)
GoTypes 1.22s ± 2% 1.21s ± 3% ~ (p=0.436 n=10+10)
Compiler 5.85s ± 1% 5.93s ± 0% +1.46% (p=0.000 n=10+8)
SSA 14.9s ± 1% 15.3s ± 1% +2.62% (p=0.000 n=10+10)
Flate 229ms ± 4% 228ms ± 6% ~ (p=0.796 n=10+10)
GoParser 271ms ± 3% 275ms ± 4% ~ (p=0.165 n=10+10)
Reflect 779ms ± 5% 775ms ± 2% ~ (p=0.971 n=10+10)
Tar 317ms ± 4% 319ms ± 5% ~ (p=0.853 n=10+10)
XML 404ms ± 4% 409ms ± 5% ~ (p=0.436 n=10+10)
name old alloc/op new alloc/op delta
Template 34.9MB ± 0% 35.0MB ± 0% +0.26% (p=0.000 n=10+10)
Unicode 29.3MB ± 0% 29.3MB ± 0% +0.02% (p=0.000 n=10+10)
GoTypes 115MB ± 0% 115MB ± 0% +0.30% (p=0.000 n=10+10)
Compiler 519MB ± 0% 521MB ± 0% +0.30% (p=0.000 n=10+10)
SSA 1.55GB ± 0% 1.57GB ± 0% +1.34% (p=0.000 n=10+9)
Flate 24.1MB ± 0% 24.2MB ± 0% +0.10% (p=0.000 n=10+10)
GoParser 28.1MB ± 0% 28.1MB ± 0% +0.07% (p=0.000 n=10+10)
Reflect 78.7MB ± 0% 78.7MB ± 0% +0.03% (p=0.000 n=8+10)
Tar 34.4MB ± 0% 34.5MB ± 0% +0.12% (p=0.000 n=10+10)
XML 43.2MB ± 0% 43.2MB ± 0% +0.13% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Template 330k ± 0% 330k ± 0% -0.01% (p=0.017 n=10+10)
Unicode 337k ± 0% 337k ± 0% +0.01% (p=0.000 n=9+10)
GoTypes 1.15M ± 0% 1.15M ± 0% +0.03% (p=0.000 n=10+10)
Compiler 4.77M ± 0% 4.77M ± 0% +0.03% (p=0.000 n=9+10)
SSA 12.5M ± 0% 12.6M ± 0% +1.16% (p=0.000 n=10+10)
Flate 221k ± 0% 221k ± 0% +0.05% (p=0.000 n=9+10)
GoParser 275k ± 0% 275k ± 0% +0.01% (p=0.014 n=10+9)
Reflect 944k ± 0% 944k ± 0% -0.02% (p=0.000 n=10+10)
Tar 324k ± 0% 323k ± 0% -0.12% (p=0.000 n=10+10)
XML 384k ± 0% 384k ± 0% -0.01% (p=0.001 n=10+10)
name old object-bytes new object-bytes delta
Template 476kB ± 0% 476kB ± 0% -0.04% (p=0.000 n=10+10)
Unicode 218kB ± 0% 218kB ± 0% ~ (all equal)
GoTypes 1.58MB ± 0% 1.58MB ± 0% -0.04% (p=0.000 n=10+10)
Compiler 6.25MB ± 0% 6.24MB ± 0% -0.09% (p=0.000 n=10+10)
SSA 15.9MB ± 0% 16.1MB ± 0% +1.22% (p=0.000 n=10+10)
Flate 304kB ± 0% 304kB ± 0% -0.13% (p=0.000 n=10+10)
GoParser 370kB ± 0% 370kB ± 0% -0.00% (p=0.000 n=10+10)
Reflect 1.27MB ± 0% 1.27MB ± 0% -0.12% (p=0.000 n=10+10)
Tar 421kB ± 0% 419kB ± 0% -0.64% (p=0.000 n=10+10)
XML 518kB ± 0% 517kB ± 0% -0.12% (p=0.000 n=10+10)
name old export-bytes new export-bytes delta
Template 16.7kB ± 0% 16.7kB ± 0% ~ (all equal)
Unicode 6.52kB ± 0% 6.52kB ± 0% ~ (all equal)
GoTypes 29.2kB ± 0% 29.2kB ± 0% ~ (all equal)
Compiler 88.0kB ± 0% 88.0kB ± 0% ~ (all equal)
SSA 109kB ± 0% 109kB ± 0% ~ (all equal)
Flate 4.49kB ± 0% 4.49kB ± 0% ~ (all equal)
GoParser 8.10kB ± 0% 8.10kB ± 0% ~ (all equal)
Reflect 7.71kB ± 0% 7.71kB ± 0% ~ (all equal)
Tar 9.15kB ± 0% 9.15kB ± 0% ~ (all equal)
XML 12.3kB ± 0% 12.3kB ± 0% ~ (all equal)
name old text-bytes new text-bytes delta
HelloSize 676kB ± 0% 672kB ± 0% -0.59% (p=0.000 n=10+10)
CmdGoSize 7.26MB ± 0% 7.24MB ± 0% -0.18% (p=0.000 n=10+10)
name old data-bytes new data-bytes delta
HelloSize 10.2kB ± 0% 10.2kB ± 0% ~ (all equal)
CmdGoSize 248kB ± 0% 248kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal)
CmdGoSize 145kB ± 0% 145kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.46MB ± 0% 1.45MB ± 0% -0.31% (p=0.000 n=10+10)
CmdGoSize 14.7MB ± 0% 14.7MB ± 0% -0.17% (p=0.000 n=10+10)
Change-Id: Ic72b0c189dd542f391e1c9ab88a76e9148dc4285
Reviewed-on: https://go-review.googlesource.com/106495
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 10:31:21 +00:00
Ben Shi
098ca846c7
cmd/compile: emit more compact 386 instructions
...
ADDL/SUBL/ANDL/ORL/XORL can have a memory operand as destination,
and this CL optimize the compiler to emit such instructions on
386 for more compact binary.
Here is test report:
1. The total size of pkg/linux_386/ and pkg/tool/linux_386/ decreases
about 14KB.
(pkg/linux_386/cmd/compile/ and pkg/tool/linux_386/compile are excluded)
2. The go1 benchmark shows little change, excluding ±2% noise.
name old time/op new time/op delta
BinaryTree17-4 3.34s ± 2% 3.38s ± 2% +1.27% (p=0.000 n=40+39)
Fannkuch11-4 3.55s ± 1% 3.51s ± 1% -1.33% (p=0.000 n=40+40)
FmtFprintfEmpty-4 46.3ns ± 3% 46.9ns ± 4% +1.41% (p=0.002 n=40+40)
FmtFprintfString-4 80.8ns ± 3% 80.4ns ± 6% -0.54% (p=0.044 n=40+40)
FmtFprintfInt-4 93.0ns ± 3% 92.2ns ± 4% -0.88% (p=0.007 n=39+40)
FmtFprintfIntInt-4 144ns ± 5% 145ns ± 2% +0.78% (p=0.015 n=40+40)
FmtFprintfPrefixedInt-4 184ns ± 2% 182ns ± 2% -1.06% (p=0.004 n=40+40)
FmtFprintfFloat-4 415ns ± 4% 419ns ± 4% ~ (p=0.434 n=40+40)
FmtManyArgs-4 615ns ± 3% 619ns ± 3% ~ (p=0.100 n=40+40)
GobDecode-4 7.30ms ± 6% 7.36ms ± 6% ~ (p=0.074 n=40+40)
GobEncode-4 7.10ms ± 6% 7.21ms ± 5% ~ (p=0.082 n=40+39)
Gzip-4 364ms ± 3% 362ms ± 6% -0.71% (p=0.020 n=40+40)
Gunzip-4 42.4ms ± 3% 42.2ms ± 3% ~ (p=0.303 n=40+40)
HTTPClientServer-4 62.9µs ± 1% 62.9µs ± 1% ~ (p=0.768 n=38+39)
JSONEncode-4 21.4ms ± 4% 21.5ms ± 5% ~ (p=0.210 n=40+40)
JSONDecode-4 67.7ms ± 3% 67.9ms ± 4% ~ (p=0.713 n=40+40)
Mandelbrot200-4 5.18ms ± 3% 5.21ms ± 3% +0.59% (p=0.021 n=40+40)
GoParse-4 3.35ms ± 3% 3.34ms ± 2% ~ (p=0.996 n=40+40)
RegexpMatchEasy0_32-4 98.5ns ± 5% 96.3ns ± 4% -2.15% (p=0.001 n=40+40)
RegexpMatchEasy0_1K-4 851ns ± 4% 850ns ± 5% ~ (p=0.700 n=40+40)
RegexpMatchEasy1_32-4 105ns ± 7% 107ns ± 4% +1.50% (p=0.017 n=40+40)
RegexpMatchEasy1_1K-4 1.03µs ± 5% 1.03µs ± 4% ~ (p=0.992 n=40+40)
RegexpMatchMedium_32-4 130ns ± 6% 128ns ± 4% -1.66% (p=0.012 n=40+40)
RegexpMatchMedium_1K-4 44.0µs ± 5% 43.6µs ± 3% ~ (p=0.704 n=40+40)
RegexpMatchHard_32-4 2.29µs ± 3% 2.23µs ± 4% -2.38% (p=0.000 n=40+40)
RegexpMatchHard_1K-4 69.0µs ± 3% 68.1µs ± 3% -1.28% (p=0.003 n=40+40)
Revcomp-4 1.85s ± 2% 1.87s ± 3% +1.11% (p=0.000 n=40+40)
Template-4 69.8ms ± 3% 69.6ms ± 3% ~ (p=0.125 n=40+40)
TimeParse-4 442ns ± 5% 440ns ± 3% ~ (p=0.585 n=40+40)
TimeFormat-4 419ns ± 3% 420ns ± 3% ~ (p=0.824 n=40+40)
[Geo mean] 67.3µs 67.2µs -0.11%
name old speed new speed delta
GobDecode-4 105MB/s ± 6% 104MB/s ± 6% ~ (p=0.074 n=40+40)
GobEncode-4 108MB/s ± 7% 107MB/s ± 5% ~ (p=0.080 n=40+39)
Gzip-4 53.3MB/s ± 3% 53.7MB/s ± 6% +0.73% (p=0.021 n=40+40)
Gunzip-4 458MB/s ± 3% 460MB/s ± 3% ~ (p=0.301 n=40+40)
JSONEncode-4 90.8MB/s ± 4% 90.3MB/s ± 4% ~ (p=0.213 n=40+40)
JSONDecode-4 28.7MB/s ± 3% 28.6MB/s ± 4% ~ (p=0.679 n=40+40)
GoParse-4 17.3MB/s ± 3% 17.3MB/s ± 2% ~ (p=1.000 n=40+40)
RegexpMatchEasy0_32-4 325MB/s ± 5% 333MB/s ± 4% +2.44% (p=0.000 n=40+38)
RegexpMatchEasy0_1K-4 1.20GB/s ± 4% 1.21GB/s ± 5% ~ (p=0.684 n=40+40)
RegexpMatchEasy1_32-4 303MB/s ± 7% 298MB/s ± 4% -1.52% (p=0.022 n=40+40)
RegexpMatchEasy1_1K-4 995MB/s ± 5% 996MB/s ± 4% ~ (p=0.996 n=40+40)
RegexpMatchMedium_32-4 7.67MB/s ± 6% 7.80MB/s ± 4% +1.68% (p=0.011 n=40+40)
RegexpMatchMedium_1K-4 23.3MB/s ± 5% 23.5MB/s ± 3% ~ (p=0.697 n=40+40)
RegexpMatchHard_32-4 14.0MB/s ± 3% 14.3MB/s ± 4% +2.43% (p=0.000 n=40+40)
RegexpMatchHard_1K-4 14.8MB/s ± 3% 15.0MB/s ± 3% +1.30% (p=0.003 n=40+40)
Revcomp-4 137MB/s ± 2% 136MB/s ± 3% -1.10% (p=0.000 n=40+40)
Template-4 27.8MB/s ± 3% 27.9MB/s ± 3% ~ (p=0.128 n=40+40)
[Geo mean] 79.6MB/s 79.9MB/s +0.28%
Change-Id: I02a3efc125dc81e18fc8495eb2bf1bba59ab8733
Reviewed-on: https://go-review.googlesource.com/110157
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-05-08 06:44:54 +00:00
Martin Möhrmann
b9a59d9f2e
cmd/compile: optimize len([]rune(string))
...
Adds a new runtime function to count runes in a string.
Modifies the compiler to detect the pattern len([]rune(string))
and replaces it with the new rune counting runtime function.
RuneCount/lenruneslice/ASCII 27.8ns ± 2% 14.5ns ± 3% -47.70% (p=0.000 n=10+10)
RuneCount/lenruneslice/Japanese 126ns ± 2% 60ns ± 2% -52.03% (p=0.000 n=10+10)
RuneCount/lenruneslice/MixedLength 104ns ± 2% 50ns ± 1% -51.71% (p=0.000 n=10+9)
Fixes #24923
Change-Id: Ie9c7e7391a4e2cca675c5cdcc1e5ce7d523948b9
Reviewed-on: https://go-review.googlesource.com/108985
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-05-06 05:31:01 +00:00
Martin Möhrmann
a8a60ac2a7
cmd/compile: optimize append(x, make([]T, y)...) slice extension
...
Changes the compiler to recognize the slice extension pattern
append(x, make([]T, y)...)
and replace it with growslice and an optional memclr to avoid an allocation for make([]T, y).
Memclr is not called in case growslice already allocated a new cleared backing array
when T contains pointers.
amd64:
name old time/op new time/op delta
ExtendSlice/IntSlice 103ns ± 4% 57ns ± 4% -44.55% (p=0.000 n=18+18)
ExtendSlice/PointerSlice 155ns ± 3% 77ns ± 3% -49.93% (p=0.000 n=20+20)
ExtendSlice/NoGrow 50.2ns ± 3% 5.2ns ± 2% -89.67% (p=0.000 n=18+18)
name old alloc/op new alloc/op delta
ExtendSlice/IntSlice 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=20+20)
ExtendSlice/PointerSlice 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=20+20)
ExtendSlice/NoGrow 32.0B ± 0% 0.0B -100.00% (p=0.000 n=20+20)
name old allocs/op new allocs/op delta
ExtendSlice/IntSlice 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=20+20)
ExtendSlice/PointerSlice 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=20+20)
ExtendSlice/NoGrow 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20)
Fixes #21266
Change-Id: Idc3077665f63cbe89762b590c5967a864fd1c07f
Reviewed-on: https://go-review.googlesource.com/109517
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-05-06 04:28:23 +00:00
Martin Möhrmann
500d79c410
cmd/compile: refactor memclrrange for arrays and slices
...
Rename memclrrange to signify that it does not handle
all types of range clears.
Simplify checks to detect the range clear idiom for
arrays and slices.
Add tests to verify the optimization for the slice
range clear idiom is being applied by the compiler.
Change-Id: I5c3b7c9a479699ebdb4c407fde692f30f377860c
Reviewed-on: https://go-review.googlesource.com/110477
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-02 04:20:25 +00:00
Ben Shi
aaf73c6d1e
cmd/compile: optimize ARM64 with shifted register indexed load/store
...
ARM64 supports efficient instructions which combine shift, addition, load/store
together. Such as "MOVD (R0)(R1<<3), R2" and "MOVWU R6, (R4)(R1<<2)".
This CL optimizes the compiler to emit such efficient instuctions. And below
is some test data.
1. binary size before/after
binary size change
pkg/linux_arm64 +80.1KB
pkg/tool/linux_arm64 +121.9KB
go -4.3KB
gofmt -64KB
2. go1 benchmark
There is big improvement for the test case Fannkuch11, and slight
improvement for sme others, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 43.9s ± 2% 44.0s ± 2% ~ (p=0.820 n=30+30)
Fannkuch11-4 30.6s ± 2% 24.5s ± 3% -19.93% (p=0.000 n=25+30)
FmtFprintfEmpty-4 500ns ± 0% 499ns ± 0% -0.11% (p=0.000 n=23+25)
FmtFprintfString-4 1.03µs ± 0% 1.04µs ± 3% ~ (p=0.065 n=29+30)
FmtFprintfInt-4 1.15µs ± 3% 1.15µs ± 4% -0.56% (p=0.000 n=30+30)
FmtFprintfIntInt-4 1.80µs ± 5% 1.82µs ± 0% ~ (p=0.094 n=30+24)
FmtFprintfPrefixedInt-4 2.17µs ± 5% 2.20µs ± 0% ~ (p=0.100 n=30+23)
FmtFprintfFloat-4 3.08µs ± 3% 3.09µs ± 4% ~ (p=0.123 n=30+30)
FmtManyArgs-4 7.41µs ± 4% 7.17µs ± 1% -3.26% (p=0.000 n=30+23)
GobDecode-4 93.7ms ± 0% 94.7ms ± 4% ~ (p=0.685 n=24+30)
GobEncode-4 78.7ms ± 7% 77.1ms ± 0% ~ (p=0.729 n=30+23)
Gzip-4 4.01s ± 0% 3.97s ± 5% -1.11% (p=0.037 n=24+30)
Gunzip-4 389ms ± 4% 384ms ± 0% ~ (p=0.155 n=30+23)
HTTPClientServer-4 536µs ± 1% 537µs ± 1% ~ (p=0.236 n=30+30)
JSONEncode-4 179ms ± 1% 182ms ± 6% ~ (p=0.763 n=24+30)
JSONDecode-4 843ms ± 0% 839ms ± 6% -0.42% (p=0.003 n=25+30)
Mandelbrot200-4 46.5ms ± 0% 46.5ms ± 0% +0.02% (p=0.000 n=26+26)
GoParse-4 44.3ms ± 6% 43.3ms ± 0% ~ (p=0.067 n=30+27)
RegexpMatchEasy0_32-4 1.07µs ± 7% 1.07µs ± 4% ~ (p=0.835 n=30+30)
RegexpMatchEasy0_1K-4 5.51µs ± 0% 5.49µs ± 0% -0.35% (p=0.000 n=23+26)
RegexpMatchEasy1_32-4 1.01µs ± 0% 1.02µs ± 4% +0.96% (p=0.014 n=24+30)
RegexpMatchEasy1_1K-4 7.43µs ± 0% 7.18µs ± 0% -3.41% (p=0.000 n=23+24)
RegexpMatchMedium_32-4 1.78µs ± 0% 1.81µs ± 4% +1.47% (p=0.012 n=23+30)
RegexpMatchMedium_1K-4 547µs ± 1% 542µs ± 3% -0.90% (p=0.003 n=24+30)
RegexpMatchHard_32-4 30.4µs ± 0% 29.7µs ± 0% -2.15% (p=0.000 n=19+23)
RegexpMatchHard_1K-4 913µs ± 0% 915µs ± 6% +0.25% (p=0.012 n=24+30)
Revcomp-4 6.32s ± 1% 6.42s ± 4% ~ (p=0.342 n=25+30)
Template-4 868ms ± 6% 878ms ± 6% +1.15% (p=0.000 n=30+30)
TimeParse-4 4.57µs ± 4% 4.59µs ± 3% +0.65% (p=0.010 n=29+30)
TimeFormat-4 4.51µs ± 0% 4.50µs ± 0% -0.27% (p=0.000 n=27+24)
[Geo mean] 695µs 689µs -0.92%
name old speed new speed delta
GobDecode-4 8.19MB/s ± 0% 8.12MB/s ± 4% ~ (p=0.680 n=24+30)
GobEncode-4 9.76MB/s ± 7% 9.96MB/s ± 0% ~ (p=0.616 n=30+23)
Gzip-4 4.84MB/s ± 0% 4.89MB/s ± 4% +1.16% (p=0.030 n=24+30)
Gunzip-4 49.9MB/s ± 4% 50.6MB/s ± 0% ~ (p=0.162 n=30+23)
JSONEncode-4 10.9MB/s ± 1% 10.7MB/s ± 6% ~ (p=0.575 n=24+30)
JSONDecode-4 2.30MB/s ± 0% 2.32MB/s ± 5% +0.72% (p=0.003 n=22+30)
GoParse-4 1.31MB/s ± 6% 1.34MB/s ± 0% +2.26% (p=0.002 n=30+27)
RegexpMatchEasy0_32-4 30.0MB/s ± 6% 30.0MB/s ± 4% ~ (p=1.000 n=30+30)
RegexpMatchEasy0_1K-4 186MB/s ± 0% 187MB/s ± 0% +0.35% (p=0.000 n=23+26)
RegexpMatchEasy1_32-4 31.8MB/s ± 0% 31.5MB/s ± 4% -0.92% (p=0.012 n=25+30)
RegexpMatchEasy1_1K-4 138MB/s ± 0% 143MB/s ± 0% +3.53% (p=0.000 n=23+24)
RegexpMatchMedium_32-4 560kB/s ± 0% 553kB/s ± 4% -1.19% (p=0.005 n=23+30)
RegexpMatchMedium_1K-4 1.87MB/s ± 0% 1.89MB/s ± 3% +1.04% (p=0.002 n=24+30)
RegexpMatchHard_32-4 1.05MB/s ± 0% 1.08MB/s ± 0% +2.40% (p=0.000 n=19+23)
RegexpMatchHard_1K-4 1.12MB/s ± 0% 1.12MB/s ± 5% +0.12% (p=0.006 n=25+30)
Revcomp-4 40.2MB/s ± 1% 39.6MB/s ± 4% ~ (p=0.242 n=25+30)
Template-4 2.24MB/s ± 6% 2.21MB/s ± 6% -1.15% (p=0.000 n=30+30)
[Geo mean] 7.87MB/s 7.91MB/s +0.44%
Change-Id: If374cb7abf83537aa0a176f73c0f736f7800db03
Reviewed-on: https://go-review.googlesource.com/108735
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-27 20:02:05 +00:00
Milan Knezevic
2959128dc5
cmd/compile: add softfloat support to mips64{,le}
...
mips64 softfloat support is based on mips implementation and introduces
new enviroment variable GOMIPS64.
GOMIPS64 is a GOARCH=mips64{,le} specific option, for a choice between
hard-float and soft-float. Valid values are 'hardfloat' (default) and
'softfloat'. It is passed to the assembler as
'GOMIPS64_{hardfloat,softfloat}'.
Change-Id: I7f73078627f7cb37c588a38fb5c997fe09c56134
Reviewed-on: https://go-review.googlesource.com/108475
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-27 14:50:17 +00:00
Josh Bleecher Snyder
d9a50a6531
cmd/compile: use prove pass to detect Ctz of non-zero values
...
On amd64, Ctz must include special handling of zeros.
But the prove pass has enough information to detect whether the input
is non-zero, allowing a more efficient lowering.
Introduce new CtzNonZero ops to capture and use this information.
Benchmark code:
func BenchmarkVisitBits(b *testing.B) {
b.Run("8", func(b *testing.B) {
for i := 0; i < b.N; i++ {
x := uint8(0xff)
for x != 0 {
sink = bits.TrailingZeros8(x)
x &= x - 1
}
}
})
// and similarly so for 16, 32, 64
}
name old time/op new time/op delta
VisitBits/8-8 7.27ns ± 4% 5.58ns ± 4% -23.35% (p=0.000 n=28+26)
VisitBits/16-8 14.7ns ± 7% 10.5ns ± 4% -28.43% (p=0.000 n=30+28)
VisitBits/32-8 27.6ns ± 8% 19.3ns ± 3% -30.14% (p=0.000 n=30+26)
VisitBits/64-8 44.0ns ±11% 38.0ns ± 5% -13.48% (p=0.000 n=30+30)
Fixes #25077
Change-Id: Ie6e5bd86baf39ee8a4ca7cadcf56d934e047f957
Reviewed-on: https://go-review.googlesource.com/109358
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-26 18:22:28 +00:00
Carlos Eduardo Seo
ebb67d993a
cmd/compile, cmd/internal/obj/ppc64: make math.Round an intrinsic on ppc64x
...
This change implements math.Round as an intrinsic on ppc64x so it can be
done using a single instruction.
benchmark old ns/op new ns/op delta
BenchmarkRound-16 2.60 0.69 -73.46%
Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8
Reviewed-on: https://go-review.googlesource.com/109395
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-04-26 14:12:09 +00:00
Josh Bleecher Snyder
c5f0104daf
cmd/compile: use intrinsic for LeadingZeros8 on amd64
...
The previous change sped up the pure computation form of LeadingZeros8.
This places it somewhat close to the table lookup form.
Depending on something that varies from toolchain to toolchain
(alignment, perhaps?), the slowdown from ditching the table lookup
is either 20% or 5%.
This benchmark is the best case scenario for the table lookup:
It is in the L1 cache already.
I think we're close enough that we can switch to the computational version,
and trust that the memory effects and binary size savings will be worth it.
Code:
func f8(x uint8) { z = bits.LeadingZeros8(x) }
Before:
"".f8 STEXT nosplit size=34 args=0x8 locals=0x0
0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8
0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX
0x0005 00005 (x.go:7) MOVBLZX AL, AX
0x0008 00008 (x.go:7) LEAQ math/bits.len8tab(SB), CX
0x000f 00015 (x.go:7) MOVBLZX (CX)(AX*1), AX
0x0013 00019 (x.go:7) ADDQ $-8, AX
0x0017 00023 (x.go:7) NEGQ AX
0x001a 00026 (x.go:7) MOVQ AX, "".z(SB)
0x0021 00033 (x.go:7) RET
After:
"".f8 STEXT nosplit size=30 args=0x8 locals=0x0
0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8
0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX
0x0005 00005 (x.go:7) MOVBLZX AL, AX
0x0008 00008 (x.go:7) LEAL 1(AX)(AX*1), AX
0x000c 00012 (x.go:7) BSRL AX, AX
0x000f 00015 (x.go:7) ADDQ $-8, AX
0x0013 00019 (x.go:7) NEGQ AX
0x0016 00022 (x.go:7) MOVQ AX, "".z(SB)
0x001d 00029 (x.go:7) RET
Change-Id: Icc7db50a7820fb9a3da8a816d6b6940d7f8e193e
Reviewed-on: https://go-review.googlesource.com/108942
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:34:15 +00:00
Josh Bleecher Snyder
1d321ada73
cmd/compile: optimize LeadingZeros(16|32) on amd64
...
Introduce Len8 and Len16 ops and provide optimized lowerings for them.
amd64 only for this CL, although it wouldn't surprise me
if other architectures also admit of optimized lowerings.
Also use and optimize the Len32 lowering, along the same lines.
Leave Len8 unused for the moment; a subsequent CL will enable it.
For 16 and 32 bits, this leads to a speed-up.
name old time/op new time/op delta
LeadingZeros16-8 1.42ns ± 5% 1.23ns ± 5% -13.42% (p=0.000 n=20+20)
LeadingZeros32-8 1.25ns ± 5% 1.03ns ± 5% -17.63% (p=0.000 n=20+16)
Code:
func f16(x uint16) { z = bits.LeadingZeros16(x) }
func f32(x uint32) { z = bits.LeadingZeros32(x) }
Before:
"".f16 STEXT nosplit size=38 args=0x8 locals=0x0
0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8
0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX
0x0005 00005 (x.go:8) MOVWLZX AX, AX
0x0008 00008 (x.go:8) BSRQ AX, AX
0x000c 00012 (x.go:8) MOVQ $-1, CX
0x0013 00019 (x.go:8) CMOVQEQ CX, AX
0x0017 00023 (x.go:8) ADDQ $-15, AX
0x001b 00027 (x.go:8) NEGQ AX
0x001e 00030 (x.go:8) MOVQ AX, "".z(SB)
0x0025 00037 (x.go:8) RET
"".f32 STEXT nosplit size=34 args=0x8 locals=0x0
0x0000 00000 (x.go:9) TEXT "".f32(SB), NOSPLIT, $0-8
0x0000 00000 (x.go:9) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
0x0000 00000 (x.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (x.go:9) MOVL "".x+8(SP), AX
0x0004 00004 (x.go:9) BSRQ AX, AX
0x0008 00008 (x.go:9) MOVQ $-1, CX
0x000f 00015 (x.go:9) CMOVQEQ CX, AX
0x0013 00019 (x.go:9) ADDQ $-31, AX
0x0017 00023 (x.go:9) NEGQ AX
0x001a 00026 (x.go:9) MOVQ AX, "".z(SB)
0x0021 00033 (x.go:9) RET
After:
"".f16 STEXT nosplit size=30 args=0x8 locals=0x0
0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8
0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX
0x0005 00005 (x.go:8) MOVWLZX AX, AX
0x0008 00008 (x.go:8) LEAL 1(AX)(AX*1), AX
0x000c 00012 (x.go:8) BSRL AX, AX
0x000f 00015 (x.go:8) ADDQ $-16, AX
0x0013 00019 (x.go:8) NEGQ AX
0x0016 00022 (x.go:8) MOVQ AX, "".z(SB)
0x001d 00029 (x.go:8) RET
"".f32 STEXT nosplit size=28 args=0x8 locals=0x0
0x0000 00000 (x.go:9) TEXT "".f32(SB), NOSPLIT, $0-8
0x0000 00000 (x.go:9) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
0x0000 00000 (x.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (x.go:9) MOVL "".x+8(SP), AX
0x0004 00004 (x.go:9) LEAQ 1(AX)(AX*1), AX
0x0009 00009 (x.go:9) BSRQ AX, AX
0x000d 00013 (x.go:9) ADDQ $-32, AX
0x0011 00017 (x.go:9) NEGQ AX
0x0014 00020 (x.go:9) MOVQ AX, "".z(SB)
0x001b 00027 (x.go:9) RET
Change-Id: I6c93c173752a7bfdeab8be30777ae05a736e1f4b
Reviewed-on: https://go-review.googlesource.com/108941
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:34:04 +00:00
Josh Bleecher Snyder
54dbab5221
cmd/compile: optimize TrailingZeros(8|16) on amd64
...
Introduce Ctz8 and Ctz16 ops and provide optimized lowerings for them.
amd64 only for this CL, although it wouldn't surprise me
if other architectures also admit of optimized lowerings.
name old time/op new time/op delta
TrailingZeros8-8 1.33ns ± 6% 0.84ns ± 3% -36.90% (p=0.000 n=20+20)
TrailingZeros16-8 1.26ns ± 5% 0.84ns ± 5% -33.50% (p=0.000 n=20+18)
Code:
func f8(x uint8) { z = bits.TrailingZeros8(x) }
func f16(x uint16) { z = bits.TrailingZeros16(x) }
Before:
"".f8 STEXT nosplit size=34 args=0x8 locals=0x0
0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8
0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX
0x0005 00005 (x.go:7) MOVBLZX AL, AX
0x0008 00008 (x.go:7) BTSQ $8, AX
0x000d 00013 (x.go:7) BSFQ AX, AX
0x0011 00017 (x.go:7) MOVL $64, CX
0x0016 00022 (x.go:7) CMOVQEQ CX, AX
0x001a 00026 (x.go:7) MOVQ AX, "".z(SB)
0x0021 00033 (x.go:7) RET
"".f16 STEXT nosplit size=34 args=0x8 locals=0x0
0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8
0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX
0x0005 00005 (x.go:8) MOVWLZX AX, AX
0x0008 00008 (x.go:8) BTSQ $16, AX
0x000d 00013 (x.go:8) BSFQ AX, AX
0x0011 00017 (x.go:8) MOVL $64, CX
0x0016 00022 (x.go:8) CMOVQEQ CX, AX
0x001a 00026 (x.go:8) MOVQ AX, "".z(SB)
0x0021 00033 (x.go:8) RET
After:
"".f8 STEXT nosplit size=20 args=0x8 locals=0x0
0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8
0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX
0x0005 00005 (x.go:7) BTSL $8, AX
0x0009 00009 (x.go:7) BSFL AX, AX
0x000c 00012 (x.go:7) MOVQ AX, "".z(SB)
0x0013 00019 (x.go:7) RET
"".f16 STEXT nosplit size=20 args=0x8 locals=0x0
0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8
0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX
0x0005 00005 (x.go:8) BTSL $16, AX
0x0009 00009 (x.go:8) BSFL AX, AX
0x000c 00012 (x.go:8) MOVQ AX, "".z(SB)
0x0013 00019 (x.go:8) RET
Change-Id: I0551e357348de2b724737d569afd6ac9f5c3aa11
Reviewed-on: https://go-review.googlesource.com/108940
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:33:52 +00:00
Josh Bleecher Snyder
d292f77e95
cmd/compile: rewrite 2*x+c into LEAx1 on amd64
...
Rewrite x<<1+c into x+x+c, which can be expressed as a single LEAQ/LEAL.
Bit of a special case, but the single-instruction
LEA is both shorter and faster than SHL then ADD.
Triggers 293 times during make.bash.
Change-Id: I3f09c8e9a8f3859d1eeed336f095fc3ada79c2c1
Reviewed-on: https://go-review.googlesource.com/108938
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-23 22:40:10 +00:00
Ben Shi
34f5f8a580
cmd/compile: optimize ARM64 with register indexed load/store
...
ARM64 supports load/store instructions with a memory operand that
the address is calculated by base register + index register.
In this CL,
1. Some rules are added to the compile's ARM64 backend to emit
such efficient instructions.
2. A wrong rule of load combination is fixed.
The go1 benchmark does show improvement.
name old time/op new time/op delta
BinaryTree17-4 44.5s ± 2% 44.1s ± 1% -0.81% (p=0.000 n=28+29)
Fannkuch11-4 32.7s ± 3% 30.5s ± 0% -6.79% (p=0.000 n=30+26)
FmtFprintfEmpty-4 499ns ± 0% 506ns ± 5% +1.39% (p=0.003 n=25+30)
FmtFprintfString-4 1.07µs ± 0% 1.04µs ± 4% -3.17% (p=0.000 n=23+30)
FmtFprintfInt-4 1.15µs ± 4% 1.13µs ± 0% -1.55% (p=0.000 n=30+23)
FmtFprintfIntInt-4 1.77µs ± 4% 1.74µs ± 0% -1.71% (p=0.000 n=30+24)
FmtFprintfPrefixedInt-4 2.37µs ± 5% 2.12µs ± 0% -10.56% (p=0.000 n=30+23)
FmtFprintfFloat-4 3.03µs ± 1% 3.03µs ± 4% -0.13% (p=0.003 n=25+30)
FmtManyArgs-4 7.38µs ± 1% 7.43µs ± 4% +0.59% (p=0.003 n=25+30)
GobDecode-4 101ms ± 6% 95ms ± 5% -5.55% (p=0.000 n=30+30)
GobEncode-4 78.0ms ± 4% 78.8ms ± 6% +1.05% (p=0.000 n=30+30)
Gzip-4 4.25s ± 0% 4.27s ± 4% +0.45% (p=0.003 n=24+30)
Gunzip-4 428ms ± 1% 420ms ± 0% -1.88% (p=0.000 n=23+23)
HTTPClientServer-4 549µs ± 1% 541µs ± 1% -1.56% (p=0.000 n=29+29)
JSONEncode-4 194ms ± 0% 188ms ± 4% ~ (p=0.417 n=23+30)
JSONDecode-4 890ms ± 5% 831ms ± 0% -6.55% (p=0.000 n=30+23)
Mandelbrot200-4 47.3ms ± 2% 46.5ms ± 0% ~ (p=0.980 n=30+26)
GoParse-4 43.1ms ± 6% 43.8ms ± 6% +1.65% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 1.06µs ± 0% 1.07µs ± 3% ~ (p=0.092 n=23+30)
RegexpMatchEasy0_1K-4 5.53µs ± 0% 5.51µs ± 0% -0.24% (p=0.000 n=25+25)
RegexpMatchEasy1_32-4 1.02µs ± 3% 1.01µs ± 0% -1.27% (p=0.000 n=30+24)
RegexpMatchEasy1_1K-4 7.26µs ± 0% 7.33µs ± 0% +0.95% (p=0.000 n=23+26)
RegexpMatchMedium_32-4 1.84µs ± 7% 1.79µs ± 1% ~ (p=0.333 n=30+23)
RegexpMatchMedium_1K-4 553µs ± 0% 547µs ± 0% -1.14% (p=0.000 n=24+22)
RegexpMatchHard_32-4 30.8µs ± 1% 30.3µs ± 0% -1.40% (p=0.000 n=24+24)
RegexpMatchHard_1K-4 928µs ± 0% 929µs ± 5% +0.12% (p=0.013 n=23+30)
Revcomp-4 8.13s ± 4% 6.32s ± 1% -22.23% (p=0.000 n=30+23)
Template-4 899ms ± 6% 854ms ± 1% -5.01% (p=0.000 n=30+24)
TimeParse-4 4.66µs ± 4% 4.59µs ± 1% -1.57% (p=0.000 n=30+23)
TimeFormat-4 4.58µs ± 0% 4.61µs ± 0% +0.57% (p=0.000 n=26+24)
[Geo mean] 717µs 698µs -2.55%
name old speed new speed delta
GobDecode-4 7.63MB/s ± 6% 8.08MB/s ± 5% +5.88% (p=0.000 n=30+30)
GobEncode-4 9.85MB/s ± 4% 9.75MB/s ± 6% -1.04% (p=0.000 n=30+30)
Gzip-4 4.56MB/s ± 0% 4.55MB/s ± 4% -0.36% (p=0.003 n=24+30)
Gunzip-4 45.3MB/s ± 1% 46.2MB/s ± 0% +1.92% (p=0.000 n=23+23)
JSONEncode-4 10.0MB/s ± 0% 10.4MB/s ± 4% ~ (p=0.403 n=23+30)
JSONDecode-4 2.18MB/s ± 5% 2.33MB/s ± 0% +6.91% (p=0.000 n=30+23)
GoParse-4 1.34MB/s ± 5% 1.32MB/s ± 5% -1.66% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 30.2MB/s ± 0% 29.8MB/s ± 3% ~ (p=0.099 n=23+30)
RegexpMatchEasy0_1K-4 185MB/s ± 0% 186MB/s ± 0% +0.24% (p=0.000 n=25+25)
RegexpMatchEasy1_32-4 31.4MB/s ± 3% 31.8MB/s ± 0% +1.24% (p=0.000 n=30+24)
RegexpMatchEasy1_1K-4 141MB/s ± 0% 140MB/s ± 0% -0.94% (p=0.000 n=23+26)
RegexpMatchMedium_32-4 541kB/s ± 6% 560kB/s ± 0% +3.45% (p=0.000 n=30+23)
RegexpMatchMedium_1K-4 1.85MB/s ± 0% 1.87MB/s ± 0% +1.08% (p=0.000 n=24+23)
RegexpMatchHard_32-4 1.04MB/s ± 1% 1.06MB/s ± 1% +1.48% (p=0.000 n=24+24)
RegexpMatchHard_1K-4 1.10MB/s ± 0% 1.10MB/s ± 5% +0.15% (p=0.004 n=23+30)
Revcomp-4 31.3MB/s ± 4% 40.2MB/s ± 1% +28.52% (p=0.000 n=30+23)
Template-4 2.16MB/s ± 6% 2.27MB/s ± 1% +5.18% (p=0.000 n=30+24)
[Geo mean] 7.57MB/s 7.79MB/s +2.98%
fixes #24907
Change-Id: I94afd0e3f53d62a1cf5e452f3dd6daf61be21785
Reviewed-on: https://go-review.googlesource.com/107376
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-19 15:08:10 +00:00
Michael Munday
58cdecb9c8
cmd/compile: generate constants for NeqPtr, EqPtr and IsNonNil ops
...
If both inputs are constant offsets from the same pointer then we
can evaluate NeqPtr and EqPtr at compile time. Triggers a few times
during all.bash. Removes a conditional branch in the following
code:
copy(x[1:], x[:])
This branch was recently added as an optimization in CL 94596. We
now skip the memmove if the pointers are equal. However, in the
above code we know at compile time that they are never equal.
Also, when the offset is variable, check if the offset is zero
rather than if the pointers are equal. For example:
copy(x[a:], x[:])
This would now skip the copy if a == 0, rather than if x + a == x.
Finally I've also added a rule to make IsNonNil true for pointers
to values on the stack. The nil check elimination pass will catch
these anyway, but eliminating them here might eliminate branches
earlier.
Change-Id: If72f436fef0a96ad0f4e296d3a1f8b6c3e712085
Reviewed-on: https://go-review.googlesource.com/106635
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-16 20:43:57 +00:00
Ben Shi
cd65bbc01b
cmd/compile/internal/ssa: optimize 386's subtraction
...
The SUBL instruction can take a memory operand, and this CL
implements this optimization.
The go1 benchmark shows a little improvement.
name old time/op new time/op delta
BinaryTree17-4 3.27s ± 2% 3.29s ± 3% ~ (p=0.322 n=37+40)
Fannkuch11-4 3.49s ± 0% 3.53s ± 1% +1.21% (p=0.000 n=31+40)
FmtFprintfEmpty-4 46.2ns ± 3% 46.3ns ± 2% ~ (p=0.351 n=40+28)
FmtFprintfString-4 82.0ns ± 3% 81.5ns ± 2% -0.69% (p=0.002 n=40+30)
FmtFprintfInt-4 94.6ns ± 3% 94.6ns ± 6% ~ (p=0.913 n=39+37)
FmtFprintfIntInt-4 147ns ± 3% 150ns ± 2% +1.72% (p=0.000 n=40+25)
FmtFprintfPrefixedInt-4 186ns ± 3% 186ns ± 0% -0.33% (p=0.006 n=40+25)
FmtFprintfFloat-4 388ns ± 4% 388ns ± 4% ~ (p=0.162 n=40+40)
FmtManyArgs-4 612ns ± 3% 616ns ± 4% ~ (p=0.223 n=40+40)
GobDecode-4 7.35ms ± 5% 7.42ms ± 5% ~ (p=0.095 n=40+40)
GobEncode-4 7.21ms ± 8% 7.23ms ± 4% ~ (p=0.294 n=40+40)
Gzip-4 360ms ± 4% 359ms ± 4% ~ (p=0.097 n=40+40)
Gunzip-4 46.1ms ± 3% 45.6ms ± 3% -1.20% (p=0.000 n=40+40)
HTTPClientServer-4 64.0µs ± 2% 64.1µs ± 2% ~ (p=0.648 n=39+40)
JSONEncode-4 21.9ms ± 4% 22.1ms ± 5% ~ (p=0.086 n=40+40)
JSONDecode-4 67.9ms ± 4% 66.7ms ± 4% -1.63% (p=0.000 n=40+40)
Mandelbrot200-4 5.19ms ± 3% 5.17ms ± 3% ~ (p=0.881 n=40+40)
GoParse-4 3.34ms ± 3% 3.28ms ± 2% -1.78% (p=0.000 n=40+40)
RegexpMatchEasy0_32-4 101ns ± 5% 99ns ± 3% -2.40% (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4 851ns ± 1% 848ns ± 3% -0.36% (p=0.004 n=33+40)
RegexpMatchEasy1_32-4 109ns ± 5% 105ns ± 3% -3.53% (p=0.000 n=39+40)
RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 3% ~ (p=0.638 n=40+38)
RegexpMatchMedium_32-4 131ns ± 5% 127ns ± 4% -3.36% (p=0.000 n=38+40)
RegexpMatchMedium_1K-4 43.4µs ± 4% 43.2µs ± 3% -0.46% (p=0.008 n=40+40)
RegexpMatchHard_32-4 2.21µs ± 4% 2.23µs ± 1% +0.77% (p=0.014 n=40+28)
RegexpMatchHard_1K-4 67.6µs ± 4% 67.7µs ± 3% +0.11% (p=0.016 n=40+40)
Revcomp-4 1.86s ± 3% 1.77s ± 2% -4.81% (p=0.000 n=40+40)
Template-4 71.7ms ± 3% 71.6ms ± 4% ~ (p=0.200 n=40+40)
TimeParse-4 436ns ± 4% 433ns ± 3% ~ (p=0.358 n=40+40)
TimeFormat-4 413ns ± 4% 412ns ± 3% ~ (p=0.415 n=40+40)
[Geo mean] 63.9µs 63.6µs -0.49%
name old speed new speed delta
GobDecode-4 105MB/s ± 5% 104MB/s ± 5% ~ (p=0.096 n=40+40)
GobEncode-4 106MB/s ± 7% 106MB/s ± 3% ~ (p=0.385 n=39+40)
Gzip-4 54.0MB/s ± 4% 54.0MB/s ± 4% ~ (p=0.100 n=40+40)
Gunzip-4 421MB/s ± 3% 426MB/s ± 3% +1.21% (p=0.000 n=40+40)
JSONEncode-4 88.5MB/s ± 5% 88.0MB/s ± 5% ~ (p=0.083 n=40+40)
JSONDecode-4 28.6MB/s ± 4% 29.1MB/s ± 4% +1.65% (p=0.000 n=40+40)
GoParse-4 17.3MB/s ± 3% 17.7MB/s ± 2% +1.82% (p=0.000 n=40+40)
RegexpMatchEasy0_32-4 316MB/s ± 5% 323MB/s ± 4% +2.44% (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4 1.20GB/s ± 1% 1.21GB/s ± 3% +0.40% (p=0.004 n=33+40)
RegexpMatchEasy1_32-4 291MB/s ± 7% 302MB/s ± 4% +3.82% (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4 993MB/s ± 4% 990MB/s ± 3% ~ (p=0.623 n=40+38)
RegexpMatchMedium_32-4 7.61MB/s ± 5% 7.87MB/s ± 4% +3.36% (p=0.000 n=38+40)
RegexpMatchMedium_1K-4 23.6MB/s ± 4% 23.7MB/s ± 4% +0.46% (p=0.007 n=40+40)
RegexpMatchHard_32-4 14.5MB/s ± 4% 14.3MB/s ± 1% -0.79% (p=0.017 n=40+28)
RegexpMatchHard_1K-4 15.1MB/s ± 4% 15.1MB/s ± 3% -0.11% (p=0.015 n=40+40)
Revcomp-4 137MB/s ± 3% 144MB/s ± 3% +5.06% (p=0.000 n=40+40)
Template-4 27.1MB/s ± 3% 27.1MB/s ± 4% ~ (p=0.211 n=40+40)
[Geo mean] 78.9MB/s 79.7MB/s +1.01%
Change-Id: I638fa4fef85833e8605919d693f9570cc3cf7334
Reviewed-on: https://go-review.googlesource.com/107275
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-16 04:41:20 +00:00
Giovanni Bajo
e7b1d0a9cf
test: add missing copyright header
...
Change-Id: Ia64535492515f725fe3c4b59ea300363a0c4ce10
Reviewed-on: https://go-review.googlesource.com/107136
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-15 21:17:54 +00:00
Giovanni Bajo
284ba47b49
test: run codegen tests on all supported architecture variants
...
This CL makes the codegen testsuite automatically test all
architecture variants for architecture specified in tests. For
instance, if a test file specifies a "arm" test, it will be
automatically run on all GOARM variants (5,6,7), to increase
the coverage.
The CL also introduces a syntax to specify only a specific
variant (eg: "arm/7") in case the test makes sense only there.
The same syntax also allows to specify the operating system
in case it matters (eg: "plan9/386/sse2").
Fixes #24658
Change-Id: I2eba8b918f51bb6a77a8431a309f8b71af07ea22
Reviewed-on: https://go-review.googlesource.com/107315
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-15 20:02:43 +00:00
Giovanni Bajo
01aa1d7dbe
test: migrate plan9 tests to codegen
...
And remove it from asmtest. Next CL will remove the whole
asmtest infrastructure.
Change-Id: I5851bf7c617456d62a3c6cffacf70252df7b056b
Reviewed-on: https://go-review.googlesource.com/107335
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-15 20:02:30 +00:00
Alberto Donizetti
467eca6076
test/codegen: port last stack and memcombining tests
...
And delete them from asm_test.
Also delete an arm64 cmov test has been already ported to the new test
harness.
Change-Id: I4458721e1f512bc9ecbbe1c22a2c9c7109ad68fe
Reviewed-on: https://go-review.googlesource.com/106335
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-11 16:08:04 +00:00
Alberto Donizetti
188e2bf897
test/codegen: port arm64 BIC/EON/ORN and masking tests
...
And delete them from asm_test.
Change-Id: I24f421b87e8cb4770c887a6dfd58eacd0088947d
Reviewed-on: https://go-review.googlesource.com/106056
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-10 10:57:50 +00:00
Alberto Donizetti
d5ff631e6b
test/codegen: port last remaining misc bit/arithmetic tests
...
And delete them from asm_test.
Change-Id: I9a75efe9858ef9d7ac86065f860c2ae3f25b0941
Reviewed-on: https://go-review.googlesource.com/105597
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-04-10 07:58:35 +00:00
Michael Munday
b65122f99a
cmd/compile: optimize comparisons using load merging where available
...
Multi-byte comparison operations were used on amd64, arm64, i386
and s390x for comparisons with constant arrays, but only amd64 and
i386 for comparisons with string constants. This CL combines the
check for platform capability, since they have the same requirements,
and also enables both on ppc64le which also supports load merging.
Note that these optimizations currently use little endian byte order
which results in byte reversal instructions on s390x. This should
be fixed at some point.
Change-Id: Ie612d13359b50c77f4d7c6e73fea4a59fa11f322
Reviewed-on: https://go-review.googlesource.com/102558
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-09 21:16:47 +00:00
Alberto Donizetti
54c3f56ee0
test/codegen: port various mem-combining tests
...
And delete them from asm_test.
Change-Id: I0e33d58274951ab5acb67b0117b60ef617ea887a
Reviewed-on: https://go-review.googlesource.com/105735
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-04-09 12:00:06 +00:00
Alberto Donizetti
3e31eb6b84
test/codegen: port arm64 slice zeroing tests
...
Finish porting arm64 slice zeroing codegen tests; delete them from
asm_test.
Change-Id: Id2532df8ba1c340fa662a6b5238daa3de30548be
Reviewed-on: https://go-review.googlesource.com/105136
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-07 09:55:51 +00:00
Alberto Donizetti
f2abca90a2
test/codegen: port arm64 byte slice zeroing tests
...
And delete them from asm_test.
Change-Id: Id533130470da9176a401cb94972f626f43a62148
Reviewed-on: https://go-review.googlesource.com/103656
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-04 13:18:15 +00:00
Alberto Donizetti
3b0b8bcd68
test/codegen: port stack-related tests to codegen
...
And delete them from asm_test.
Change-Id: Idfe1249052d82d15b9c30b292c78656a0bf7b48d
Reviewed-on: https://go-review.googlesource.com/103315
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-30 08:08:06 +00:00
Alberto Donizetti
56eaf574a1
test/codegen: match 387 ops too for GOARCH=386
...
Change-Id: I99407e27e340689009af798989b33cef7cb92070
Reviewed-on: https://go-review.googlesource.com/103376
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-29 20:05:40 +00:00
Alberto Donizetti
a27cd4fd31
test/codegen: port tbz/tbnz arm64 tests
...
And delete them from asm_test.
Change-Id: I34fcf85ae8ce09cd146fe4ce6a0ae7616bd97e2d
Reviewed-on: https://go-review.googlesource.com/102296
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-24 09:35:53 +00:00
Giovanni Bajo
79112707bb
cmd/compile: add patterns for bit set/clear/complement on amd64
...
This patch completes implementation of BT(Q|L), and adds support
for BT(S|R|C)(Q|L).
Example of code changes from time.(*Time).addSec:
if t.wall&hasMonotonic != 0 {
0x1073465 488b08 MOVQ 0(AX), CX
0x1073468 4889ca MOVQ CX, DX
0x107346b 48c1e93f SHRQ $0x3f, CX
0x107346f 48c1e13f SHLQ $0x3f, CX
0x1073473 48f7c1ffffffff TESTQ $-0x1, CX
0x107347a 746b JE 0x10734e7
if t.wall&hasMonotonic != 0 {
0x1073435 488b08 MOVQ 0(AX), CX
0x1073438 480fbae13f BTQ $0x3f, CX
0x107343d 7363 JAE 0x10734a2
Another example:
t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
0x10734c8 4881e1ffffff3f ANDQ $0x3fffffff, CX
0x10734cf 48c1e61e SHLQ $0x1e, SI
0x10734d3 4809ce ORQ CX, SI
0x10734d6 48b90000000000000080 MOVQ $0x8000000000000000, CX
0x10734e0 4809f1 ORQ SI, CX
0x10734e3 488908 MOVQ CX, 0(AX)
t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
0x107348b 4881e2ffffff3f ANDQ $0x3fffffff, DX
0x1073492 48c1e61e SHLQ $0x1e, SI
0x1073496 4809f2 ORQ SI, DX
0x1073499 480fbaea3f BTSQ $0x3f, DX
0x107349e 488910 MOVQ DX, 0(AX)
Go1 benchmarks seem unaffected, and I would be surprised
otherwise:
name old time/op new time/op delta
BinaryTree17-4 2.64s ± 4% 2.56s ± 9% -2.92% (p=0.008 n=9+9)
Fannkuch11-4 2.90s ± 1% 2.95s ± 3% +1.76% (p=0.010 n=10+9)
FmtFprintfEmpty-4 35.3ns ± 1% 34.5ns ± 2% -2.34% (p=0.004 n=9+8)
FmtFprintfString-4 57.0ns ± 1% 58.4ns ± 5% +2.52% (p=0.029 n=9+10)
FmtFprintfInt-4 59.8ns ± 3% 59.8ns ± 6% ~ (p=0.565 n=10+10)
FmtFprintfIntInt-4 93.9ns ± 3% 91.2ns ± 5% -2.94% (p=0.014 n=10+9)
FmtFprintfPrefixedInt-4 107ns ± 6% 104ns ± 6% ~ (p=0.099 n=10+10)
FmtFprintfFloat-4 187ns ± 3% 188ns ± 3% ~ (p=0.505 n=10+9)
FmtManyArgs-4 410ns ± 1% 415ns ± 6% ~ (p=0.649 n=8+10)
GobDecode-4 5.30ms ± 3% 5.27ms ± 3% ~ (p=0.436 n=10+10)
GobEncode-4 4.62ms ± 5% 4.47ms ± 2% -3.24% (p=0.001 n=9+10)
Gzip-4 197ms ± 4% 193ms ± 3% ~ (p=0.123 n=10+10)
Gunzip-4 30.4ms ± 3% 30.1ms ± 3% ~ (p=0.481 n=10+10)
HTTPClientServer-4 76.3µs ± 1% 76.0µs ± 1% ~ (p=0.236 n=8+9)
JSONEncode-4 10.5ms ± 9% 10.3ms ± 3% ~ (p=0.280 n=10+10)
JSONDecode-4 42.3ms ±10% 41.3ms ± 2% ~ (p=0.053 n=9+10)
Mandelbrot200-4 3.80ms ± 2% 3.72ms ± 2% -2.15% (p=0.001 n=9+10)
GoParse-4 2.88ms ±10% 2.81ms ± 2% ~ (p=0.247 n=10+10)
RegexpMatchEasy0_32-4 69.5ns ± 4% 68.6ns ± 2% ~ (p=0.171 n=10+10)
RegexpMatchEasy0_1K-4 165ns ± 3% 162ns ± 3% ~ (p=0.137 n=10+10)
RegexpMatchEasy1_32-4 65.7ns ± 6% 64.4ns ± 2% -2.02% (p=0.037 n=10+10)
RegexpMatchEasy1_1K-4 278ns ± 2% 279ns ± 3% ~ (p=0.991 n=8+9)
RegexpMatchMedium_32-4 99.3ns ± 3% 98.5ns ± 4% ~ (p=0.457 n=10+9)
RegexpMatchMedium_1K-4 30.1µs ± 1% 30.4µs ± 2% ~ (p=0.173 n=8+10)
RegexpMatchHard_32-4 1.40µs ± 2% 1.41µs ± 4% ~ (p=0.565 n=10+10)
RegexpMatchHard_1K-4 42.5µs ± 1% 41.5µs ± 3% -2.13% (p=0.002 n=8+9)
Revcomp-4 332ms ± 4% 328ms ± 5% ~ (p=0.720 n=9+10)
Template-4 48.3ms ± 2% 49.6ms ± 3% +2.56% (p=0.002 n=8+10)
TimeParse-4 252ns ± 2% 249ns ± 3% ~ (p=0.116 n=9+10)
TimeFormat-4 262ns ± 4% 252ns ± 3% -4.01% (p=0.000 n=9+10)
name old speed new speed delta
GobDecode-4 145MB/s ± 3% 146MB/s ± 3% ~ (p=0.436 n=10+10)
GobEncode-4 166MB/s ± 5% 172MB/s ± 2% +3.28% (p=0.001 n=9+10)
Gzip-4 98.6MB/s ± 4% 100.4MB/s ± 3% ~ (p=0.123 n=10+10)
Gunzip-4 639MB/s ± 3% 645MB/s ± 3% ~ (p=0.481 n=10+10)
JSONEncode-4 185MB/s ± 8% 189MB/s ± 3% ~ (p=0.280 n=10+10)
JSONDecode-4 46.0MB/s ± 9% 47.0MB/s ± 2% +2.21% (p=0.046 n=9+10)
GoParse-4 20.1MB/s ± 9% 20.6MB/s ± 2% ~ (p=0.239 n=10+10)
RegexpMatchEasy0_32-4 460MB/s ± 4% 467MB/s ± 2% ~ (p=0.165 n=10+10)
RegexpMatchEasy0_1K-4 6.19GB/s ± 3% 6.28GB/s ± 3% ~ (p=0.165 n=10+10)
RegexpMatchEasy1_32-4 487MB/s ± 5% 497MB/s ± 2% +2.00% (p=0.043 n=10+10)
RegexpMatchEasy1_1K-4 3.67GB/s ± 2% 3.67GB/s ± 3% ~ (p=0.963 n=8+9)
RegexpMatchMedium_32-4 10.1MB/s ± 3% 10.1MB/s ± 4% ~ (p=0.435 n=10+9)
RegexpMatchMedium_1K-4 34.0MB/s ± 1% 33.7MB/s ± 2% ~ (p=0.173 n=8+10)
RegexpMatchHard_32-4 22.9MB/s ± 2% 22.7MB/s ± 4% ~ (p=0.565 n=10+10)
RegexpMatchHard_1K-4 24.0MB/s ± 3% 24.7MB/s ± 3% +2.64% (p=0.001 n=9+9)
Revcomp-4 766MB/s ± 4% 775MB/s ± 5% ~ (p=0.720 n=9+10)
Template-4 40.2MB/s ± 2% 39.2MB/s ± 3% -2.47% (p=0.002 n=8+10)
The rules match ~1800 times during all.bash.
Fixes #18943
Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4
Reviewed-on: https://go-review.googlesource.com/94766
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-24 02:38:50 +00:00
Alberto Donizetti
fc6280d4b0
test/codegen: port direct comparisons with memory tests
...
And remove them from asm_test.
Change-Id: I1ca29b40546d6de06f20bfd550ed8ff87f495454
Reviewed-on: https://go-review.googlesource.com/102115
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-22 17:20:09 +00:00
Alberto Donizetti
be371edd67
test/codegen: port comparisons tests to codegen
...
And delete them from asm_test.
Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab
Reviewed-on: https://go-review.googlesource.com/101595
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-20 19:38:06 +00:00
Vladimir Kuzmin
c12b185a6e
cmd/compile: avoid mapaccess at m[k]=append(m[k]..
...
Currently rvalue m[k] is transformed during walk into:
tmp1 := *mapaccess(m, k)
tmp2 := append(tmp1, ...)
*mapassign(m, k) = tmp2
However, this is suboptimal, as we could instead produce just:
tmp := mapassign(m, k)
*tmp := append(*tmp, ...)
Optimization is possible only if during Order it may tell that m[k] is
exactly the same at left and right part of assignment. It doesn't work:
1) m[f(k)] = append(m[f(k)], ...)
2) sink, m[k] = sink, append(m[k]...)
3) m[k] = append(..., m[k],...)
Benchmark:
name old time/op new time/op delta
MapAppendAssign/Int32/256-8 33.5ns ± 3% 22.4ns ±10% -33.24% (p=0.000 n=16+18)
MapAppendAssign/Int32/65536-8 68.2ns ± 6% 48.5ns ±29% -28.90% (p=0.000 n=20+20)
MapAppendAssign/Int64/256-8 34.3ns ± 4% 23.3ns ± 5% -32.23% (p=0.000 n=17+18)
MapAppendAssign/Int64/65536-8 65.9ns ± 7% 61.2ns ±19% -7.06% (p=0.002 n=18+20)
MapAppendAssign/Str/256-8 116ns ±12% 79ns ±16% -31.70% (p=0.000 n=20+19)
MapAppendAssign/Str/65536-8 134ns ±15% 111ns ±45% -16.95% (p=0.000 n=19+20)
name old alloc/op new alloc/op delta
MapAppendAssign/Int32/256-8 47.0B ± 0% 46.0B ± 0% -2.13% (p=0.000 n=19+18)
MapAppendAssign/Int32/65536-8 27.0B ± 0% 20.7B ±30% -23.33% (p=0.000 n=20+20)
MapAppendAssign/Int64/256-8 47.0B ± 0% 46.0B ± 0% -2.13% (p=0.000 n=20+17)
MapAppendAssign/Int64/65536-8 27.0B ± 0% 27.0B ± 0% ~ (all equal)
MapAppendAssign/Str/256-8 94.0B ± 0% 78.0B ± 0% -17.02% (p=0.000 n=20+16)
MapAppendAssign/Str/65536-8 54.0B ± 0% 54.0B ± 0% ~ (all equal)
Fixes #24364
Updates #5147
Change-Id: Id257d052b75b9a445b4885dc571bf06ce6f6b409
Reviewed-on: https://go-review.googlesource.com/100838
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-20 01:47:07 +00:00
Alberto Donizetti
5a4e09837c
test/codegen: port maps test to codegen
...
And delete them from asm_test.
Change-Id: I3cf0934706a640136cb0f646509174f8c1bf3363
Reviewed-on: https://go-review.googlesource.com/101395
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-19 13:39:34 +00:00
Alberto Donizetti
b61b1d2c57
test/codegen: port structs test to codegen
...
And delete them from asm_test.
Change-Id: Ia286239a3d8f3915f2ca25dbcb39f3354a4f8aea
Reviewed-on: https://go-review.googlesource.com/101138
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-18 16:53:53 +00:00
Alberto Donizetti
cceee685be
test/codegen: port floats tests to codegen
...
And delete them from asm_test.
Change-Id: Ibdaca3496eefc73c731b511ddb9636a1f3dff68c
Reviewed-on: https://go-review.googlesource.com/100915
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-15 18:05:59 +00:00
Giovanni Bajo
a35ec9a59e
cmd/compile: implement CMOV on amd64
...
This builds upon the branchelim pass, activating it for amd64 and
lowering CondSelect. Special care is made to FPU instructions for
NaN handling.
Benchmark results on Xeon E5630 (Westmere EP):
name old time/op new time/op delta
BinaryTree17-16 4.99s ± 9% 4.66s ± 2% ~ (p=0.095 n=5+5)
Fannkuch11-16 4.93s ± 3% 5.04s ± 2% ~ (p=0.548 n=5+5)
FmtFprintfEmpty-16 58.8ns ± 7% 61.4ns ±14% ~ (p=0.579 n=5+5)
FmtFprintfString-16 114ns ± 2% 114ns ± 4% ~ (p=0.603 n=5+5)
FmtFprintfInt-16 181ns ± 4% 125ns ± 3% -30.90% (p=0.008 n=5+5)
FmtFprintfIntInt-16 263ns ± 2% 217ns ± 2% -17.34% (p=0.008 n=5+5)
FmtFprintfPrefixedInt-16 230ns ± 1% 212ns ± 1% -7.99% (p=0.008 n=5+5)
FmtFprintfFloat-16 411ns ± 3% 344ns ± 5% -16.43% (p=0.008 n=5+5)
FmtManyArgs-16 828ns ± 4% 790ns ± 2% -4.59% (p=0.032 n=5+5)
GobDecode-16 10.9ms ± 4% 10.8ms ± 5% ~ (p=0.548 n=5+5)
GobEncode-16 9.52ms ± 5% 9.46ms ± 2% ~ (p=1.000 n=5+5)
Gzip-16 334ms ± 2% 337ms ± 2% ~ (p=0.548 n=5+5)
Gunzip-16 64.4ms ± 1% 65.0ms ± 1% +1.00% (p=0.008 n=5+5)
HTTPClientServer-16 156µs ± 3% 155µs ± 3% ~ (p=0.690 n=5+5)
JSONEncode-16 21.0ms ± 1% 21.8ms ± 0% +3.76% (p=0.016 n=5+4)
JSONDecode-16 95.1ms ± 0% 95.7ms ± 1% ~ (p=0.151 n=5+5)
Mandelbrot200-16 6.38ms ± 1% 6.42ms ± 1% ~ (p=0.095 n=5+5)
GoParse-16 5.47ms ± 2% 5.36ms ± 1% -1.95% (p=0.016 n=5+5)
RegexpMatchEasy0_32-16 111ns ± 1% 111ns ± 1% ~ (p=0.635 n=5+4)
RegexpMatchEasy0_1K-16 408ns ± 1% 411ns ± 2% ~ (p=0.087 n=5+5)
RegexpMatchEasy1_32-16 103ns ± 1% 104ns ± 1% ~ (p=0.484 n=5+5)
RegexpMatchEasy1_1K-16 659ns ± 2% 652ns ± 1% ~ (p=0.571 n=5+5)
RegexpMatchMedium_32-16 176ns ± 2% 174ns ± 1% ~ (p=0.476 n=5+5)
RegexpMatchMedium_1K-16 58.6µs ± 4% 57.7µs ± 4% ~ (p=0.548 n=5+5)
RegexpMatchHard_32-16 3.07µs ± 3% 3.04µs ± 4% ~ (p=0.421 n=5+5)
RegexpMatchHard_1K-16 89.2µs ± 1% 87.9µs ± 2% -1.52% (p=0.032 n=5+5)
Revcomp-16 575ms ± 0% 587ms ± 2% +2.12% (p=0.032 n=4+5)
Template-16 110ms ± 1% 107ms ± 3% -3.00% (p=0.032 n=5+5)
TimeParse-16 463ns ± 0% 462ns ± 0% ~ (p=0.810 n=5+4)
TimeFormat-16 538ns ± 0% 535ns ± 0% -0.63% (p=0.024 n=5+5)
name old speed new speed delta
GobDecode-16 70.7MB/s ± 4% 71.4MB/s ± 5% ~ (p=0.452 n=5+5)
GobEncode-16 80.7MB/s ± 5% 81.2MB/s ± 2% ~ (p=1.000 n=5+5)
Gzip-16 58.2MB/s ± 2% 57.7MB/s ± 2% ~ (p=0.452 n=5+5)
Gunzip-16 302MB/s ± 1% 299MB/s ± 1% -0.99% (p=0.008 n=5+5)
JSONEncode-16 92.4MB/s ± 1% 89.1MB/s ± 0% -3.63% (p=0.016 n=5+4)
JSONDecode-16 20.4MB/s ± 0% 20.3MB/s ± 1% ~ (p=0.135 n=5+5)
GoParse-16 10.6MB/s ± 2% 10.8MB/s ± 1% +2.00% (p=0.016 n=5+5)
RegexpMatchEasy0_32-16 286MB/s ± 1% 285MB/s ± 3% ~ (p=1.000 n=5+5)
RegexpMatchEasy0_1K-16 2.51GB/s ± 1% 2.49GB/s ± 2% ~ (p=0.095 n=5+5)
RegexpMatchEasy1_32-16 309MB/s ± 1% 307MB/s ± 1% ~ (p=0.548 n=5+5)
RegexpMatchEasy1_1K-16 1.55GB/s ± 2% 1.57GB/s ± 1% ~ (p=0.690 n=5+5)
RegexpMatchMedium_32-16 5.68MB/s ± 2% 5.73MB/s ± 1% ~ (p=0.579 n=5+5)
RegexpMatchMedium_1K-16 17.5MB/s ± 4% 17.8MB/s ± 4% ~ (p=0.500 n=5+5)
RegexpMatchHard_32-16 10.4MB/s ± 3% 10.5MB/s ± 4% ~ (p=0.460 n=5+5)
RegexpMatchHard_1K-16 11.5MB/s ± 1% 11.7MB/s ± 2% +1.57% (p=0.032 n=5+5)
Revcomp-16 442MB/s ± 0% 433MB/s ± 2% -2.05% (p=0.032 n=4+5)
Template-16 17.7MB/s ± 1% 18.2MB/s ± 3% +3.12% (p=0.032 n=5+5)
Change-Id: I6972e8f35f2b31f9a42ac473a6bf419a18022558
Reviewed-on: https://go-review.googlesource.com/100935
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-15 16:41:59 +00:00
Geoff Berry
e244a7a7d3
cmd/compile/internal/ssa: add patterns for arm64 bitfield opcodes
...
Add patterns to match common idioms for EXTR, BFI, BFXIL, SBFIZ, SBFX,
UBFIZ and UBFX opcodes.
go1 benchmarks results on Amberwing:
name old time/op new time/op delta
FmtManyArgs 786ns ± 2% 714ns ± 1% -9.20% (p=0.000 n=10+10)
Gzip 437ms ± 0% 402ms ± 0% -7.99% (p=0.000 n=10+10)
FmtFprintfIntInt 196ns ± 0% 182ns ± 0% -7.28% (p=0.000 n=10+9)
FmtFprintfPrefixedInt 207ns ± 0% 199ns ± 0% -3.86% (p=0.000 n=10+10)
FmtFprintfFloat 324ns ± 0% 316ns ± 0% -2.47% (p=0.000 n=10+8)
FmtFprintfInt 119ns ± 0% 117ns ± 0% -1.68% (p=0.000 n=10+9)
GobDecode 12.8ms ± 2% 12.6ms ± 1% -1.62% (p=0.002 n=10+10)
JSONDecode 94.4ms ± 1% 93.4ms ± 0% -1.10% (p=0.000 n=10+10)
RegexpMatchEasy0_32 247ns ± 0% 245ns ± 0% -0.65% (p=0.000 n=10+10)
RegexpMatchMedium_32 314ns ± 0% 312ns ± 0% -0.64% (p=0.000 n=10+10)
RegexpMatchEasy0_1K 541ns ± 0% 538ns ± 0% -0.55% (p=0.000 n=10+9)
TimeParse 450ns ± 1% 448ns ± 1% -0.42% (p=0.035 n=9+9)
RegexpMatchEasy1_32 244ns ± 0% 243ns ± 0% -0.41% (p=0.000 n=10+10)
GoParse 6.03ms ± 0% 6.00ms ± 0% -0.40% (p=0.002 n=10+10)
RegexpMatchEasy1_1K 779ns ± 0% 777ns ± 0% -0.26% (p=0.000 n=10+10)
RegexpMatchHard_32 2.75µs ± 0% 2.74µs ± 1% -0.06% (p=0.026 n=9+9)
BinaryTree17 11.7s ± 0% 11.6s ± 0% ~ (p=0.089 n=10+10)
HTTPClientServer 89.1µs ± 1% 89.5µs ± 2% ~ (p=0.436 n=10+10)
RegexpMatchHard_1K 78.9µs ± 0% 79.5µs ± 2% ~ (p=0.469 n=10+10)
FmtFprintfEmpty 58.5ns ± 0% 58.5ns ± 0% ~ (all equal)
GobEncode 12.0ms ± 1% 12.1ms ± 0% ~ (p=0.075 n=10+10)
Revcomp 669ms ± 0% 668ms ± 0% ~ (p=0.091 n=7+9)
Mandelbrot200 5.35ms ± 0% 5.36ms ± 0% +0.07% (p=0.000 n=9+9)
RegexpMatchMedium_1K 52.1µs ± 0% 52.1µs ± 0% +0.10% (p=0.000 n=9+9)
Fannkuch11 3.25s ± 0% 3.26s ± 0% +0.36% (p=0.000 n=9+10)
FmtFprintfString 114ns ± 1% 115ns ± 0% +0.52% (p=0.011 n=10+10)
JSONEncode 20.2ms ± 0% 20.3ms ± 0% +0.65% (p=0.000 n=10+10)
Template 91.3ms ± 0% 92.3ms ± 0% +1.08% (p=0.000 n=10+10)
TimeFormat 484ns ± 0% 495ns ± 1% +2.30% (p=0.000 n=9+10)
There are some opportunities to improve this change further by adding
patterns to match the "extended register" versions of ADD/SUB/CMP, but I
think that should be evaluated on its own. The regressions in Template
and TimeFormat would likely be recovered by this, as they seem to be due
to generating:
ubfiz x0, x0, #3 , #8
add x1, x2, x0
instead of
add x1, x2, x0, lsl #3
Change-Id: I5644a8d70ac7a98e784a377a2b76ab47a3415a4b
Reviewed-on: https://go-review.googlesource.com/88355
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-15 14:10:41 +00:00
Alberto Donizetti
ded9a1b372
test/codegen: port len/cap pow2 div tests to codegen
...
And delete them from asm_test.
Change-Id: I29c8d098a8893e6b669b6272a2f508985ac9d618
Reviewed-on: https://go-review.googlesource.com/100876
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-15 13:34:01 +00:00
Ilya Tocar
644d14ea0f
Revert "cmd/compile: implement CMOV on amd64"
...
This reverts commit 080187f4f72bd6594e3c2efc35cf51bf61378552.
It broke build of golang.org/x/exp/shiny/iconvg
See issue 24395 for details
Change-Id: Ifd6134f6214e6cee40bd3c63c32941d5fc96ae8b
Reviewed-on: https://go-review.googlesource.com/100755
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-14 21:21:23 +00:00
Alberto Donizetti
cd3aae9b81
test/codegen: port all small memmove tests to codegen
...
This change ports all the remaining tests checking that small memmoves
are replaced with MOVs to the new codegen test harness, and deletes
them from the asm_test file.
Change-Id: I01c94b441e27a5d61518035af62d62779dafeb56
Reviewed-on: https://go-review.googlesource.com/100476
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-14 15:57:07 +00:00
Alberto Donizetti
858042b8fd
test/codegen: add codegen tests for div
...
Change-Id: I6ce8981e85fd55ade6078b0946e54a9215d9deca
Reviewed-on: https://go-review.googlesource.com/100575
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-14 15:56:46 +00:00
isharipo
85a8d25d53
cmd/compile/internal/ssa: emit IMUL3{L/Q} for MUL{L/Q}const on x86
...
cmd/asm now supports three-operand form of IMUL,
so instead of using IMUL with resultInArg0, emit IMUL3 instruction.
This results in less redundant MOVs where SSA assigns
different registers to input[0] and dst arguments.
Note: these have exactly the same encoding when reg0=reg1:
IMUL3x $const, reg0, reg1
IMULx $const, reg
Two-operand IMULx is like a crippled IMUL3x, with dst fixed to input[0].
This is why we don't bother to generate IMULx for the case where
dst is the same as input[0].
Change-Id: I4becda475b3dffdd07b6fdf1c75bacc82af654e4
Reviewed-on: https://go-review.googlesource.com/99656
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-12 19:02:36 +00:00
Giovanni Bajo
080187f4f7
cmd/compile: implement CMOV on amd64
...
This builds upon the branchelim pass, activating it for amd64 and
lowering CondSelect. Special care is made to FPU instructions for
NaN handling.
Benchmark results on Xeon E5630 (Westmere EP):
name old time/op new time/op delta
BinaryTree17-16 4.99s ± 9% 4.66s ± 2% ~ (p=0.095 n=5+5)
Fannkuch11-16 4.93s ± 3% 5.04s ± 2% ~ (p=0.548 n=5+5)
FmtFprintfEmpty-16 58.8ns ± 7% 61.4ns ±14% ~ (p=0.579 n=5+5)
FmtFprintfString-16 114ns ± 2% 114ns ± 4% ~ (p=0.603 n=5+5)
FmtFprintfInt-16 181ns ± 4% 125ns ± 3% -30.90% (p=0.008 n=5+5)
FmtFprintfIntInt-16 263ns ± 2% 217ns ± 2% -17.34% (p=0.008 n=5+5)
FmtFprintfPrefixedInt-16 230ns ± 1% 212ns ± 1% -7.99% (p=0.008 n=5+5)
FmtFprintfFloat-16 411ns ± 3% 344ns ± 5% -16.43% (p=0.008 n=5+5)
FmtManyArgs-16 828ns ± 4% 790ns ± 2% -4.59% (p=0.032 n=5+5)
GobDecode-16 10.9ms ± 4% 10.8ms ± 5% ~ (p=0.548 n=5+5)
GobEncode-16 9.52ms ± 5% 9.46ms ± 2% ~ (p=1.000 n=5+5)
Gzip-16 334ms ± 2% 337ms ± 2% ~ (p=0.548 n=5+5)
Gunzip-16 64.4ms ± 1% 65.0ms ± 1% +1.00% (p=0.008 n=5+5)
HTTPClientServer-16 156µs ± 3% 155µs ± 3% ~ (p=0.690 n=5+5)
JSONEncode-16 21.0ms ± 1% 21.8ms ± 0% +3.76% (p=0.016 n=5+4)
JSONDecode-16 95.1ms ± 0% 95.7ms ± 1% ~ (p=0.151 n=5+5)
Mandelbrot200-16 6.38ms ± 1% 6.42ms ± 1% ~ (p=0.095 n=5+5)
GoParse-16 5.47ms ± 2% 5.36ms ± 1% -1.95% (p=0.016 n=5+5)
RegexpMatchEasy0_32-16 111ns ± 1% 111ns ± 1% ~ (p=0.635 n=5+4)
RegexpMatchEasy0_1K-16 408ns ± 1% 411ns ± 2% ~ (p=0.087 n=5+5)
RegexpMatchEasy1_32-16 103ns ± 1% 104ns ± 1% ~ (p=0.484 n=5+5)
RegexpMatchEasy1_1K-16 659ns ± 2% 652ns ± 1% ~ (p=0.571 n=5+5)
RegexpMatchMedium_32-16 176ns ± 2% 174ns ± 1% ~ (p=0.476 n=5+5)
RegexpMatchMedium_1K-16 58.6µs ± 4% 57.7µs ± 4% ~ (p=0.548 n=5+5)
RegexpMatchHard_32-16 3.07µs ± 3% 3.04µs ± 4% ~ (p=0.421 n=5+5)
RegexpMatchHard_1K-16 89.2µs ± 1% 87.9µs ± 2% -1.52% (p=0.032 n=5+5)
Revcomp-16 575ms ± 0% 587ms ± 2% +2.12% (p=0.032 n=4+5)
Template-16 110ms ± 1% 107ms ± 3% -3.00% (p=0.032 n=5+5)
TimeParse-16 463ns ± 0% 462ns ± 0% ~ (p=0.810 n=5+4)
TimeFormat-16 538ns ± 0% 535ns ± 0% -0.63% (p=0.024 n=5+5)
name old speed new speed delta
GobDecode-16 70.7MB/s ± 4% 71.4MB/s ± 5% ~ (p=0.452 n=5+5)
GobEncode-16 80.7MB/s ± 5% 81.2MB/s ± 2% ~ (p=1.000 n=5+5)
Gzip-16 58.2MB/s ± 2% 57.7MB/s ± 2% ~ (p=0.452 n=5+5)
Gunzip-16 302MB/s ± 1% 299MB/s ± 1% -0.99% (p=0.008 n=5+5)
JSONEncode-16 92.4MB/s ± 1% 89.1MB/s ± 0% -3.63% (p=0.016 n=5+4)
JSONDecode-16 20.4MB/s ± 0% 20.3MB/s ± 1% ~ (p=0.135 n=5+5)
GoParse-16 10.6MB/s ± 2% 10.8MB/s ± 1% +2.00% (p=0.016 n=5+5)
RegexpMatchEasy0_32-16 286MB/s ± 1% 285MB/s ± 3% ~ (p=1.000 n=5+5)
RegexpMatchEasy0_1K-16 2.51GB/s ± 1% 2.49GB/s ± 2% ~ (p=0.095 n=5+5)
RegexpMatchEasy1_32-16 309MB/s ± 1% 307MB/s ± 1% ~ (p=0.548 n=5+5)
RegexpMatchEasy1_1K-16 1.55GB/s ± 2% 1.57GB/s ± 1% ~ (p=0.690 n=5+5)
RegexpMatchMedium_32-16 5.68MB/s ± 2% 5.73MB/s ± 1% ~ (p=0.579 n=5+5)
RegexpMatchMedium_1K-16 17.5MB/s ± 4% 17.8MB/s ± 4% ~ (p=0.500 n=5+5)
RegexpMatchHard_32-16 10.4MB/s ± 3% 10.5MB/s ± 4% ~ (p=0.460 n=5+5)
RegexpMatchHard_1K-16 11.5MB/s ± 1% 11.7MB/s ± 2% +1.57% (p=0.032 n=5+5)
Revcomp-16 442MB/s ± 0% 433MB/s ± 2% -2.05% (p=0.032 n=4+5)
Template-16 17.7MB/s ± 1% 18.2MB/s ± 3% +3.12% (p=0.032 n=5+5)
Change-Id: Ic7cb7374d07da031e771bdcbfdd832fd1b17159c
Reviewed-on: https://go-review.googlesource.com/98695
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-03-12 18:01:33 +00:00
Giovanni Bajo
f7ac70a566
test: move rotate tests to top-level testsuite.
...
Remove old tests from asm_test.
Change-Id: Ib408ec7faa60068bddecf709b93ce308e0ef665a
Reviewed-on: https://go-review.googlesource.com/100075
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
2018-03-11 10:08:18 +00:00
Alberto Donizetti
8e427a3878
test/codegen: add README file for the codegen test harness
...
This change adds a README file inside the test/codegen directory,
explaining how to run the codegen tests and the syntax of the regexps
comments used to match assembly instructions.
Change-Id: Ica4eb3ffa9c6975371538cc8ae0ac3c1a3a03baf
Reviewed-on: https://go-review.googlesource.com/99156
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-09 18:38:53 +00:00
Alberto Donizetti
5f541b11aa
test/codegen: port MULs merging tests to codegen
...
And delete them from asm_go.
Change-Id: I0057cbd90ca55fa51c596e32406e190f3866f93e
Reviewed-on: https://go-review.googlesource.com/99815
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-09 17:01:56 +00:00
Alberto Donizetti
cde34780b7
test/codegen: port math/bits.RotateLeft tests to codegen
...
Only RotateLeft{64,32} were tested, and just for ppc64. This CL adds
tests for RotateLeft{64,32,16,8} on arm64 and amd64/386, for the cases
where the calls are actually instrinsified.
RotateLeft tests (the last ones for math/bits functions) are deleted
from asm_test.
This CL also adds a space between the "//" and the arch name in the
comments, to uniform this file to the style used in all the other
files.
Change-Id: Ifc2a27261d70bcc294b4ec64490d8367f62d2b89
Reviewed-on: https://go-review.googlesource.com/99596
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-09 10:53:38 +00:00
Alberto Donizetti
3772b2e1d5
test/codegen: port 2^n muls tests to codegen harness
...
And delete them from the asm_test.go file.
Change-Id: I124c8c352299646ec7db0968cdb0fe59a3b5d83d
Reviewed-on: https://go-review.googlesource.com/99475
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-08 16:30:14 +00:00
Alberto Donizetti
c028958393
test/codegen: fix issue with arm64 memmove codegen test
...
This recently added arm64 memmove codegen check:
func movesmall() {
// arm64:-"memmove"
x := [...]byte{1, 2, 3, 4, 5, 6, 7}
copy(x[1:], x[:])
}
is not correct, for two reasons:
1. regexps are matched from the start of the disasm line (excluding
line information). This mean that a negative -"memmove" check will
pass against a 'CALL runtime.memmove' line because the line does
not start with 'memmove' (its starts with CALL...).
The way to specify no 'memmove' match whatsoever on the line is
-".*memmove"
2. AFAIK comments on their own line are matched against the first
subsequent non-comment line. So the code above only verifies that
the x := ... line does not generate a memmove. The comment should
be moved near the copy() line, if it's that one we want to not
generate a memmove call.
The fact that the test above is not effective can be checked by
running `go run run.go -v codegen` in the toplevel test directory with
a go1.10 toolchain (that does not have the memmove-elision
optimization). The test will still pass (it shouldn't).
This change changes the regexp to -".*memmove" and moves it near the
line it needs to (not)match.
Change-Id: Ie01ef4d775e77d92dc8d8b7856b89b200f5e5ef2
Reviewed-on: https://go-review.googlesource.com/98977
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-07 16:41:24 +00:00