cybercyst/go - go - Gitea: Git with a cup of tea

mirror of https://github.com/golang/go.git synced 2025-05-24 17:01:26 +00:00

Author	SHA1	Message	Date
Ben Shi	2294e3ebd3	cmd/compile: combine similar code in amd64's assembly generator This CL combines similar code in amd64's assembly generator. The total size of pkg/linux_amd64/cmd/compile/ decreases about 4.5KB, while the generated amd64 code is not affected. Change-Id: I4cdbdd22bde8857aafdc29b47fa100a906fa1598 Reviewed-on: https://go-review.googlesource.com/c/140298 Run-TryBot: Ben Shi <powerman1st@163.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-10-06 08:50:33 +00:00
Ben Shi	c6bf9a8109	cmd/compile: optimize AMD64's bit wise operation Currently "arr[idx] \|= 0x80" is compiled to MOVLload->BTSL->MOVLstore. And this CL optimizes it to a single BTSLconstmodify. Other bit wise operations with a direct memory operand are also implemented. 1. The size of the executable bin/go decreases about 4KB, and the total size of pkg/linux_amd64 (excluding cmd/compile) decreases about 0.6KB. 2. There a little improvement in the go1 benchmark test (excluding noise). name old time/op new time/op delta BinaryTree17-4 2.66s ± 4% 2.66s ± 3% ~ (p=0.596 n=49+49) Fannkuch11-4 2.38s ± 2% 2.32s ± 2% -2.69% (p=0.000 n=50+50) FmtFprintfEmpty-4 42.7ns ± 4% 43.2ns ± 7% +1.31% (p=0.009 n=50+50) FmtFprintfString-4 71.0ns ± 5% 72.0ns ± 3% +1.33% (p=0.000 n=50+50) FmtFprintfInt-4 80.7ns ± 4% 80.6ns ± 3% ~ (p=0.931 n=50+50) FmtFprintfIntInt-4 125ns ± 3% 126ns ± 4% ~ (p=0.051 n=50+50) FmtFprintfPrefixedInt-4 158ns ± 1% 142ns ± 3% -9.84% (p=0.000 n=36+50) FmtFprintfFloat-4 215ns ± 4% 212ns ± 4% -1.23% (p=0.002 n=50+50) FmtManyArgs-4 519ns ± 3% 510ns ± 3% -1.77% (p=0.000 n=50+50) GobDecode-4 6.49ms ± 6% 6.52ms ± 5% ~ (p=0.866 n=50+50) GobEncode-4 5.93ms ± 8% 6.01ms ± 7% ~ (p=0.076 n=50+50) Gzip-4 222ms ± 4% 224ms ± 8% +0.80% (p=0.001 n=50+50) Gunzip-4 36.6ms ± 5% 36.4ms ± 4% ~ (p=0.093 n=50+50) HTTPClientServer-4 59.1µs ± 1% 58.9µs ± 2% -0.24% (p=0.039 n=49+48) JSONEncode-4 9.23ms ± 4% 9.21ms ± 5% ~ (p=0.244 n=50+50) JSONDecode-4 48.8ms ± 4% 48.7ms ± 4% ~ (p=0.653 n=50+50) Mandelbrot200-4 3.81ms ± 4% 3.80ms ± 3% ~ (p=0.834 n=50+50) GoParse-4 3.20ms ± 5% 3.19ms ± 5% ~ (p=0.494 n=50+50) RegexpMatchEasy0_32-4 78.1ns ± 2% 77.4ns ± 3% -0.86% (p=0.005 n=50+50) RegexpMatchEasy0_1K-4 233ns ± 3% 233ns ± 3% ~ (p=0.074 n=50+50) RegexpMatchEasy1_32-4 74.2ns ± 3% 73.4ns ± 3% -1.06% (p=0.000 n=50+50) RegexpMatchEasy1_1K-4 369ns ± 2% 364ns ± 4% -1.41% (p=0.000 n=36+50) RegexpMatchMedium_32-4 109ns ± 4% 107ns ± 3% -2.06% (p=0.001 n=50+50) RegexpMatchMedium_1K-4 31.5µs ± 3% 30.8µs ± 3% -2.20% (p=0.000 n=50+50) RegexpMatchHard_32-4 1.57µs ± 3% 1.56µs ± 2% -0.57% (p=0.016 n=50+50) RegexpMatchHard_1K-4 47.4µs ± 4% 47.0µs ± 3% -0.82% (p=0.008 n=50+50) Revcomp-4 414ms ± 7% 412ms ± 7% ~ (p=0.285 n=50+50) Template-4 64.3ms ± 4% 62.7ms ± 3% -2.44% (p=0.000 n=50+50) TimeParse-4 316ns ± 3% 313ns ± 3% ~ (p=0.122 n=50+50) TimeFormat-4 291ns ± 3% 293ns ± 3% +0.80% (p=0.001 n=50+50) [Geo mean] 46.5µs 46.2µs -0.81% name old speed new speed delta GobDecode-4 118MB/s ± 6% 118MB/s ± 5% ~ (p=0.863 n=50+50) GobEncode-4 130MB/s ± 9% 128MB/s ± 8% ~ (p=0.076 n=50+50) Gzip-4 87.4MB/s ± 4% 86.8MB/s ± 7% -0.78% (p=0.002 n=50+50) Gunzip-4 531MB/s ± 5% 533MB/s ± 4% ~ (p=0.093 n=50+50) JSONEncode-4 210MB/s ± 4% 211MB/s ± 5% ~ (p=0.247 n=50+50) JSONDecode-4 39.8MB/s ± 4% 39.9MB/s ± 4% ~ (p=0.654 n=50+50) GoParse-4 18.1MB/s ± 5% 18.2MB/s ± 5% ~ (p=0.493 n=50+50) RegexpMatchEasy0_32-4 410MB/s ± 2% 413MB/s ± 3% +0.86% (p=0.004 n=50+50) RegexpMatchEasy0_1K-4 4.39GB/s ± 3% 4.38GB/s ± 3% ~ (p=0.063 n=50+50) RegexpMatchEasy1_32-4 432MB/s ± 3% 436MB/s ± 3% +1.07% (p=0.000 n=50+50) RegexpMatchEasy1_1K-4 2.77GB/s ± 2% 2.81GB/s ± 4% +1.46% (p=0.000 n=36+50) RegexpMatchMedium_32-4 9.16MB/s ± 3% 9.35MB/s ± 4% +2.09% (p=0.001 n=50+50) RegexpMatchMedium_1K-4 32.5MB/s ± 3% 33.2MB/s ± 3% +2.25% (p=0.000 n=50+50) RegexpMatchHard_32-4 20.4MB/s ± 3% 20.5MB/s ± 2% +0.56% (p=0.017 n=50+50) RegexpMatchHard_1K-4 21.6MB/s ± 4% 21.8MB/s ± 3% +0.83% (p=0.008 n=50+50) Revcomp-4 613MB/s ± 4% 618MB/s ± 7% ~ (p=0.152 n=48+50) Template-4 30.2MB/s ± 4% 30.9MB/s ± 3% +2.49% (p=0.000 n=50+50) [Geo mean] 127MB/s 128MB/s +0.64% Change-Id: If405198283855d75697f66cf894b2bef458f620e Reviewed-on: https://go-review.googlesource.com/135422 Reviewed-by: Keith Randall <khr@golang.org>	2018-09-19 03:00:58 +00:00
Ben Shi	d17ac29158	cmd/compile: simplify AMD64's assembly generator AMD64's ADDQconstmodify/ADDLconstmodify have similar logic with other constmodify like operators, but seperated case statements. This CL simplify them with a fallthrough. Change-Id: Ia73ffeaddc5080182f68c06c9d9b48fe32a14e38 Reviewed-on: https://go-review.googlesource.com/135855 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-09-19 01:35:46 +00:00
Ben Shi	4545fea4fc	cmd/compile/internal/amd64: simplify assembly generator Merge two case-statements together, since they have similar logic. 1. That makes the assembly generator more clear. 2. The total size of cmd/compile decreases about 0.8KB. Change-Id: I0144a07152202ee7b21e323bcd5dea80a351a6e3 Reviewed-on: https://go-review.googlesource.com/134215 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-09-08 23:59:26 +00:00
Ben Shi	3bc34385fa	cmd/compile: introduce more read-modify-write operations for amd64 Add suport of read-modify-write for AND/SUB/AND/OR/XOR on amd64. 1. The total size of pkg/linux_amd64 decreases about 4KB, excluding cmd/compile. 2. The go1 benchmark shows a little improvement, excluding noise. name old time/op new time/op delta BinaryTree17-4 2.63s ± 3% 2.65s ± 4% +1.01% (p=0.037 n=35+35) Fannkuch11-4 2.33s ± 2% 2.39s ± 2% +2.49% (p=0.000 n=35+35) FmtFprintfEmpty-4 45.4ns ± 5% 40.8ns ± 6% -10.09% (p=0.000 n=35+35) FmtFprintfString-4 73.3ns ± 4% 70.9ns ± 3% -3.23% (p=0.000 n=30+35) FmtFprintfInt-4 79.9ns ± 4% 79.5ns ± 3% ~ (p=0.736 n=34+35) FmtFprintfIntInt-4 126ns ± 4% 125ns ± 4% ~ (p=0.083 n=35+35) FmtFprintfPrefixedInt-4 152ns ± 6% 152ns ± 3% ~ (p=0.855 n=34+35) FmtFprintfFloat-4 215ns ± 4% 213ns ± 4% ~ (p=0.066 n=35+35) FmtManyArgs-4 522ns ± 3% 506ns ± 3% -3.15% (p=0.000 n=35+35) GobDecode-4 6.45ms ± 8% 6.51ms ± 7% +0.96% (p=0.026 n=35+35) GobEncode-4 6.10ms ± 6% 6.02ms ± 8% ~ (p=0.160 n=35+35) Gzip-4 228ms ± 3% 221ms ± 3% -2.92% (p=0.000 n=35+35) Gunzip-4 37.5ms ± 4% 37.2ms ± 3% -0.78% (p=0.036 n=35+35) HTTPClientServer-4 58.7µs ± 2% 59.2µs ± 1% +0.80% (p=0.000 n=33+33) JSONEncode-4 12.0ms ± 3% 12.2ms ± 3% +1.84% (p=0.008 n=35+35) JSONDecode-4 57.0ms ± 4% 56.6ms ± 3% ~ (p=0.320 n=35+35) Mandelbrot200-4 3.82ms ± 3% 3.79ms ± 3% ~ (p=0.074 n=35+35) GoParse-4 3.21ms ± 5% 3.24ms ± 4% ~ (p=0.119 n=35+35) RegexpMatchEasy0_32-4 76.3ns ± 4% 75.4ns ± 4% -1.14% (p=0.014 n=34+33) RegexpMatchEasy0_1K-4 251ns ± 4% 254ns ± 3% +1.28% (p=0.016 n=35+35) RegexpMatchEasy1_32-4 69.6ns ± 3% 70.1ns ± 3% +0.82% (p=0.005 n=35+35) RegexpMatchEasy1_1K-4 367ns ± 4% 376ns ± 4% +2.47% (p=0.000 n=35+35) RegexpMatchMedium_32-4 108ns ± 5% 104ns ± 4% -3.18% (p=0.000 n=35+35) RegexpMatchMedium_1K-4 33.8µs ± 3% 32.7µs ± 3% -3.27% (p=0.000 n=35+35) RegexpMatchHard_32-4 1.55µs ± 3% 1.52µs ± 3% -1.64% (p=0.000 n=35+35) RegexpMatchHard_1K-4 46.6µs ± 3% 46.6µs ± 4% ~ (p=0.149 n=35+35) Revcomp-4 416ms ± 7% 412ms ± 6% -0.95% (p=0.033 n=33+35) Template-4 64.3ms ± 3% 62.4ms ± 7% -2.94% (p=0.000 n=35+35) TimeParse-4 320ns ± 2% 322ns ± 3% ~ (p=0.589 n=35+35) TimeFormat-4 300ns ± 3% 300ns ± 3% ~ (p=0.597 n=35+35) [Geo mean] 47.4µs 47.0µs -0.86% name old speed new speed delta GobDecode-4 119MB/s ± 7% 118MB/s ± 7% -0.96% (p=0.027 n=35+35) GobEncode-4 126MB/s ± 7% 127MB/s ± 6% ~ (p=0.157 n=34+34) Gzip-4 85.3MB/s ± 3% 87.9MB/s ± 3% +3.02% (p=0.000 n=35+35) Gunzip-4 518MB/s ± 4% 522MB/s ± 3% +0.79% (p=0.037 n=35+35) JSONEncode-4 162MB/s ± 3% 159MB/s ± 3% -1.81% (p=0.009 n=35+35) JSONDecode-4 34.1MB/s ± 4% 34.3MB/s ± 3% ~ (p=0.318 n=35+35) GoParse-4 18.0MB/s ± 5% 17.9MB/s ± 4% ~ (p=0.117 n=35+35) RegexpMatchEasy0_32-4 419MB/s ± 3% 425MB/s ± 4% +1.46% (p=0.003 n=32+33) RegexpMatchEasy0_1K-4 4.07GB/s ± 4% 4.02GB/s ± 3% -1.28% (p=0.014 n=35+35) RegexpMatchEasy1_32-4 460MB/s ± 3% 456MB/s ± 4% -0.82% (p=0.004 n=35+35) RegexpMatchEasy1_1K-4 2.79GB/s ± 4% 2.72GB/s ± 4% -2.39% (p=0.000 n=35+35) RegexpMatchMedium_32-4 9.23MB/s ± 4% 9.53MB/s ± 4% +3.16% (p=0.000 n=35+35) RegexpMatchMedium_1K-4 30.3MB/s ± 3% 31.3MB/s ± 3% +3.38% (p=0.000 n=35+35) RegexpMatchHard_32-4 20.7MB/s ± 3% 21.0MB/s ± 3% +1.67% (p=0.000 n=35+35) RegexpMatchHard_1K-4 22.0MB/s ± 3% 21.9MB/s ± 4% ~ (p=0.277 n=35+33) Revcomp-4 612MB/s ± 7% 618MB/s ± 6% +0.96% (p=0.034 n=33+35) Template-4 30.2MB/s ± 3% 31.1MB/s ± 6% +3.05% (p=0.000 n=35+35) [Geo mean] 123MB/s 124MB/s +0.64% Change-Id: Ia025da272e07d0069413824bfff3471b106d6280 Reviewed-on: https://go-review.googlesource.com/121535 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-08-24 23:38:25 +00:00
Ben Shi	705f3c74e6	cmd/compile: optimize AMD64 with DIVSSload and DIVSDload DIVSSload & DIVSDload directly operate on a memory operand. And binary size can be reduced by them, while the performance is not affected. The total size of pkg/linux_amd64 (excluding cmd/compile) decreases about 6KB. There is little regression in the go1 benchmark test (excluding noise). name old time/op new time/op delta BinaryTree17-4 2.63s ± 4% 2.62s ± 4% ~ (p=0.809 n=30+30) Fannkuch11-4 2.40s ± 2% 2.40s ± 2% ~ (p=0.109 n=30+30) FmtFprintfEmpty-4 43.1ns ± 4% 43.2ns ± 9% ~ (p=0.168 n=30+30) FmtFprintfString-4 73.6ns ± 4% 74.1ns ± 4% ~ (p=0.069 n=30+30) FmtFprintfInt-4 81.0ns ± 3% 81.4ns ± 5% ~ (p=0.350 n=30+30) FmtFprintfIntInt-4 127ns ± 4% 129ns ± 4% +0.99% (p=0.021 n=30+30) FmtFprintfPrefixedInt-4 156ns ± 4% 155ns ± 4% ~ (p=0.415 n=30+30) FmtFprintfFloat-4 219ns ± 4% 218ns ± 4% ~ (p=0.071 n=30+30) FmtManyArgs-4 522ns ± 3% 518ns ± 3% -0.68% (p=0.034 n=30+30) GobDecode-4 6.49ms ± 6% 6.52ms ± 6% ~ (p=0.832 n=30+30) GobEncode-4 6.10ms ± 9% 6.14ms ± 7% ~ (p=0.485 n=30+30) Gzip-4 227ms ± 1% 224ms ± 4% ~ (p=0.484 n=24+30) Gunzip-4 37.2ms ± 3% 36.8ms ± 4% ~ (p=0.889 n=30+30) HTTPClientServer-4 58.9µs ± 1% 58.7µs ± 2% -0.42% (p=0.003 n=28+28) JSONEncode-4 12.0ms ± 3% 12.0ms ± 4% ~ (p=0.523 n=30+30) JSONDecode-4 54.6ms ± 4% 54.5ms ± 4% ~ (p=0.708 n=30+30) Mandelbrot200-4 3.78ms ± 4% 3.81ms ± 3% +0.99% (p=0.016 n=30+30) GoParse-4 3.20ms ± 4% 3.20ms ± 5% ~ (p=0.994 n=30+30) RegexpMatchEasy0_32-4 77.0ns ± 4% 75.9ns ± 3% -1.39% (p=0.006 n=29+30) RegexpMatchEasy0_1K-4 255ns ± 4% 253ns ± 4% ~ (p=0.091 n=30+30) RegexpMatchEasy1_32-4 69.7ns ± 3% 70.3ns ± 4% ~ (p=0.120 n=30+30) RegexpMatchEasy1_1K-4 373ns ± 2% 378ns ± 3% +1.43% (p=0.000 n=21+26) RegexpMatchMedium_32-4 107ns ± 2% 108ns ± 4% +1.50% (p=0.012 n=22+30) RegexpMatchMedium_1K-4 34.0µs ± 1% 34.3µs ± 3% +1.08% (p=0.008 n=24+30) RegexpMatchHard_32-4 1.53µs ± 3% 1.54µs ± 3% ~ (p=0.234 n=30+30) RegexpMatchHard_1K-4 46.7µs ± 4% 47.0µs ± 4% ~ (p=0.420 n=30+30) Revcomp-4 411ms ± 7% 415ms ± 6% ~ (p=0.059 n=30+30) Template-4 65.5ms ± 5% 66.9ms ± 4% +2.21% (p=0.001 n=30+30) TimeParse-4 317ns ± 3% 311ns ± 3% -1.97% (p=0.000 n=30+30) TimeFormat-4 293ns ± 3% 294ns ± 3% ~ (p=0.243 n=30+30) [Geo mean] 47.4µs 47.5µs +0.17% name old speed new speed delta GobDecode-4 118MB/s ± 5% 118MB/s ± 6% ~ (p=0.832 n=30+30) GobEncode-4 125MB/s ± 7% 125MB/s ± 7% ~ (p=0.625 n=29+30) Gzip-4 85.3MB/s ± 1% 86.6MB/s ± 4% ~ (p=0.486 n=24+30) Gunzip-4 522MB/s ± 3% 527MB/s ± 4% ~ (p=0.889 n=30+30) JSONEncode-4 162MB/s ± 3% 162MB/s ± 4% ~ (p=0.520 n=30+30) JSONDecode-4 35.5MB/s ± 4% 35.6MB/s ± 4% ~ (p=0.701 n=30+30) GoParse-4 18.1MB/s ± 4% 18.1MB/s ± 4% ~ (p=0.891 n=29+30) RegexpMatchEasy0_32-4 416MB/s ± 4% 422MB/s ± 3% +1.43% (p=0.005 n=29+30) RegexpMatchEasy0_1K-4 4.01GB/s ± 4% 4.04GB/s ± 4% ~ (p=0.091 n=30+30) RegexpMatchEasy1_32-4 460MB/s ± 3% 456MB/s ± 5% ~ (p=0.123 n=30+30) RegexpMatchEasy1_1K-4 2.74GB/s ± 2% 2.70GB/s ± 3% -1.33% (p=0.000 n=22+26) RegexpMatchMedium_32-4 9.39MB/s ± 3% 9.19MB/s ± 4% -2.06% (p=0.001 n=28+30) RegexpMatchMedium_1K-4 30.1MB/s ± 1% 29.8MB/s ± 3% -1.04% (p=0.008 n=24+30) RegexpMatchHard_32-4 20.9MB/s ± 3% 20.8MB/s ± 3% ~ (p=0.234 n=30+30) RegexpMatchHard_1K-4 21.9MB/s ± 4% 21.8MB/s ± 4% ~ (p=0.420 n=30+30) Revcomp-4 619MB/s ± 7% 612MB/s ± 7% ~ (p=0.059 n=30+30) Template-4 29.6MB/s ± 4% 29.0MB/s ± 4% -2.16% (p=0.002 n=30+30) [Geo mean] 123MB/s 123MB/s -0.33% Change-Id: Ia59e077feae4f2824df79059daea4d0f678e3e4c Reviewed-on: https://go-review.googlesource.com/120275 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2018-08-22 02:47:49 +00:00
Ben Shi	b75c5c5992	cmd/compile: optimize AMD64 with more read-modify-write operations 6 more operations which do read-modify-write with a constant source operand are added. 1. The total size of pkg/linux_amd64 decreases about 3KB, excluding cmd/compile. 2. The go1 benckmark shows a slight improvement. name old time/op new time/op delta BinaryTree17-4 2.61s ± 4% 2.67s ± 2% +2.26% (p=0.000 n=30+29) Fannkuch11-4 2.39s ± 2% 2.32s ± 2% -2.67% (p=0.000 n=30+30) FmtFprintfEmpty-4 44.0ns ± 4% 41.7ns ± 4% -5.15% (p=0.000 n=30+30) FmtFprintfString-4 74.2ns ± 4% 72.3ns ± 4% -2.59% (p=0.000 n=30+30) FmtFprintfInt-4 81.7ns ± 3% 78.8ns ± 4% -3.54% (p=0.000 n=27+30) FmtFprintfIntInt-4 130ns ± 4% 124ns ± 5% -4.60% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 154ns ± 3% 152ns ± 3% -1.13% (p=0.012 n=30+30) FmtFprintfFloat-4 215ns ± 4% 212ns ± 5% -1.56% (p=0.002 n=30+30) FmtManyArgs-4 522ns ± 3% 512ns ± 3% -1.84% (p=0.001 n=30+30) GobDecode-4 6.42ms ± 5% 6.49ms ± 7% ~ (p=0.070 n=30+30) GobEncode-4 6.07ms ± 8% 5.98ms ± 8% ~ (p=0.150 n=30+30) Gzip-4 236ms ± 4% 223ms ± 4% -5.57% (p=0.000 n=30+30) Gunzip-4 37.4ms ± 3% 36.7ms ± 4% -2.03% (p=0.000 n=30+30) HTTPClientServer-4 58.7µs ± 1% 58.5µs ± 2% -0.37% (p=0.018 n=30+29) JSONEncode-4 12.0ms ± 4% 12.1ms ± 3% ~ (p=0.112 n=30+30) JSONDecode-4 54.5ms ± 3% 55.5ms ± 4% +1.80% (p=0.006 n=30+30) Mandelbrot200-4 3.78ms ± 4% 3.78ms ± 4% ~ (p=0.173 n=30+30) GoParse-4 3.16ms ± 5% 3.22ms ± 5% +1.75% (p=0.010 n=30+30) RegexpMatchEasy0_32-4 76.6ns ± 1% 75.9ns ± 3% ~ (p=0.672 n=25+30) RegexpMatchEasy0_1K-4 252ns ± 3% 253ns ± 3% +0.57% (p=0.027 n=30+30) RegexpMatchEasy1_32-4 69.8ns ± 4% 70.2ns ± 6% ~ (p=0.539 n=30+30) RegexpMatchEasy1_1K-4 374ns ± 3% 373ns ± 5% ~ (p=0.263 n=30+30) RegexpMatchMedium_32-4 107ns ± 4% 109ns ± 3% ~ (p=0.067 n=30+30) RegexpMatchMedium_1K-4 33.9µs ± 5% 34.1µs ± 4% ~ (p=0.297 n=30+30) RegexpMatchHard_32-4 1.54µs ± 3% 1.56µs ± 4% +1.43% (p=0.002 n=30+30) RegexpMatchHard_1K-4 46.6µs ± 3% 47.0µs ± 3% ~ (p=0.055 n=30+30) Revcomp-4 411ms ± 6% 407ms ± 6% ~ (p=0.219 n=30+30) Template-4 66.8ms ± 3% 64.8ms ± 5% -3.01% (p=0.000 n=30+30) TimeParse-4 312ns ± 2% 319ns ± 3% +2.50% (p=0.000 n=30+30) TimeFormat-4 296ns ± 5% 299ns ± 3% +0.93% (p=0.005 n=30+30) [Geo mean] 47.5µs 47.1µs -0.75% name old speed new speed delta GobDecode-4 120MB/s ± 5% 118MB/s ± 6% ~ (p=0.072 n=30+30) GobEncode-4 127MB/s ± 8% 129MB/s ± 8% ~ (p=0.150 n=30+30) Gzip-4 82.1MB/s ± 4% 87.0MB/s ± 4% +5.90% (p=0.000 n=30+30) Gunzip-4 519MB/s ± 4% 529MB/s ± 4% +2.07% (p=0.001 n=30+30) JSONEncode-4 162MB/s ± 4% 161MB/s ± 3% ~ (p=0.110 n=30+30) JSONDecode-4 35.6MB/s ± 3% 35.0MB/s ± 4% -1.77% (p=0.007 n=30+30) GoParse-4 18.3MB/s ± 4% 18.0MB/s ± 4% -1.72% (p=0.009 n=30+30) RegexpMatchEasy0_32-4 418MB/s ± 1% 422MB/s ± 3% ~ (p=0.645 n=25+30) RegexpMatchEasy0_1K-4 4.06GB/s ± 3% 4.04GB/s ± 3% -0.57% (p=0.033 n=30+30) RegexpMatchEasy1_32-4 459MB/s ± 4% 456MB/s ± 6% ~ (p=0.530 n=30+30) RegexpMatchEasy1_1K-4 2.73GB/s ± 3% 2.75GB/s ± 5% ~ (p=0.279 n=30+30) RegexpMatchMedium_32-4 9.28MB/s ± 5% 9.18MB/s ± 4% ~ (p=0.086 n=30+30) RegexpMatchMedium_1K-4 30.2MB/s ± 4% 30.0MB/s ± 4% ~ (p=0.300 n=30+30) RegexpMatchHard_32-4 20.8MB/s ± 3% 20.5MB/s ± 4% -1.41% (p=0.002 n=30+30) RegexpMatchHard_1K-4 22.0MB/s ± 3% 21.8MB/s ± 3% ~ (p=0.051 n=30+30) Revcomp-4 619MB/s ± 7% 625MB/s ± 7% ~ (p=0.219 n=30+30) Template-4 29.0MB/s ± 3% 29.9MB/s ± 4% +3.11% (p=0.000 n=30+30) [Geo mean] 123MB/s 123MB/s +0.28% Change-Id: I850652cfd53329c1af804b7f57f4393d8097bb0d Reviewed-on: https://go-review.googlesource.com/121135 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2018-08-20 14:18:39 +00:00
Martin Möhrmann	b40db51438	cmd/compile: split slow 3 operand LEA instructions into two LEAs go tool objdump ../bin/go \| grep "\.go\:" \| grep -c "LEA.0x.[(].[(]." Before: 1012 After: 20 Updates #21735 Benchmarks thanks to drchase@google.com Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz benchstat -geomean *.stdout \| grep -v pkg: name old time/op new time/op delta FastTest2KB-12 131.0ns ± 0% 131.0ns ± 0% ~ (all equal) BaseTest2KB-12 601.3ns ± 0% 601.3ns ± 0% ~ (p=0.942 n=45+41) Encoding4KBVerySparse-12 15.38µs ± 1% 15.41µs ± 0% +0.24% (p=0.000 n=44+43) Join_8-12 2.117s ± 2% 2.128s ± 2% +0.51% (p=0.007 n=48+48) HashimotoLight-12 1.663ms ± 1% 1.668ms ± 1% +0.35% (p=0.006 n=49+49) Sha3_224_MTU-12 4.843µs ± 0% 4.836µs ± 0% -0.14% (p=0.000 n=45+42) GenSharedKeyP256-12 73.74µs ± 0% 70.92µs ± 0% -3.82% (p=0.000 n=48+47) Run/10k/1-12 24.81s ± 0% 24.88s ± 0% +0.30% (p=0.000 n=48+47) Run/10k/16-12 4.621s ± 2% 4.625s ± 3% ~ (p=0.776 n=50+50) Dnrm2MediumPosInc-12 4.018µs ± 0% 4.019µs ± 0% ~ (p=0.060 n=50+48) DasumMediumUnitaryInc-12 855.3ns ± 0% 855.0ns ± 0% -0.03% (p=0.000 n=47+42) Dgeev/Circulant10-12 40.45µs ± 1% 41.11µs ± 1% +1.62% (p=0.000 n=45+49) Dgeev/Circulant100-12 10.42ms ± 2% 10.61ms ± 1% +1.83% (p=0.000 n=49+48) MulWorkspaceDense1000Hundredth-12 64.69ms ± 1% 64.63ms ± 1% ~ (p=0.718 n=48+48) ScaleVec10000Inc20-12 22.31µs ± 1% 22.29µs ± 1% ~ (p=0.424 n=50+50) ValidateVersionTildeFail-12 737.6ns ± 1% 736.0ns ± 1% -0.22% (p=0.000 n=49+49) StripHTML-12 2.846µs ± 0% 2.806µs ± 1% -1.40% (p=0.000 n=43+50) ReaderContains-12 6.073µs ± 0% 5.999µs ± 0% -1.22% (p=0.000 n=48+48) EncodeCodecFromInternalProtobuf-12 5.817µs ± 2% 5.555µs ± 2% -4.51% (p=0.000 n=47+47) TarjanSCCGnp_10_tenth-12 7.091µs ± 5% 7.132µs ± 7% ~ (p=0.361 n=50+50) TarjanSCCGnp_1000_half-12 82.25ms ± 3% 81.29ms ± 2% -1.16% (p=0.000 n=50+43) AStarUndirectedmallWorld_10_2_2_2_Heur-12 15.18µs ± 8% 15.11µs ± 7% ~ (p=0.511 n=50+49) LouvainDirectedMultiplex-12 20.92ms ± 1% 21.00ms ± 1% +0.36% (p=0.000 n=48+49) WalkAllBreadthFirstGnp_10_tenth-12 2.974µs ± 4% 2.964µs ± 5% ~ (p=0.504 n=50+50) WalkAllBreadthFirstGnp_1000_tenth-12 9.733ms ± 4% 9.741ms ± 4% ~ (p=0.774 n=48+50) TextMovementBetweenSegments-12 432.8µs ± 0% 433.2µs ± 1% ~ (p=0.128 n=50+50) Growth_MultiSegment-12 13.11ms ± 0% 13.19ms ± 1% +0.58% (p=0.000 n=44+46) AddingFields/Zap.Sugar-12 1.296µs ± 1% 1.310µs ± 2% +1.09% (p=0.000 n=43+43) AddingFields/apex/log-12 34.19µs ± 1% 34.31µs ± 1% +0.35% (p=0.000 n=45+45) AddingFields/inconshreveable/log15-12 30.08µs ± 2% 30.07µs ± 2% ~ (p=0.803 n=48+47) AddingFields/sirupsen/logrus-12 6.683µs ± 3% 6.735µs ± 3% +0.78% (p=0.000 n=43+42) [Geo mean] 143.5µs 143.3µs -0.16% Change-Id: I637203c75c837737f1febced75d5985703e51044 Reviewed-on: https://go-review.googlesource.com/114655 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2018-08-20 14:17:20 +00:00
Keith Randall	f32348669c	cmd/compile: rename memory-using operations Some *mem ops are loads, some are stores, some are modifications. Replace mem->load for the loads. Replace mem->store for the stores. Replace mem->modify for the load-modify-stores. The only semantic change in this CL is to mark ADD(Q\|L)constmodify (which used to be ADD(Q\|L)constmem) as both a read and a write, instead of just a write. This is arguably a bug fix, but the bug isn't triggerable at the moment, see CL 112157. Change-Id: Iccb45aea817b606adb2d712ff99b10ee28e4616a Reviewed-on: https://go-review.googlesource.com/112159 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-05-08 19:16:50 +00:00
Josh Bleecher Snyder	8bf4b7e673	cmd/compile: convert amd64 BSFL and BSRL from tuple to result only Change-Id: I220a459f67ecb310b6e9a526a1ff55527d421e70 Reviewed-on: https://go-review.googlesource.com/109416 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-26 19:29:17 +00:00
Josh Bleecher Snyder	22115859a5	cmd/compile: add amd64 LEAL{1,2,4,8} ops For future use in rewrite rules. Change-Id: Ic9875beb0dea6e0bbcbd4b75d99a53f4a9a7c3fd Reviewed-on: https://go-review.googlesource.com/101275 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-23 21:42:28 +00:00
Austin Clements	8871c930be	cmd/compile: don't lower OpConvert Currently, each architecture lowers OpConvert to an arch-specific OpXXXconvert. This is silly because OpConvert means the same thing on all architectures and is logically a no-op that exists only to keep track of conversions to and from unsafe.Pointer. Furthermore, lowering it makes it harder to recognize in other analyses, particularly liveness analysis. This CL eliminates the lowering of OpConvert, leaving it as the generic op until code generation time. The main complexity here is that we still need to register-allocate OpConvert operations. Currently, each arch's lowered OpConvert specifies all GP registers in its register mask. Ideally, OpConvert wouldn't affect value homing at all, and we could just copy the home of OpConvert's source, but this can potentially home an OpConvert in a LocalSlot, which neither regalloc nor stackalloc expect. Rather than try to disentangle this assumption from regalloc and stackalloc, we continue to register-allocate OpConvert, but teach regalloc that OpConvert can be allocated to any allocatable GP register. For #24543. Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6 Reviewed-on: https://go-review.googlesource.com/108496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-04-20 18:46:39 +00:00
David Chase	ab48574bab	cmd/compile: use Block.Likely to put likely branch first When a neither of a conditional block's successors follows, the block must end with a conditional branch followed by a an unconditional branch. If the (conditional) branch is "unlikely", invert it and swap successors to make it likely instead. This doesn't matter to most benchmarks on amd64, but in one instance on amd64 it caused a 30% improvement, and it is otherwise harmless. The problematic loop is for i := 0; i < w; i++ { if pw[i] != 0 { return true } } compiled under GOEXPERIMENT=preemptibleloops This the very worst-case benchmark for that experiment. Also in this CL is a commoning up of heavily-repeated boilerplate, which made it much easier to see that the changes were applied correctly. In the future this should allow un-exporting of SSAGenState.Branches once the boilerplate-replacement is done everywhere. Change-Id: I0e5ded6eeb3ab1e3e0138e12d54c7e056bd99335 Reviewed-on: https://go-review.googlesource.com/104977 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-04-11 17:35:34 +00:00
Giovanni Bajo	79112707bb	cmd/compile: add patterns for bit set/clear/complement on amd64 This patch completes implementation of BT(Q\|L), and adds support for BT(S\|R\|C)(Q\|L). Example of code changes from time.(*Time).addSec: if t.wall&hasMonotonic != 0 { 0x1073465 488b08 MOVQ 0(AX), CX 0x1073468 4889ca MOVQ CX, DX 0x107346b 48c1e93f SHRQ $0x3f, CX 0x107346f 48c1e13f SHLQ $0x3f, CX 0x1073473 48f7c1ffffffff TESTQ $-0x1, CX 0x107347a 746b JE 0x10734e7 if t.wall&hasMonotonic != 0 { 0x1073435 488b08 MOVQ 0(AX), CX 0x1073438 480fbae13f BTQ $0x3f, CX 0x107343d 7363 JAE 0x10734a2 Another example: t.wall = t.wall&nsecMask \| uint64(dsec)<<nsecShift \| hasMonotonic 0x10734c8 4881e1ffffff3f ANDQ $0x3fffffff, CX 0x10734cf 48c1e61e SHLQ $0x1e, SI 0x10734d3 4809ce ORQ CX, SI 0x10734d6 48b90000000000000080 MOVQ $0x8000000000000000, CX 0x10734e0 4809f1 ORQ SI, CX 0x10734e3 488908 MOVQ CX, 0(AX) t.wall = t.wall&nsecMask \| uint64(dsec)<<nsecShift \| hasMonotonic 0x107348b 4881e2ffffff3f ANDQ $0x3fffffff, DX 0x1073492 48c1e61e SHLQ $0x1e, SI 0x1073496 4809f2 ORQ SI, DX 0x1073499 480fbaea3f BTSQ $0x3f, DX 0x107349e 488910 MOVQ DX, 0(AX) Go1 benchmarks seem unaffected, and I would be surprised otherwise: name old time/op new time/op delta BinaryTree17-4 2.64s ± 4% 2.56s ± 9% -2.92% (p=0.008 n=9+9) Fannkuch11-4 2.90s ± 1% 2.95s ± 3% +1.76% (p=0.010 n=10+9) FmtFprintfEmpty-4 35.3ns ± 1% 34.5ns ± 2% -2.34% (p=0.004 n=9+8) FmtFprintfString-4 57.0ns ± 1% 58.4ns ± 5% +2.52% (p=0.029 n=9+10) FmtFprintfInt-4 59.8ns ± 3% 59.8ns ± 6% ~ (p=0.565 n=10+10) FmtFprintfIntInt-4 93.9ns ± 3% 91.2ns ± 5% -2.94% (p=0.014 n=10+9) FmtFprintfPrefixedInt-4 107ns ± 6% 104ns ± 6% ~ (p=0.099 n=10+10) FmtFprintfFloat-4 187ns ± 3% 188ns ± 3% ~ (p=0.505 n=10+9) FmtManyArgs-4 410ns ± 1% 415ns ± 6% ~ (p=0.649 n=8+10) GobDecode-4 5.30ms ± 3% 5.27ms ± 3% ~ (p=0.436 n=10+10) GobEncode-4 4.62ms ± 5% 4.47ms ± 2% -3.24% (p=0.001 n=9+10) Gzip-4 197ms ± 4% 193ms ± 3% ~ (p=0.123 n=10+10) Gunzip-4 30.4ms ± 3% 30.1ms ± 3% ~ (p=0.481 n=10+10) HTTPClientServer-4 76.3µs ± 1% 76.0µs ± 1% ~ (p=0.236 n=8+9) JSONEncode-4 10.5ms ± 9% 10.3ms ± 3% ~ (p=0.280 n=10+10) JSONDecode-4 42.3ms ±10% 41.3ms ± 2% ~ (p=0.053 n=9+10) Mandelbrot200-4 3.80ms ± 2% 3.72ms ± 2% -2.15% (p=0.001 n=9+10) GoParse-4 2.88ms ±10% 2.81ms ± 2% ~ (p=0.247 n=10+10) RegexpMatchEasy0_32-4 69.5ns ± 4% 68.6ns ± 2% ~ (p=0.171 n=10+10) RegexpMatchEasy0_1K-4 165ns ± 3% 162ns ± 3% ~ (p=0.137 n=10+10) RegexpMatchEasy1_32-4 65.7ns ± 6% 64.4ns ± 2% -2.02% (p=0.037 n=10+10) RegexpMatchEasy1_1K-4 278ns ± 2% 279ns ± 3% ~ (p=0.991 n=8+9) RegexpMatchMedium_32-4 99.3ns ± 3% 98.5ns ± 4% ~ (p=0.457 n=10+9) RegexpMatchMedium_1K-4 30.1µs ± 1% 30.4µs ± 2% ~ (p=0.173 n=8+10) RegexpMatchHard_32-4 1.40µs ± 2% 1.41µs ± 4% ~ (p=0.565 n=10+10) RegexpMatchHard_1K-4 42.5µs ± 1% 41.5µs ± 3% -2.13% (p=0.002 n=8+9) Revcomp-4 332ms ± 4% 328ms ± 5% ~ (p=0.720 n=9+10) Template-4 48.3ms ± 2% 49.6ms ± 3% +2.56% (p=0.002 n=8+10) TimeParse-4 252ns ± 2% 249ns ± 3% ~ (p=0.116 n=9+10) TimeFormat-4 262ns ± 4% 252ns ± 3% -4.01% (p=0.000 n=9+10) name old speed new speed delta GobDecode-4 145MB/s ± 3% 146MB/s ± 3% ~ (p=0.436 n=10+10) GobEncode-4 166MB/s ± 5% 172MB/s ± 2% +3.28% (p=0.001 n=9+10) Gzip-4 98.6MB/s ± 4% 100.4MB/s ± 3% ~ (p=0.123 n=10+10) Gunzip-4 639MB/s ± 3% 645MB/s ± 3% ~ (p=0.481 n=10+10) JSONEncode-4 185MB/s ± 8% 189MB/s ± 3% ~ (p=0.280 n=10+10) JSONDecode-4 46.0MB/s ± 9% 47.0MB/s ± 2% +2.21% (p=0.046 n=9+10) GoParse-4 20.1MB/s ± 9% 20.6MB/s ± 2% ~ (p=0.239 n=10+10) RegexpMatchEasy0_32-4 460MB/s ± 4% 467MB/s ± 2% ~ (p=0.165 n=10+10) RegexpMatchEasy0_1K-4 6.19GB/s ± 3% 6.28GB/s ± 3% ~ (p=0.165 n=10+10) RegexpMatchEasy1_32-4 487MB/s ± 5% 497MB/s ± 2% +2.00% (p=0.043 n=10+10) RegexpMatchEasy1_1K-4 3.67GB/s ± 2% 3.67GB/s ± 3% ~ (p=0.963 n=8+9) RegexpMatchMedium_32-4 10.1MB/s ± 3% 10.1MB/s ± 4% ~ (p=0.435 n=10+9) RegexpMatchMedium_1K-4 34.0MB/s ± 1% 33.7MB/s ± 2% ~ (p=0.173 n=8+10) RegexpMatchHard_32-4 22.9MB/s ± 2% 22.7MB/s ± 4% ~ (p=0.565 n=10+10) RegexpMatchHard_1K-4 24.0MB/s ± 3% 24.7MB/s ± 3% +2.64% (p=0.001 n=9+9) Revcomp-4 766MB/s ± 4% 775MB/s ± 5% ~ (p=0.720 n=9+10) Template-4 40.2MB/s ± 2% 39.2MB/s ± 3% -2.47% (p=0.002 n=8+10) The rules match ~1800 times during all.bash. Fixes #18943 Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4 Reviewed-on: https://go-review.googlesource.com/94766 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-24 02:38:50 +00:00
Giovanni Bajo	a35ec9a59e	cmd/compile: implement CMOV on amd64 This builds upon the branchelim pass, activating it for amd64 and lowering CondSelect. Special care is made to FPU instructions for NaN handling. Benchmark results on Xeon E5630 (Westmere EP): name old time/op new time/op delta BinaryTree17-16 4.99s ± 9% 4.66s ± 2% ~ (p=0.095 n=5+5) Fannkuch11-16 4.93s ± 3% 5.04s ± 2% ~ (p=0.548 n=5+5) FmtFprintfEmpty-16 58.8ns ± 7% 61.4ns ±14% ~ (p=0.579 n=5+5) FmtFprintfString-16 114ns ± 2% 114ns ± 4% ~ (p=0.603 n=5+5) FmtFprintfInt-16 181ns ± 4% 125ns ± 3% -30.90% (p=0.008 n=5+5) FmtFprintfIntInt-16 263ns ± 2% 217ns ± 2% -17.34% (p=0.008 n=5+5) FmtFprintfPrefixedInt-16 230ns ± 1% 212ns ± 1% -7.99% (p=0.008 n=5+5) FmtFprintfFloat-16 411ns ± 3% 344ns ± 5% -16.43% (p=0.008 n=5+5) FmtManyArgs-16 828ns ± 4% 790ns ± 2% -4.59% (p=0.032 n=5+5) GobDecode-16 10.9ms ± 4% 10.8ms ± 5% ~ (p=0.548 n=5+5) GobEncode-16 9.52ms ± 5% 9.46ms ± 2% ~ (p=1.000 n=5+5) Gzip-16 334ms ± 2% 337ms ± 2% ~ (p=0.548 n=5+5) Gunzip-16 64.4ms ± 1% 65.0ms ± 1% +1.00% (p=0.008 n=5+5) HTTPClientServer-16 156µs ± 3% 155µs ± 3% ~ (p=0.690 n=5+5) JSONEncode-16 21.0ms ± 1% 21.8ms ± 0% +3.76% (p=0.016 n=5+4) JSONDecode-16 95.1ms ± 0% 95.7ms ± 1% ~ (p=0.151 n=5+5) Mandelbrot200-16 6.38ms ± 1% 6.42ms ± 1% ~ (p=0.095 n=5+5) GoParse-16 5.47ms ± 2% 5.36ms ± 1% -1.95% (p=0.016 n=5+5) RegexpMatchEasy0_32-16 111ns ± 1% 111ns ± 1% ~ (p=0.635 n=5+4) RegexpMatchEasy0_1K-16 408ns ± 1% 411ns ± 2% ~ (p=0.087 n=5+5) RegexpMatchEasy1_32-16 103ns ± 1% 104ns ± 1% ~ (p=0.484 n=5+5) RegexpMatchEasy1_1K-16 659ns ± 2% 652ns ± 1% ~ (p=0.571 n=5+5) RegexpMatchMedium_32-16 176ns ± 2% 174ns ± 1% ~ (p=0.476 n=5+5) RegexpMatchMedium_1K-16 58.6µs ± 4% 57.7µs ± 4% ~ (p=0.548 n=5+5) RegexpMatchHard_32-16 3.07µs ± 3% 3.04µs ± 4% ~ (p=0.421 n=5+5) RegexpMatchHard_1K-16 89.2µs ± 1% 87.9µs ± 2% -1.52% (p=0.032 n=5+5) Revcomp-16 575ms ± 0% 587ms ± 2% +2.12% (p=0.032 n=4+5) Template-16 110ms ± 1% 107ms ± 3% -3.00% (p=0.032 n=5+5) TimeParse-16 463ns ± 0% 462ns ± 0% ~ (p=0.810 n=5+4) TimeFormat-16 538ns ± 0% 535ns ± 0% -0.63% (p=0.024 n=5+5) name old speed new speed delta GobDecode-16 70.7MB/s ± 4% 71.4MB/s ± 5% ~ (p=0.452 n=5+5) GobEncode-16 80.7MB/s ± 5% 81.2MB/s ± 2% ~ (p=1.000 n=5+5) Gzip-16 58.2MB/s ± 2% 57.7MB/s ± 2% ~ (p=0.452 n=5+5) Gunzip-16 302MB/s ± 1% 299MB/s ± 1% -0.99% (p=0.008 n=5+5) JSONEncode-16 92.4MB/s ± 1% 89.1MB/s ± 0% -3.63% (p=0.016 n=5+4) JSONDecode-16 20.4MB/s ± 0% 20.3MB/s ± 1% ~ (p=0.135 n=5+5) GoParse-16 10.6MB/s ± 2% 10.8MB/s ± 1% +2.00% (p=0.016 n=5+5) RegexpMatchEasy0_32-16 286MB/s ± 1% 285MB/s ± 3% ~ (p=1.000 n=5+5) RegexpMatchEasy0_1K-16 2.51GB/s ± 1% 2.49GB/s ± 2% ~ (p=0.095 n=5+5) RegexpMatchEasy1_32-16 309MB/s ± 1% 307MB/s ± 1% ~ (p=0.548 n=5+5) RegexpMatchEasy1_1K-16 1.55GB/s ± 2% 1.57GB/s ± 1% ~ (p=0.690 n=5+5) RegexpMatchMedium_32-16 5.68MB/s ± 2% 5.73MB/s ± 1% ~ (p=0.579 n=5+5) RegexpMatchMedium_1K-16 17.5MB/s ± 4% 17.8MB/s ± 4% ~ (p=0.500 n=5+5) RegexpMatchHard_32-16 10.4MB/s ± 3% 10.5MB/s ± 4% ~ (p=0.460 n=5+5) RegexpMatchHard_1K-16 11.5MB/s ± 1% 11.7MB/s ± 2% +1.57% (p=0.032 n=5+5) Revcomp-16 442MB/s ± 0% 433MB/s ± 2% -2.05% (p=0.032 n=4+5) Template-16 17.7MB/s ± 1% 18.2MB/s ± 3% +3.12% (p=0.032 n=5+5) Change-Id: I6972e8f35f2b31f9a42ac473a6bf419a18022558 Reviewed-on: https://go-review.googlesource.com/100935 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-15 16:41:59 +00:00
Ilya Tocar	644d14ea0f	Revert "cmd/compile: implement CMOV on amd64" This reverts commit 080187f4f72bd6594e3c2efc35cf51bf61378552. It broke build of golang.org/x/exp/shiny/iconvg See issue 24395 for details Change-Id: Ifd6134f6214e6cee40bd3c63c32941d5fc96ae8b Reviewed-on: https://go-review.googlesource.com/100755 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-14 21:21:23 +00:00
Giovanni Bajo	eb44b8635c	cmd/compile: remove BTQconst rule This rule is meant for code optimization, but it makes other rules potentially more complex, as they need to cope with the fact that a 32-bit op (BTLconst) can appear everywhere a 64-bit rule maches. Move the optimization to opcode expansion instead. Tests will be added in following CL. Change-Id: Ica5ef291e7963c4af17c124d4a2869e6c8f7b0c7 Reviewed-on: https://go-review.googlesource.com/99995 Reviewed-by: Keith Randall <khr@golang.org>	2018-03-13 23:21:38 +00:00
isharipo	85a8d25d53	cmd/compile/internal/ssa: emit IMUL3{L/Q} for MUL{L/Q}const on x86 cmd/asm now supports three-operand form of IMUL, so instead of using IMUL with resultInArg0, emit IMUL3 instruction. This results in less redundant MOVs where SSA assigns different registers to input[0] and dst arguments. Note: these have exactly the same encoding when reg0=reg1: IMUL3x $const, reg0, reg1 IMULx $const, reg Two-operand IMULx is like a crippled IMUL3x, with dst fixed to input[0]. This is why we don't bother to generate IMULx for the case where dst is the same as input[0]. Change-Id: I4becda475b3dffdd07b6fdf1c75bacc82af654e4 Reviewed-on: https://go-review.googlesource.com/99656 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-12 19:02:36 +00:00
Giovanni Bajo	080187f4f7	cmd/compile: implement CMOV on amd64 This builds upon the branchelim pass, activating it for amd64 and lowering CondSelect. Special care is made to FPU instructions for NaN handling. Benchmark results on Xeon E5630 (Westmere EP): name old time/op new time/op delta BinaryTree17-16 4.99s ± 9% 4.66s ± 2% ~ (p=0.095 n=5+5) Fannkuch11-16 4.93s ± 3% 5.04s ± 2% ~ (p=0.548 n=5+5) FmtFprintfEmpty-16 58.8ns ± 7% 61.4ns ±14% ~ (p=0.579 n=5+5) FmtFprintfString-16 114ns ± 2% 114ns ± 4% ~ (p=0.603 n=5+5) FmtFprintfInt-16 181ns ± 4% 125ns ± 3% -30.90% (p=0.008 n=5+5) FmtFprintfIntInt-16 263ns ± 2% 217ns ± 2% -17.34% (p=0.008 n=5+5) FmtFprintfPrefixedInt-16 230ns ± 1% 212ns ± 1% -7.99% (p=0.008 n=5+5) FmtFprintfFloat-16 411ns ± 3% 344ns ± 5% -16.43% (p=0.008 n=5+5) FmtManyArgs-16 828ns ± 4% 790ns ± 2% -4.59% (p=0.032 n=5+5) GobDecode-16 10.9ms ± 4% 10.8ms ± 5% ~ (p=0.548 n=5+5) GobEncode-16 9.52ms ± 5% 9.46ms ± 2% ~ (p=1.000 n=5+5) Gzip-16 334ms ± 2% 337ms ± 2% ~ (p=0.548 n=5+5) Gunzip-16 64.4ms ± 1% 65.0ms ± 1% +1.00% (p=0.008 n=5+5) HTTPClientServer-16 156µs ± 3% 155µs ± 3% ~ (p=0.690 n=5+5) JSONEncode-16 21.0ms ± 1% 21.8ms ± 0% +3.76% (p=0.016 n=5+4) JSONDecode-16 95.1ms ± 0% 95.7ms ± 1% ~ (p=0.151 n=5+5) Mandelbrot200-16 6.38ms ± 1% 6.42ms ± 1% ~ (p=0.095 n=5+5) GoParse-16 5.47ms ± 2% 5.36ms ± 1% -1.95% (p=0.016 n=5+5) RegexpMatchEasy0_32-16 111ns ± 1% 111ns ± 1% ~ (p=0.635 n=5+4) RegexpMatchEasy0_1K-16 408ns ± 1% 411ns ± 2% ~ (p=0.087 n=5+5) RegexpMatchEasy1_32-16 103ns ± 1% 104ns ± 1% ~ (p=0.484 n=5+5) RegexpMatchEasy1_1K-16 659ns ± 2% 652ns ± 1% ~ (p=0.571 n=5+5) RegexpMatchMedium_32-16 176ns ± 2% 174ns ± 1% ~ (p=0.476 n=5+5) RegexpMatchMedium_1K-16 58.6µs ± 4% 57.7µs ± 4% ~ (p=0.548 n=5+5) RegexpMatchHard_32-16 3.07µs ± 3% 3.04µs ± 4% ~ (p=0.421 n=5+5) RegexpMatchHard_1K-16 89.2µs ± 1% 87.9µs ± 2% -1.52% (p=0.032 n=5+5) Revcomp-16 575ms ± 0% 587ms ± 2% +2.12% (p=0.032 n=4+5) Template-16 110ms ± 1% 107ms ± 3% -3.00% (p=0.032 n=5+5) TimeParse-16 463ns ± 0% 462ns ± 0% ~ (p=0.810 n=5+4) TimeFormat-16 538ns ± 0% 535ns ± 0% -0.63% (p=0.024 n=5+5) name old speed new speed delta GobDecode-16 70.7MB/s ± 4% 71.4MB/s ± 5% ~ (p=0.452 n=5+5) GobEncode-16 80.7MB/s ± 5% 81.2MB/s ± 2% ~ (p=1.000 n=5+5) Gzip-16 58.2MB/s ± 2% 57.7MB/s ± 2% ~ (p=0.452 n=5+5) Gunzip-16 302MB/s ± 1% 299MB/s ± 1% -0.99% (p=0.008 n=5+5) JSONEncode-16 92.4MB/s ± 1% 89.1MB/s ± 0% -3.63% (p=0.016 n=5+4) JSONDecode-16 20.4MB/s ± 0% 20.3MB/s ± 1% ~ (p=0.135 n=5+5) GoParse-16 10.6MB/s ± 2% 10.8MB/s ± 1% +2.00% (p=0.016 n=5+5) RegexpMatchEasy0_32-16 286MB/s ± 1% 285MB/s ± 3% ~ (p=1.000 n=5+5) RegexpMatchEasy0_1K-16 2.51GB/s ± 1% 2.49GB/s ± 2% ~ (p=0.095 n=5+5) RegexpMatchEasy1_32-16 309MB/s ± 1% 307MB/s ± 1% ~ (p=0.548 n=5+5) RegexpMatchEasy1_1K-16 1.55GB/s ± 2% 1.57GB/s ± 1% ~ (p=0.690 n=5+5) RegexpMatchMedium_32-16 5.68MB/s ± 2% 5.73MB/s ± 1% ~ (p=0.579 n=5+5) RegexpMatchMedium_1K-16 17.5MB/s ± 4% 17.8MB/s ± 4% ~ (p=0.500 n=5+5) RegexpMatchHard_32-16 10.4MB/s ± 3% 10.5MB/s ± 4% ~ (p=0.460 n=5+5) RegexpMatchHard_1K-16 11.5MB/s ± 1% 11.7MB/s ± 2% +1.57% (p=0.032 n=5+5) Revcomp-16 442MB/s ± 0% 433MB/s ± 2% -2.05% (p=0.032 n=4+5) Template-16 17.7MB/s ± 1% 18.2MB/s ± 3% +3.12% (p=0.032 n=5+5) Change-Id: Ic7cb7374d07da031e771bdcbfdd832fd1b17159c Reviewed-on: https://go-review.googlesource.com/98695 Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2018-03-12 18:01:33 +00:00
Keith Randall	4b00d3f4a2	cmd/compile: implement comparisons directly with memory Allow the compiler to generate code like CMPQ 16(AX), $7 It's tricky because it's difficult to spill such a comparison during flagalloc, because the same memory state might not be available at the restore locations. Solve this problem by decomposing the compare+load back into its parts if it needs to be spilled. The big win is that the write barrier test goes from: MOVL runtime.writeBarrier(SB), CX TESTL CX, CX JNE 60 to CMPL runtime.writeBarrier(SB), $0 JNE 59 It's one instruction and one byte smaller. Fixes #19485 Fixes #15245 Update #22460 Binaries are about 0.15% smaller. Change-Id: I4fd8d1111b6b9924d52f9a0901ca1b2e5cce0836 Reviewed-on: https://go-review.googlesource.com/86035 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2018-02-26 23:49:44 +00:00
Ilya Tocar	f4d9c30901	cmd/compile/internal/amd64: use appropriate NEG for div Currently we generate NEGQ for DIV{Q,L,W}. By generating NEGL and NEGW, we will reduce code size, because NEGL doesn't require rex prefix. This also guarantees that upper 32 bits are zeroed, so we can revert CL 85736, and remove zero-extensions of DIVL results. Also adds test for redundant zero extend elimination. Fixes #23310 Change-Id: Ic58c3104c255a71371a06e09d10a975bbe5df587 Reviewed-on: https://go-review.googlesource.com/96815 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-02-26 20:09:21 +00:00
Ilya Tocar	de4edf3de7	cmd/compile/internal/amd64: update popcnt code generation Popcnt has false dependency on output register and generates MOVQ $0, reg to break it. But recently we switched MOVQ $0, reg encoding from xor reg, reg to actual mov $0, reg. This CL updates code generation for popcnt to use actual XOR. Change-Id: I4c1fc11e85758b53ba2679165fa55614ec54b27d Reviewed-on: https://go-review.googlesource.com/82516 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-02-14 19:56:57 +00:00
Austin Clements	9b331189c1	cmd/internal/obj/x86: adjust SP correctly for tail calls Currently, tail calls on x86 don't adjust the SP on return, so it's important that the compiler produce a zero-sized frame and disable the frame pointer. However, these constraints aren't necessary. For example, on other architectures it's generally necessary to restore the saved LR before a tail call, so obj simply makes this work. Likewise, on x86, there's no reason we can't simply make this work. Hence, this CL adjusts the compiler to use the same tail call convention for x86 that we use on LR machines by producing a RET with a target, rather than a JMP with a target. In fact, obj already understands this convention for x86 except that it's buggy with non-zero frame sizes. So we also fix this bug obj. As a result of these fixes, the compiler no longer needs to mark wrappers as NoFramePointer since it's now perfectly fine to save the frame pointer. In fact, this eliminates the only use of NoFramePointer in the compiler, which will enable further cleanups. This also fixes what is very nearly, but not quite, a code generation bug. NoFramePointer becomes obj.NOFRAME in the object file, which on ppc64 and s390x means to omit the saved LR. Hence, on these architectures, NoFramePointer (and NOFRAME) is only safe to set on leaf functions. However, on most architectures, wrappers aren't necessarily leaf functions because they may call DUFFZERO. We're saved on ppc64 and s390x only because the compiler doesn't have the rules to produce DUFFZERO calls on these architectures. Hence, this only works because the set of LR architectures that implement NOFRAME is disjoint from the set where the compiler produces DUFFZERO operations. (I discovered this whole mess when I attempted to add NOFRAME support to arm.) Change-Id: Icc589aeb86beacb850d0a6a80bd3024974a33947 Reviewed-on: https://go-review.googlesource.com/92035 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-12 21:41:19 +00:00
Ilya Tocar	2f1f607b21	cmd/compile: intrinsify math.RoundToEven on amd64 We already do this for floor/ceil, but RoundToEven was added later. Intrinsify it also. name old time/op new time/op delta RoundToEven-8 3.00ns ± 1% 0.68ns ± 2% -77.34% (p=0.000 n=10+10) Change-Id: Ib158cbceb436c6725b2d9353a526c5c4be19bcad Reviewed-on: https://go-review.googlesource.com/74852 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-11-02 17:33:52 +00:00
Ilya Tocar	94484d8ed5	cmd/compile: intrinsify math.{Trunc/Ceil/Floor} on amd64 This significantly speed-ups Trunc. Ceil/Floor are using the same instruction, so do them too. name old time/op new time/op delta Floor-6 3.33ns ± 1% 3.22ns ± 0% -3.39% (p=0.000 n=10+10) Ceil-6 3.33ns ± 1% 3.22ns ± 0% -3.16% (p=0.000 n=10+7) Trunc-6 4.83ns ± 0% 3.22ns ± 0% -33.36% (p=0.000 n=6+8) Change-Id: If848790e458eedfe38a6a0407bb4f589c68ac254 Reviewed-on: https://go-review.googlesource.com/68630 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-31 19:30:54 +00:00
Austin Clements	7e343134d3	cmd/compile: compiler support for buffered write barrier This CL implements the compiler support for calling the buffered write barrier added by the previous CL. Since the buffered write barrier is only implemented on amd64 right now, this still supports the old, eager write barrier as well. There's little overhead to supporting both and this way a few tests in test/fixedbugs that expect to have liveness maps at write barrier calls can easily opt-in to the old, eager barrier. This significantly improves the performance of the write barrier: name old time/op new time/op delta WriteBarrier-12 73.5ns ±20% 19.2ns ±27% -73.90% (p=0.000 n=19+18) It also reduces the size of binaries because the write barrier call is more compact: name old object-bytes new object-bytes delta Template 398k ± 0% 393k ± 0% -1.14% (p=0.008 n=5+5) Unicode 208k ± 0% 206k ± 0% -1.00% (p=0.008 n=5+5) GoTypes 1.18M ± 0% 1.15M ± 0% -2.00% (p=0.008 n=5+5) Compiler 4.05M ± 0% 3.88M ± 0% -4.26% (p=0.008 n=5+5) SSA 8.25M ± 0% 8.11M ± 0% -1.59% (p=0.008 n=5+5) Flate 228k ± 0% 224k ± 0% -1.83% (p=0.008 n=5+5) GoParser 295k ± 0% 284k ± 0% -3.62% (p=0.008 n=5+5) Reflect 1.00M ± 0% 0.99M ± 0% -0.70% (p=0.008 n=5+5) Tar 339k ± 0% 333k ± 0% -1.67% (p=0.008 n=5+5) XML 404k ± 0% 395k ± 0% -2.10% (p=0.008 n=5+5) [Geo mean] 704k 690k -2.00% name old exe-bytes new exe-bytes delta HelloSize 1.05M ± 0% 1.04M ± 0% -1.55% (p=0.008 n=5+5) https://perf.golang.org/search?q=upload:20171027.1 (Amusingly, this also reduces compiler allocations by 0.75%, which, combined with the better write barrier, speeds up the compiler overall by 2.10%. See the perf link.) It slightly improves the performance of most of the go1 benchmarks and improves the performance of the x/benchmarks: name old time/op new time/op delta BinaryTree17-12 2.40s ± 1% 2.47s ± 1% +2.69% (p=0.000 n=19+19) Fannkuch11-12 2.95s ± 0% 2.95s ± 0% +0.21% (p=0.000 n=20+19) FmtFprintfEmpty-12 41.8ns ± 4% 41.4ns ± 2% -1.03% (p=0.014 n=20+20) FmtFprintfString-12 68.7ns ± 2% 67.5ns ± 1% -1.75% (p=0.000 n=20+17) FmtFprintfInt-12 79.0ns ± 3% 77.1ns ± 1% -2.40% (p=0.000 n=19+17) FmtFprintfIntInt-12 127ns ± 1% 123ns ± 3% -3.42% (p=0.000 n=20+20) FmtFprintfPrefixedInt-12 152ns ± 1% 150ns ± 1% -1.02% (p=0.000 n=18+17) FmtFprintfFloat-12 211ns ± 1% 209ns ± 0% -0.99% (p=0.000 n=20+16) FmtManyArgs-12 500ns ± 0% 496ns ± 0% -0.73% (p=0.000 n=17+20) GobDecode-12 6.44ms ± 1% 6.53ms ± 0% +1.28% (p=0.000 n=20+19) GobEncode-12 5.46ms ± 0% 5.46ms ± 1% ~ (p=0.550 n=19+20) Gzip-12 220ms ± 1% 216ms ± 0% -1.75% (p=0.000 n=19+19) Gunzip-12 38.8ms ± 0% 38.6ms ± 0% -0.30% (p=0.000 n=18+19) HTTPClientServer-12 79.0µs ± 1% 78.2µs ± 1% -1.01% (p=0.000 n=20+20) JSONEncode-12 11.9ms ± 0% 11.9ms ± 0% -0.29% (p=0.000 n=20+19) JSONDecode-12 52.6ms ± 0% 52.2ms ± 0% -0.68% (p=0.000 n=19+20) Mandelbrot200-12 3.69ms ± 0% 3.68ms ± 0% -0.36% (p=0.000 n=20+20) GoParse-12 3.13ms ± 1% 3.18ms ± 1% +1.67% (p=0.000 n=19+20) RegexpMatchEasy0_32-12 73.2ns ± 1% 72.3ns ± 1% -1.19% (p=0.000 n=19+18) RegexpMatchEasy0_1K-12 241ns ± 0% 239ns ± 0% -0.83% (p=0.000 n=17+16) RegexpMatchEasy1_32-12 68.6ns ± 1% 69.0ns ± 1% +0.47% (p=0.015 n=18+16) RegexpMatchEasy1_1K-12 364ns ± 0% 361ns ± 0% -0.67% (p=0.000 n=16+17) RegexpMatchMedium_32-12 104ns ± 1% 103ns ± 1% -0.79% (p=0.001 n=20+15) RegexpMatchMedium_1K-12 33.8µs ± 3% 34.0µs ± 2% ~ (p=0.267 n=20+19) RegexpMatchHard_32-12 1.64µs ± 1% 1.62µs ± 2% -1.25% (p=0.000 n=19+18) RegexpMatchHard_1K-12 49.2µs ± 0% 48.7µs ± 1% -0.93% (p=0.000 n=19+18) Revcomp-12 391ms ± 5% 396ms ± 7% ~ (p=0.154 n=19+19) Template-12 63.1ms ± 0% 59.5ms ± 0% -5.76% (p=0.000 n=18+19) TimeParse-12 307ns ± 0% 306ns ± 0% -0.39% (p=0.000 n=19+17) TimeFormat-12 325ns ± 0% 323ns ± 0% -0.50% (p=0.000 n=19+19) [Geo mean] 47.3µs 46.9µs -0.67% https://perf.golang.org/search?q=upload:20171026.1 name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.25ms ± 1% 2.20ms ± 1% -2.31% (p=0.000 n=18+18) HTTP-12 12.6µs ± 0% 12.6µs ± 0% -0.72% (p=0.000 n=18+17) JSON-12 11.0ms ± 0% 11.0ms ± 1% -0.68% (p=0.000 n=17+19) https://perf.golang.org/search?q=upload:20171026.2 Updates #14951. Updates #22460. Change-Id: Id4c0932890a1d41020071bec73b8522b1367d3e7 Reviewed-on: https://go-review.googlesource.com/73712 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-10-30 18:12:46 +00:00
Matthew Dempsky	a5868a47c6	cmd/internal/obj/x86: move MOV->XOR rewriting into compiler Fixes #20986. Change-Id: Ic3cf5c0ab260f259ecff7b92cfdf5f4ae432aef3 Reviewed-on: https://go-review.googlesource.com/73072 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2017-10-24 21:32:17 +00:00
Cherry Zhang	6f3e5e637c	cmd/compile: intrinsify runtime.getcallersp Add a compiler intrinsic for getcallersp. So we are able to get rid of the argument (not done in this CL). Change-Id: Ic38fda1c694f918328659ab44654198fb116668d Reviewed-on: https://go-review.googlesource.com/69350 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com>	2017-10-10 15:15:21 +00:00
Ilya Tocar	6b8a3c8889	cmd/compile/internal/amd64: add SETccmem Combine setcc and store of result into setcc that writes directly to memory. Triggers 200+ times in go tool. Fixes #21630 Change-Id: Iafa22607426f4120140c88fae4b9aecb46e0bba8 Reviewed-on: https://go-review.googlesource.com/67950 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-05 20:53:28 +00:00
David Chase	6cac100eef	cmd/compile: add intrinsic for reading caller's pc First step towards removing the mandatory argument for getcallerpc, which solves certain problems for the runtime. This might also slightly improve performance. Intrinsic enabled on 386, amd64, amd64p32, runtime asm implementation removed on those architectures. Now-superfluous argument remains in getcallerpc signature (for a future CL; non-386/amd64 asm funcs ignore it). Added getcallerpc to the "not a real function" test in dcl.go, that story is a little odd with respect to unexported functions but that is not this CL. Fixes #17327. Change-Id: I5df1ad91f27ee9ac1f0dd88fa48f1329d6306c3e Reviewed-on: https://go-review.googlesource.com/31851 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2017-09-22 18:37:03 +00:00
Ilya Tocar	a4e1a72f0a	cmd/compile/internal/amd64: add MOVLloadidx8 and MOVLstoreidx8 Currently we only use 1 and 4 as a scale for indexed 4-byte load. In code generated in #20711 we can use indexed load with scale=8, to improve performance: name old time/op new time/op delta GM-6 108µs ± 0% 95µs ± 0% -12.06% (p=0.000 n=10+10) So add new ops and combine loadidx1(shift 3..).. into loadidx8, same for stores. Change-Id: I5ed1c250ac40960e20606580cf9de221e75b72f1 Reviewed-on: https://go-review.googlesource.com/46134 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-09-01 21:25:25 +00:00
Daniel Martí	99da8730b0	all: remove some double spaces from comments Went mainly for the ones that make no sense, such as the ones mid-sentence or after commas. Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0 Reviewed-on: https://go-review.googlesource.com/57293 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-26 15:09:09 +00:00
Josh Bleecher Snyder	a45d6859cb	cmd/compile: enforce that MOVXconvert is a no-op on 386 and amd64 Follow-up to CL 58371. Change-Id: I3d2aaec84ee6db3ef1bd4fcfcaf46cc297c7176b Reviewed-on: https://go-review.googlesource.com/58610 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-25 19:39:46 +00:00
Keith Randall	fb05948d9e	cmd/compile,math: improve code generation for math.Abs Implement int reg <-> fp reg moves on amd64. If we see a load to int reg followed by an int->fp move, then we can just load to the fp reg instead. Same for stores. math.Abs is now: MOVQ "".x+8(SP), AX SHLQ $1, AX SHRQ $1, AX MOVQ AX, "".~r1+16(SP) math.Copysign is now: MOVQ "".x+8(SP), AX SHLQ $1, AX SHRQ $1, AX MOVQ "".y+16(SP), CX SHRQ $63, CX SHLQ $63, CX ORQ CX, AX MOVQ AX, "".~r2+24(SP) math.Float64bits is now: MOVSD "".x+8(SP), X0 MOVSD X0, "".~r1+16(SP) (it would be nicer to use a non-SSE reg for this, nothing is perfect) And due to the fix for #21440, the inlined version of these improve as well. name old time/op new time/op delta Abs 1.38ns ± 5% 0.89ns ±10% -35.54% (p=0.000 n=10+10) Copysign 1.56ns ± 7% 1.35ns ± 6% -13.77% (p=0.000 n=9+10) Fixes #13095 Change-Id: Ibd7f2792412a6668608780b0688a77062e1f1499 Reviewed-on: https://go-review.googlesource.com/58732 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2017-08-25 19:15:01 +00:00
Ilya Tocar	ac29f4d01c	cmd/compile/internal/amd64: add ADD[Q\|L]constmem We can add a constant to loaction in memory with 1 instruction, as opposed to load+add+store, so add a new op and relevent ssa rules. Triggers in e. g. encoding/json isValidNumber: NumberIsValid-6 36.4ns ± 0% 35.2ns ± 1% -3.32% (p=0.000 n=6+10) Shaves ~2.5 kb from go tool. Change-Id: I7ba576676c2522432360f77b290cecb9574a93c3 Reviewed-on: https://go-review.googlesource.com/54431 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-18 18:55:44 +00:00
Ilya Tocar	df70982825	cmd/compile/internal/ssa: use sse to zero on amd64 Use 16-byte stores instead of 8-byte stores to zero small blocks. Also switch to duffzero for 65+ bytes only, because for each duffzero call we also save/restore BP, so call requires 4 instructions and replacing it with 4 sse stores doesn't cause code-bloat. Also switch duffzero to use leaq, instead of addq to avoid clobbering flags. ClearFat8-6 0.54ns ± 0% 0.54ns ± 0% ~ (all equal) ClearFat12-6 1.07ns ± 0% 1.07ns ± 0% ~ (all equal) ClearFat16-6 1.07ns ± 0% 0.69ns ± 0% -35.51% (p=0.001 n=8+9) ClearFat24-6 1.61ns ± 1% 1.07ns ± 0% -33.33% (p=0.000 n=10+10) ClearFat32-6 2.14ns ± 0% 1.07ns ± 0% -50.00% (p=0.001 n=8+9) ClearFat40-6 2.67ns ± 1% 1.61ns ± 0% -39.72% (p=0.000 n=10+8) ClearFat48-6 3.75ns ± 0% 2.68ns ± 0% -28.59% (p=0.000 n=9+9) ClearFat56-6 4.29ns ± 0% 3.22ns ± 0% -25.10% (p=0.000 n=9+9) ClearFat64-6 4.30ns ± 0% 3.22ns ± 0% -25.15% (p=0.000 n=8+8) ClearFat128-6 7.50ns ± 1% 7.51ns ± 0% ~ (p=0.767 n=10+9) ClearFat256-6 13.9ns ± 1% 13.9ns ± 1% ~ (p=0.257 n=10+10) ClearFat512-6 26.8ns ± 0% 26.8ns ± 0% ~ (p=0.467 n=8+8) ClearFat1024-6 52.5ns ± 0% 52.5ns ± 0% ~ (p=1.000 n=8+8) Also shaves ~20kb from go tool: go_old 10384994 go_new 10364514 [-20480 bytes] section differences global text (code) = -20585 bytes (-0.532047%) read-only data = -302 bytes (-0.018101%) Total difference -20887 bytes (-0.348731%) Change-Id: I15854e87544545c1af24775df895e38e16e12694 Reviewed-on: https://go-review.googlesource.com/54410 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-16 15:52:27 +00:00
Josh Bleecher Snyder	46b88c9fbc	cmd/compile: change ssa.Type into types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of types.Type. The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-09 23:01:51 +00:00
Martin Möhrmann	f9bec9eb42	cmd/compile: use MOVL instead of MOVQ for small constants on amd64 The encoding of MOVL to a register is 2 bytes shorter than for MOVQ. The upper 32bit are automatically zeroed when MOVL to a register is used. Replaces 1657 MOVQ by MOVL in the go binary. Reduces go binary size by 4 kilobyte. name old time/op new time/op delta BinaryTree17 1.93s ± 0% 1.93s ± 0% -0.32% (p=0.000 n=9+9) Fannkuch11 2.66s ± 0% 2.48s ± 0% -6.60% (p=0.000 n=9+9) FmtFprintfEmpty 31.8ns ± 0% 31.6ns ± 0% -0.63% (p=0.000 n=10+10) FmtFprintfString 52.0ns ± 0% 51.9ns ± 0% -0.19% (p=0.000 n=10+10) FmtFprintfInt 55.6ns ± 0% 54.6ns ± 0% -1.80% (p=0.002 n=8+10) FmtFprintfIntInt 87.7ns ± 0% 84.8ns ± 0% -3.31% (p=0.000 n=9+9) FmtFprintfPrefixedInt 98.9ns ± 0% 102.0ns ± 0% +3.10% (p=0.000 n=10+10) FmtFprintfFloat 165ns ± 0% 164ns ± 0% -0.61% (p=0.000 n=10+10) FmtManyArgs 368ns ± 0% 361ns ± 0% -1.98% (p=0.000 n=8+10) GobDecode 4.53ms ± 0% 4.58ms ± 0% +1.08% (p=0.000 n=9+10) GobEncode 3.74ms ± 0% 3.73ms ± 0% -0.27% (p=0.000 n=10+10) Gzip 164ms ± 0% 163ms ± 0% -0.48% (p=0.000 n=10+10) Gunzip 26.7ms ± 0% 26.6ms ± 0% -0.13% (p=0.000 n=9+10) HTTPClientServer 30.4µs ± 1% 30.3µs ± 1% -0.41% (p=0.016 n=10+10) JSONEncode 10.9ms ± 0% 11.0ms ± 0% +0.70% (p=0.000 n=10+10) JSONDecode 36.8ms ± 0% 37.0ms ± 0% +0.59% (p=0.000 n=9+10) Mandelbrot200 3.20ms ± 0% 3.21ms ± 0% +0.44% (p=0.000 n=9+10) GoParse 2.35ms ± 0% 2.35ms ± 0% +0.26% (p=0.000 n=10+9) RegexpMatchEasy0_32 58.3ns ± 0% 58.4ns ± 0% +0.17% (p=0.000 n=10+10) RegexpMatchEasy0_1K 138ns ± 0% 142ns ± 0% +2.68% (p=0.000 n=10+10) RegexpMatchEasy1_32 55.1ns ± 0% 55.6ns ± 1% ~ (p=0.104 n=10+10) RegexpMatchEasy1_1K 242ns ± 0% 243ns ± 0% +0.41% (p=0.000 n=10+10) RegexpMatchMedium_32 87.4ns ± 0% 89.9ns ± 0% +2.86% (p=0.000 n=10+10) RegexpMatchMedium_1K 27.4µs ± 0% 27.4µs ± 0% +0.15% (p=0.000 n=10+10) RegexpMatchHard_32 1.30µs ± 0% 1.32µs ± 1% +1.91% (p=0.000 n=10+10) RegexpMatchHard_1K 39.0µs ± 0% 39.5µs ± 0% +1.38% (p=0.000 n=10+10) Revcomp 316ms ± 0% 319ms ± 0% +1.13% (p=0.000 n=9+8) Template 40.6ms ± 0% 40.6ms ± 0% ~ (p=0.123 n=10+10) TimeParse 224ns ± 0% 224ns ± 0% ~ (all equal) TimeFormat 230ns ± 0% 225ns ± 0% -2.17% (p=0.000 n=10+10) Change-Id: I32a099b65f9e6d4ad7288ed48546655c534757d8 Reviewed-on: https://go-review.googlesource.com/38630 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-01 20:59:58 +00:00
Josh Bleecher Snyder	dae5389d3d	Revert "cmd/compile: add Type.MustSize and Type.MustAlignment" This reverts commit 94d540a4b6bf68ec472bf4469037955e3133fcf7. Reason for revert: prefer something along the lines of CL 42018. Change-Id: I876fe32e98f37d8d725fe55e0fd0ea429c0198e0 Reviewed-on: https://go-review.googlesource.com/42022 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-04-28 01:24:13 +00:00
Josh Bleecher Snyder	94d540a4b6	cmd/compile: add Type.MustSize and Type.MustAlignment Type.Size and Type.Alignment are for the front end: They calculate size and alignment if needed. Type.MustSize and Type.MustAlignment are for the back end: They call Fatal if size and alignment are not already calculated. Most uses are of MustSize and MustAlignment, but that's because the back end is newer, and this API was added to support it. This CL was mostly generated with sed and selective reversion. The only mildly interesting bit is the change of the ssa.Type interface and the supporting ssa dummy types. Follow-up to review feedback on CL 41970. Passes toolstash-check. Change-Id: I0d9b9505e57453dae8fb6a236a07a7a02abd459e Reviewed-on: https://go-review.googlesource.com/42016 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-04-27 22:57:57 +00:00
Keith Randall	1e72bf6218	cmd/compile: experiment which clobbers all dead pointer fields The experiment "clobberdead" clobbers all pointer fields that the compiler thinks are dead, just before and after every safepoint. Useful for debugging the generation of live pointer bitmaps. Helped find the following issues: Update #15936 Update #16026 Update #16095 Update #18860 Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9 Reviewed-on: https://go-review.googlesource.com/23924 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-04-21 20:19:50 +00:00
Keith Randall	7e07e635f3	cmd/compile: implement non-constant rotates Makes math/bits.Rotate{Left,Right} fast on amd64. name old time/op new time/op delta RotateLeft-12 7.42ns ± 6% 5.45ns ± 6% -26.54% (p=0.000 n=9+10) RotateLeft8-12 4.77ns ± 5% 3.42ns ± 7% -28.25% (p=0.000 n=8+10) RotateLeft16-12 4.82ns ± 8% 3.40ns ± 7% -29.36% (p=0.000 n=10+10) RotateLeft32-12 4.87ns ± 7% 3.48ns ± 7% -28.51% (p=0.000 n=8+9) RotateLeft64-12 5.23ns ±10% 3.35ns ± 6% -35.97% (p=0.000 n=9+10) RotateRight-12 7.59ns ± 8% 5.71ns ± 1% -24.72% (p=0.000 n=10+8) RotateRight8-12 4.98ns ± 7% 3.36ns ± 9% -32.55% (p=0.000 n=10+10) RotateRight16-12 5.12ns ± 2% 3.45ns ± 5% -32.62% (p=0.000 n=10+10) RotateRight32-12 4.80ns ± 6% 3.42ns ±16% -28.68% (p=0.000 n=10+10) RotateRight64-12 4.78ns ± 6% 3.42ns ± 6% -28.50% (p=0.000 n=10+10) Update #18940 Change-Id: Ie79fb5581c489ed4d3b859314c5e669a134c119b Reviewed-on: https://go-review.googlesource.com/39711 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-04-17 23:19:45 +00:00
Keith Randall	5cadc91b3c	cmd/compile: intrinsics for math/bits.OnesCount Popcount instructions on amd64 are not guaranteed to be present, so we must guard their call. Rewrite rules can't generate control flow at the moment, so the intrinsifier needs to generate that code. name old time/op new time/op delta OnesCount-8 2.47ns ± 5% 1.04ns ± 2% -57.70% (p=0.000 n=10+10) OnesCount16-8 1.05ns ± 1% 0.78ns ± 0% -25.56% (p=0.000 n=9+8) OnesCount32-8 1.63ns ± 5% 1.04ns ± 2% -35.96% (p=0.000 n=10+10) OnesCount64-8 2.45ns ± 0% 1.04ns ± 1% -57.55% (p=0.000 n=6+10) Update #18616 Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673 Reviewed-on: https://go-review.googlesource.com/38320 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-04-04 02:40:11 +00:00
Keith Randall	214be5b302	cmd/compile: remove likely bits from generated assembly We don't need them any more since #15837 was fixed. Fixes #19718 Change-Id: I13e46c62b321b2c9265f44c977b63bfb23163ca2 Reviewed-on: https://go-review.googlesource.com/38664 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-03-26 04:40:20 +00:00
Matthew Dempsky	22ea7fc1a9	cmd/compile/internal/gc: make SSAGenFPJump a method of SSAGenState Change-Id: Ie22a08c93dfcfd4b336e7b158415448dd55b2c11 Reviewed-on: https://go-review.googlesource.com/38407 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-03-22 17:46:08 +00:00
Josh Bleecher Snyder	0a94daa378	cmd/compile: funnel SSA Prog creation through SSAGenState Step one in eliminating Prog-related globals. Passes toolstash-check -all. Updates #15756 Change-Id: I3b777fb5a7716f2d9da3067fbd94c28ca894a465 Reviewed-on: https://go-review.googlesource.com/38450 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-03-22 17:18:40 +00:00
Matthew Dempsky	3e2f980e27	cmd/compile: eliminate direct uses of gc.Thearch in backends This CL changes the GOARCH.Init functions to take gc.Thearch as a parameter, which gc.Main supplies. Additionally, the x86 backend is refactored to decide within Init whether to use the 387 or SSE2 instruction generators, rather than for each individual SSA Value/Block. Passes toolstash-check -all. Change-Id: Ie6305a6cd6f6ab4e89ecbb3cbbaf5ffd57057a24 Reviewed-on: https://go-review.googlesource.com/38301 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-03-17 22:10:53 +00:00
Keith Randall	495b167919	cmd/compile: intrinsics for math/bits.{Len,LeadingZeros} name old time/op new time/op delta LeadingZeros-4 2.00ns ± 0% 1.34ns ± 1% -33.02% (p=0.000 n=8+10) LeadingZeros16-4 1.62ns ± 0% 1.57ns ± 0% -3.09% (p=0.001 n=8+9) LeadingZeros32-4 2.14ns ± 0% 1.48ns ± 0% -30.84% (p=0.002 n=8+10) LeadingZeros64-4 2.06ns ± 1% 1.33ns ± 0% -35.08% (p=0.000 n=8+8) 8-bit args is a special case - the Go code is really fast because it is just a single table lookup. So I've disabled that for now. Intrinsics were actually slower: LeadingZeros8-4 1.22ns ± 3% 1.58ns ± 1% +29.56% (p=0.000 n=10+10) Update #18616 Change-Id: Ia9c289b9ba59c583ea64060470315fd637e814cf Reviewed-on: https://go-review.googlesource.com/38311 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-16 22:53:49 +00:00
Matthew Dempsky	118b3fe7bb	cmd/compile/internal/gc: refactor ACALL Prog creation This abstracts creation of ACALL Progs into package gc. The main benefit of this today is we can refactor away a lot of common boilerplate code. Later, once liveness analysis happens on the SSA graph, this will also provide an easy insertion point for emitting the PCDATA Progs immediately before call instructions. Passes toolstash-check -all. Change-Id: Ia15108ace97201cd84314f1ca916dfeb4f09d61c Reviewed-on: https://go-review.googlesource.com/38081 Reviewed-by: Keith Randall <khr@golang.org>	2017-03-13 21:04:16 +00:00
Matthew Dempsky	08d8d5c986	cmd/compile/internal/ssa: replace {Defer,Go}Call with StaticCall Passes toolstash-check -all. Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2 Reviewed-on: https://go-review.googlesource.com/38080 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-13 19:44:36 +00:00

1 2

93 Commits