93 Commits

Author SHA1 Message Date
Ben Shi
2294e3ebd3 cmd/compile: combine similar code in amd64's assembly generator
This CL combines similar code in amd64's assembly generator. The
total size of pkg/linux_amd64/cmd/compile/ decreases about 4.5KB,
while the generated amd64 code is not affected.

Change-Id: I4cdbdd22bde8857aafdc29b47fa100a906fa1598
Reviewed-on: https://go-review.googlesource.com/c/140298
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-06 08:50:33 +00:00
Ben Shi
c6bf9a8109 cmd/compile: optimize AMD64's bit wise operation
Currently "arr[idx] |= 0x80" is compiled to MOVLload->BTSL->MOVLstore.
And this CL optimizes it to a single BTSLconstmodify. Other bit wise
operations with a direct memory operand are also implemented.

1. The size of the executable bin/go decreases about 4KB, and the total size
of pkg/linux_amd64 (excluding cmd/compile) decreases about 0.6KB.

2. There a little improvement in the go1 benchmark test (excluding noise).
name                     old time/op    new time/op    delta
BinaryTree17-4              2.66s ± 4%     2.66s ± 3%    ~     (p=0.596 n=49+49)
Fannkuch11-4                2.38s ± 2%     2.32s ± 2%  -2.69%  (p=0.000 n=50+50)
FmtFprintfEmpty-4          42.7ns ± 4%    43.2ns ± 7%  +1.31%  (p=0.009 n=50+50)
FmtFprintfString-4         71.0ns ± 5%    72.0ns ± 3%  +1.33%  (p=0.000 n=50+50)
FmtFprintfInt-4            80.7ns ± 4%    80.6ns ± 3%    ~     (p=0.931 n=50+50)
FmtFprintfIntInt-4          125ns ± 3%     126ns ± 4%    ~     (p=0.051 n=50+50)
FmtFprintfPrefixedInt-4     158ns ± 1%     142ns ± 3%  -9.84%  (p=0.000 n=36+50)
FmtFprintfFloat-4           215ns ± 4%     212ns ± 4%  -1.23%  (p=0.002 n=50+50)
FmtManyArgs-4               519ns ± 3%     510ns ± 3%  -1.77%  (p=0.000 n=50+50)
GobDecode-4                6.49ms ± 6%    6.52ms ± 5%    ~     (p=0.866 n=50+50)
GobEncode-4                5.93ms ± 8%    6.01ms ± 7%    ~     (p=0.076 n=50+50)
Gzip-4                      222ms ± 4%     224ms ± 8%  +0.80%  (p=0.001 n=50+50)
Gunzip-4                   36.6ms ± 5%    36.4ms ± 4%    ~     (p=0.093 n=50+50)
HTTPClientServer-4         59.1µs ± 1%    58.9µs ± 2%  -0.24%  (p=0.039 n=49+48)
JSONEncode-4               9.23ms ± 4%    9.21ms ± 5%    ~     (p=0.244 n=50+50)
JSONDecode-4               48.8ms ± 4%    48.7ms ± 4%    ~     (p=0.653 n=50+50)
Mandelbrot200-4            3.81ms ± 4%    3.80ms ± 3%    ~     (p=0.834 n=50+50)
GoParse-4                  3.20ms ± 5%    3.19ms ± 5%    ~     (p=0.494 n=50+50)
RegexpMatchEasy0_32-4      78.1ns ± 2%    77.4ns ± 3%  -0.86%  (p=0.005 n=50+50)
RegexpMatchEasy0_1K-4       233ns ± 3%     233ns ± 3%    ~     (p=0.074 n=50+50)
RegexpMatchEasy1_32-4      74.2ns ± 3%    73.4ns ± 3%  -1.06%  (p=0.000 n=50+50)
RegexpMatchEasy1_1K-4       369ns ± 2%     364ns ± 4%  -1.41%  (p=0.000 n=36+50)
RegexpMatchMedium_32-4      109ns ± 4%     107ns ± 3%  -2.06%  (p=0.001 n=50+50)
RegexpMatchMedium_1K-4     31.5µs ± 3%    30.8µs ± 3%  -2.20%  (p=0.000 n=50+50)
RegexpMatchHard_32-4       1.57µs ± 3%    1.56µs ± 2%  -0.57%  (p=0.016 n=50+50)
RegexpMatchHard_1K-4       47.4µs ± 4%    47.0µs ± 3%  -0.82%  (p=0.008 n=50+50)
Revcomp-4                   414ms ± 7%     412ms ± 7%    ~     (p=0.285 n=50+50)
Template-4                 64.3ms ± 4%    62.7ms ± 3%  -2.44%  (p=0.000 n=50+50)
TimeParse-4                 316ns ± 3%     313ns ± 3%    ~     (p=0.122 n=50+50)
TimeFormat-4                291ns ± 3%     293ns ± 3%  +0.80%  (p=0.001 n=50+50)
[Geo mean]                 46.5µs         46.2µs       -0.81%

name                     old speed      new speed      delta
GobDecode-4               118MB/s ± 6%   118MB/s ± 5%    ~     (p=0.863 n=50+50)
GobEncode-4               130MB/s ± 9%   128MB/s ± 8%    ~     (p=0.076 n=50+50)
Gzip-4                   87.4MB/s ± 4%  86.8MB/s ± 7%  -0.78%  (p=0.002 n=50+50)
Gunzip-4                  531MB/s ± 5%   533MB/s ± 4%    ~     (p=0.093 n=50+50)
JSONEncode-4              210MB/s ± 4%   211MB/s ± 5%    ~     (p=0.247 n=50+50)
JSONDecode-4             39.8MB/s ± 4%  39.9MB/s ± 4%    ~     (p=0.654 n=50+50)
GoParse-4                18.1MB/s ± 5%  18.2MB/s ± 5%    ~     (p=0.493 n=50+50)
RegexpMatchEasy0_32-4     410MB/s ± 2%   413MB/s ± 3%  +0.86%  (p=0.004 n=50+50)
RegexpMatchEasy0_1K-4    4.39GB/s ± 3%  4.38GB/s ± 3%    ~     (p=0.063 n=50+50)
RegexpMatchEasy1_32-4     432MB/s ± 3%   436MB/s ± 3%  +1.07%  (p=0.000 n=50+50)
RegexpMatchEasy1_1K-4    2.77GB/s ± 2%  2.81GB/s ± 4%  +1.46%  (p=0.000 n=36+50)
RegexpMatchMedium_32-4   9.16MB/s ± 3%  9.35MB/s ± 4%  +2.09%  (p=0.001 n=50+50)
RegexpMatchMedium_1K-4   32.5MB/s ± 3%  33.2MB/s ± 3%  +2.25%  (p=0.000 n=50+50)
RegexpMatchHard_32-4     20.4MB/s ± 3%  20.5MB/s ± 2%  +0.56%  (p=0.017 n=50+50)
RegexpMatchHard_1K-4     21.6MB/s ± 4%  21.8MB/s ± 3%  +0.83%  (p=0.008 n=50+50)
Revcomp-4                 613MB/s ± 4%   618MB/s ± 7%    ~     (p=0.152 n=48+50)
Template-4               30.2MB/s ± 4%  30.9MB/s ± 3%  +2.49%  (p=0.000 n=50+50)
[Geo mean]                127MB/s        128MB/s       +0.64%

Change-Id: If405198283855d75697f66cf894b2bef458f620e
Reviewed-on: https://go-review.googlesource.com/135422
Reviewed-by: Keith Randall <khr@golang.org>
2018-09-19 03:00:58 +00:00
Ben Shi
d17ac29158 cmd/compile: simplify AMD64's assembly generator
AMD64's ADDQconstmodify/ADDLconstmodify have similar logic with
other constmodify like operators, but seperated case statements.
This CL simplify them with a fallthrough.

Change-Id: Ia73ffeaddc5080182f68c06c9d9b48fe32a14e38
Reviewed-on: https://go-review.googlesource.com/135855
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-09-19 01:35:46 +00:00
Ben Shi
4545fea4fc cmd/compile/internal/amd64: simplify assembly generator
Merge two case-statements together, since they have similar logic.

1. That makes the assembly generator more clear.
2. The total size of cmd/compile decreases about 0.8KB.

Change-Id: I0144a07152202ee7b21e323bcd5dea80a351a6e3
Reviewed-on: https://go-review.googlesource.com/134215
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-09-08 23:59:26 +00:00
Ben Shi
3bc34385fa cmd/compile: introduce more read-modify-write operations for amd64
Add suport of read-modify-write for AND/SUB/AND/OR/XOR on amd64.

1. The total size of pkg/linux_amd64 decreases about 4KB, excluding
cmd/compile.

2. The go1 benchmark shows a little improvement, excluding noise.

name                     old time/op    new time/op    delta
BinaryTree17-4              2.63s ± 3%     2.65s ± 4%   +1.01%  (p=0.037 n=35+35)
Fannkuch11-4                2.33s ± 2%     2.39s ± 2%   +2.49%  (p=0.000 n=35+35)
FmtFprintfEmpty-4          45.4ns ± 5%    40.8ns ± 6%  -10.09%  (p=0.000 n=35+35)
FmtFprintfString-4         73.3ns ± 4%    70.9ns ± 3%   -3.23%  (p=0.000 n=30+35)
FmtFprintfInt-4            79.9ns ± 4%    79.5ns ± 3%     ~     (p=0.736 n=34+35)
FmtFprintfIntInt-4          126ns ± 4%     125ns ± 4%     ~     (p=0.083 n=35+35)
FmtFprintfPrefixedInt-4     152ns ± 6%     152ns ± 3%     ~     (p=0.855 n=34+35)
FmtFprintfFloat-4           215ns ± 4%     213ns ± 4%     ~     (p=0.066 n=35+35)
FmtManyArgs-4               522ns ± 3%     506ns ± 3%   -3.15%  (p=0.000 n=35+35)
GobDecode-4                6.45ms ± 8%    6.51ms ± 7%   +0.96%  (p=0.026 n=35+35)
GobEncode-4                6.10ms ± 6%    6.02ms ± 8%     ~     (p=0.160 n=35+35)
Gzip-4                      228ms ± 3%     221ms ± 3%   -2.92%  (p=0.000 n=35+35)
Gunzip-4                   37.5ms ± 4%    37.2ms ± 3%   -0.78%  (p=0.036 n=35+35)
HTTPClientServer-4         58.7µs ± 2%    59.2µs ± 1%   +0.80%  (p=0.000 n=33+33)
JSONEncode-4               12.0ms ± 3%    12.2ms ± 3%   +1.84%  (p=0.008 n=35+35)
JSONDecode-4               57.0ms ± 4%    56.6ms ± 3%     ~     (p=0.320 n=35+35)
Mandelbrot200-4            3.82ms ± 3%    3.79ms ± 3%     ~     (p=0.074 n=35+35)
GoParse-4                  3.21ms ± 5%    3.24ms ± 4%     ~     (p=0.119 n=35+35)
RegexpMatchEasy0_32-4      76.3ns ± 4%    75.4ns ± 4%   -1.14%  (p=0.014 n=34+33)
RegexpMatchEasy0_1K-4       251ns ± 4%     254ns ± 3%   +1.28%  (p=0.016 n=35+35)
RegexpMatchEasy1_32-4      69.6ns ± 3%    70.1ns ± 3%   +0.82%  (p=0.005 n=35+35)
RegexpMatchEasy1_1K-4       367ns ± 4%     376ns ± 4%   +2.47%  (p=0.000 n=35+35)
RegexpMatchMedium_32-4      108ns ± 5%     104ns ± 4%   -3.18%  (p=0.000 n=35+35)
RegexpMatchMedium_1K-4     33.8µs ± 3%    32.7µs ± 3%   -3.27%  (p=0.000 n=35+35)
RegexpMatchHard_32-4       1.55µs ± 3%    1.52µs ± 3%   -1.64%  (p=0.000 n=35+35)
RegexpMatchHard_1K-4       46.6µs ± 3%    46.6µs ± 4%     ~     (p=0.149 n=35+35)
Revcomp-4                   416ms ± 7%     412ms ± 6%   -0.95%  (p=0.033 n=33+35)
Template-4                 64.3ms ± 3%    62.4ms ± 7%   -2.94%  (p=0.000 n=35+35)
TimeParse-4                 320ns ± 2%     322ns ± 3%     ~     (p=0.589 n=35+35)
TimeFormat-4                300ns ± 3%     300ns ± 3%     ~     (p=0.597 n=35+35)
[Geo mean]                 47.4µs         47.0µs        -0.86%

name                     old speed      new speed      delta
GobDecode-4               119MB/s ± 7%   118MB/s ± 7%   -0.96%  (p=0.027 n=35+35)
GobEncode-4               126MB/s ± 7%   127MB/s ± 6%     ~     (p=0.157 n=34+34)
Gzip-4                   85.3MB/s ± 3%  87.9MB/s ± 3%   +3.02%  (p=0.000 n=35+35)
Gunzip-4                  518MB/s ± 4%   522MB/s ± 3%   +0.79%  (p=0.037 n=35+35)
JSONEncode-4              162MB/s ± 3%   159MB/s ± 3%   -1.81%  (p=0.009 n=35+35)
JSONDecode-4             34.1MB/s ± 4%  34.3MB/s ± 3%     ~     (p=0.318 n=35+35)
GoParse-4                18.0MB/s ± 5%  17.9MB/s ± 4%     ~     (p=0.117 n=35+35)
RegexpMatchEasy0_32-4     419MB/s ± 3%   425MB/s ± 4%   +1.46%  (p=0.003 n=32+33)
RegexpMatchEasy0_1K-4    4.07GB/s ± 4%  4.02GB/s ± 3%   -1.28%  (p=0.014 n=35+35)
RegexpMatchEasy1_32-4     460MB/s ± 3%   456MB/s ± 4%   -0.82%  (p=0.004 n=35+35)
RegexpMatchEasy1_1K-4    2.79GB/s ± 4%  2.72GB/s ± 4%   -2.39%  (p=0.000 n=35+35)
RegexpMatchMedium_32-4   9.23MB/s ± 4%  9.53MB/s ± 4%   +3.16%  (p=0.000 n=35+35)
RegexpMatchMedium_1K-4   30.3MB/s ± 3%  31.3MB/s ± 3%   +3.38%  (p=0.000 n=35+35)
RegexpMatchHard_32-4     20.7MB/s ± 3%  21.0MB/s ± 3%   +1.67%  (p=0.000 n=35+35)
RegexpMatchHard_1K-4     22.0MB/s ± 3%  21.9MB/s ± 4%     ~     (p=0.277 n=35+33)
Revcomp-4                 612MB/s ± 7%   618MB/s ± 6%   +0.96%  (p=0.034 n=33+35)
Template-4               30.2MB/s ± 3%  31.1MB/s ± 6%   +3.05%  (p=0.000 n=35+35)
[Geo mean]                123MB/s        124MB/s        +0.64%

Change-Id: Ia025da272e07d0069413824bfff3471b106d6280
Reviewed-on: https://go-review.googlesource.com/121535
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-24 23:38:25 +00:00
Ben Shi
705f3c74e6 cmd/compile: optimize AMD64 with DIVSSload and DIVSDload
DIVSSload & DIVSDload directly operate on a memory operand. And
binary size can be reduced by them, while the performance is
not affected.

The total size of pkg/linux_amd64 (excluding cmd/compile) decreases
about 6KB.

There is little regression in the go1 benchmark test (excluding noise).
name                     old time/op    new time/op    delta
BinaryTree17-4              2.63s ± 4%     2.62s ± 4%    ~     (p=0.809 n=30+30)
Fannkuch11-4                2.40s ± 2%     2.40s ± 2%    ~     (p=0.109 n=30+30)
FmtFprintfEmpty-4          43.1ns ± 4%    43.2ns ± 9%    ~     (p=0.168 n=30+30)
FmtFprintfString-4         73.6ns ± 4%    74.1ns ± 4%    ~     (p=0.069 n=30+30)
FmtFprintfInt-4            81.0ns ± 3%    81.4ns ± 5%    ~     (p=0.350 n=30+30)
FmtFprintfIntInt-4          127ns ± 4%     129ns ± 4%  +0.99%  (p=0.021 n=30+30)
FmtFprintfPrefixedInt-4     156ns ± 4%     155ns ± 4%    ~     (p=0.415 n=30+30)
FmtFprintfFloat-4           219ns ± 4%     218ns ± 4%    ~     (p=0.071 n=30+30)
FmtManyArgs-4               522ns ± 3%     518ns ± 3%  -0.68%  (p=0.034 n=30+30)
GobDecode-4                6.49ms ± 6%    6.52ms ± 6%    ~     (p=0.832 n=30+30)
GobEncode-4                6.10ms ± 9%    6.14ms ± 7%    ~     (p=0.485 n=30+30)
Gzip-4                      227ms ± 1%     224ms ± 4%    ~     (p=0.484 n=24+30)
Gunzip-4                   37.2ms ± 3%    36.8ms ± 4%    ~     (p=0.889 n=30+30)
HTTPClientServer-4         58.9µs ± 1%    58.7µs ± 2%  -0.42%  (p=0.003 n=28+28)
JSONEncode-4               12.0ms ± 3%    12.0ms ± 4%    ~     (p=0.523 n=30+30)
JSONDecode-4               54.6ms ± 4%    54.5ms ± 4%    ~     (p=0.708 n=30+30)
Mandelbrot200-4            3.78ms ± 4%    3.81ms ± 3%  +0.99%  (p=0.016 n=30+30)
GoParse-4                  3.20ms ± 4%    3.20ms ± 5%    ~     (p=0.994 n=30+30)
RegexpMatchEasy0_32-4      77.0ns ± 4%    75.9ns ± 3%  -1.39%  (p=0.006 n=29+30)
RegexpMatchEasy0_1K-4       255ns ± 4%     253ns ± 4%    ~     (p=0.091 n=30+30)
RegexpMatchEasy1_32-4      69.7ns ± 3%    70.3ns ± 4%    ~     (p=0.120 n=30+30)
RegexpMatchEasy1_1K-4       373ns ± 2%     378ns ± 3%  +1.43%  (p=0.000 n=21+26)
RegexpMatchMedium_32-4      107ns ± 2%     108ns ± 4%  +1.50%  (p=0.012 n=22+30)
RegexpMatchMedium_1K-4     34.0µs ± 1%    34.3µs ± 3%  +1.08%  (p=0.008 n=24+30)
RegexpMatchHard_32-4       1.53µs ± 3%    1.54µs ± 3%    ~     (p=0.234 n=30+30)
RegexpMatchHard_1K-4       46.7µs ± 4%    47.0µs ± 4%    ~     (p=0.420 n=30+30)
Revcomp-4                   411ms ± 7%     415ms ± 6%    ~     (p=0.059 n=30+30)
Template-4                 65.5ms ± 5%    66.9ms ± 4%  +2.21%  (p=0.001 n=30+30)
TimeParse-4                 317ns ± 3%     311ns ± 3%  -1.97%  (p=0.000 n=30+30)
TimeFormat-4                293ns ± 3%     294ns ± 3%    ~     (p=0.243 n=30+30)
[Geo mean]                 47.4µs         47.5µs       +0.17%

name                     old speed      new speed      delta
GobDecode-4               118MB/s ± 5%   118MB/s ± 6%    ~     (p=0.832 n=30+30)
GobEncode-4               125MB/s ± 7%   125MB/s ± 7%    ~     (p=0.625 n=29+30)
Gzip-4                   85.3MB/s ± 1%  86.6MB/s ± 4%    ~     (p=0.486 n=24+30)
Gunzip-4                  522MB/s ± 3%   527MB/s ± 4%    ~     (p=0.889 n=30+30)
JSONEncode-4              162MB/s ± 3%   162MB/s ± 4%    ~     (p=0.520 n=30+30)
JSONDecode-4             35.5MB/s ± 4%  35.6MB/s ± 4%    ~     (p=0.701 n=30+30)
GoParse-4                18.1MB/s ± 4%  18.1MB/s ± 4%    ~     (p=0.891 n=29+30)
RegexpMatchEasy0_32-4     416MB/s ± 4%   422MB/s ± 3%  +1.43%  (p=0.005 n=29+30)
RegexpMatchEasy0_1K-4    4.01GB/s ± 4%  4.04GB/s ± 4%    ~     (p=0.091 n=30+30)
RegexpMatchEasy1_32-4     460MB/s ± 3%   456MB/s ± 5%    ~     (p=0.123 n=30+30)
RegexpMatchEasy1_1K-4    2.74GB/s ± 2%  2.70GB/s ± 3%  -1.33%  (p=0.000 n=22+26)
RegexpMatchMedium_32-4   9.39MB/s ± 3%  9.19MB/s ± 4%  -2.06%  (p=0.001 n=28+30)
RegexpMatchMedium_1K-4   30.1MB/s ± 1%  29.8MB/s ± 3%  -1.04%  (p=0.008 n=24+30)
RegexpMatchHard_32-4     20.9MB/s ± 3%  20.8MB/s ± 3%    ~     (p=0.234 n=30+30)
RegexpMatchHard_1K-4     21.9MB/s ± 4%  21.8MB/s ± 4%    ~     (p=0.420 n=30+30)
Revcomp-4                 619MB/s ± 7%   612MB/s ± 7%    ~     (p=0.059 n=30+30)
Template-4               29.6MB/s ± 4%  29.0MB/s ± 4%  -2.16%  (p=0.002 n=30+30)
[Geo mean]                123MB/s        123MB/s       -0.33%

Change-Id: Ia59e077feae4f2824df79059daea4d0f678e3e4c
Reviewed-on: https://go-review.googlesource.com/120275
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-08-22 02:47:49 +00:00
Ben Shi
b75c5c5992 cmd/compile: optimize AMD64 with more read-modify-write operations
6 more operations which do read-modify-write with a constant
source operand are added.

1. The total size of pkg/linux_amd64 decreases about 3KB, excluding
cmd/compile.

2. The go1 benckmark shows a slight improvement.
name                     old time/op    new time/op    delta
BinaryTree17-4              2.61s ± 4%     2.67s ± 2%  +2.26%  (p=0.000 n=30+29)
Fannkuch11-4                2.39s ± 2%     2.32s ± 2%  -2.67%  (p=0.000 n=30+30)
FmtFprintfEmpty-4          44.0ns ± 4%    41.7ns ± 4%  -5.15%  (p=0.000 n=30+30)
FmtFprintfString-4         74.2ns ± 4%    72.3ns ± 4%  -2.59%  (p=0.000 n=30+30)
FmtFprintfInt-4            81.7ns ± 3%    78.8ns ± 4%  -3.54%  (p=0.000 n=27+30)
FmtFprintfIntInt-4          130ns ± 4%     124ns ± 5%  -4.60%  (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4     154ns ± 3%     152ns ± 3%  -1.13%  (p=0.012 n=30+30)
FmtFprintfFloat-4           215ns ± 4%     212ns ± 5%  -1.56%  (p=0.002 n=30+30)
FmtManyArgs-4               522ns ± 3%     512ns ± 3%  -1.84%  (p=0.001 n=30+30)
GobDecode-4                6.42ms ± 5%    6.49ms ± 7%    ~     (p=0.070 n=30+30)
GobEncode-4                6.07ms ± 8%    5.98ms ± 8%    ~     (p=0.150 n=30+30)
Gzip-4                      236ms ± 4%     223ms ± 4%  -5.57%  (p=0.000 n=30+30)
Gunzip-4                   37.4ms ± 3%    36.7ms ± 4%  -2.03%  (p=0.000 n=30+30)
HTTPClientServer-4         58.7µs ± 1%    58.5µs ± 2%  -0.37%  (p=0.018 n=30+29)
JSONEncode-4               12.0ms ± 4%    12.1ms ± 3%    ~     (p=0.112 n=30+30)
JSONDecode-4               54.5ms ± 3%    55.5ms ± 4%  +1.80%  (p=0.006 n=30+30)
Mandelbrot200-4            3.78ms ± 4%    3.78ms ± 4%    ~     (p=0.173 n=30+30)
GoParse-4                  3.16ms ± 5%    3.22ms ± 5%  +1.75%  (p=0.010 n=30+30)
RegexpMatchEasy0_32-4      76.6ns ± 1%    75.9ns ± 3%    ~     (p=0.672 n=25+30)
RegexpMatchEasy0_1K-4       252ns ± 3%     253ns ± 3%  +0.57%  (p=0.027 n=30+30)
RegexpMatchEasy1_32-4      69.8ns ± 4%    70.2ns ± 6%    ~     (p=0.539 n=30+30)
RegexpMatchEasy1_1K-4       374ns ± 3%     373ns ± 5%    ~     (p=0.263 n=30+30)
RegexpMatchMedium_32-4      107ns ± 4%     109ns ± 3%    ~     (p=0.067 n=30+30)
RegexpMatchMedium_1K-4     33.9µs ± 5%    34.1µs ± 4%    ~     (p=0.297 n=30+30)
RegexpMatchHard_32-4       1.54µs ± 3%    1.56µs ± 4%  +1.43%  (p=0.002 n=30+30)
RegexpMatchHard_1K-4       46.6µs ± 3%    47.0µs ± 3%    ~     (p=0.055 n=30+30)
Revcomp-4                   411ms ± 6%     407ms ± 6%    ~     (p=0.219 n=30+30)
Template-4                 66.8ms ± 3%    64.8ms ± 5%  -3.01%  (p=0.000 n=30+30)
TimeParse-4                 312ns ± 2%     319ns ± 3%  +2.50%  (p=0.000 n=30+30)
TimeFormat-4                296ns ± 5%     299ns ± 3%  +0.93%  (p=0.005 n=30+30)
[Geo mean]                 47.5µs         47.1µs       -0.75%

name                     old speed      new speed      delta
GobDecode-4               120MB/s ± 5%   118MB/s ± 6%    ~     (p=0.072 n=30+30)
GobEncode-4               127MB/s ± 8%   129MB/s ± 8%    ~     (p=0.150 n=30+30)
Gzip-4                   82.1MB/s ± 4%  87.0MB/s ± 4%  +5.90%  (p=0.000 n=30+30)
Gunzip-4                  519MB/s ± 4%   529MB/s ± 4%  +2.07%  (p=0.001 n=30+30)
JSONEncode-4              162MB/s ± 4%   161MB/s ± 3%    ~     (p=0.110 n=30+30)
JSONDecode-4             35.6MB/s ± 3%  35.0MB/s ± 4%  -1.77%  (p=0.007 n=30+30)
GoParse-4                18.3MB/s ± 4%  18.0MB/s ± 4%  -1.72%  (p=0.009 n=30+30)
RegexpMatchEasy0_32-4     418MB/s ± 1%   422MB/s ± 3%    ~     (p=0.645 n=25+30)
RegexpMatchEasy0_1K-4    4.06GB/s ± 3%  4.04GB/s ± 3%  -0.57%  (p=0.033 n=30+30)
RegexpMatchEasy1_32-4     459MB/s ± 4%   456MB/s ± 6%    ~     (p=0.530 n=30+30)
RegexpMatchEasy1_1K-4    2.73GB/s ± 3%  2.75GB/s ± 5%    ~     (p=0.279 n=30+30)
RegexpMatchMedium_32-4   9.28MB/s ± 5%  9.18MB/s ± 4%    ~     (p=0.086 n=30+30)
RegexpMatchMedium_1K-4   30.2MB/s ± 4%  30.0MB/s ± 4%    ~     (p=0.300 n=30+30)
RegexpMatchHard_32-4     20.8MB/s ± 3%  20.5MB/s ± 4%  -1.41%  (p=0.002 n=30+30)
RegexpMatchHard_1K-4     22.0MB/s ± 3%  21.8MB/s ± 3%    ~     (p=0.051 n=30+30)
Revcomp-4                 619MB/s ± 7%   625MB/s ± 7%    ~     (p=0.219 n=30+30)
Template-4               29.0MB/s ± 3%  29.9MB/s ± 4%  +3.11%  (p=0.000 n=30+30)
[Geo mean]                123MB/s        123MB/s       +0.28%

Change-Id: I850652cfd53329c1af804b7f57f4393d8097bb0d
Reviewed-on: https://go-review.googlesource.com/121135
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-08-20 14:18:39 +00:00
Martin Möhrmann
b40db51438 cmd/compile: split slow 3 operand LEA instructions into two LEAs
go tool objdump ../bin/go | grep "\.go\:" | grep -c "LEA.*0x.*[(].*[(].*"
Before: 1012
After: 20

Updates #21735

Benchmarks thanks to drchase@google.com
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
benchstat -geomean *.stdout | grep -v pkg:
name                                       old time/op    new time/op    delta
FastTest2KB-12                              131.0ns ± 0%   131.0ns ± 0%    ~     (all equal)
BaseTest2KB-12                              601.3ns ± 0%   601.3ns ± 0%    ~     (p=0.942 n=45+41)
Encoding4KBVerySparse-12                    15.38µs ± 1%   15.41µs ± 0%  +0.24%  (p=0.000 n=44+43)
Join_8-12                                    2.117s ± 2%    2.128s ± 2%  +0.51%  (p=0.007 n=48+48)
HashimotoLight-12                           1.663ms ± 1%   1.668ms ± 1%  +0.35%  (p=0.006 n=49+49)
Sha3_224_MTU-12                             4.843µs ± 0%   4.836µs ± 0%  -0.14%  (p=0.000 n=45+42)
GenSharedKeyP256-12                         73.74µs ± 0%   70.92µs ± 0%  -3.82%  (p=0.000 n=48+47)
Run/10k/1-12                                 24.81s ± 0%    24.88s ± 0%  +0.30%  (p=0.000 n=48+47)
Run/10k/16-12                                4.621s ± 2%    4.625s ± 3%    ~     (p=0.776 n=50+50)
Dnrm2MediumPosInc-12                        4.018µs ± 0%   4.019µs ± 0%    ~     (p=0.060 n=50+48)
DasumMediumUnitaryInc-12                    855.3ns ± 0%   855.0ns ± 0%  -0.03%  (p=0.000 n=47+42)
Dgeev/Circulant10-12                        40.45µs ± 1%   41.11µs ± 1%  +1.62%  (p=0.000 n=45+49)
Dgeev/Circulant100-12                       10.42ms ± 2%   10.61ms ± 1%  +1.83%  (p=0.000 n=49+48)
MulWorkspaceDense1000Hundredth-12           64.69ms ± 1%   64.63ms ± 1%    ~     (p=0.718 n=48+48)
ScaleVec10000Inc20-12                       22.31µs ± 1%   22.29µs ± 1%    ~     (p=0.424 n=50+50)
ValidateVersionTildeFail-12                 737.6ns ± 1%   736.0ns ± 1%  -0.22%  (p=0.000 n=49+49)
StripHTML-12                                2.846µs ± 0%   2.806µs ± 1%  -1.40%  (p=0.000 n=43+50)
ReaderContains-12                           6.073µs ± 0%   5.999µs ± 0%  -1.22%  (p=0.000 n=48+48)
EncodeCodecFromInternalProtobuf-12          5.817µs ± 2%   5.555µs ± 2%  -4.51%  (p=0.000 n=47+47)
TarjanSCCGnp_10_tenth-12                    7.091µs ± 5%   7.132µs ± 7%    ~     (p=0.361 n=50+50)
TarjanSCCGnp_1000_half-12                   82.25ms ± 3%   81.29ms ± 2%  -1.16%  (p=0.000 n=50+43)
AStarUndirectedmallWorld_10_2_2_2_Heur-12   15.18µs ± 8%   15.11µs ± 7%    ~     (p=0.511 n=50+49)
LouvainDirectedMultiplex-12                 20.92ms ± 1%   21.00ms ± 1%  +0.36%  (p=0.000 n=48+49)
WalkAllBreadthFirstGnp_10_tenth-12          2.974µs ± 4%   2.964µs ± 5%    ~     (p=0.504 n=50+50)
WalkAllBreadthFirstGnp_1000_tenth-12        9.733ms ± 4%   9.741ms ± 4%    ~     (p=0.774 n=48+50)
TextMovementBetweenSegments-12              432.8µs ± 0%   433.2µs ± 1%    ~     (p=0.128 n=50+50)
Growth_MultiSegment-12                      13.11ms ± 0%   13.19ms ± 1%  +0.58%  (p=0.000 n=44+46)
AddingFields/Zap.Sugar-12                   1.296µs ± 1%   1.310µs ± 2%  +1.09%  (p=0.000 n=43+43)
AddingFields/apex/log-12                    34.19µs ± 1%   34.31µs ± 1%  +0.35%  (p=0.000 n=45+45)
AddingFields/inconshreveable/log15-12       30.08µs ± 2%   30.07µs ± 2%    ~     (p=0.803 n=48+47)
AddingFields/sirupsen/logrus-12             6.683µs ± 3%   6.735µs ± 3%  +0.78%  (p=0.000 n=43+42)

[Geo mean]                                  143.5µs        143.3µs       -0.16%

Change-Id: I637203c75c837737f1febced75d5985703e51044
Reviewed-on: https://go-review.googlesource.com/114655
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-08-20 14:17:20 +00:00
Keith Randall
f32348669c cmd/compile: rename memory-using operations
Some *mem ops are loads, some are stores, some are modifications.
Replace mem->load for the loads.
Replace mem->store for the stores.
Replace mem->modify for the load-modify-stores.

The only semantic change in this CL is to mark
ADD(Q|L)constmodify (which used to be ADD(Q|L)constmem) as
both a read and a write, instead of just a write. This is arguably
a bug fix, but the bug isn't triggerable at the moment, see CL 112157.

Change-Id: Iccb45aea817b606adb2d712ff99b10ee28e4616a
Reviewed-on: https://go-review.googlesource.com/112159
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-08 19:16:50 +00:00
Josh Bleecher Snyder
8bf4b7e673 cmd/compile: convert amd64 BSFL and BSRL from tuple to result only
Change-Id: I220a459f67ecb310b6e9a526a1ff55527d421e70
Reviewed-on: https://go-review.googlesource.com/109416
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-26 19:29:17 +00:00
Josh Bleecher Snyder
22115859a5 cmd/compile: add amd64 LEAL{1,2,4,8} ops
For future use in rewrite rules.

Change-Id: Ic9875beb0dea6e0bbcbd4b75d99a53f4a9a7c3fd
Reviewed-on: https://go-review.googlesource.com/101275
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-23 21:42:28 +00:00
Austin Clements
8871c930be cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.

This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.

The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.

For #24543.

Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-20 18:46:39 +00:00
David Chase
ab48574bab cmd/compile: use Block.Likely to put likely branch first
When a neither of a conditional block's successors follows,
the block must end with a conditional branch followed by a
an unconditional branch.  If the (conditional) branch is
"unlikely", invert it and swap successors to make it
likely instead.

This doesn't matter to most benchmarks on amd64, but in one
instance on amd64 it caused a 30% improvement, and it is
otherwise harmless.  The problematic loop is

		for i := 0; i < w; i++ {
			if pw[i] != 0 {
				return true
			}
		}

compiled under GOEXPERIMENT=preemptibleloops
This the very worst-case benchmark for that experiment.

Also in this CL is a commoning up of heavily-repeated
boilerplate, which made it much easier to see that the
changes were applied correctly.  In the future this should
allow un-exporting of SSAGenState.Branches once the
boilerplate-replacement is done everywhere.

Change-Id: I0e5ded6eeb3ab1e3e0138e12d54c7e056bd99335
Reviewed-on: https://go-review.googlesource.com/104977
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-11 17:35:34 +00:00
Giovanni Bajo
79112707bb cmd/compile: add patterns for bit set/clear/complement on amd64
This patch completes implementation of BT(Q|L), and adds support
for BT(S|R|C)(Q|L).

Example of code changes from time.(*Time).addSec:

        if t.wall&hasMonotonic != 0 {
  0x1073465               488b08                  MOVQ 0(AX), CX
  0x1073468               4889ca                  MOVQ CX, DX
  0x107346b               48c1e93f                SHRQ $0x3f, CX
  0x107346f               48c1e13f                SHLQ $0x3f, CX
  0x1073473               48f7c1ffffffff          TESTQ $-0x1, CX
  0x107347a               746b                    JE 0x10734e7

        if t.wall&hasMonotonic != 0 {
  0x1073435               488b08                  MOVQ 0(AX), CX
  0x1073438               480fbae13f              BTQ $0x3f, CX
  0x107343d               7363                    JAE 0x10734a2

Another example:

                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x10734c8               4881e1ffffff3f          ANDQ $0x3fffffff, CX
  0x10734cf               48c1e61e                SHLQ $0x1e, SI
  0x10734d3               4809ce                  ORQ CX, SI
  0x10734d6               48b90000000000000080    MOVQ $0x8000000000000000, CX
  0x10734e0               4809f1                  ORQ SI, CX
  0x10734e3               488908                  MOVQ CX, 0(AX)

                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x107348b		4881e2ffffff3f		ANDQ $0x3fffffff, DX
  0x1073492		48c1e61e		SHLQ $0x1e, SI
  0x1073496		4809f2			ORQ SI, DX
  0x1073499		480fbaea3f		BTSQ $0x3f, DX
  0x107349e		488910			MOVQ DX, 0(AX)

Go1 benchmarks seem unaffected, and I would be surprised
otherwise:

name                     old time/op    new time/op     delta
BinaryTree17-4              2.64s ± 4%      2.56s ± 9%  -2.92%  (p=0.008 n=9+9)
Fannkuch11-4                2.90s ± 1%      2.95s ± 3%  +1.76%  (p=0.010 n=10+9)
FmtFprintfEmpty-4          35.3ns ± 1%     34.5ns ± 2%  -2.34%  (p=0.004 n=9+8)
FmtFprintfString-4         57.0ns ± 1%     58.4ns ± 5%  +2.52%  (p=0.029 n=9+10)
FmtFprintfInt-4            59.8ns ± 3%     59.8ns ± 6%    ~     (p=0.565 n=10+10)
FmtFprintfIntInt-4         93.9ns ± 3%     91.2ns ± 5%  -2.94%  (p=0.014 n=10+9)
FmtFprintfPrefixedInt-4     107ns ± 6%      104ns ± 6%    ~     (p=0.099 n=10+10)
FmtFprintfFloat-4           187ns ± 3%      188ns ± 3%    ~     (p=0.505 n=10+9)
FmtManyArgs-4               410ns ± 1%      415ns ± 6%    ~     (p=0.649 n=8+10)
GobDecode-4                5.30ms ± 3%     5.27ms ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4                4.62ms ± 5%     4.47ms ± 2%  -3.24%  (p=0.001 n=9+10)
Gzip-4                      197ms ± 4%      193ms ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                   30.4ms ± 3%     30.1ms ± 3%    ~     (p=0.481 n=10+10)
HTTPClientServer-4         76.3µs ± 1%     76.0µs ± 1%    ~     (p=0.236 n=8+9)
JSONEncode-4               10.5ms ± 9%     10.3ms ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4               42.3ms ±10%     41.3ms ± 2%    ~     (p=0.053 n=9+10)
Mandelbrot200-4            3.80ms ± 2%     3.72ms ± 2%  -2.15%  (p=0.001 n=9+10)
GoParse-4                  2.88ms ±10%     2.81ms ± 2%    ~     (p=0.247 n=10+10)
RegexpMatchEasy0_32-4      69.5ns ± 4%     68.6ns ± 2%    ~     (p=0.171 n=10+10)
RegexpMatchEasy0_1K-4       165ns ± 3%      162ns ± 3%    ~     (p=0.137 n=10+10)
RegexpMatchEasy1_32-4      65.7ns ± 6%     64.4ns ± 2%  -2.02%  (p=0.037 n=10+10)
RegexpMatchEasy1_1K-4       278ns ± 2%      279ns ± 3%    ~     (p=0.991 n=8+9)
RegexpMatchMedium_32-4     99.3ns ± 3%     98.5ns ± 4%    ~     (p=0.457 n=10+9)
RegexpMatchMedium_1K-4     30.1µs ± 1%     30.4µs ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4       1.40µs ± 2%     1.41µs ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4       42.5µs ± 1%     41.5µs ± 3%  -2.13%  (p=0.002 n=8+9)
Revcomp-4                   332ms ± 4%      328ms ± 5%    ~     (p=0.720 n=9+10)
Template-4                 48.3ms ± 2%     49.6ms ± 3%  +2.56%  (p=0.002 n=8+10)
TimeParse-4                 252ns ± 2%      249ns ± 3%    ~     (p=0.116 n=9+10)
TimeFormat-4                262ns ± 4%      252ns ± 3%  -4.01%  (p=0.000 n=9+10)

name                     old speed      new speed       delta
GobDecode-4               145MB/s ± 3%    146MB/s ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4               166MB/s ± 5%    172MB/s ± 2%  +3.28%  (p=0.001 n=9+10)
Gzip-4                   98.6MB/s ± 4%  100.4MB/s ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                  639MB/s ± 3%    645MB/s ± 3%    ~     (p=0.481 n=10+10)
JSONEncode-4              185MB/s ± 8%    189MB/s ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4             46.0MB/s ± 9%   47.0MB/s ± 2%  +2.21%  (p=0.046 n=9+10)
GoParse-4                20.1MB/s ± 9%   20.6MB/s ± 2%    ~     (p=0.239 n=10+10)
RegexpMatchEasy0_32-4     460MB/s ± 4%    467MB/s ± 2%    ~     (p=0.165 n=10+10)
RegexpMatchEasy0_1K-4    6.19GB/s ± 3%   6.28GB/s ± 3%    ~     (p=0.165 n=10+10)
RegexpMatchEasy1_32-4     487MB/s ± 5%    497MB/s ± 2%  +2.00%  (p=0.043 n=10+10)
RegexpMatchEasy1_1K-4    3.67GB/s ± 2%   3.67GB/s ± 3%    ~     (p=0.963 n=8+9)
RegexpMatchMedium_32-4   10.1MB/s ± 3%   10.1MB/s ± 4%    ~     (p=0.435 n=10+9)
RegexpMatchMedium_1K-4   34.0MB/s ± 1%   33.7MB/s ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4     22.9MB/s ± 2%   22.7MB/s ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4     24.0MB/s ± 3%   24.7MB/s ± 3%  +2.64%  (p=0.001 n=9+9)
Revcomp-4                 766MB/s ± 4%    775MB/s ± 5%    ~     (p=0.720 n=9+10)
Template-4               40.2MB/s ± 2%   39.2MB/s ± 3%  -2.47%  (p=0.002 n=8+10)

The rules match ~1800 times during all.bash.

Fixes #18943

Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4
Reviewed-on: https://go-review.googlesource.com/94766
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-24 02:38:50 +00:00
Giovanni Bajo
a35ec9a59e cmd/compile: implement CMOV on amd64
This builds upon the branchelim pass, activating it for amd64 and
lowering CondSelect. Special care is made to FPU instructions for
NaN handling.

Benchmark results on Xeon E5630 (Westmere EP):

name                      old time/op    new time/op    delta
BinaryTree17-16              4.99s ± 9%     4.66s ± 2%     ~     (p=0.095 n=5+5)
Fannkuch11-16                4.93s ± 3%     5.04s ± 2%     ~     (p=0.548 n=5+5)
FmtFprintfEmpty-16          58.8ns ± 7%    61.4ns ±14%     ~     (p=0.579 n=5+5)
FmtFprintfString-16          114ns ± 2%     114ns ± 4%     ~     (p=0.603 n=5+5)
FmtFprintfInt-16             181ns ± 4%     125ns ± 3%  -30.90%  (p=0.008 n=5+5)
FmtFprintfIntInt-16          263ns ± 2%     217ns ± 2%  -17.34%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-16     230ns ± 1%     212ns ± 1%   -7.99%  (p=0.008 n=5+5)
FmtFprintfFloat-16           411ns ± 3%     344ns ± 5%  -16.43%  (p=0.008 n=5+5)
FmtManyArgs-16               828ns ± 4%     790ns ± 2%   -4.59%  (p=0.032 n=5+5)
GobDecode-16                10.9ms ± 4%    10.8ms ± 5%     ~     (p=0.548 n=5+5)
GobEncode-16                9.52ms ± 5%    9.46ms ± 2%     ~     (p=1.000 n=5+5)
Gzip-16                      334ms ± 2%     337ms ± 2%     ~     (p=0.548 n=5+5)
Gunzip-16                   64.4ms ± 1%    65.0ms ± 1%   +1.00%  (p=0.008 n=5+5)
HTTPClientServer-16          156µs ± 3%     155µs ± 3%     ~     (p=0.690 n=5+5)
JSONEncode-16               21.0ms ± 1%    21.8ms ± 0%   +3.76%  (p=0.016 n=5+4)
JSONDecode-16               95.1ms ± 0%    95.7ms ± 1%     ~     (p=0.151 n=5+5)
Mandelbrot200-16            6.38ms ± 1%    6.42ms ± 1%     ~     (p=0.095 n=5+5)
GoParse-16                  5.47ms ± 2%    5.36ms ± 1%   -1.95%  (p=0.016 n=5+5)
RegexpMatchEasy0_32-16       111ns ± 1%     111ns ± 1%     ~     (p=0.635 n=5+4)
RegexpMatchEasy0_1K-16       408ns ± 1%     411ns ± 2%     ~     (p=0.087 n=5+5)
RegexpMatchEasy1_32-16       103ns ± 1%     104ns ± 1%     ~     (p=0.484 n=5+5)
RegexpMatchEasy1_1K-16       659ns ± 2%     652ns ± 1%     ~     (p=0.571 n=5+5)
RegexpMatchMedium_32-16      176ns ± 2%     174ns ± 1%     ~     (p=0.476 n=5+5)
RegexpMatchMedium_1K-16     58.6µs ± 4%    57.7µs ± 4%     ~     (p=0.548 n=5+5)
RegexpMatchHard_32-16       3.07µs ± 3%    3.04µs ± 4%     ~     (p=0.421 n=5+5)
RegexpMatchHard_1K-16       89.2µs ± 1%    87.9µs ± 2%   -1.52%  (p=0.032 n=5+5)
Revcomp-16                   575ms ± 0%     587ms ± 2%   +2.12%  (p=0.032 n=4+5)
Template-16                  110ms ± 1%     107ms ± 3%   -3.00%  (p=0.032 n=5+5)
TimeParse-16                 463ns ± 0%     462ns ± 0%     ~     (p=0.810 n=5+4)
TimeFormat-16                538ns ± 0%     535ns ± 0%   -0.63%  (p=0.024 n=5+5)

name                      old speed      new speed      delta
GobDecode-16              70.7MB/s ± 4%  71.4MB/s ± 5%     ~     (p=0.452 n=5+5)
GobEncode-16              80.7MB/s ± 5%  81.2MB/s ± 2%     ~     (p=1.000 n=5+5)
Gzip-16                   58.2MB/s ± 2%  57.7MB/s ± 2%     ~     (p=0.452 n=5+5)
Gunzip-16                  302MB/s ± 1%   299MB/s ± 1%   -0.99%  (p=0.008 n=5+5)
JSONEncode-16             92.4MB/s ± 1%  89.1MB/s ± 0%   -3.63%  (p=0.016 n=5+4)
JSONDecode-16             20.4MB/s ± 0%  20.3MB/s ± 1%     ~     (p=0.135 n=5+5)
GoParse-16                10.6MB/s ± 2%  10.8MB/s ± 1%   +2.00%  (p=0.016 n=5+5)
RegexpMatchEasy0_32-16     286MB/s ± 1%   285MB/s ± 3%     ~     (p=1.000 n=5+5)
RegexpMatchEasy0_1K-16    2.51GB/s ± 1%  2.49GB/s ± 2%     ~     (p=0.095 n=5+5)
RegexpMatchEasy1_32-16     309MB/s ± 1%   307MB/s ± 1%     ~     (p=0.548 n=5+5)
RegexpMatchEasy1_1K-16    1.55GB/s ± 2%  1.57GB/s ± 1%     ~     (p=0.690 n=5+5)
RegexpMatchMedium_32-16   5.68MB/s ± 2%  5.73MB/s ± 1%     ~     (p=0.579 n=5+5)
RegexpMatchMedium_1K-16   17.5MB/s ± 4%  17.8MB/s ± 4%     ~     (p=0.500 n=5+5)
RegexpMatchHard_32-16     10.4MB/s ± 3%  10.5MB/s ± 4%     ~     (p=0.460 n=5+5)
RegexpMatchHard_1K-16     11.5MB/s ± 1%  11.7MB/s ± 2%   +1.57%  (p=0.032 n=5+5)
Revcomp-16                 442MB/s ± 0%   433MB/s ± 2%   -2.05%  (p=0.032 n=4+5)
Template-16               17.7MB/s ± 1%  18.2MB/s ± 3%   +3.12%  (p=0.032 n=5+5)

Change-Id: I6972e8f35f2b31f9a42ac473a6bf419a18022558
Reviewed-on: https://go-review.googlesource.com/100935
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-15 16:41:59 +00:00
Ilya Tocar
644d14ea0f Revert "cmd/compile: implement CMOV on amd64"
This reverts commit 080187f4f72bd6594e3c2efc35cf51bf61378552.

It broke build of golang.org/x/exp/shiny/iconvg
See issue 24395 for details

Change-Id: Ifd6134f6214e6cee40bd3c63c32941d5fc96ae8b
Reviewed-on: https://go-review.googlesource.com/100755
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-14 21:21:23 +00:00
Giovanni Bajo
eb44b8635c cmd/compile: remove BTQconst rule
This rule is meant for code optimization, but it makes other rules
potentially more complex, as they need to cope with the fact that
a 32-bit op (BTLconst) can appear everywhere a 64-bit rule maches.

Move the optimization to opcode expansion instead. Tests will be
added in following CL.

Change-Id: Ica5ef291e7963c4af17c124d4a2869e6c8f7b0c7
Reviewed-on: https://go-review.googlesource.com/99995
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-13 23:21:38 +00:00
isharipo
85a8d25d53 cmd/compile/internal/ssa: emit IMUL3{L/Q} for MUL{L/Q}const on x86
cmd/asm now supports three-operand form of IMUL,
so instead of using IMUL with resultInArg0, emit IMUL3 instruction.

This results in less redundant MOVs where SSA assigns
different registers to input[0] and dst arguments.

Note: these have exactly the same encoding when reg0=reg1:
      IMUL3x $const, reg0, reg1
      IMULx $const, reg
Two-operand IMULx is like a crippled IMUL3x, with dst fixed to input[0].
This is why we don't bother to generate IMULx for the case where
dst is the same as input[0].

Change-Id: I4becda475b3dffdd07b6fdf1c75bacc82af654e4
Reviewed-on: https://go-review.googlesource.com/99656
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-12 19:02:36 +00:00
Giovanni Bajo
080187f4f7 cmd/compile: implement CMOV on amd64
This builds upon the branchelim pass, activating it for amd64 and
lowering CondSelect. Special care is made to FPU instructions for
NaN handling.

Benchmark results on Xeon E5630 (Westmere EP):

name                      old time/op    new time/op    delta
BinaryTree17-16              4.99s ± 9%     4.66s ± 2%     ~     (p=0.095 n=5+5)
Fannkuch11-16                4.93s ± 3%     5.04s ± 2%     ~     (p=0.548 n=5+5)
FmtFprintfEmpty-16          58.8ns ± 7%    61.4ns ±14%     ~     (p=0.579 n=5+5)
FmtFprintfString-16          114ns ± 2%     114ns ± 4%     ~     (p=0.603 n=5+5)
FmtFprintfInt-16             181ns ± 4%     125ns ± 3%  -30.90%  (p=0.008 n=5+5)
FmtFprintfIntInt-16          263ns ± 2%     217ns ± 2%  -17.34%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-16     230ns ± 1%     212ns ± 1%   -7.99%  (p=0.008 n=5+5)
FmtFprintfFloat-16           411ns ± 3%     344ns ± 5%  -16.43%  (p=0.008 n=5+5)
FmtManyArgs-16               828ns ± 4%     790ns ± 2%   -4.59%  (p=0.032 n=5+5)
GobDecode-16                10.9ms ± 4%    10.8ms ± 5%     ~     (p=0.548 n=5+5)
GobEncode-16                9.52ms ± 5%    9.46ms ± 2%     ~     (p=1.000 n=5+5)
Gzip-16                      334ms ± 2%     337ms ± 2%     ~     (p=0.548 n=5+5)
Gunzip-16                   64.4ms ± 1%    65.0ms ± 1%   +1.00%  (p=0.008 n=5+5)
HTTPClientServer-16          156µs ± 3%     155µs ± 3%     ~     (p=0.690 n=5+5)
JSONEncode-16               21.0ms ± 1%    21.8ms ± 0%   +3.76%  (p=0.016 n=5+4)
JSONDecode-16               95.1ms ± 0%    95.7ms ± 1%     ~     (p=0.151 n=5+5)
Mandelbrot200-16            6.38ms ± 1%    6.42ms ± 1%     ~     (p=0.095 n=5+5)
GoParse-16                  5.47ms ± 2%    5.36ms ± 1%   -1.95%  (p=0.016 n=5+5)
RegexpMatchEasy0_32-16       111ns ± 1%     111ns ± 1%     ~     (p=0.635 n=5+4)
RegexpMatchEasy0_1K-16       408ns ± 1%     411ns ± 2%     ~     (p=0.087 n=5+5)
RegexpMatchEasy1_32-16       103ns ± 1%     104ns ± 1%     ~     (p=0.484 n=5+5)
RegexpMatchEasy1_1K-16       659ns ± 2%     652ns ± 1%     ~     (p=0.571 n=5+5)
RegexpMatchMedium_32-16      176ns ± 2%     174ns ± 1%     ~     (p=0.476 n=5+5)
RegexpMatchMedium_1K-16     58.6µs ± 4%    57.7µs ± 4%     ~     (p=0.548 n=5+5)
RegexpMatchHard_32-16       3.07µs ± 3%    3.04µs ± 4%     ~     (p=0.421 n=5+5)
RegexpMatchHard_1K-16       89.2µs ± 1%    87.9µs ± 2%   -1.52%  (p=0.032 n=5+5)
Revcomp-16                   575ms ± 0%     587ms ± 2%   +2.12%  (p=0.032 n=4+5)
Template-16                  110ms ± 1%     107ms ± 3%   -3.00%  (p=0.032 n=5+5)
TimeParse-16                 463ns ± 0%     462ns ± 0%     ~     (p=0.810 n=5+4)
TimeFormat-16                538ns ± 0%     535ns ± 0%   -0.63%  (p=0.024 n=5+5)

name                      old speed      new speed      delta
GobDecode-16              70.7MB/s ± 4%  71.4MB/s ± 5%     ~     (p=0.452 n=5+5)
GobEncode-16              80.7MB/s ± 5%  81.2MB/s ± 2%     ~     (p=1.000 n=5+5)
Gzip-16                   58.2MB/s ± 2%  57.7MB/s ± 2%     ~     (p=0.452 n=5+5)
Gunzip-16                  302MB/s ± 1%   299MB/s ± 1%   -0.99%  (p=0.008 n=5+5)
JSONEncode-16             92.4MB/s ± 1%  89.1MB/s ± 0%   -3.63%  (p=0.016 n=5+4)
JSONDecode-16             20.4MB/s ± 0%  20.3MB/s ± 1%     ~     (p=0.135 n=5+5)
GoParse-16                10.6MB/s ± 2%  10.8MB/s ± 1%   +2.00%  (p=0.016 n=5+5)
RegexpMatchEasy0_32-16     286MB/s ± 1%   285MB/s ± 3%     ~     (p=1.000 n=5+5)
RegexpMatchEasy0_1K-16    2.51GB/s ± 1%  2.49GB/s ± 2%     ~     (p=0.095 n=5+5)
RegexpMatchEasy1_32-16     309MB/s ± 1%   307MB/s ± 1%     ~     (p=0.548 n=5+5)
RegexpMatchEasy1_1K-16    1.55GB/s ± 2%  1.57GB/s ± 1%     ~     (p=0.690 n=5+5)
RegexpMatchMedium_32-16   5.68MB/s ± 2%  5.73MB/s ± 1%     ~     (p=0.579 n=5+5)
RegexpMatchMedium_1K-16   17.5MB/s ± 4%  17.8MB/s ± 4%     ~     (p=0.500 n=5+5)
RegexpMatchHard_32-16     10.4MB/s ± 3%  10.5MB/s ± 4%     ~     (p=0.460 n=5+5)
RegexpMatchHard_1K-16     11.5MB/s ± 1%  11.7MB/s ± 2%   +1.57%  (p=0.032 n=5+5)
Revcomp-16                 442MB/s ± 0%   433MB/s ± 2%   -2.05%  (p=0.032 n=4+5)
Template-16               17.7MB/s ± 1%  18.2MB/s ± 3%   +3.12%  (p=0.032 n=5+5)

Change-Id: Ic7cb7374d07da031e771bdcbfdd832fd1b17159c
Reviewed-on: https://go-review.googlesource.com/98695
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-03-12 18:01:33 +00:00
Keith Randall
4b00d3f4a2 cmd/compile: implement comparisons directly with memory
Allow the compiler to generate code like CMPQ 16(AX), $7

It's tricky because it's difficult to spill such a comparison during
flagalloc, because the same memory state might not be available at
the restore locations.

Solve this problem by decomposing the compare+load back into its parts
if it needs to be spilled.

The big win is that the write barrier test goes from:

MOVL	runtime.writeBarrier(SB), CX
TESTL	CX, CX
JNE	60

to

CMPL	runtime.writeBarrier(SB), $0
JNE	59

It's one instruction and one byte smaller.

Fixes #19485
Fixes #15245
Update #22460

Binaries are about 0.15% smaller.

Change-Id: I4fd8d1111b6b9924d52f9a0901ca1b2e5cce0836
Reviewed-on: https://go-review.googlesource.com/86035
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-02-26 23:49:44 +00:00
Ilya Tocar
f4d9c30901 cmd/compile/internal/amd64: use appropriate NEG for div
Currently we generate NEGQ for DIV{Q,L,W}. By generating NEGL and NEGW,
we will reduce code size, because NEGL doesn't require rex prefix.
This also guarantees that upper 32 bits are zeroed, so we can revert CL 85736,
and remove zero-extensions of DIVL results.
Also adds test for redundant zero extend elimination.

Fixes #23310

Change-Id: Ic58c3104c255a71371a06e09d10a975bbe5df587
Reviewed-on: https://go-review.googlesource.com/96815
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-26 20:09:21 +00:00
Ilya Tocar
de4edf3de7 cmd/compile/internal/amd64: update popcnt code generation
Popcnt has false dependency on output register and generates
MOVQ $0, reg to break it. But recently we switched MOVQ $0, reg
encoding from xor reg, reg  to actual mov $0, reg. This CL updates
code generation for popcnt to use actual XOR.

Change-Id: I4c1fc11e85758b53ba2679165fa55614ec54b27d
Reviewed-on: https://go-review.googlesource.com/82516
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-14 19:56:57 +00:00
Austin Clements
9b331189c1 cmd/internal/obj/x86: adjust SP correctly for tail calls
Currently, tail calls on x86 don't adjust the SP on return, so it's
important that the compiler produce a zero-sized frame and disable the
frame pointer. However, these constraints aren't necessary. For
example, on other architectures it's generally necessary to restore
the saved LR before a tail call, so obj simply makes this work.
Likewise, on x86, there's no reason we can't simply make this work.

Hence, this CL adjusts the compiler to use the same tail call
convention for x86 that we use on LR machines by producing a RET with
a target, rather than a JMP with a target. In fact, obj already
understands this convention for x86 except that it's buggy with
non-zero frame sizes. So we also fix this bug obj. As a result of
these fixes, the compiler no longer needs to mark wrappers as
NoFramePointer since it's now perfectly fine to save the frame
pointer.

In fact, this eliminates the only use of NoFramePointer in the
compiler, which will enable further cleanups.

This also fixes what is very nearly, but not quite, a code generation
bug. NoFramePointer becomes obj.NOFRAME in the object file, which on
ppc64 and s390x means to omit the saved LR. Hence, on these
architectures, NoFramePointer (and NOFRAME) is only safe to set on
leaf functions. However, on *most* architectures, wrappers aren't
necessarily leaf functions because they may call DUFFZERO. We're saved
on ppc64 and s390x only because the compiler doesn't have the rules to
produce DUFFZERO calls on these architectures. Hence, this only works
because the set of LR architectures that implement NOFRAME is disjoint
from the set where the compiler produces DUFFZERO operations. (I
discovered this whole mess when I attempted to add NOFRAME support to
arm.)

Change-Id: Icc589aeb86beacb850d0a6a80bd3024974a33947
Reviewed-on: https://go-review.googlesource.com/92035
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-12 21:41:19 +00:00
Ilya Tocar
2f1f607b21 cmd/compile: intrinsify math.RoundToEven on amd64
We already do this for floor/ceil, but RoundToEven was added later.
Intrinsify it also.

name           old time/op  new time/op  delta
RoundToEven-8  3.00ns ± 1%  0.68ns ± 2%  -77.34%  (p=0.000 n=10+10)

Change-Id: Ib158cbceb436c6725b2d9353a526c5c4be19bcad
Reviewed-on: https://go-review.googlesource.com/74852
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-11-02 17:33:52 +00:00
Ilya Tocar
94484d8ed5 cmd/compile: intrinsify math.{Trunc/Ceil/Floor} on amd64
This significantly speed-ups Trunc.
Ceil/Floor are using the same instruction, so do them too.

name     old time/op  new time/op  delta
Floor-6  3.33ns ± 1%  3.22ns ± 0%   -3.39%  (p=0.000 n=10+10)
Ceil-6   3.33ns ± 1%  3.22ns ± 0%   -3.16%  (p=0.000 n=10+7)
Trunc-6  4.83ns ± 0%  3.22ns ± 0%  -33.36%  (p=0.000 n=6+8)

Change-Id: If848790e458eedfe38a6a0407bb4f589c68ac254
Reviewed-on: https://go-review.googlesource.com/68630
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-31 19:30:54 +00:00
Austin Clements
7e343134d3 cmd/compile: compiler support for buffered write barrier
This CL implements the compiler support for calling the buffered write
barrier added by the previous CL.

Since the buffered write barrier is only implemented on amd64 right
now, this still supports the old, eager write barrier as well. There's
little overhead to supporting both and this way a few tests in
test/fixedbugs that expect to have liveness maps at write barrier
calls can easily opt-in to the old, eager barrier.

This significantly improves the performance of the write barrier:

name             old time/op  new time/op  delta
WriteBarrier-12  73.5ns ±20%  19.2ns ±27%  -73.90%  (p=0.000 n=19+18)

It also reduces the size of binaries because the write barrier call is
more compact:

name        old object-bytes  new object-bytes  delta
Template           398k ± 0%         393k ± 0%  -1.14%  (p=0.008 n=5+5)
Unicode            208k ± 0%         206k ± 0%  -1.00%  (p=0.008 n=5+5)
GoTypes           1.18M ± 0%        1.15M ± 0%  -2.00%  (p=0.008 n=5+5)
Compiler          4.05M ± 0%        3.88M ± 0%  -4.26%  (p=0.008 n=5+5)
SSA               8.25M ± 0%        8.11M ± 0%  -1.59%  (p=0.008 n=5+5)
Flate              228k ± 0%         224k ± 0%  -1.83%  (p=0.008 n=5+5)
GoParser           295k ± 0%         284k ± 0%  -3.62%  (p=0.008 n=5+5)
Reflect           1.00M ± 0%        0.99M ± 0%  -0.70%  (p=0.008 n=5+5)
Tar                339k ± 0%         333k ± 0%  -1.67%  (p=0.008 n=5+5)
XML                404k ± 0%         395k ± 0%  -2.10%  (p=0.008 n=5+5)
[Geo mean]         704k              690k       -2.00%

name        old exe-bytes     new exe-bytes     delta
HelloSize         1.05M ± 0%        1.04M ± 0%  -1.55%  (p=0.008 n=5+5)

https://perf.golang.org/search?q=upload:20171027.1

(Amusingly, this also reduces compiler allocations by 0.75%, which,
combined with the better write barrier, speeds up the compiler overall
by 2.10%. See the perf link.)

It slightly improves the performance of most of the go1 benchmarks and
improves the performance of the x/benchmarks:

name                      old time/op    new time/op    delta
BinaryTree17-12              2.40s ± 1%     2.47s ± 1%  +2.69%  (p=0.000 n=19+19)
Fannkuch11-12                2.95s ± 0%     2.95s ± 0%  +0.21%  (p=0.000 n=20+19)
FmtFprintfEmpty-12          41.8ns ± 4%    41.4ns ± 2%  -1.03%  (p=0.014 n=20+20)
FmtFprintfString-12         68.7ns ± 2%    67.5ns ± 1%  -1.75%  (p=0.000 n=20+17)
FmtFprintfInt-12            79.0ns ± 3%    77.1ns ± 1%  -2.40%  (p=0.000 n=19+17)
FmtFprintfIntInt-12          127ns ± 1%     123ns ± 3%  -3.42%  (p=0.000 n=20+20)
FmtFprintfPrefixedInt-12     152ns ± 1%     150ns ± 1%  -1.02%  (p=0.000 n=18+17)
FmtFprintfFloat-12           211ns ± 1%     209ns ± 0%  -0.99%  (p=0.000 n=20+16)
FmtManyArgs-12               500ns ± 0%     496ns ± 0%  -0.73%  (p=0.000 n=17+20)
GobDecode-12                6.44ms ± 1%    6.53ms ± 0%  +1.28%  (p=0.000 n=20+19)
GobEncode-12                5.46ms ± 0%    5.46ms ± 1%    ~     (p=0.550 n=19+20)
Gzip-12                      220ms ± 1%     216ms ± 0%  -1.75%  (p=0.000 n=19+19)
Gunzip-12                   38.8ms ± 0%    38.6ms ± 0%  -0.30%  (p=0.000 n=18+19)
HTTPClientServer-12         79.0µs ± 1%    78.2µs ± 1%  -1.01%  (p=0.000 n=20+20)
JSONEncode-12               11.9ms ± 0%    11.9ms ± 0%  -0.29%  (p=0.000 n=20+19)
JSONDecode-12               52.6ms ± 0%    52.2ms ± 0%  -0.68%  (p=0.000 n=19+20)
Mandelbrot200-12            3.69ms ± 0%    3.68ms ± 0%  -0.36%  (p=0.000 n=20+20)
GoParse-12                  3.13ms ± 1%    3.18ms ± 1%  +1.67%  (p=0.000 n=19+20)
RegexpMatchEasy0_32-12      73.2ns ± 1%    72.3ns ± 1%  -1.19%  (p=0.000 n=19+18)
RegexpMatchEasy0_1K-12       241ns ± 0%     239ns ± 0%  -0.83%  (p=0.000 n=17+16)
RegexpMatchEasy1_32-12      68.6ns ± 1%    69.0ns ± 1%  +0.47%  (p=0.015 n=18+16)
RegexpMatchEasy1_1K-12       364ns ± 0%     361ns ± 0%  -0.67%  (p=0.000 n=16+17)
RegexpMatchMedium_32-12      104ns ± 1%     103ns ± 1%  -0.79%  (p=0.001 n=20+15)
RegexpMatchMedium_1K-12     33.8µs ± 3%    34.0µs ± 2%    ~     (p=0.267 n=20+19)
RegexpMatchHard_32-12       1.64µs ± 1%    1.62µs ± 2%  -1.25%  (p=0.000 n=19+18)
RegexpMatchHard_1K-12       49.2µs ± 0%    48.7µs ± 1%  -0.93%  (p=0.000 n=19+18)
Revcomp-12                   391ms ± 5%     396ms ± 7%    ~     (p=0.154 n=19+19)
Template-12                 63.1ms ± 0%    59.5ms ± 0%  -5.76%  (p=0.000 n=18+19)
TimeParse-12                 307ns ± 0%     306ns ± 0%  -0.39%  (p=0.000 n=19+17)
TimeFormat-12                325ns ± 0%     323ns ± 0%  -0.50%  (p=0.000 n=19+19)
[Geo mean]                  47.3µs         46.9µs       -0.67%

https://perf.golang.org/search?q=upload:20171026.1

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.25ms ± 1%  2.20ms ± 1%  -2.31%  (p=0.000 n=18+18)
HTTP-12                    12.6µs ± 0%  12.6µs ± 0%  -0.72%  (p=0.000 n=18+17)
JSON-12                    11.0ms ± 0%  11.0ms ± 1%  -0.68%  (p=0.000 n=17+19)

https://perf.golang.org/search?q=upload:20171026.2

Updates #14951.
Updates #22460.

Change-Id: Id4c0932890a1d41020071bec73b8522b1367d3e7
Reviewed-on: https://go-review.googlesource.com/73712
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-30 18:12:46 +00:00
Matthew Dempsky
a5868a47c6 cmd/internal/obj/x86: move MOV->XOR rewriting into compiler
Fixes #20986.

Change-Id: Ic3cf5c0ab260f259ecff7b92cfdf5f4ae432aef3
Reviewed-on: https://go-review.googlesource.com/73072
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-10-24 21:32:17 +00:00
Cherry Zhang
6f3e5e637c cmd/compile: intrinsify runtime.getcallersp
Add a compiler intrinsic for getcallersp. So we are able to get
rid of the argument (not done in this CL).

Change-Id: Ic38fda1c694f918328659ab44654198fb116668d
Reviewed-on: https://go-review.googlesource.com/69350
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
2017-10-10 15:15:21 +00:00
Ilya Tocar
6b8a3c8889 cmd/compile/internal/amd64: add SETccmem
Combine setcc and store of result into setcc that writes directly to memory.
Triggers 200+ times in go tool.

Fixes #21630

Change-Id: Iafa22607426f4120140c88fae4b9aecb46e0bba8
Reviewed-on: https://go-review.googlesource.com/67950
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-05 20:53:28 +00:00
David Chase
6cac100eef cmd/compile: add intrinsic for reading caller's pc
First step towards removing the mandatory argument for
getcallerpc, which solves certain problems for the runtime.
This might also slightly improve performance.

Intrinsic enabled on 386, amd64, amd64p32,
runtime asm implementation removed on those architectures.

Now-superfluous argument remains in getcallerpc signature
(for a future CL; non-386/amd64 asm funcs ignore it).

Added getcallerpc to the "not a real function" test
in dcl.go, that story is a little odd with respect to
unexported functions but that is not this CL.

Fixes #17327.

Change-Id: I5df1ad91f27ee9ac1f0dd88fa48f1329d6306c3e
Reviewed-on: https://go-review.googlesource.com/31851
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-09-22 18:37:03 +00:00
Ilya Tocar
a4e1a72f0a cmd/compile/internal/amd64: add MOVLloadidx8 and MOVLstoreidx8
Currently we only use 1 and 4 as a scale for indexed 4-byte load.
In code generated in #20711 we can use indexed load with scale=8,
to improve performance:

name  old time/op  new time/op  delta
GM-6   108µs ± 0%    95µs ± 0%  -12.06%  (p=0.000 n=10+10)

So add new ops and combine loadidx1(shift 3..).. into loadidx8,
same for stores.

Change-Id: I5ed1c250ac40960e20606580cf9de221e75b72f1
Reviewed-on: https://go-review.googlesource.com/46134
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-09-01 21:25:25 +00:00
Daniel Martí
99da8730b0 all: remove some double spaces from comments
Went mainly for the ones that make no sense, such as the ones
mid-sentence or after commas.

Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0
Reviewed-on: https://go-review.googlesource.com/57293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-26 15:09:09 +00:00
Josh Bleecher Snyder
a45d6859cb cmd/compile: enforce that MOVXconvert is a no-op on 386 and amd64
Follow-up to CL 58371.

Change-Id: I3d2aaec84ee6db3ef1bd4fcfcaf46cc297c7176b
Reviewed-on: https://go-review.googlesource.com/58610
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-25 19:39:46 +00:00
Keith Randall
fb05948d9e cmd/compile,math: improve code generation for math.Abs
Implement int reg <-> fp reg moves on amd64.
If we see a load to int reg followed by an int->fp move, then we can just
load to the fp reg instead.  Same for stores.

math.Abs is now:

MOVQ	"".x+8(SP), AX
SHLQ	$1, AX
SHRQ	$1, AX
MOVQ	AX, "".~r1+16(SP)

math.Copysign is now:

MOVQ	"".x+8(SP), AX
SHLQ	$1, AX
SHRQ	$1, AX
MOVQ	"".y+16(SP), CX
SHRQ	$63, CX
SHLQ	$63, CX
ORQ	CX, AX
MOVQ	AX, "".~r2+24(SP)

math.Float64bits is now:

MOVSD	"".x+8(SP), X0
MOVSD	X0, "".~r1+16(SP)
(it would be nicer to use a non-SSE reg for this, nothing is perfect)

And due to the fix for #21440, the inlined version of these improve as well.

name      old time/op  new time/op  delta
Abs       1.38ns ± 5%  0.89ns ±10%  -35.54%  (p=0.000 n=10+10)
Copysign  1.56ns ± 7%  1.35ns ± 6%  -13.77%  (p=0.000 n=9+10)

Fixes #13095

Change-Id: Ibd7f2792412a6668608780b0688a77062e1f1499
Reviewed-on: https://go-review.googlesource.com/58732
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-08-25 19:15:01 +00:00
Ilya Tocar
ac29f4d01c cmd/compile/internal/amd64: add ADD[Q|L]constmem
We can add a constant to loaction in memory with 1 instruction,
as opposed to load+add+store, so add a new op and relevent ssa rules.
Triggers in e. g. encoding/json isValidNumber:
NumberIsValid-6          36.4ns ± 0%    35.2ns ± 1%  -3.32%  (p=0.000 n=6+10)
Shaves ~2.5 kb from go tool.

Change-Id: I7ba576676c2522432360f77b290cecb9574a93c3
Reviewed-on: https://go-review.googlesource.com/54431
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-18 18:55:44 +00:00
Ilya Tocar
df70982825 cmd/compile/internal/ssa: use sse to zero on amd64
Use 16-byte stores instead of 8-byte stores to zero small blocks.
Also switch to duffzero for 65+ bytes only, because for each
duffzero call we also save/restore BP, so call requires 4 instructions
and replacing it with 4 sse stores doesn't cause code-bloat.
Also switch duffzero to use leaq, instead of addq to avoid clobbering flags.

ClearFat8-6     0.54ns ± 0%  0.54ns ± 0%     ~     (all equal)
ClearFat12-6    1.07ns ± 0%  1.07ns ± 0%     ~     (all equal)
ClearFat16-6    1.07ns ± 0%  0.69ns ± 0%  -35.51%  (p=0.001 n=8+9)
ClearFat24-6    1.61ns ± 1%  1.07ns ± 0%  -33.33%  (p=0.000 n=10+10)
ClearFat32-6    2.14ns ± 0%  1.07ns ± 0%  -50.00%  (p=0.001 n=8+9)
ClearFat40-6    2.67ns ± 1%  1.61ns ± 0%  -39.72%  (p=0.000 n=10+8)
ClearFat48-6    3.75ns ± 0%  2.68ns ± 0%  -28.59%  (p=0.000 n=9+9)
ClearFat56-6    4.29ns ± 0%  3.22ns ± 0%  -25.10%  (p=0.000 n=9+9)
ClearFat64-6    4.30ns ± 0%  3.22ns ± 0%  -25.15%  (p=0.000 n=8+8)
ClearFat128-6   7.50ns ± 1%  7.51ns ± 0%     ~     (p=0.767 n=10+9)
ClearFat256-6   13.9ns ± 1%  13.9ns ± 1%     ~     (p=0.257 n=10+10)
ClearFat512-6   26.8ns ± 0%  26.8ns ± 0%     ~     (p=0.467 n=8+8)
ClearFat1024-6  52.5ns ± 0%  52.5ns ± 0%     ~     (p=1.000 n=8+8)

Also shaves ~20kb from go tool:

go_old 10384994
go_new 10364514 [-20480 bytes]

section differences
global text (code) = -20585 bytes (-0.532047%)
read-only data = -302 bytes (-0.018101%)
Total difference -20887 bytes (-0.348731%)

Change-Id: I15854e87544545c1af24775df895e38e16e12694
Reviewed-on: https://go-review.googlesource.com/54410
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-16 15:52:27 +00:00
Josh Bleecher Snyder
46b88c9fbc cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.

In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.

Though this is a big CL, most of the changes are
mechanical and uninteresting.

Interesting bits:

* Add new singleton globals to package types for the special
  SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
  and TTUPLE, for SSA tuple types.
  ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
  to package types.
* We had picked the name "types" in our rules for the handy
  list of types provided by ssa.Config. That conflicted with
  the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
  types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
  and probably also some other duplicated Type methods
  designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
  and they were not particularly careful about types in general.
  Of necessity, this CL switches them to use *types.Type;
  it does not make them more type-accurate.
  Unfortunately, using types.Type means initializing a bit
  of the types universe.
  This is prime for refactoring and improvement.

This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.

name        old alloc/op      new alloc/op      delta
Template         37.9MB ± 0%       37.7MB ± 0%  -0.57%  (p=0.000 n=10+8)
Unicode          28.9MB ± 0%       28.7MB ± 0%  -0.52%  (p=0.000 n=10+10)
GoTypes           110MB ± 0%        109MB ± 0%  -0.88%  (p=0.000 n=10+10)
Flate            24.7MB ± 0%       24.6MB ± 0%  -0.66%  (p=0.000 n=10+10)
GoParser         31.1MB ± 0%       30.9MB ± 0%  -0.61%  (p=0.000 n=10+9)
Reflect          73.9MB ± 0%       73.4MB ± 0%  -0.62%  (p=0.000 n=10+8)
Tar              25.8MB ± 0%       25.6MB ± 0%  -0.77%  (p=0.000 n=9+10)
XML              41.2MB ± 0%       40.9MB ± 0%  -0.80%  (p=0.000 n=10+10)
[Geo mean]       40.5MB            40.3MB       -0.68%

name        old allocs/op     new allocs/op     delta
Template           385k ± 0%         386k ± 0%    ~     (p=0.356 n=10+9)
Unicode            343k ± 1%         344k ± 0%    ~     (p=0.481 n=10+10)
GoTypes           1.16M ± 0%        1.16M ± 0%  -0.16%  (p=0.004 n=10+10)
Flate              238k ± 1%         238k ± 1%    ~     (p=0.853 n=10+10)
GoParser           320k ± 0%         320k ± 0%    ~     (p=0.720 n=10+9)
Reflect            957k ± 0%         957k ± 0%    ~     (p=0.460 n=10+8)
Tar                252k ± 0%         252k ± 0%    ~     (p=0.133 n=9+10)
XML                400k ± 0%         400k ± 0%    ~     (p=0.796 n=10+10)
[Geo mean]         428k              428k       -0.01%


Removing all the interface calls helps non-trivially with CPU, though.

name        old time/op       new time/op       delta
Template          178ms ± 4%        173ms ± 3%  -2.90%  (p=0.000 n=94+96)
Unicode          85.0ms ± 4%       83.9ms ± 4%  -1.23%  (p=0.000 n=96+96)
GoTypes           543ms ± 3%        528ms ± 3%  -2.73%  (p=0.000 n=98+96)
Flate             116ms ± 3%        113ms ± 4%  -2.34%  (p=0.000 n=96+99)
GoParser          144ms ± 3%        140ms ± 4%  -2.80%  (p=0.000 n=99+97)
Reflect           344ms ± 3%        334ms ± 4%  -3.02%  (p=0.000 n=100+99)
Tar               106ms ± 5%        103ms ± 4%  -3.30%  (p=0.000 n=98+94)
XML               198ms ± 5%        192ms ± 4%  -2.88%  (p=0.000 n=92+95)
[Geo mean]        178ms             173ms       -2.65%

name        old user-time/op  new user-time/op  delta
Template          229ms ± 5%        224ms ± 5%  -2.36%  (p=0.000 n=95+99)
Unicode           107ms ± 6%        106ms ± 5%  -1.13%  (p=0.001 n=93+95)
GoTypes           696ms ± 4%        679ms ± 4%  -2.45%  (p=0.000 n=97+99)
Flate             137ms ± 4%        134ms ± 5%  -2.66%  (p=0.000 n=99+96)
GoParser          176ms ± 5%        172ms ± 8%  -2.27%  (p=0.000 n=98+100)
Reflect           430ms ± 6%        411ms ± 5%  -4.46%  (p=0.000 n=100+92)
Tar               128ms ±13%        123ms ±13%  -4.21%  (p=0.000 n=100+100)
XML               239ms ± 6%        233ms ± 6%  -2.50%  (p=0.000 n=95+97)
[Geo mean]        220ms             213ms       -2.76%


Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-09 23:01:51 +00:00
Martin Möhrmann
f9bec9eb42 cmd/compile: use MOVL instead of MOVQ for small constants on amd64
The encoding of MOVL to a register is 2 bytes shorter than for MOVQ.
The upper 32bit are automatically zeroed when MOVL to a register is used.

Replaces 1657 MOVQ by MOVL in the go binary.
Reduces go binary size by 4 kilobyte.

name                   old time/op    new time/op    delta
BinaryTree17              1.93s ± 0%     1.93s ± 0%  -0.32%  (p=0.000 n=9+9)
Fannkuch11                2.66s ± 0%     2.48s ± 0%  -6.60%  (p=0.000 n=9+9)
FmtFprintfEmpty          31.8ns ± 0%    31.6ns ± 0%  -0.63%  (p=0.000 n=10+10)
FmtFprintfString         52.0ns ± 0%    51.9ns ± 0%  -0.19%  (p=0.000 n=10+10)
FmtFprintfInt            55.6ns ± 0%    54.6ns ± 0%  -1.80%  (p=0.002 n=8+10)
FmtFprintfIntInt         87.7ns ± 0%    84.8ns ± 0%  -3.31%  (p=0.000 n=9+9)
FmtFprintfPrefixedInt    98.9ns ± 0%   102.0ns ± 0%  +3.10%  (p=0.000 n=10+10)
FmtFprintfFloat           165ns ± 0%     164ns ± 0%  -0.61%  (p=0.000 n=10+10)
FmtManyArgs               368ns ± 0%     361ns ± 0%  -1.98%  (p=0.000 n=8+10)
GobDecode                4.53ms ± 0%    4.58ms ± 0%  +1.08%  (p=0.000 n=9+10)
GobEncode                3.74ms ± 0%    3.73ms ± 0%  -0.27%  (p=0.000 n=10+10)
Gzip                      164ms ± 0%     163ms ± 0%  -0.48%  (p=0.000 n=10+10)
Gunzip                   26.7ms ± 0%    26.6ms ± 0%  -0.13%  (p=0.000 n=9+10)
HTTPClientServer         30.4µs ± 1%    30.3µs ± 1%  -0.41%  (p=0.016 n=10+10)
JSONEncode               10.9ms ± 0%    11.0ms ± 0%  +0.70%  (p=0.000 n=10+10)
JSONDecode               36.8ms ± 0%    37.0ms ± 0%  +0.59%  (p=0.000 n=9+10)
Mandelbrot200            3.20ms ± 0%    3.21ms ± 0%  +0.44%  (p=0.000 n=9+10)
GoParse                  2.35ms ± 0%    2.35ms ± 0%  +0.26%  (p=0.000 n=10+9)
RegexpMatchEasy0_32      58.3ns ± 0%    58.4ns ± 0%  +0.17%  (p=0.000 n=10+10)
RegexpMatchEasy0_1K       138ns ± 0%     142ns ± 0%  +2.68%  (p=0.000 n=10+10)
RegexpMatchEasy1_32      55.1ns ± 0%    55.6ns ± 1%    ~     (p=0.104 n=10+10)
RegexpMatchEasy1_1K       242ns ± 0%     243ns ± 0%  +0.41%  (p=0.000 n=10+10)
RegexpMatchMedium_32     87.4ns ± 0%    89.9ns ± 0%  +2.86%  (p=0.000 n=10+10)
RegexpMatchMedium_1K     27.4µs ± 0%    27.4µs ± 0%  +0.15%  (p=0.000 n=10+10)
RegexpMatchHard_32       1.30µs ± 0%    1.32µs ± 1%  +1.91%  (p=0.000 n=10+10)
RegexpMatchHard_1K       39.0µs ± 0%    39.5µs ± 0%  +1.38%  (p=0.000 n=10+10)
Revcomp                   316ms ± 0%     319ms ± 0%  +1.13%  (p=0.000 n=9+8)
Template                 40.6ms ± 0%    40.6ms ± 0%    ~     (p=0.123 n=10+10)
TimeParse                 224ns ± 0%     224ns ± 0%    ~     (all equal)
TimeFormat                230ns ± 0%     225ns ± 0%  -2.17%  (p=0.000 n=10+10)

Change-Id: I32a099b65f9e6d4ad7288ed48546655c534757d8
Reviewed-on: https://go-review.googlesource.com/38630
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-01 20:59:58 +00:00
Josh Bleecher Snyder
dae5389d3d Revert "cmd/compile: add Type.MustSize and Type.MustAlignment"
This reverts commit 94d540a4b6bf68ec472bf4469037955e3133fcf7.

Reason for revert: prefer something along the lines of CL 42018.

Change-Id: I876fe32e98f37d8d725fe55e0fd0ea429c0198e0
Reviewed-on: https://go-review.googlesource.com/42022
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-28 01:24:13 +00:00
Josh Bleecher Snyder
94d540a4b6 cmd/compile: add Type.MustSize and Type.MustAlignment
Type.Size and Type.Alignment are for the front end:
They calculate size and alignment if needed.

Type.MustSize and Type.MustAlignment are for the back end:
They call Fatal if size and alignment are not already calculated.

Most uses are of MustSize and MustAlignment,
but that's because the back end is newer,
and this API was added to support it.

This CL was mostly generated with sed and selective reversion.
The only mildly interesting bit is the change of the ssa.Type interface
and the supporting ssa dummy types.

Follow-up to review feedback on CL 41970.

Passes toolstash-check.

Change-Id: I0d9b9505e57453dae8fb6a236a07a7a02abd459e
Reviewed-on: https://go-review.googlesource.com/42016
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-27 22:57:57 +00:00
Keith Randall
1e72bf6218 cmd/compile: experiment which clobbers all dead pointer fields
The experiment "clobberdead" clobbers all pointer fields that the
compiler thinks are dead, just before and after every safepoint.
Useful for debugging the generation of live pointer bitmaps.

Helped find the following issues:
Update #15936
Update #16026
Update #16095
Update #18860

Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9
Reviewed-on: https://go-review.googlesource.com/23924
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-21 20:19:50 +00:00
Keith Randall
7e07e635f3 cmd/compile: implement non-constant rotates
Makes math/bits.Rotate{Left,Right} fast on amd64.

name              old time/op  new time/op  delta
RotateLeft-12     7.42ns ± 6%  5.45ns ± 6%  -26.54%   (p=0.000 n=9+10)
RotateLeft8-12    4.77ns ± 5%  3.42ns ± 7%  -28.25%   (p=0.000 n=8+10)
RotateLeft16-12   4.82ns ± 8%  3.40ns ± 7%  -29.36%  (p=0.000 n=10+10)
RotateLeft32-12   4.87ns ± 7%  3.48ns ± 7%  -28.51%    (p=0.000 n=8+9)
RotateLeft64-12   5.23ns ±10%  3.35ns ± 6%  -35.97%   (p=0.000 n=9+10)
RotateRight-12    7.59ns ± 8%  5.71ns ± 1%  -24.72%   (p=0.000 n=10+8)
RotateRight8-12   4.98ns ± 7%  3.36ns ± 9%  -32.55%  (p=0.000 n=10+10)
RotateRight16-12  5.12ns ± 2%  3.45ns ± 5%  -32.62%  (p=0.000 n=10+10)
RotateRight32-12  4.80ns ± 6%  3.42ns ±16%  -28.68%  (p=0.000 n=10+10)
RotateRight64-12  4.78ns ± 6%  3.42ns ± 6%  -28.50%  (p=0.000 n=10+10)

Update #18940

Change-Id: Ie79fb5581c489ed4d3b859314c5e669a134c119b
Reviewed-on: https://go-review.googlesource.com/39711
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-17 23:19:45 +00:00
Keith Randall
5cadc91b3c cmd/compile: intrinsics for math/bits.OnesCount
Popcount instructions on amd64 are not guaranteed to be
present, so we must guard their call.  Rewrite rules can't
generate control flow at the moment, so the intrinsifier
needs to generate that code.

name           old time/op  new time/op  delta
OnesCount-8    2.47ns ± 5%  1.04ns ± 2%  -57.70%  (p=0.000 n=10+10)
OnesCount16-8  1.05ns ± 1%  0.78ns ± 0%  -25.56%    (p=0.000 n=9+8)
OnesCount32-8  1.63ns ± 5%  1.04ns ± 2%  -35.96%  (p=0.000 n=10+10)
OnesCount64-8  2.45ns ± 0%  1.04ns ± 1%  -57.55%   (p=0.000 n=6+10)

Update #18616

Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673
Reviewed-on: https://go-review.googlesource.com/38320
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-04-04 02:40:11 +00:00
Keith Randall
214be5b302 cmd/compile: remove likely bits from generated assembly
We don't need them any more since #15837 was fixed.

Fixes #19718

Change-Id: I13e46c62b321b2c9265f44c977b63bfb23163ca2
Reviewed-on: https://go-review.googlesource.com/38664
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-26 04:40:20 +00:00
Matthew Dempsky
22ea7fc1a9 cmd/compile/internal/gc: make SSAGenFPJump a method of SSAGenState
Change-Id: Ie22a08c93dfcfd4b336e7b158415448dd55b2c11
Reviewed-on: https://go-review.googlesource.com/38407
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-22 17:46:08 +00:00
Josh Bleecher Snyder
0a94daa378 cmd/compile: funnel SSA Prog creation through SSAGenState
Step one in eliminating Prog-related globals.

Passes toolstash-check -all.

Updates #15756

Change-Id: I3b777fb5a7716f2d9da3067fbd94c28ca894a465
Reviewed-on: https://go-review.googlesource.com/38450
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-22 17:18:40 +00:00
Matthew Dempsky
3e2f980e27 cmd/compile: eliminate direct uses of gc.Thearch in backends
This CL changes the GOARCH.Init functions to take gc.Thearch as a
parameter, which gc.Main supplies.

Additionally, the x86 backend is refactored to decide within Init
whether to use the 387 or SSE2 instruction generators, rather than for
each individual SSA Value/Block.

Passes toolstash-check -all.

Change-Id: Ie6305a6cd6f6ab4e89ecbb3cbbaf5ffd57057a24
Reviewed-on: https://go-review.googlesource.com/38301
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-17 22:10:53 +00:00
Keith Randall
495b167919 cmd/compile: intrinsics for math/bits.{Len,LeadingZeros}
name              old time/op  new time/op  delta
LeadingZeros-4    2.00ns ± 0%  1.34ns ± 1%  -33.02%  (p=0.000 n=8+10)
LeadingZeros16-4  1.62ns ± 0%  1.57ns ± 0%   -3.09%  (p=0.001 n=8+9)
LeadingZeros32-4  2.14ns ± 0%  1.48ns ± 0%  -30.84%  (p=0.002 n=8+10)
LeadingZeros64-4  2.06ns ± 1%  1.33ns ± 0%  -35.08%  (p=0.000 n=8+8)

8-bit args is a special case - the Go code is really fast because
it is just a single table lookup.  So I've disabled that for now.
Intrinsics were actually slower:
LeadingZeros8-4   1.22ns ± 3%  1.58ns ± 1%  +29.56%  (p=0.000 n=10+10)

Update #18616

Change-Id: Ia9c289b9ba59c583ea64060470315fd637e814cf
Reviewed-on: https://go-review.googlesource.com/38311
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-16 22:53:49 +00:00
Matthew Dempsky
118b3fe7bb cmd/compile/internal/gc: refactor ACALL Prog creation
This abstracts creation of ACALL Progs into package gc. The main
benefit of this today is we can refactor away a lot of common
boilerplate code.

Later, once liveness analysis happens on the SSA graph, this will also
provide an easy insertion point for emitting the PCDATA Progs
immediately before call instructions.

Passes toolstash-check -all.

Change-Id: Ia15108ace97201cd84314f1ca916dfeb4f09d61c
Reviewed-on: https://go-review.googlesource.com/38081
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 21:04:16 +00:00
Matthew Dempsky
08d8d5c986 cmd/compile/internal/ssa: replace {Defer,Go}Call with StaticCall
Passes toolstash-check -all.

Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2
Reviewed-on: https://go-review.googlesource.com/38080
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 19:44:36 +00:00