145 Commits

Author SHA1 Message Date
Josh Bleecher Snyder
3d6647d6f8 cmd/compile: improve regalloc live values debug printing
Before:

live values at end of each block
  b1: v3 v2 v7 avoid=0
  b2: v3 v13 avoid=81
  b3: v19[AX] v3 avoid=81
  b6: avoid=0
  b7: avoid=0
  b5: avoid=0
  b4: v3 v18 avoid=81

After:

live values at end of each block
  b1: v3 v2 v7
  b2: v3 v13 avoid=AX DI
  b3: v19[AX] v3 avoid=AX DI
  b6:
  b7:
  b5:
  b4: v3 v18 avoid=AX DI

Change-Id: Ibec5c76a16151832b8d49a21c640699fdc9a9d28
Reviewed-on: https://go-review.googlesource.com/109000
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-24 17:45:48 +00:00
Austin Clements
8871c930be cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.

This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.

The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.

For #24543.

Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-20 18:46:39 +00:00
David Chase
b9a365681f cmd/compile: adjust is-statement on Pos's to improve debugging
Stores to auto tmp variables can be hoisted to places
where the line numbers make debugging look "jumpy".
Turning those instructions into ones with is_stmt = 0 in
the DWARF (accomplished by marking ssa nodes with NotStmt)
makes debugging look better while still attributing the
instructions with the correct line number.

The same is true for certain register allocator spills and
reloads.

Change-Id: I97a394eb522d4911cc40b4bf5bf76d3d7221f6c0
Reviewed-on: https://go-review.googlesource.com/98415
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-04 22:14:59 +00:00
Josh Bleecher Snyder
014a9048d4 cmd/compile: prefer to evict a rematerializable register
This resolves a long-standing regalloc TODO:
If you must evict a register, choose to evict a register
containing a rematerializable value, since that value
won't need to be spilled.

Provides very minor performance and size improvements.

name                     old time/op    new time/op    delta
BinaryTree17-8              2.20s ± 3%     2.18s ± 2%  -0.77%  (p=0.000 n=45+49)
Fannkuch11-8                2.14s ± 2%     2.15s ± 2%  +0.73%  (p=0.000 n=43+44)
FmtFprintfEmpty-8          30.6ns ± 4%    30.2ns ± 3%  -1.14%  (p=0.000 n=50+48)
FmtFprintfString-8         54.5ns ± 6%    53.6ns ± 5%  -1.64%  (p=0.001 n=50+48)
FmtFprintfInt-8            58.0ns ± 7%    57.6ns ± 4%    ~     (p=0.220 n=50+50)
FmtFprintfIntInt-8         85.3ns ± 2%    84.8ns ± 3%  -0.62%  (p=0.001 n=44+47)
FmtFprintfPrefixedInt-8    93.9ns ± 6%    93.6ns ± 5%    ~     (p=0.706 n=50+48)
FmtFprintfFloat-8           178ns ± 4%     177ns ± 4%    ~     (p=0.107 n=49+50)
FmtManyArgs-8               376ns ± 4%     374ns ± 3%  -0.58%  (p=0.013 n=45+50)
GobDecode-8                4.77ms ± 2%    4.76ms ± 3%    ~     (p=0.059 n=47+46)
GobEncode-8                4.04ms ± 2%    3.99ms ± 3%  -1.13%  (p=0.000 n=49+49)
Gzip-8                      177ms ± 2%     180ms ± 3%  +1.43%  (p=0.000 n=48+48)
Gunzip-8                   28.5ms ± 6%    28.3ms ± 5%    ~     (p=0.104 n=50+49)
HTTPClientServer-8         72.1µs ± 1%    72.0µs ± 1%  -0.15%  (p=0.042 n=48+42)
JSONEncode-8               9.81ms ± 5%   10.03ms ± 6%  +2.29%  (p=0.000 n=50+49)
JSONDecode-8               39.2ms ± 3%    39.3ms ± 2%    ~     (p=0.095 n=49+49)
Mandelbrot200-8            3.48ms ± 2%    3.46ms ± 2%  -0.80%  (p=0.000 n=47+48)
GoParse-8                  2.54ms ± 3%    2.51ms ± 3%  -1.35%  (p=0.000 n=49+49)
RegexpMatchEasy0_32-8      66.0ns ± 7%    65.7ns ± 8%    ~     (p=0.331 n=50+50)
RegexpMatchEasy0_1K-8       155ns ± 4%     154ns ± 4%    ~     (p=0.986 n=49+50)
RegexpMatchEasy1_32-8      62.6ns ± 8%    62.2ns ± 5%    ~     (p=0.395 n=50+49)
RegexpMatchEasy1_1K-8       260ns ± 5%     255ns ± 3%  -1.92%  (p=0.000 n=49+49)
RegexpMatchMedium_32-8     92.9ns ± 2%    91.8ns ± 2%  -1.25%  (p=0.000 n=46+48)
RegexpMatchMedium_1K-8     27.7µs ± 3%    27.0µs ± 2%  -2.59%  (p=0.000 n=49+49)
RegexpMatchHard_32-8       1.23µs ± 4%    1.21µs ± 2%  -2.16%  (p=0.000 n=49+44)
RegexpMatchHard_1K-8       36.4µs ± 2%    35.7µs ± 2%  -1.87%  (p=0.000 n=48+49)
Revcomp-8                   274ms ± 2%     276ms ± 3%  +0.70%  (p=0.034 n=45+48)
Template-8                 45.1ms ± 8%    45.1ms ± 8%    ~     (p=0.643 n=50+50)
TimeParse-8                 223ns ± 2%     223ns ± 2%    ~     (p=0.401 n=47+47)
TimeFormat-8                245ns ± 2%     246ns ± 3%    ~     (p=0.758 n=49+50)
[Geo mean]                 36.5µs         36.3µs       -0.54%


name        old object-bytes  new object-bytes  delta
Template          480kB ± 0%        480kB ± 0%    ~     (all equal)
Unicode           214kB ± 0%        214kB ± 0%    ~     (all equal)
GoTypes          1.54MB ± 0%       1.54MB ± 0%  -0.03%  (p=0.008 n=5+5)
Compiler         5.75MB ± 0%       5.75MB ± 0%    ~     (all equal)
SSA              14.6MB ± 0%       14.6MB ± 0%  -0.01%  (p=0.008 n=5+5)
Flate             300kB ± 0%        300kB ± 0%  -0.01%  (p=0.008 n=5+5)
GoParser          366kB ± 0%        366kB ± 0%    ~     (all equal)
Reflect          1.20MB ± 0%       1.20MB ± 0%    ~     (all equal)
Tar               413kB ± 0%        413kB ± 0%    ~     (all equal)
XML               529kB ± 0%        528kB ± 0%  -0.13%  (p=0.008 n=5+5)
[Geo mean]        909kB             909kB       -0.02%


Change-Id: I46d37a55197683a98913f35801dc2b0d609653c8
Reviewed-on: https://go-review.googlesource.com/103240
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-29 18:47:18 +00:00
Josh Bleecher Snyder
dafca7de0f cmd/compile: prefer rematerialization to copying
Fixes #24132


name                     old time/op    new time/op    delta
BinaryTree17-8              2.18s ± 2%     2.15s ± 2%  -1.28%  (p=0.000 n=25+26)
Fannkuch11-8                2.16s ± 3%     2.13s ± 3%  -1.54%  (p=0.000 n=27+30)
FmtFprintfEmpty-8          29.9ns ± 3%    29.6ns ± 3%  -1.08%  (p=0.001 n=29+26)
FmtFprintfString-8         53.6ns ± 2%    54.0ns ± 4%    ~     (p=0.193 n=28+29)
FmtFprintfInt-8            56.8ns ± 3%    57.0ns ± 3%    ~     (p=0.330 n=29+29)
FmtFprintfIntInt-8         85.3ns ± 2%    85.8ns ± 3%  +0.56%  (p=0.042 n=30+29)
FmtFprintfPrefixedInt-8    94.1ns ± 5%    99.0ns ± 8%  +5.20%  (p=0.000 n=27+30)
FmtFprintfFloat-8           183ns ± 4%     182ns ± 3%    ~     (p=0.619 n=30+26)
FmtManyArgs-8               369ns ± 2%     369ns ± 2%    ~     (p=0.748 n=27+29)
GobDecode-8                4.78ms ± 2%    4.75ms ± 1%    ~     (p=0.051 n=28+27)
GobEncode-8                4.06ms ± 3%    4.07ms ± 3%    ~     (p=0.781 n=29+30)
Gzip-8                      178ms ± 2%     177ms ± 2%    ~     (p=0.171 n=29+30)
Gunzip-8                   28.2ms ± 7%    28.0ms ± 4%    ~     (p=0.155 n=30+30)
HTTPClientServer-8         71.5µs ± 3%    71.3µs ± 1%    ~     (p=0.913 n=25+27)
JSONEncode-8               9.71ms ± 5%    9.86ms ± 4%  +1.55%  (p=0.015 n=28+30)
JSONDecode-8               38.8ms ± 2%    39.3ms ± 2%  +1.41%  (p=0.000 n=28+29)
Mandelbrot200-8            3.47ms ± 6%    3.44ms ± 3%    ~     (p=0.183 n=28+28)
GoParse-8                  2.55ms ± 2%    2.54ms ± 3%  -0.58%  (p=0.003 n=27+29)
RegexpMatchEasy0_32-8      66.0ns ± 5%    65.3ns ± 4%    ~     (p=0.124 n=30+30)
RegexpMatchEasy0_1K-8       152ns ± 2%     152ns ± 3%    ~     (p=0.881 n=30+30)
RegexpMatchEasy1_32-8      62.9ns ± 9%    62.7ns ± 7%    ~     (p=0.717 n=30+30)
RegexpMatchEasy1_1K-8       263ns ± 3%     263ns ± 4%    ~     (p=0.909 n=30+29)
RegexpMatchMedium_32-8     93.4ns ± 3%    89.3ns ± 2%  -4.32%  (p=0.000 n=29+29)
RegexpMatchMedium_1K-8     27.5µs ± 3%    27.1µs ± 2%  -1.46%  (p=0.000 n=30+27)
RegexpMatchHard_32-8       1.33µs ± 3%    1.31µs ± 3%  -1.50%  (p=0.000 n=27+28)
RegexpMatchHard_1K-8       39.4µs ± 2%    39.1µs ± 2%  -0.54%  (p=0.027 n=28+28)
Revcomp-8                   274ms ± 4%     276ms ± 2%  +0.67%  (p=0.048 n=29+28)
Template-8                 45.1ms ± 5%    44.6ms ± 7%  -1.22%  (p=0.029 n=30+29)
TimeParse-8                 227ns ± 3%     224ns ± 3%  -1.25%  (p=0.000 n=28+27)
TimeFormat-8                248ns ± 3%     245ns ± 3%  -1.33%  (p=0.002 n=30+29)
[Geo mean]                 36.6µs         36.5µs       -0.32%


Change-Id: I24083f0013506b77e2d9da99c40ae2f67803285e
Reviewed-on: https://go-review.googlesource.com/101076
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-28 18:43:49 +00:00
Alberto Donizetti
377a2cb2d2 cmd/compile: reduce allocations in regAllocState.regalloc
name      old time/op       new time/op       delta
Template        281ms ± 2%        282ms ± 3%    ~     (p=0.428 n=19+20)
Unicode         138ms ± 6%        138ms ± 7%    ~     (p=0.813 n=19+20)
GoTypes         901ms ± 2%        895ms ± 2%    ~     (p=0.050 n=19+20)
Compiler        4.25s ± 1%        4.23s ± 1%  -0.31%  (p=0.031 n=19+18)
SSA             9.77s ± 1%        9.78s ± 1%    ~     (p=0.512 n=20+20)
Flate           187ms ± 3%        187ms ± 4%    ~     (p=0.687 n=20+19)
GoParser        224ms ± 4%        222ms ± 3%    ~     (p=0.301 n=20+20)
Reflect         576ms ± 2%        576ms ± 2%    ~     (p=0.620 n=20+20)
Tar             262ms ± 3%        263ms ± 3%    ~     (p=0.599 n=19+18)
XML             322ms ± 4%        322ms ± 2%    ~     (p=0.512 n=20+20)

name      old user-time/op  new user-time/op  delta
Template        403ms ± 3%        399ms ± 5%    ~     (p=0.149 n=17+20)
Unicode         217ms ±12%        217ms ± 9%    ~     (p=0.883 n=20+20)
GoTypes         1.24s ± 3%        1.24s ± 3%    ~     (p=0.718 n=20+20)
Compiler        5.90s ± 3%        5.84s ± 5%    ~     (p=0.217 n=18+20)
SSA             14.0s ± 6%        14.1s ± 5%    ~     (p=0.235 n=19+20)
Flate           253ms ± 6%        254ms ± 5%    ~     (p=0.749 n=20+19)
GoParser        309ms ± 7%        307ms ± 5%    ~     (p=0.398 n=20+20)
Reflect         772ms ± 3%        771ms ± 3%    ~     (p=0.901 n=20+19)
Tar             368ms ± 5%        369ms ± 8%    ~     (p=0.429 n=20+20)
XML             435ms ± 5%        434ms ± 5%    ~     (p=0.841 n=20+20)

name      old alloc/op      new alloc/op      delta
Template       39.0MB ± 0%       38.9MB ± 0%  -0.21%  (p=0.000 n=20+19)
Unicode        29.0MB ± 0%       29.0MB ± 0%  -0.03%  (p=0.000 n=20+20)
GoTypes         116MB ± 0%        115MB ± 0%  -0.33%  (p=0.000 n=20+20)
Compiler        498MB ± 0%        496MB ± 0%  -0.37%  (p=0.000 n=19+20)
SSA            1.41GB ± 0%       1.40GB ± 0%  -0.24%  (p=0.000 n=20+20)
Flate          25.0MB ± 0%       25.0MB ± 0%  -0.22%  (p=0.000 n=20+19)
GoParser       31.0MB ± 0%       30.9MB ± 0%  -0.23%  (p=0.000 n=20+17)
Reflect        77.1MB ± 0%       77.0MB ± 0%  -0.12%  (p=0.000 n=20+20)
Tar            39.7MB ± 0%       39.6MB ± 0%  -0.17%  (p=0.000 n=20+20)
XML            44.9MB ± 0%       44.8MB ± 0%  -0.29%  (p=0.000 n=20+20)

name      old allocs/op     new allocs/op     delta
Template         386k ± 0%         385k ± 0%  -0.28%  (p=0.000 n=20+20)
Unicode          337k ± 0%         336k ± 0%  -0.07%  (p=0.000 n=20+20)
GoTypes         1.20M ± 0%        1.20M ± 0%  -0.41%  (p=0.000 n=20+20)
Compiler        4.71M ± 0%        4.68M ± 0%  -0.52%  (p=0.000 n=20+20)
SSA             11.7M ± 0%        11.6M ± 0%  -0.31%  (p=0.000 n=20+19)
Flate            238k ± 0%         237k ± 0%  -0.28%  (p=0.000 n=18+20)
GoParser         320k ± 0%         319k ± 0%  -0.34%  (p=0.000 n=20+19)
Reflect          961k ± 0%         959k ± 0%  -0.12%  (p=0.000 n=20+20)
Tar              397k ± 0%         396k ± 0%  -0.23%  (p=0.000 n=20+20)
XML              419k ± 0%         417k ± 0%  -0.39%  (p=0.000 n=20+19)

Change-Id: Ic7ec3614808d9892c1cab3991b996b7a3b8eff21
Reviewed-on: https://go-review.googlesource.com/102676
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-27 18:03:39 +00:00
Alberto Donizetti
2ba98f1ae9 cmd/compile: avoid some allocations in regalloc
Compilebench:
name      old time/op       new time/op       delta
Template        283ms ± 3%        281ms ± 4%    ~     (p=0.242 n=20+20)
Unicode         137ms ± 6%        135ms ± 6%    ~     (p=0.194 n=20+19)
GoTypes         890ms ± 2%        883ms ± 1%  -0.74%  (p=0.001 n=19+19)
Compiler        4.21s ± 2%        4.20s ± 2%  -0.40%  (p=0.033 n=20+19)
SSA             9.86s ± 2%        9.68s ± 1%  -1.80%  (p=0.000 n=20+19)
Flate           185ms ± 5%        185ms ± 7%    ~     (p=0.429 n=20+20)
GoParser        222ms ± 3%        222ms ± 4%    ~     (p=0.588 n=19+20)
Reflect         572ms ± 2%        570ms ± 3%    ~     (p=0.113 n=19+20)
Tar             263ms ± 4%        259ms ± 2%  -1.41%  (p=0.013 n=20+20)
XML             321ms ± 2%        321ms ± 4%    ~     (p=0.835 n=20+19)

name      old user-time/op  new user-time/op  delta
Template        400ms ± 5%        405ms ± 5%    ~     (p=0.096 n=20+20)
Unicode         217ms ± 8%        213ms ± 8%    ~     (p=0.242 n=20+20)
GoTypes         1.23s ± 3%        1.22s ± 3%    ~     (p=0.923 n=19+20)
Compiler        5.76s ± 6%        5.81s ± 2%    ~     (p=0.687 n=20+19)
SSA             14.2s ± 4%        14.0s ± 4%    ~     (p=0.121 n=20+20)
Flate           248ms ± 7%        251ms ±10%    ~     (p=0.369 n=20+20)
GoParser        308ms ± 5%        305ms ± 6%    ~     (p=0.336 n=19+20)
Reflect         771ms ± 2%        766ms ± 2%    ~     (p=0.113 n=20+19)
Tar             370ms ± 5%        362ms ± 7%  -2.06%  (p=0.036 n=19+20)
XML             435ms ± 4%        432ms ± 5%    ~     (p=0.369 n=20+20)

name      old alloc/op      new alloc/op      delta
Template       39.5MB ± 0%       39.4MB ± 0%  -0.20%  (p=0.000 n=20+20)
Unicode        29.1MB ± 0%       29.1MB ± 0%    ~     (p=0.064 n=20+20)
GoTypes         117MB ± 0%        117MB ± 0%  -0.17%  (p=0.000 n=20+20)
Compiler        503MB ± 0%        502MB ± 0%  -0.15%  (p=0.000 n=19+19)
SSA            1.42GB ± 0%       1.42GB ± 0%  -0.16%  (p=0.000 n=20+20)
Flate          25.3MB ± 0%       25.3MB ± 0%  -0.19%  (p=0.000 n=20+20)
GoParser       31.4MB ± 0%       31.3MB ± 0%  -0.14%  (p=0.000 n=20+18)
Reflect        78.1MB ± 0%       77.9MB ± 0%  -0.34%  (p=0.000 n=20+19)
Tar            40.1MB ± 0%       40.0MB ± 0%  -0.17%  (p=0.000 n=20+20)
XML            45.3MB ± 0%       45.2MB ± 0%  -0.13%  (p=0.000 n=20+20)

name      old allocs/op     new allocs/op     delta
Template         393k ± 0%         392k ± 0%  -0.21%  (p=0.000 n=20+19)
Unicode          337k ± 0%         337k ± 0%  -0.02%  (p=0.000 n=20+20)
GoTypes         1.22M ± 0%        1.22M ± 0%  -0.21%  (p=0.000 n=20+20)
Compiler        4.77M ± 0%        4.76M ± 0%  -0.16%  (p=0.000 n=20+20)
SSA             11.8M ± 0%        11.8M ± 0%  -0.12%  (p=0.000 n=20+20)
Flate            242k ± 0%         241k ± 0%  -0.20%  (p=0.000 n=20+20)
GoParser         324k ± 0%         324k ± 0%  -0.14%  (p=0.000 n=20+20)
Reflect          985k ± 0%         981k ± 0%  -0.38%  (p=0.000 n=20+20)
Tar              403k ± 0%         402k ± 0%  -0.19%  (p=0.000 n=20+20)
XML              424k ± 0%         424k ± 0%  -0.16%  (p=0.000 n=19+20)

Change-Id: I131e382b64cd6db11a9263a477d45d80c180c499
Reviewed-on: https://go-review.googlesource.com/102421
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-25 18:27:49 +00:00
Ilya Tocar
983dcf70ba cmd/compile/internal/ssa: update regalloc in loops
Currently we don't lift spill out of loop if loop contains call.
However often we have code like this:

for .. {
    if hard_case {
	call()
    }
    // simple case, without call
}

So instead of checking for any call, check for unavoidable call.
For #22698 cases I see:
mime/quotedprintable/Writer-6                   10.9µs ± 4%      9.2µs ± 3%   -15.02%  (p=0.000 n=8+8)
And:
compress/flate/Encode/Twain/Huffman/1e4-6       99.4µs ± 6%     90.9µs ± 0%    -8.57%  (p=0.000 n=8+8)
compress/flate/Encode/Twain/Huffman/1e5-6       760µs ± 1%      725µs ± 1%     -4.56%  (p=0.000 n=8+8)
compress/flate/Encode/Twain/Huffman/1e6-6       7.55ms ± 0%      7.24ms ± 0%     -4.07%  (p=0.000 n=8+7)

There are no significant changes on go1 benchmarks.
But for cases with runtime arch checks, where we call generic version on old hardware,
there are respectable performance gains:
math/RoundToEven-6                             1.43ns ± 0%     1.25ns ± 0%   -12.59%  (p=0.001 n=7+7)
math/bits/OnesCount64-6                        1.60ns ± 1%     1.42ns ± 1%   -11.32%  (p=0.000 n=8+8)

Also on some runtime benchmarks loops have less loads and higher performance:
runtime/RuneIterate/range1/ASCII-6             15.6ns ± 1%     13.9ns ± 1%   -10.74%  (p=0.000 n=7+8)
runtime/ArrayEqual-6                           3.22ns ± 0%     2.86ns ± 2%   -11.06%  (p=0.000 n=7+8)

Fixes #22698
Updates #22234

Change-Id: I0ae2f19787d07a9026f064366dedbe601bf7257a
Reviewed-on: https://go-review.googlesource.com/84055
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-03-20 21:02:39 +00:00
Daniel Martí
cd2cb6e3f5 cmd/compile: cache sparse maps across ssa passes
This is done for sparse sets already, but it was missing for sparse
maps. Only affects deadstore and regalloc, as they're the only ones that
use sparse maps.

name                 old time/op    new time/op    delta
DSEPass-4               247µs ± 0%     216µs ± 0%  -12.75%  (p=0.008 n=5+5)
DSEPassBlock-4         3.05ms ± 1%    2.87ms ± 1%   -6.02%  (p=0.002 n=6+6)
CSEPass-4              2.30ms ± 0%    2.32ms ± 0%   +0.53%  (p=0.026 n=6+6)
CSEPassBlock-4         23.8ms ± 0%    23.8ms ± 0%     ~     (p=0.931 n=6+5)
DeadcodePass-4         51.7µs ± 1%    51.5µs ± 2%     ~     (p=0.429 n=5+6)
DeadcodePassBlock-4     734µs ± 1%     742µs ± 3%     ~     (p=0.394 n=6+6)
MultiPass-4             152µs ± 0%     149µs ± 2%     ~     (p=0.082 n=5+6)
MultiPassBlock-4       2.67ms ± 1%    2.41ms ± 2%   -9.77%  (p=0.008 n=5+5)

name                 old alloc/op   new alloc/op   delta
DSEPass-4              41.2kB ± 0%     0.1kB ± 0%  -99.68%  (p=0.002 n=6+6)
DSEPassBlock-4          560kB ± 0%       4kB ± 0%  -99.34%  (p=0.026 n=5+6)
CSEPass-4               189kB ± 0%     189kB ± 0%     ~     (all equal)
CSEPassBlock-4         3.10MB ± 0%    3.10MB ± 0%     ~     (p=0.444 n=5+5)
DeadcodePass-4         10.5kB ± 0%    10.5kB ± 0%     ~     (all equal)
DeadcodePassBlock-4     164kB ± 0%     164kB ± 0%     ~     (all equal)
MultiPass-4             240kB ± 0%     199kB ± 0%  -17.06%  (p=0.002 n=6+6)
MultiPassBlock-4       3.60MB ± 0%    2.99MB ± 0%  -17.06%  (p=0.002 n=6+6)

name                 old allocs/op  new allocs/op  delta
DSEPass-4                8.00 ± 0%      4.00 ± 0%  -50.00%  (p=0.002 n=6+6)
DSEPassBlock-4            240 ± 0%       120 ± 0%  -50.00%  (p=0.002 n=6+6)
CSEPass-4                9.00 ± 0%      9.00 ± 0%     ~     (all equal)
CSEPassBlock-4          1.35k ± 0%     1.35k ± 0%     ~     (all equal)
DeadcodePass-4           3.00 ± 0%      3.00 ± 0%     ~     (all equal)
DeadcodePassBlock-4      9.00 ± 0%      9.00 ± 0%     ~     (all equal)
MultiPass-4              11.0 ± 0%      10.0 ± 0%   -9.09%  (p=0.002 n=6+6)
MultiPassBlock-4          165 ± 0%       150 ± 0%   -9.09%  (p=0.002 n=6+6)

Change-Id: I43860687c88f33605eb1415f36473c5cfe8fde4a
Reviewed-on: https://go-review.googlesource.com/98449
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-03-15 17:24:39 +00:00
David Chase
c18ff18465 cmd/compile: decouple emitted block order from regalloc block order
While tinkering with different block orders for the preemptible
loop experiment, crashed the register allocator with a "bad"
one (these exist).  Realized that one knob was controlling
two things (register allocation and branch patterns) and
decided that life would be simpler if the two orders were
independent.

Ran some experiments and determined that we have probably,
mostly, been optimizing for register allocation effects, not
branch effects.  Bad block orders for register allocation are
somewhat costly.

This will also allow separate experimentation with perhaps-
better block orders for register allocation.

Change-Id: I6ecf2f24cca178b6f8acc0d3c4caaef043c11ed9
Reviewed-on: https://go-review.googlesource.com/47314
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-22 03:02:34 +00:00
Heschi Kreinick
2075a9323d cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.

RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.

The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.

Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.

Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.

TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.

Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-02-14 18:29:19 +00:00
Keith Randall
0153a4130d cmd/compile: fix runtime.KeepAlive
KeepAlive needs to introduce a use of the spill of the
value it is keeping alive.  Without that, we don't guarantee
that the spill dominates the KeepAlive.

This bug was probably introduced with the code to move spills
down to the dominator of the restores, instead of always spilling
just after the value itself (CL 34822).

Fixes #22458.

Change-Id: I94955a21960448ffdacc4df775fe1213967b1d4c
Reviewed-on: https://go-review.googlesource.com/74210
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2017-10-30 19:55:02 +00:00
David Chase
ca360c3992 cmd/compile: better XPos for rematerialized values and JMPs
This attempts to choose better values for values that are
rematerialized (uses the XPos of the consumer, not the
original) and for unconditional branches (uses the last
assigned XPos in the block).

The JMP branches seem to sometimes end up with a PC in the
destination block, I think because of register movement
or rematerialization that gets placed in predecessor blocks.
This may be acceptable because (eyeball-empirically) that is
often the line number of the target block, so the line number
flow is correct.

Added proper test, that checks both -N -l and regular compilation.
The test is also capable (for gdb, delve soon) of tracking
variable printing based on comments in the source code.

There's substantial room for improvement in debugger behavior.

Updates #21098.

Change-Id: I13abd48a39141583b85576a015f561065819afd0
Reviewed-on: https://go-review.googlesource.com/50610
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-07 22:12:36 +00:00
Daniel Martí
ded2c65db3 cmd/compile: simplify a few bits of the code
Remove an unused type, a few redundant returns and replace a few slice
append loops with a single append.

Change-Id: If07248180bae5631b5b152c6051d9635889997d5
Reviewed-on: https://go-review.googlesource.com/66851
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2017-09-28 20:40:17 +00:00
Lynn Boger
b74b43de68 cmd/compile: request r12 for indirect calls on ppc64le
On ppc64le, functions compiled with -shared expect r12 to
hold the function's address for indirect calls. Previously
this was enforced by generating a move instruction if the
address wasn't already in r12. This change avoids that extra
move by requesting r12 in the CALL ops that do indirect calls.

As a result of adding support for plugins on ppc64le, it was
discovered that there would be more cases where this extra
move was needed, so this seemed like a better solution.

Updates #20756

Change-Id: I6770885a46990f78c6d2902a715dcdaa822192a1
Reviewed-on: https://go-review.googlesource.com/62890
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-09-11 18:02:33 +00:00
Daniel Martí
99da8730b0 all: remove some double spaces from comments
Went mainly for the ones that make no sense, such as the ones
mid-sentence or after commas.

Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0
Reviewed-on: https://go-review.googlesource.com/57293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-26 15:09:09 +00:00
Keith Randall
770d8d8207 cmd/compile: free value earlier in nilcheck
When we remove a nil check, add it back to the free Value pool immediately.

Fixes #18732

Change-Id: I8d644faabbfb52157d3f2d071150ff0342ac28dc
Reviewed-on: https://go-review.googlesource.com/58810
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-08-25 06:01:26 +00:00
Keith Randall
bf4d8d3d05 cmd/compile: rename SSA Register.Name to Register.String
Just to get rid of lots of .Name() stutter in printf calls.

Change-Id: I86cf00b3f7b2172387a1c6a7f189c1897fab6300
Reviewed-on: https://go-review.googlesource.com/56630
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-08-17 21:53:08 +00:00
Heschi Kreinick
4c54a047c6 [dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.

After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.

There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:

  type myint int
  var a int
  b := myint(int)
and
  b := (*uint64)(unsafe.Pointer(a))

don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.

Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.

A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.

TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.

go build -toolexec 'toolstash -cmp' -a std succeeds.

With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after:  06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean  /tmp/220352263 /tmp/621364410
completed   15 of   15, estimated time remaining 0s (eta 3:52PM)
name        old time/op       new time/op       delta
Template          199ms ± 3%        198ms ± 2%     ~     (p=0.400 n=15+14)
Unicode          96.6ms ± 5%       96.4ms ± 5%     ~     (p=0.838 n=15+15)
GoTypes           653ms ± 2%        647ms ± 2%     ~     (p=0.102 n=15+14)
Flate             133ms ± 6%        129ms ± 3%   -2.62%  (p=0.041 n=15+15)
GoParser          164ms ± 5%        159ms ± 3%   -3.05%  (p=0.000 n=15+15)
Reflect           428ms ± 4%        422ms ± 3%     ~     (p=0.156 n=15+13)
Tar               123ms ±10%        124ms ± 8%     ~     (p=0.461 n=15+15)
XML               228ms ± 3%        224ms ± 3%   -1.57%  (p=0.045 n=15+15)
[Geo mean]        206ms             377ms       +82.86%

name        old user-time/op  new user-time/op  delta
Template          292ms ±10%        301ms ±12%     ~     (p=0.189 n=15+15)
Unicode           166ms ±37%        158ms ±14%     ~     (p=0.418 n=15+14)
GoTypes           962ms ± 6%        963ms ± 7%     ~     (p=0.976 n=15+15)
Flate             207ms ±19%        200ms ±14%     ~     (p=0.345 n=14+15)
GoParser          246ms ±22%        240ms ±15%     ~     (p=0.587 n=15+15)
Reflect           611ms ±13%        587ms ±14%     ~     (p=0.085 n=15+13)
Tar               211ms ±12%        217ms ±14%     ~     (p=0.355 n=14+15)
XML               335ms ±15%        320ms ±18%     ~     (p=0.169 n=15+15)
[Geo mean]        317ms             583ms       +83.72%

name        old alloc/op      new alloc/op      delta
Template         40.2MB ± 0%       40.2MB ± 0%   -0.15%  (p=0.000 n=14+15)
Unicode          29.2MB ± 0%       29.3MB ± 0%     ~     (p=0.624 n=15+15)
GoTypes           114MB ± 0%        114MB ± 0%   -0.15%  (p=0.000 n=15+14)
Flate            25.7MB ± 0%       25.6MB ± 0%   -0.18%  (p=0.000 n=13+15)
GoParser         32.2MB ± 0%       32.2MB ± 0%   -0.14%  (p=0.003 n=15+15)
Reflect          77.8MB ± 0%       77.9MB ± 0%     ~     (p=0.061 n=15+15)
Tar              27.1MB ± 0%       27.0MB ± 0%   -0.11%  (p=0.029 n=15+15)
XML              42.7MB ± 0%       42.5MB ± 0%   -0.29%  (p=0.000 n=15+15)
[Geo mean]       42.1MB            75.0MB       +78.05%

name        old allocs/op     new allocs/op     delta
Template           402k ± 1%         398k ± 0%   -0.91%  (p=0.000 n=15+15)
Unicode            344k ± 1%         344k ± 0%     ~     (p=0.715 n=15+14)
GoTypes           1.18M ± 0%        1.17M ± 0%   -0.91%  (p=0.000 n=15+14)
Flate              243k ± 0%         240k ± 1%   -1.05%  (p=0.000 n=13+15)
GoParser           327k ± 1%         324k ± 1%   -0.96%  (p=0.000 n=15+15)
Reflect            984k ± 1%         982k ± 0%     ~     (p=0.050 n=15+15)
Tar                261k ± 1%         259k ± 1%   -0.77%  (p=0.000 n=15+15)
XML                411k ± 0%         404k ± 1%   -1.55%  (p=0.000 n=15+15)
[Geo mean]         439k              755k       +72.01%

name        old text-bytes    new text-bytes    delta
HelloSize         694kB ± 0%        694kB ± 0%   -0.00%  (p=0.000 n=15+15)

name        old data-bytes    new data-bytes    delta
HelloSize        5.55kB ± 0%       5.55kB ± 0%     ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize         133kB ± 0%        133kB ± 0%     ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.04MB ± 0%       1.04MB ± 0%     ~     (all equal)

Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-27 20:19:44 +00:00
Heschi Kreinick
2d57d94ac3 [dev.debug] cmd/compile: track variable decomposition in LocalSlot
When the compiler decomposes a user variable, track its origin so that
it can be recomposed during DWARF generation.

Change-Id: Ia71c7f8e7f4d65f0652f1c97b0dda5d9cad41936
Reviewed-on: https://go-review.googlesource.com/50878
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-26 18:39:39 +00:00
David Chase
00263a8968 cmd/compile: reduce debugger-worsening line number churn
Reuse block head or preceding instruction's line number for
register allocator's spill, fill, copy, rematerialization
instructionsl; and also for phi, and for no-src-pos
instructions.  Assembler creates same line number tables
for copy-predecessor-line and for no-src-pos,
but copy-predecessor produces better-looking assembly
language output with -S and with GOSSAFUNC, and does not
require changes to tests of existing assembly language.

Split "copyInto" into two cases, one for register allocation,
one for otherwise.  This caused the test score line change
count to increase by one, which may reflect legitimately
useful information preserved.  Without any special treatment
for copyInto, the change count increases by 21 more, from
51 to 72 (i.e., quite a lot).

There is a test; using two naive "scores" for line number
churn, the old numbering is 2x or 4x worse.

Fixes #18902.

Change-Id: I0a0a69659d30ee4e5d10116a0dd2b8c5df8457b1
Reviewed-on: https://go-review.googlesource.com/36207
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-10 17:16:44 +00:00
Josh Bleecher Snyder
46b88c9fbc cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.

In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.

Though this is a big CL, most of the changes are
mechanical and uninteresting.

Interesting bits:

* Add new singleton globals to package types for the special
  SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
  and TTUPLE, for SSA tuple types.
  ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
  to package types.
* We had picked the name "types" in our rules for the handy
  list of types provided by ssa.Config. That conflicted with
  the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
  types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
  and probably also some other duplicated Type methods
  designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
  and they were not particularly careful about types in general.
  Of necessity, this CL switches them to use *types.Type;
  it does not make them more type-accurate.
  Unfortunately, using types.Type means initializing a bit
  of the types universe.
  This is prime for refactoring and improvement.

This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.

name        old alloc/op      new alloc/op      delta
Template         37.9MB ± 0%       37.7MB ± 0%  -0.57%  (p=0.000 n=10+8)
Unicode          28.9MB ± 0%       28.7MB ± 0%  -0.52%  (p=0.000 n=10+10)
GoTypes           110MB ± 0%        109MB ± 0%  -0.88%  (p=0.000 n=10+10)
Flate            24.7MB ± 0%       24.6MB ± 0%  -0.66%  (p=0.000 n=10+10)
GoParser         31.1MB ± 0%       30.9MB ± 0%  -0.61%  (p=0.000 n=10+9)
Reflect          73.9MB ± 0%       73.4MB ± 0%  -0.62%  (p=0.000 n=10+8)
Tar              25.8MB ± 0%       25.6MB ± 0%  -0.77%  (p=0.000 n=9+10)
XML              41.2MB ± 0%       40.9MB ± 0%  -0.80%  (p=0.000 n=10+10)
[Geo mean]       40.5MB            40.3MB       -0.68%

name        old allocs/op     new allocs/op     delta
Template           385k ± 0%         386k ± 0%    ~     (p=0.356 n=10+9)
Unicode            343k ± 1%         344k ± 0%    ~     (p=0.481 n=10+10)
GoTypes           1.16M ± 0%        1.16M ± 0%  -0.16%  (p=0.004 n=10+10)
Flate              238k ± 1%         238k ± 1%    ~     (p=0.853 n=10+10)
GoParser           320k ± 0%         320k ± 0%    ~     (p=0.720 n=10+9)
Reflect            957k ± 0%         957k ± 0%    ~     (p=0.460 n=10+8)
Tar                252k ± 0%         252k ± 0%    ~     (p=0.133 n=9+10)
XML                400k ± 0%         400k ± 0%    ~     (p=0.796 n=10+10)
[Geo mean]         428k              428k       -0.01%


Removing all the interface calls helps non-trivially with CPU, though.

name        old time/op       new time/op       delta
Template          178ms ± 4%        173ms ± 3%  -2.90%  (p=0.000 n=94+96)
Unicode          85.0ms ± 4%       83.9ms ± 4%  -1.23%  (p=0.000 n=96+96)
GoTypes           543ms ± 3%        528ms ± 3%  -2.73%  (p=0.000 n=98+96)
Flate             116ms ± 3%        113ms ± 4%  -2.34%  (p=0.000 n=96+99)
GoParser          144ms ± 3%        140ms ± 4%  -2.80%  (p=0.000 n=99+97)
Reflect           344ms ± 3%        334ms ± 4%  -3.02%  (p=0.000 n=100+99)
Tar               106ms ± 5%        103ms ± 4%  -3.30%  (p=0.000 n=98+94)
XML               198ms ± 5%        192ms ± 4%  -2.88%  (p=0.000 n=92+95)
[Geo mean]        178ms             173ms       -2.65%

name        old user-time/op  new user-time/op  delta
Template          229ms ± 5%        224ms ± 5%  -2.36%  (p=0.000 n=95+99)
Unicode           107ms ± 6%        106ms ± 5%  -1.13%  (p=0.001 n=93+95)
GoTypes           696ms ± 4%        679ms ± 4%  -2.45%  (p=0.000 n=97+99)
Flate             137ms ± 4%        134ms ± 5%  -2.66%  (p=0.000 n=99+96)
GoParser          176ms ± 5%        172ms ± 8%  -2.27%  (p=0.000 n=98+100)
Reflect           430ms ± 6%        411ms ± 5%  -4.46%  (p=0.000 n=100+92)
Tar               128ms ±13%        123ms ±13%  -4.21%  (p=0.000 n=100+100)
XML               239ms ± 6%        233ms ± 6%  -2.50%  (p=0.000 n=95+97)
[Geo mean]        220ms             213ms       -2.76%


Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-09 23:01:51 +00:00
Matthew Dempsky
1e3570ac86 cmd/internal/objabi: extract shared functionality from obj
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing
the assembler backends no longer requires reinstalling cmd/link or
cmd/addr2line.

There's also now one canonical definition of the object file format in
cmd/internal/objabi/doc.go, with a warning to update all three
implementations.

objabi is still something of a grab bag of unrelated code (e.g., flag
and environment variable handling probably belong in a separate "tool"
package), but this is still progress.

Fixes #15165.
Fixes #20026.

Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c
Reviewed-on: https://go-review.googlesource.com/40972
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-19 00:00:09 +00:00
Cherry Zhang
8a5175df35 cmd/compile: improve startRegs calculation
In register allocation, we calculate what values are used in
and after the current block. If a value is used only after a
function call, since registers are clobbered in call, we don't
need to mark the value live at the entrance of the block.
Before this CL it is considered live, and unnecessary copy or
load may be generated when resolving merge edge.

Fixes #14761.

On AMD64:
name                      old time/op    new time/op    delta
BinaryTree17-12              2.84s ± 1%     2.81s ± 1%   -1.06%  (p=0.000 n=10+9)
Fannkuch11-12                3.61s ± 0%     3.55s ± 1%   -1.77%  (p=0.000 n=10+9)
FmtFprintfEmpty-12          50.4ns ± 4%    50.0ns ± 1%     ~     (p=0.785 n=9+8)
FmtFprintfString-12         80.0ns ± 3%    78.2ns ± 3%   -2.35%  (p=0.004 n=10+9)
FmtFprintfInt-12            81.3ns ± 4%    81.8ns ± 2%     ~     (p=0.159 n=10+10)
FmtFprintfIntInt-12          120ns ± 4%     118ns ± 2%     ~     (p=0.218 n=10+10)
FmtFprintfPrefixedInt-12     152ns ± 3%     155ns ± 2%   +2.11%  (p=0.026 n=10+10)
FmtFprintfFloat-12           240ns ± 1%     238ns ± 1%   -0.79%  (p=0.005 n=9+9)
FmtManyArgs-12               504ns ± 1%     510ns ± 1%   +1.14%  (p=0.000 n=8+9)
GobDecode-12                7.00ms ± 1%    6.99ms ± 0%     ~     (p=0.497 n=9+10)
GobEncode-12                5.47ms ± 1%    5.48ms ± 1%     ~     (p=0.218 n=10+10)
Gzip-12                      258ms ± 2%     256ms ± 1%   -0.96%  (p=0.043 n=10+9)
Gunzip-12                   38.6ms ± 0%    38.3ms ± 0%   -0.64%  (p=0.000 n=9+8)
HTTPClientServer-12         90.4µs ± 3%    87.2µs ±11%     ~     (p=0.053 n=9+10)
JSONEncode-12               15.6ms ± 0%    15.6ms ± 1%     ~     (p=0.077 n=9+9)
JSONDecode-12               55.1ms ± 1%    54.6ms ± 1%   -0.85%  (p=0.010 n=10+9)
Mandelbrot200-12            4.49ms ± 0%    4.47ms ± 0%   -0.25%  (p=0.000 n=10+8)
GoParse-12                  3.38ms ± 0%    3.37ms ± 1%     ~     (p=0.315 n=8+10)
RegexpMatchEasy0_32-12      82.5ns ± 4%    82.0ns ± 0%     ~     (p=0.164 n=10+8)
RegexpMatchEasy0_1K-12       203ns ± 1%     202ns ± 1%   -0.85%  (p=0.000 n=9+10)
RegexpMatchEasy1_32-12      82.3ns ± 1%    81.1ns ± 0%   -1.39%  (p=0.000 n=10+8)
RegexpMatchEasy1_1K-12       357ns ± 1%     357ns ± 1%     ~     (p=0.697 n=8+9)
RegexpMatchMedium_32-12      125ns ± 2%     126ns ± 2%     ~     (p=0.197 n=10+10)
RegexpMatchMedium_1K-12     39.6µs ± 3%    39.6µs ± 1%     ~     (p=0.971 n=10+10)
RegexpMatchHard_32-12       1.99µs ± 2%    1.99µs ± 4%     ~     (p=0.891 n=10+9)
RegexpMatchHard_1K-12       60.1µs ± 3%    60.4µs ± 3%     ~     (p=0.684 n=10+10)
Revcomp-12                   531ms ± 6%     441ms ± 0%  -16.94%  (p=0.000 n=10+9)
Template-12                 58.9ms ± 1%    58.7ms ± 1%     ~     (p=0.315 n=10+10)
TimeParse-12                 319ns ± 1%     320ns ± 4%     ~     (p=0.215 n=9+9)
TimeFormat-12                345ns ± 0%     333ns ± 1%   -3.36%  (p=0.000 n=9+10)
[Geo mean]                  52.2µs         51.6µs        -1.13%

On ARM64:
name                     old time/op    new time/op    delta
BinaryTree17-8              8.53s ± 0%     8.36s ± 0%   -1.89%  (p=0.000 n=10+10)
Fannkuch11-8                6.15s ± 0%     6.10s ± 0%   -0.67%  (p=0.000 n=10+10)
FmtFprintfEmpty-8           117ns ± 0%     117ns ± 0%     ~     (all equal)
FmtFprintfString-8          192ns ± 0%     192ns ± 0%     ~     (all equal)
FmtFprintfInt-8             198ns ± 0%     198ns ± 0%     ~     (p=0.211 n=10+10)
FmtFprintfIntInt-8          289ns ± 0%     291ns ± 0%   +0.59%  (p=0.000 n=7+10)
FmtFprintfPrefixedInt-8     320ns ± 2%     317ns ± 0%     ~     (p=0.431 n=10+8)
FmtFprintfFloat-8           538ns ± 0%     538ns ± 0%     ~     (all equal)
FmtManyArgs-8              1.17µs ± 1%    1.18µs ± 1%     ~     (p=0.063 n=10+10)
GobDecode-8                17.0ms ± 1%    17.2ms ± 1%   +0.83%  (p=0.000 n=10+10)
GobEncode-8                14.2ms ± 0%    14.1ms ± 1%   -0.78%  (p=0.001 n=9+10)
Gzip-8                      806ms ± 0%     797ms ± 0%   -1.12%  (p=0.000 n=6+9)
Gunzip-8                    131ms ± 0%     130ms ± 0%   -0.51%  (p=0.000 n=10+9)
HTTPClientServer-8          206µs ± 9%     212µs ± 2%     ~     (p=0.829 n=10+8)
JSONEncode-8               40.1ms ± 0%    40.1ms ± 0%     ~     (p=0.136 n=9+9)
JSONDecode-8                157ms ± 0%     151ms ± 0%   -3.32%  (p=0.000 n=9+9)
Mandelbrot200-8            10.1ms ± 0%    10.1ms ± 0%   -0.05%  (p=0.000 n=9+8)
GoParse-8                  8.43ms ± 0%    8.43ms ± 0%     ~     (p=0.912 n=10+10)
RegexpMatchEasy0_32-8       228ns ± 1%     227ns ± 0%   -0.26%  (p=0.026 n=10+9)
RegexpMatchEasy0_1K-8      1.92µs ± 0%    1.63µs ± 0%  -15.18%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8       258ns ± 1%     250ns ± 0%   -2.83%  (p=0.000 n=10+10)
RegexpMatchEasy1_1K-8      2.39µs ± 0%    2.13µs ± 0%  -10.94%  (p=0.000 n=9+9)
RegexpMatchMedium_32-8      352ns ± 0%     351ns ± 0%   -0.29%  (p=0.004 n=9+10)
RegexpMatchMedium_1K-8      104µs ± 0%     105µs ± 0%   +0.58%  (p=0.000 n=8+9)
RegexpMatchHard_32-8       5.84µs ± 0%    5.82µs ± 0%   -0.27%  (p=0.000 n=9+10)
RegexpMatchHard_1K-8        177µs ± 0%     177µs ± 0%   -0.07%  (p=0.000 n=9+9)
Revcomp-8                   1.57s ± 1%     1.50s ± 1%   -4.60%  (p=0.000 n=9+10)
Template-8                  157ms ± 1%     153ms ± 1%   -2.28%  (p=0.000 n=10+9)
TimeParse-8                 779ns ± 1%     770ns ± 1%   -1.18%  (p=0.013 n=10+10)
TimeFormat-8                823ns ± 2%     826ns ± 1%     ~     (p=0.324 n=10+9)
[Geo mean]                  144µs          142µs        -1.45%

Reduce cmd/go text size by 0.5%.

Change-Id: I9288ff983c4a7cf03fc0cb35b9b1750828013117
Reviewed-on: https://go-review.googlesource.com/38457
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-29 19:45:32 +00:00
Josh Bleecher Snyder
34975095d0 cmd/compile: provide pos and curfn to temp
Concurrent compilation requires providing an
explicit position and curfn to temp.
This implementation of tempAt temporarily
continues to use the globals lineno and Curfn,
so as not to collide with mdempsky's
work for #19683 eliminating the Curfn dependency
from func nod.

Updates #15756
Updates #19683

Change-Id: Ib3149ca4b0740e9f6eea44babc6f34cdd63028a9
Reviewed-on: https://go-review.googlesource.com/38592
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-25 00:09:21 +00:00
Keith Randall
27bc723b51 cmd/compile: initialize loop depths
Regalloc uses loop depths - make sure they are initialized!

Test to make sure we aren't pushing spills into loops.

This fixes a generated-code performance bug introduced with
the better spill placement change:
https://go-review.googlesource.com/c/34822/

Update #19595

Change-Id: Ib9f0da6fb588503518847d7aab51e569fd3fa61e
Reviewed-on: https://go-review.googlesource.com/38434
Reviewed-by: David Chase <drchase@google.com>
2017-03-23 16:41:22 +00:00
Josh Bleecher Snyder
aea3aff669 cmd/compile: separate ssa.Frontend and ssa.TypeSource
Prior to this CL, the ssa.Frontend field was responsible
for providing types to the backend during compilation.
However, the types needed by the backend are few and static.
It makes more sense to use a struct for them
and to hang that struct off the ssa.Config,
which is the correct home for readonly data.
Now that Types is a struct, we can clean up the names a bit as well.

This has the added benefit of allowing early construction
of all types needed by the backend.
This will be useful for concurrent backend compilation.

Passes toolstash-check -all. No compiler performance change.

Updates #15756

Change-Id: I021658c8cf2836d6a22bbc20cc828ac38c7da08a
Reviewed-on: https://go-review.googlesource.com/38336
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-19 00:21:23 +00:00
Josh Bleecher Snyder
2cdb7f118a cmd/compile: move Frontend field from ssa.Config to ssa.Func
Suggested by mdempsky in CL 38232.
This allows us to use the Frontend field
to associate frontend state and information
with a function.
See the following CL in the series for examples.

This is a giant CL, but it is almost entirely routine refactoring.

The ssa test API is starting to feel a bit unwieldy.
I will clean it up separately, once the dust has settled.

Passes toolstash -cmp.

Updates #15756

Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf
Reviewed-on: https://go-review.googlesource.com/38327
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 23:18:57 +00:00
Josh Bleecher Snyder
a5e3cac895 cmd/compile: rearrange fields between ssa.Func, ssa.Cache, and ssa.Config
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill
the roles laid out for them in CL 38160.

The only non-trivial change in this CL is how cached
values and blocks get IDs. Prior to this CL, their IDs were
assigned as part of resetting the cache, and only modified
IDs were reset. This required knowing how many values and
blocks were modified, which required a tight coupling between
ssa.Func and ssa.Config. To eliminate that coupling,
we now zero values and blocks during reset,
and assign their IDs when they are used.
Since unused values and blocks have ID == 0,
we can efficiently find the last used value/block,
to avoid zeroing everything.
Bulk zeroing is efficient, but not efficient enough
to obviate the need to avoid zeroing everything every time.
As a happy side-effect, ssa.Func.Free is no longer necessary.

DebugHashMatch and friends now belong in func.go.
They have been left in place for clarity and review.
I will move them in a subsequent CL.

Passes toolstash -cmp. No compiler performance impact.
No change in 'go test cmd/compile/internal/ssa' execution time.

Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098
Reviewed-on: https://go-review.googlesource.com/38167
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-17 05:21:42 +00:00
David Chase
886e9e6065 cmd/compile: put spills in better places
Previously we always issued a spill right after the op
that was being spilled.  This CL pushes spills father away
from the generator, hopefully pushing them into unlikely branches.
For example:

  x = ...
  if unlikely {
    call ...
  }
  ... use x ...

Used to compile to

  x = ...
  spill x
  if unlikely {
    call ...
    restore x
  }

It now compiles to

  x = ...
  if unlikely {
    spill x
    call ...
    restore x
  }

This is particularly useful for code which appends, as the only
call is an unlikely call to growslice.  It also helps for the
spills needed around write barrier calls.

The basic algorithm is walk down the dominator tree following a
path where the block still dominates all of the restores.  We're
looking for a block that:
 1) dominates all restores
 2) has the value being spilled in a register
 3) has a loop depth no deeper than the value being spilled

The walking-down code is iterative.  I was forced to limit it to
searching 100 blocks so it doesn't become O(n^2).  Maybe one day
we'll find a better way.

I had to delete most of David's code which pushed spills out of loops.
I suspect this CL subsumes most of the cases that his code handled.

Generally positive performance improvements, but hard to tell for sure
with all the noise.  (compilebench times are unchanged.)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.91s ±15%     2.80s ±12%    ~     (p=0.063 n=10+10)
Fannkuch11-12                3.47s ± 0%     3.30s ± 4%  -4.91%   (p=0.000 n=9+10)
FmtFprintfEmpty-12          48.0ns ± 1%    47.4ns ± 1%  -1.32%    (p=0.002 n=9+9)
FmtFprintfString-12         85.6ns ±11%    79.4ns ± 3%  -7.27%  (p=0.005 n=10+10)
FmtFprintfInt-12            91.8ns ±10%    85.9ns ± 4%    ~      (p=0.203 n=10+9)
FmtFprintfIntInt-12          135ns ±13%     127ns ± 1%  -5.72%   (p=0.025 n=10+9)
FmtFprintfPrefixedInt-12     167ns ± 1%     168ns ± 2%    ~      (p=0.580 n=9+10)
FmtFprintfFloat-12           249ns ±11%     230ns ± 1%  -7.32%  (p=0.000 n=10+10)
FmtManyArgs-12               504ns ± 7%     506ns ± 1%    ~       (p=0.198 n=9+9)
GobDecode-12                6.95ms ± 1%    7.04ms ± 1%  +1.37%  (p=0.001 n=10+10)
GobEncode-12                6.32ms ±13%    6.04ms ± 1%    ~     (p=0.063 n=10+10)
Gzip-12                      233ms ± 1%     235ms ± 0%  +1.01%   (p=0.000 n=10+9)
Gunzip-12                   40.1ms ± 1%    39.6ms ± 0%  -1.12%   (p=0.000 n=10+8)
HTTPClientServer-12          227µs ± 9%     221µs ± 5%    ~       (p=0.114 n=9+8)
JSONEncode-12               16.1ms ± 2%    15.8ms ± 1%  -2.09%    (p=0.002 n=9+8)
JSONDecode-12               61.8ms ±11%    57.9ms ± 1%  -6.30%   (p=0.000 n=10+9)
Mandelbrot200-12            4.30ms ± 3%    4.28ms ± 1%    ~      (p=0.203 n=10+8)
GoParse-12                  3.18ms ± 2%    3.18ms ± 2%    ~     (p=0.579 n=10+10)
RegexpMatchEasy0_32-12      76.7ns ± 1%    77.5ns ± 1%  +0.92%    (p=0.002 n=9+8)
RegexpMatchEasy0_1K-12       239ns ± 3%     239ns ± 1%    ~     (p=0.204 n=10+10)
RegexpMatchEasy1_32-12      71.4ns ± 1%    70.6ns ± 0%  -1.15%   (p=0.000 n=10+9)
RegexpMatchEasy1_1K-12       383ns ± 2%     390ns ±10%    ~       (p=0.181 n=8+9)
RegexpMatchMedium_32-12      114ns ± 0%     113ns ± 1%  -0.88%    (p=0.000 n=9+8)
RegexpMatchMedium_1K-12     36.3µs ± 1%    36.8µs ± 1%  +1.59%   (p=0.000 n=10+8)
RegexpMatchHard_32-12       1.90µs ± 1%    1.90µs ± 1%    ~     (p=0.341 n=10+10)
RegexpMatchHard_1K-12       59.4µs ±11%    57.8µs ± 1%    ~      (p=0.968 n=10+9)
Revcomp-12                   461ms ± 1%     462ms ± 1%    ~       (p=1.000 n=9+9)
Template-12                 67.5ms ± 1%    66.3ms ± 1%  -1.77%   (p=0.000 n=10+8)
TimeParse-12                 314ns ± 3%     309ns ± 0%  -1.56%    (p=0.000 n=9+8)
TimeFormat-12                340ns ± 2%     331ns ± 1%  -2.79%  (p=0.000 n=10+10)

The go binary is 0.2% larger.  Not really sure why the size
would change.

Change-Id: Ia5116e53a3aeb025ef350ffc51c14ae5cc17871c
Reviewed-on: https://go-review.googlesource.com/34822
Reviewed-by: David Chase <drchase@google.com>
2017-03-15 02:09:25 +00:00
Cherry Zhang
8a44c8efae cmd/compile: don't spill rematerializeable value when resolving merge edges
Fixes #19515.

Change-Id: I4bcce152cef52d00fbb5ab4daf72a6e742bae27c
Reviewed-on: https://go-review.googlesource.com/38158
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-14 22:55:52 +00:00
Josh Bleecher Snyder
57546d67ec cmd/compile: add reusable []Location to ssa.Config
name       old time/op      new time/op      delta
Template        218ms ± 3%       214ms ± 3%  -1.70%  (p=0.000 n=30+30)
Unicode         100ms ± 3%       100ms ± 4%    ~     (p=0.614 n=29+30)
GoTypes         657ms ± 1%       660ms ± 3%  +0.46%  (p=0.046 n=29+30)
Compiler        2.80s ± 2%       2.80s ± 1%    ~     (p=0.451 n=28+29)
Flate           131ms ± 2%       132ms ± 4%    ~     (p=1.000 n=29+29)
GoParser        159ms ± 3%       160ms ± 5%    ~     (p=0.341 n=28+30)
Reflect         406ms ± 3%       408ms ± 4%    ~     (p=0.511 n=28+30)
Tar             118ms ± 4%       118ms ± 4%    ~     (p=0.827 n=29+30)
XML             222ms ± 6%       222ms ± 3%    ~     (p=0.532 n=30+30)

name       old user-ns/op   new user-ns/op   delta
Template   274user-ms ± 3%  272user-ms ± 3%  -0.87%  (p=0.015 n=29+30)
Unicode    140user-ms ± 4%  140user-ms ± 3%    ~     (p=0.735 n=29+30)
GoTypes    890user-ms ± 1%  897user-ms ± 2%  +0.88%  (p=0.002 n=29+30)
Compiler   3.88user-s ± 2%  3.89user-s ± 1%    ~     (p=0.132 n=30+29)
Flate      168user-ms ± 2%  157user-ms ± 4%  -6.21%  (p=0.000 n=25+28)
GoParser   211user-ms ± 2%  213user-ms ± 5%    ~     (p=0.086 n=28+30)
Reflect    539user-ms ± 2%  541user-ms ± 3%    ~     (p=0.267 n=27+29)
Tar        156user-ms ± 7%  155user-ms ± 5%    ~     (p=0.708 n=30+30)
XML        291user-ms ± 5%  294user-ms ± 3%  +0.83%  (p=0.029 n=29+30)

name       old alloc/op     new alloc/op     delta
Template       40.7MB ± 0%      39.4MB ± 0%  -3.26%  (p=0.000 n=29+26)
Unicode        30.8MB ± 0%      30.7MB ± 0%  -0.40%  (p=0.000 n=28+30)
GoTypes         123MB ± 0%       119MB ± 0%  -3.47%  (p=0.000 n=30+29)
Compiler        472MB ± 0%       455MB ± 0%  -3.60%  (p=0.000 n=30+30)
Flate          26.5MB ± 0%      25.6MB ± 0%  -3.21%  (p=0.000 n=28+30)
GoParser       32.3MB ± 0%      31.4MB ± 0%  -2.98%  (p=0.000 n=29+30)
Reflect        84.4MB ± 0%      82.1MB ± 0%  -2.83%  (p=0.000 n=30+30)
Tar            27.3MB ± 0%      26.5MB ± 0%  -2.70%  (p=0.000 n=29+29)
XML            44.6MB ± 0%      43.1MB ± 0%  -3.49%  (p=0.000 n=30+30)

name       old allocs/op    new allocs/op    delta
Template         401k ± 1%        399k ± 0%  -0.35%  (p=0.000 n=30+28)
Unicode          331k ± 0%        331k ± 1%    ~     (p=0.907 n=28+30)
GoTypes         1.24M ± 0%       1.23M ± 0%  -0.43%  (p=0.000 n=30+30)
Compiler        4.26M ± 0%       4.25M ± 0%  -0.34%  (p=0.000 n=29+30)
Flate            252k ± 1%        251k ± 1%  -0.41%  (p=0.000 n=30+30)
GoParser         325k ± 1%        324k ± 1%  -0.31%  (p=0.000 n=27+30)
Reflect         1.06M ± 0%       1.05M ± 0%  -0.69%  (p=0.000 n=30+30)
Tar              266k ± 1%        265k ± 1%  -0.51%  (p=0.000 n=29+30)
XML              416k ± 1%        415k ± 1%  -0.36%  (p=0.002 n=30+30)

Change-Id: I8f784001324df83b2764c44f0e83a540e5beab34
Reviewed-on: https://go-review.googlesource.com/36212
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-02 22:39:32 +00:00
Russ Cox
47ce87877b all: merge dev.inline into master
Change-Id: I7715581a04e513dcda9918e853fa6b1ddc703770
2017-02-01 09:47:23 -05:00
Robert Griesemer
472c792e0a [dev.inline] cmd/internal/src: introduce compact source position representation
XPos is a compact (8 instead of 16 bytes on a 64bit machine) source
position representation. There is a 1:1 correspondence between each
XPos and each regular Pos, translated via a global table.

In some sense this brings back the LineHist, though positions can
track line and column information; there is a O(1) translation
between the representations (no binary search), and the translation
is factored out.

The size increase with the prior change is brought down again and
the compiler speed is in line with the master repo (measured on
the same "quiet" machine as for prior change):

name       old time/op     new time/op     delta
Template       256ms ± 1%      262ms ± 2%    ~             (p=0.063 n=5+4)
Unicode        132ms ± 1%      135ms ± 2%    ~             (p=0.063 n=5+4)
GoTypes        891ms ± 1%      871ms ± 1%  -2.28%          (p=0.016 n=5+4)
Compiler       3.84s ± 2%      3.89s ± 2%    ~             (p=0.413 n=5+4)
MakeBash       47.1s ± 1%      46.2s ± 2%    ~             (p=0.095 n=5+5)

name       old user-ns/op  new user-ns/op  delta
Template        309M ± 1%       314M ± 2%    ~             (p=0.111 n=5+4)
Unicode         165M ± 1%       172M ± 9%    ~             (p=0.151 n=5+5)
GoTypes        1.14G ± 2%      1.12G ± 1%    ~             (p=0.063 n=5+4)
Compiler       5.00G ± 1%      4.96G ± 1%    ~             (p=0.286 n=5+4)

Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47
Reviewed-on: https://go-review.googlesource.com/34506
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-09 22:43:22 +00:00
shawnps
067bab00a8 all: fix misspellings
Change-Id: I429637ca91f7db4144f17621de851a548dc1ce76
Reviewed-on: https://go-review.googlesource.com/34923
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-07 16:53:25 +00:00
Robert Griesemer
eaca0e0529 [dev.inline] cmd/internal/src: introduce NoPos and use it instead Pos{}
Using a variable instead of a composite literal makes
the code independent of implementation changes of Pos.

Per David Lazar's suggestion.

Change-Id: I336967ac12a027c51a728a58ac6207cb5119af4a
Reviewed-on: https://go-review.googlesource.com/34148
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 00:35:07 +00:00
Robert Griesemer
c10499b539 [dev.inline] cmd/compile/internal/ssa: another round of renames from line -> pos (cleanup)
Mostly mechanical renames. Make variable names consistent with use.

Change-Id: Iaa89d31deab11eca6e784595b58e779ad525c8a3
Reviewed-on: https://go-review.googlesource.com/34146
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-12-08 23:10:30 +00:00
Robert Griesemer
cfd17f51c8 [dev.inline] cmd/compile/internal/ssa: rename various fields from Line to Pos
This is a mostly mechanical rename followed by manual fixes where necessary.

Change-Id: Ie5c670b133db978f15dc03e50dc2da0c80fc8842
Reviewed-on: https://go-review.googlesource.com/34137
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:36:52 +00:00
Robert Griesemer
82d0caea2c [dev.inline] cmd/internal/src: make Pos implementation abstract
Adjust cmd/compile accordingly.

This will make it easier to replace the underlying implementation.

Change-Id: I33645850bb18c839b24785b6222a9e028617addb
Reviewed-on: https://go-review.googlesource.com/34133
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:31:28 +00:00
Robert Griesemer
24597c080b [dev.inline] cmd/compile: introduce cmd/internal/src.Pos type for line numbers
This is a step toward chosing a different position representation.
By introducing an explicit type, it will be easier to make the
transition step-wise while ensuring everything keeps running.

This has been reviewed via https://go-review.googlesource.com/#/c/34025/.

Change-Id: Ibceddcd62d8f346321ac3250e3940e9c436ed684
Reviewed-on: https://go-review.googlesource.com/34132
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:26:25 +00:00
Cherry Zhang
348275cda6 cmd/compile: make a copy of Phi input if it is still live
Register of Phi input is allocated to the Phi. So if the Phi
input is still live after Phi, we may need to use a spill. In
this case, copy the Phi input to a spare register to avoid a
spill.

Originally targeted the code in issue #16187, and this CL
indeed removes the spill, but doesn't seem to help on benchmark
result. It may help in general, though.

On AMD64:
name                      old time/op    new time/op    delta
BinaryTree17-12              2.79s ± 1%     2.76s ± 0%  -1.33%  (p=0.000 n=10+10)
Fannkuch11-12                3.02s ± 0%     3.14s ± 0%  +3.99%  (p=0.000 n=10+10)
FmtFprintfEmpty-12          51.2ns ± 0%    51.4ns ± 3%    ~      (p=0.368 n=8+10)
FmtFprintfString-12          145ns ± 0%     144ns ± 0%  -0.69%    (p=0.000 n=6+9)
FmtFprintfInt-12             127ns ± 1%     124ns ± 1%  -2.79%   (p=0.000 n=10+9)
FmtFprintfIntInt-12          186ns ± 0%     184ns ± 0%  -1.34%   (p=0.000 n=10+9)
FmtFprintfPrefixedInt-12     196ns ± 0%     194ns ± 0%  -0.97%    (p=0.000 n=9+9)
FmtFprintfFloat-12           293ns ± 2%     287ns ± 0%  -2.00%   (p=0.000 n=10+9)
FmtManyArgs-12               847ns ± 1%     829ns ± 0%  -2.17%   (p=0.000 n=10+7)
GobDecode-12                7.17ms ± 0%    7.18ms ± 0%    ~     (p=0.123 n=10+10)
GobEncode-12                6.08ms ± 1%    6.08ms ± 0%    ~      (p=0.497 n=10+9)
Gzip-12                      277ms ± 1%     275ms ± 1%  -0.47%   (p=0.028 n=10+9)
Gunzip-12                   39.1ms ± 2%    38.2ms ± 1%  -2.20%   (p=0.000 n=10+9)
HTTPClientServer-12         90.9µs ± 4%    87.7µs ± 2%  -3.51%   (p=0.001 n=9+10)
JSONEncode-12               17.3ms ± 1%    16.5ms ± 0%  -5.02%    (p=0.000 n=9+9)
JSONDecode-12               54.6ms ± 1%    54.1ms ± 0%  -0.99%    (p=0.000 n=9+9)
Mandelbrot200-12            4.45ms ± 0%    4.45ms ± 0%  -0.02%    (p=0.006 n=8+9)
GoParse-12                  3.44ms ± 0%    3.48ms ± 1%  +0.95%  (p=0.000 n=10+10)
RegexpMatchEasy0_32-12      84.9ns ± 0%    85.0ns ± 0%    ~       (p=0.241 n=8+8)
RegexpMatchEasy0_1K-12       867ns ± 3%     915ns ±11%  +5.55%  (p=0.037 n=10+10)
RegexpMatchEasy1_32-12      82.7ns ± 5%    83.9ns ± 4%    ~      (p=0.161 n=9+10)
RegexpMatchEasy1_1K-12       361ns ± 1%     363ns ± 0%    ~      (p=0.098 n=10+8)
RegexpMatchMedium_32-12      126ns ± 0%     126ns ± 1%    ~      (p=0.549 n=8+10)
RegexpMatchMedium_1K-12     38.8µs ± 0%    39.1µs ± 0%  +0.67%    (p=0.000 n=9+8)
RegexpMatchHard_32-12       1.95µs ± 0%    1.96µs ± 0%  +0.43%    (p=0.000 n=9+9)
RegexpMatchHard_1K-12       59.0µs ± 0%    59.1µs ± 0%  +0.27%   (p=0.000 n=10+9)
Revcomp-12                   436ms ± 1%     431ms ± 1%  -1.19%  (p=0.005 n=10+10)
Template-12                 56.7ms ± 1%    57.1ms ± 1%  +0.71%   (p=0.001 n=10+9)
TimeParse-12                 312ns ± 0%     310ns ± 0%  -0.80%   (p=0.000 n=10+9)
TimeFormat-12                336ns ± 0%     332ns ± 0%  -1.19%    (p=0.000 n=8+7)
[Geo mean]                  59.2µs         58.9µs       -0.42%

On PPC64:
name                     old time/op    new time/op    delta
BinaryTree17-2              4.67s ± 2%     4.71s ± 1%    ~     (p=0.421 n=5+5)
Fannkuch11-2                3.92s ± 1%     3.94s ± 0%  +0.46%  (p=0.032 n=5+5)
FmtFprintfEmpty-2           122ns ± 0%     120ns ± 2%  -1.80%  (p=0.016 n=4+5)
FmtFprintfString-2          305ns ± 1%     299ns ± 1%  -1.84%  (p=0.008 n=5+5)
FmtFprintfInt-2             243ns ± 0%     241ns ± 1%  -0.66%  (p=0.016 n=4+5)
FmtFprintfIntInt-2          361ns ± 1%     356ns ± 1%  -1.49%  (p=0.016 n=5+5)
FmtFprintfPrefixedInt-2     355ns ± 1%     357ns ± 1%    ~     (p=0.333 n=5+5)
FmtFprintfFloat-2           502ns ± 2%     498ns ± 1%    ~     (p=0.151 n=5+5)
FmtManyArgs-2              1.55µs ± 2%    1.59µs ± 1%  +2.52%  (p=0.008 n=5+5)
GobDecode-2                13.0ms ± 1%    13.0ms ± 1%    ~     (p=0.841 n=5+5)
GobEncode-2                11.8ms ± 1%    11.8ms ± 1%    ~     (p=0.690 n=5+5)
Gzip-2                      499ms ± 1%     503ms ± 0%    ~     (p=0.421 n=5+5)
Gunzip-2                   86.5ms ± 0%    86.4ms ± 1%    ~     (p=0.841 n=5+5)
HTTPClientServer-2         68.2µs ± 2%    69.6µs ± 3%    ~     (p=0.151 n=5+5)
JSONEncode-2               39.0ms ± 1%    37.2ms ± 1%  -4.65%  (p=0.008 n=5+5)
JSONDecode-2                122ms ± 1%     126ms ± 1%  +2.63%  (p=0.008 n=5+5)
Mandelbrot200-2            6.08ms ± 1%    5.89ms ± 1%  -3.06%  (p=0.008 n=5+5)
GoParse-2                  5.95ms ± 2%    5.98ms ± 1%    ~     (p=0.421 n=5+5)
RegexpMatchEasy0_32-2       331ns ± 1%     328ns ± 1%    ~     (p=0.056 n=5+5)
RegexpMatchEasy0_1K-2      1.45µs ± 0%    1.47µs ± 0%  +1.13%  (p=0.008 n=5+5)
RegexpMatchEasy1_32-2       359ns ± 0%     353ns ± 0%  -1.84%  (p=0.008 n=5+5)
RegexpMatchEasy1_1K-2      1.79µs ± 0%    1.81µs ± 1%  +1.16%  (p=0.008 n=5+5)
RegexpMatchMedium_32-2      420ns ± 2%     413ns ± 0%  -1.72%  (p=0.008 n=5+5)
RegexpMatchMedium_1K-2     70.2µs ± 1%    69.5µs ± 1%  -1.09%  (p=0.032 n=5+5)
RegexpMatchHard_32-2       3.87µs ± 1%    3.65µs ± 0%  -5.86%  (p=0.008 n=5+5)
RegexpMatchHard_1K-2        111µs ± 0%     105µs ± 0%  -5.49%  (p=0.016 n=5+4)
Revcomp-2                   1.00s ± 1%     1.01s ± 2%    ~     (p=0.151 n=5+5)
Template-2                  113ms ± 1%     113ms ± 2%    ~     (p=0.841 n=5+5)
TimeParse-2                 555ns ± 0%     550ns ± 1%  -0.87%  (p=0.032 n=5+5)
TimeFormat-2                736ns ± 1%     704ns ± 1%  -4.35%  (p=0.008 n=5+5)
[Geo mean]                  120µs          119µs       -0.77%

Reduce "spilled value remains" by 0.6% in cmd/go on AMD64.

Change-Id: If655df343b0f30d1a49ab1ab644f10c698b96f3e
Reviewed-on: https://go-review.googlesource.com/32442
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-11-18 13:56:23 +00:00
Cherry Zhang
f9238a76ff cmd/compile: make LR allocatable in non-leaf functions on ARM
The mechanism is initially introduced (and reviewed) in CL 30597
on S390X.

Reduce number of "spilled value remains" by 0.4% in cmd/go.

Disabled on ARMv5 because LR is clobbered almost everywhere with
inserted softfloat calls.

Change-Id: I2934737ce2455909647ed2118fe2bd6f0aa5ac52
Reviewed-on: https://go-review.googlesource.com/32178
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-10-28 14:25:33 +00:00
Michael Munday
15817e409b cmd/compile: make link register allocatable in non-leaf functions
We save and restore the link register in non-leaf functions because
it is clobbered by CALLs. It is therefore available for general
purpose use.

Only enabled on s390x currently. The RC4 benchmarks in particular
benefit from the extra register:

name     old speed     new speed     delta
RC4_128  243MB/s ± 2%  341MB/s ± 2%  +40.46%  (p=0.008 n=5+5)
RC4_1K   267MB/s ± 0%  359MB/s ± 1%  +34.32%  (p=0.008 n=5+5)
RC4_8K   271MB/s ± 0%  362MB/s ± 0%  +33.61%  (p=0.008 n=5+5)

Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f
Reviewed-on: https://go-review.googlesource.com/30597
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-11 18:52:35 +00:00
Keith Randall
93d5f43a29 cmd/compile: do regalloc check only when checkEnabled
No point doing this check all the time.

Fixes #15621

Change-Id: I1966c061986fe98fe9ebe146d6b9738c13cef724
Reviewed-on: https://go-review.googlesource.com/30670
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-07 17:33:15 +00:00
Keith Randall
1bddd2ee6a cmd/compile: don't shuffle rematerializeable values around
Better to just rematerialize them when needed instead of
cross-register spilling or other techniques for keeping them in
registers.

This helps for amd64 code that does 1 << x. It is better to do
  loop:
    MOVQ $1, AX  // materialize arg to SLLQ
    SLLQ CX, AX
    ...
    goto loop
than to do
  MOVQ $1, AX    // materialize outsize of loop
  loop:
    MOVQ AX, DX  // save value that's about to be clobbered
    SLLQ CX, AX
    MOVQ DX, AX  // move it back to the correct register
    goto loop

Update #16092

Change-Id: If7ac290208f513061ebb0736e8a79dcb0ba338c0
Reviewed-on: https://go-review.googlesource.com/30471
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-06 02:46:43 +00:00
Matthew Dempsky
6dc356a76a cmd/compile/internal/ssa: erase register copies deterministically
Fixes #17288.

Change-Id: I2ddd01d14667d5c6a2e19bd70489da8d9869d308
Reviewed-on: https://go-review.googlesource.com/30072
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-29 22:22:36 +00:00
Cherry Zhang
f876fb9bae cmd/compile: move value around before kick it out of register
When allocating registers, before kicking out the existing value,
copy it to a spare register if there is one. So later use of this
value can be found in register instead of reload from spill. This
is very helpful for instructions of which the input and/or output
can only be in specific registers, e.g. DIV on x86, MUL/DIV on
MIPS. May also be helpful in general.

For "go build -a cmd/go" on AMD64, reduce "spilled value remains"
by 1% (not including args, which almost certainly remain).

For the code in issue #16061 on AMD64:
MaxRem-12   111µs ± 1%    94µs ± 0%  -15.38%  (p=0.008 n=5+5)

Go1 benchmark on AMD64:
BinaryTree17-12              2.32s ± 2%     2.30s ± 1%    ~     (p=0.421 n=5+5)
Fannkuch11-12                2.52s ± 0%     2.44s ± 0%  -3.44%  (p=0.008 n=5+5)
FmtFprintfEmpty-12          39.9ns ± 3%    39.8ns ± 0%    ~     (p=0.635 n=5+4)
FmtFprintfString-12          114ns ± 1%     113ns ± 1%    ~     (p=0.905 n=5+5)
FmtFprintfInt-12             102ns ± 6%      98ns ± 1%    ~     (p=0.087 n=5+5)
FmtFprintfIntInt-12          146ns ± 5%     147ns ± 1%    ~     (p=0.238 n=5+5)
FmtFprintfPrefixedInt-12     155ns ± 2%     151ns ± 1%  -2.58%  (p=0.008 n=5+5)
FmtFprintfFloat-12           231ns ± 1%     232ns ± 1%    ~     (p=0.286 n=5+5)
FmtManyArgs-12               657ns ± 1%     649ns ± 0%  -1.31%  (p=0.008 n=5+5)
GobDecode-12                6.35ms ± 0%    6.29ms ± 1%    ~     (p=0.056 n=5+5)
GobEncode-12                5.38ms ± 1%    5.45ms ± 1%    ~     (p=0.056 n=5+5)
Gzip-12                      209ms ± 0%     209ms ± 1%    ~     (p=0.690 n=5+5)
Gunzip-12                   31.2ms ± 1%    31.1ms ± 1%    ~     (p=0.548 n=5+5)
HTTPClientServer-12          123µs ± 4%     130µs ± 8%    ~     (p=0.151 n=5+5)
JSONEncode-12               14.0ms ± 1%    14.0ms ± 1%    ~     (p=0.421 n=5+5)
JSONDecode-12               41.2ms ± 1%    41.1ms ± 2%    ~     (p=0.421 n=5+5)
Mandelbrot200-12            3.96ms ± 1%    3.98ms ± 0%    ~     (p=0.421 n=5+5)
GoParse-12                  2.88ms ± 1%    2.88ms ± 1%    ~     (p=0.841 n=5+5)
RegexpMatchEasy0_32-12      68.0ns ± 3%    66.6ns ± 1%  -2.00%  (p=0.024 n=5+5)
RegexpMatchEasy0_1K-12       728ns ± 8%     682ns ± 1%  -6.26%  (p=0.008 n=5+5)
RegexpMatchEasy1_32-12      66.8ns ± 2%    66.0ns ± 1%    ~     (p=0.302 n=5+5)
RegexpMatchEasy1_1K-12       291ns ± 2%     288ns ± 1%    ~     (p=0.111 n=5+5)
RegexpMatchMedium_32-12      103ns ± 2%     100ns ± 0%  -2.53%  (p=0.016 n=5+4)
RegexpMatchMedium_1K-12     31.9µs ± 1%    31.3µs ± 0%  -1.75%  (p=0.008 n=5+5)
RegexpMatchHard_32-12       1.59µs ± 2%    1.59µs ± 1%    ~     (p=0.548 n=5+5)
RegexpMatchHard_1K-12       48.3µs ± 2%    47.7µs ± 1%    ~     (p=0.222 n=5+5)
Revcomp-12                   340ms ± 1%     338ms ± 1%    ~     (p=0.421 n=5+5)
Template-12                 46.3ms ± 1%    46.5ms ± 1%    ~     (p=0.690 n=5+5)
TimeParse-12                 252ns ± 1%     247ns ± 0%  -1.91%  (p=0.000 n=5+4)
TimeFormat-12                277ns ± 1%     267ns ± 0%  -3.82%  (p=0.008 n=5+5)
[Geo mean]                  48.8µs         48.3µs       -0.93%

It has very little effect on binary size and compiler speed.
compilebench:
Template       230ms ±10%      231ms ± 8%    ~             (p=0.546 n=9+9)
Unicode        123ms ± 6%      124ms ± 9%    ~           (p=0.481 n=10+10)
GoTypes        742ms ± 6%      755ms ± 3%    ~           (p=0.123 n=10+10)
Compiler       3.10s ± 3%      3.08s ± 1%    ~           (p=0.631 n=10+10)

Fixes #16061.

Change-Id: Id99cdc7a182ee10a704fa0f04e8e0d0809b2ac56
Reviewed-on: https://go-review.googlesource.com/29732
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-27 15:44:58 +00:00
David Chase
cddddbc623 cmd/compile: use ISEL, cleanup use of zero & extensions
Abandoned earlier efforts to expose zero register,
but left it in numbering to decrease squirrelyness of
register allocator.

ISELrelOp used in code generation of bool := x relOp y.
Some patterns added to better elide zero case and
some sign extension.

Updates: #17109

Change-Id: Ida7839f0023ca8f0ffddc0545f0ac269e65b05d9
Reviewed-on: https://go-review.googlesource.com/29380
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-09-22 17:36:39 +00:00
Keith Randall
75ce89c20d cmd/compile: cache CFG-dependent computations
We compute a lot of stuff based off the CFG: postorder traversal,
dominators, dominator tree, loop nest.  Multiple phases use this
information and we end up recomputing some of it.  Add a cache
for this information so if the CFG hasn't changed, we can reuse
the previous computation.

Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2
Reviewed-on: https://go-review.googlesource.com/29356
Reviewed-by: David Chase <drchase@google.com>
2016-09-19 16:00:13 +00:00
Keith Randall
833ed7c431 cmd/compile: reorganize SSA register numbering
Teach SSA about the cmd/internal/obj/$ARCH register numbering.
It can then return that numbering when requested.  Each architecture
now does not need to know anything about the internal SSA numbering
of registers.

Change-Id: I34472a2736227c15482e60994eebcdd2723fa52d
Reviewed-on: https://go-review.googlesource.com/29249
Reviewed-by: David Chase <drchase@google.com>
2016-09-16 19:01:55 +00:00