33 Commits

Author SHA1 Message Date
Keith Randall
db9341a024 cmd/compile: update WBLoads during deadcode
When we deadcode-remove a block which is a write barrier test,
remove that block from the list of write barrier test blocks.

Fixes #25516

Change-Id: I1efe732d5476003eab4ad6bf67d0340d7874ff0c
Reviewed-on: https://go-review.googlesource.com/115037
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-29 17:45:36 +00:00
David Chase
c2c1822b12 cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).

Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).

Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.

Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".

The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices.  This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.

Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go.  There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.

Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)

This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.

The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.

This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.

Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-14 14:09:49 +00:00
Alberto Donizetti
9357bb9eba cmd/compile: stack-allocate worklist in ReachableBlocks
Stack-allocate a local worklist in the deadcode pass. A size of 64 for
the pre-allocation is enough for >99% of the ReachableBlocks call in
a typical package.

name      old time/op       new time/op       delta
Template        281ms ± 3%        278ms ± 2%  -1.03%  (p=0.049 n=20+20)
Unicode         135ms ± 6%        134ms ± 6%    ~     (p=0.273 n=18+17)
GoTypes         882ms ± 3%        880ms ± 2%    ~     (p=0.925 n=20+20)
Compiler        4.01s ± 1%        4.02s ± 2%    ~     (p=0.640 n=20+20)
SSA             9.61s ± 1%        9.75s ± 1%  +1.39%  (p=0.000 n=20+19)
Flate           186ms ± 5%        185ms ± 7%    ~     (p=0.758 n=20+20)
GoParser        219ms ± 5%        218ms ± 4%    ~     (p=0.149 n=20+20)
Reflect         568ms ± 4%        562ms ± 1%    ~     (p=0.154 n=19+19)
Tar             258ms ± 2%        257ms ± 3%    ~     (p=0.428 n=19+20)
XML             316ms ± 2%        317ms ± 3%    ~     (p=0.901 n=20+19)

name      old user-time/op  new user-time/op  delta
Template        398ms ± 6%        388ms ± 6%  -2.55%  (p=0.007 n=20+20)
Unicode         217ms ± 5%        213ms ± 6%  -1.90%  (p=0.036 n=17+20)
GoTypes         1.21s ± 3%        1.20s ± 3%  -0.89%  (p=0.022 n=19+20)
Compiler        5.56s ± 3%        5.53s ± 5%    ~     (p=0.779 n=20+20)
SSA             13.9s ± 5%        14.0s ± 4%    ~     (p=0.529 n=20+20)
Flate           248ms ±10%        252ms ± 4%    ~     (p=0.409 n=20+18)
GoParser        305ms ± 4%        299ms ± 5%  -1.87%  (p=0.007 n=19+20)
Reflect         754ms ± 2%        747ms ± 3%    ~     (p=0.107 n=20+19)
Tar             360ms ± 5%        362ms ± 3%    ~     (p=0.534 n=20+18)
XML             425ms ± 6%        429ms ± 4%    ~     (p=0.496 n=20+19)

name      old alloc/op      new alloc/op      delta
Template       38.8MB ± 0%       38.7MB ± 0%  -0.15%  (p=0.000 n=20+20)
Unicode        29.1MB ± 0%       29.1MB ± 0%  -0.03%  (p=0.000 n=20+20)
GoTypes         115MB ± 0%        115MB ± 0%  -0.13%  (p=0.000 n=20+20)
Compiler        491MB ± 0%        490MB ± 0%  -0.15%  (p=0.000 n=18+19)
SSA            1.40GB ± 0%       1.40GB ± 0%  -0.16%  (p=0.000 n=20+20)
Flate          24.9MB ± 0%       24.8MB ± 0%  -0.17%  (p=0.000 n=20+20)
GoParser       30.7MB ± 0%       30.6MB ± 0%  -0.16%  (p=0.000 n=20+20)
Reflect        77.1MB ± 0%       77.0MB ± 0%  -0.11%  (p=0.000 n=19+20)
Tar            39.0MB ± 0%       39.0MB ± 0%  -0.14%  (p=0.000 n=20+20)
XML            44.6MB ± 0%       44.5MB ± 0%  -0.13%  (p=0.000 n=17+19)

name      old allocs/op     new allocs/op     delta
Template         379k ± 0%         378k ± 0%  -0.45%  (p=0.000 n=20+17)
Unicode          336k ± 0%         336k ± 0%  -0.08%  (p=0.000 n=20+20)
GoTypes         1.18M ± 0%        1.17M ± 0%  -0.37%  (p=0.000 n=20+20)
Compiler        4.58M ± 0%        4.56M ± 0%  -0.38%  (p=0.000 n=20+20)
SSA             11.4M ± 0%        11.4M ± 0%  -0.39%  (p=0.000 n=20+20)
Flate            233k ± 0%         232k ± 0%  -0.51%  (p=0.000 n=20+20)
GoParser         313k ± 0%         312k ± 0%  -0.48%  (p=0.000 n=19+20)
Reflect          946k ± 0%         943k ± 0%  -0.31%  (p=0.000 n=20+20)
Tar              388k ± 0%         387k ± 0%  -0.40%  (p=0.000 n=20+20)
XML              411k ± 0%         409k ± 0%  -0.35%  (p=0.000 n=17+20)

Change-Id: Iaec0b9471ded61be5eb3c9d1074e804672307644
Reviewed-on: https://go-review.googlesource.com/104675
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-04-05 07:41:56 +00:00
Alberto Donizetti
b63b0f2b75 cmd/compile: allocate less in regalloc's liveValues
Instrumenting the compiler shows that, at the end of liveValues, the
values of the workList's cap are distributed as:

  cap	 freq
  1  	 0.006
  2  	 0.002
  4  	 0.237
  8  	 0.272
  16	 0.254
  32	 0.141
  64	 0.062
  128	 0.02
  256	 0.005
  512	 0.001
  1024	 0.0

Since the initial workList slice allocation is always on the stack
(as the variable does not escape), we can aggressively pre-allocate a
big backing array at (almost) no cost. This will save several
allocations in liveValues calls that end up having a large workList,
with no performance penalties for calls that have a small workList.

name      old time/op       new time/op       delta
Template        284ms ± 3%        282ms ± 3%    ~     (p=0.201 n=20+20)
Unicode         138ms ± 7%        138ms ± 7%    ~     (p=0.718 n=20+20)
GoTypes         905ms ± 2%        895ms ± 1%  -1.10%  (p=0.003 n=19+18)
Compiler        4.26s ± 1%        4.25s ± 1%  -0.38%  (p=0.038 n=20+19)
SSA             9.85s ± 2%        9.80s ± 1%    ~     (p=0.061 n=20+19)
Flate           187ms ± 6%        186ms ± 5%    ~     (p=0.289 n=20+20)
GoParser        227ms ± 3%        225ms ± 3%    ~     (p=0.072 n=20+20)
Reflect         578ms ± 2%        575ms ± 2%    ~     (p=0.059 n=18+20)
Tar             263ms ± 2%        265ms ± 3%    ~     (p=0.224 n=19+20)
XML             323ms ± 3%        325ms ± 2%    ~     (p=0.127 n=20+20)

name      old user-time/op  new user-time/op  delta
Template        406ms ± 6%        404ms ± 4%    ~     (p=0.314 n=20+20)
Unicode         220ms ± 6%        215ms ±11%    ~     (p=0.077 n=18+20)
GoTypes         1.25s ± 3%        1.24s ± 4%    ~     (p=0.461 n=20+20)
Compiler        5.95s ± 2%        5.84s ± 5%  -1.93%  (p=0.007 n=20+20)
SSA             14.4s ± 4%        14.2s ± 4%    ~     (p=0.108 n=20+20)
Flate           257ms ± 6%        252ms ± 9%    ~     (p=0.063 n=20+20)
GoParser        317ms ± 5%        312ms ± 6%  -1.85%  (p=0.049 n=20+20)
Reflect         779ms ± 2%        774ms ± 3%    ~     (p=0.253 n=20+20)
Tar             371ms ± 4%        374ms ± 4%    ~     (p=0.327 n=20+20)
XML             440ms ± 5%        442ms ± 5%    ~     (p=0.678 n=20+20)

name      old alloc/op      new alloc/op      delta
Template       39.4MB ± 0%       39.0MB ± 0%  -0.96%  (p=0.000 n=20+20)
Unicode        29.1MB ± 0%       29.0MB ± 0%  -0.13%  (p=0.000 n=20+20)
GoTypes         117MB ± 0%        116MB ± 0%  -0.88%  (p=0.000 n=20+20)
Compiler        502MB ± 0%        498MB ± 0%  -0.77%  (p=0.000 n=19+20)
SSA            1.42GB ± 0%       1.40GB ± 0%  -0.80%  (p=0.000 n=20+20)
Flate          25.3MB ± 0%       25.0MB ± 0%  -1.10%  (p=0.000 n=20+19)
GoParser       31.3MB ± 0%       31.0MB ± 0%  -1.05%  (p=0.000 n=20+20)
Reflect        77.9MB ± 0%       77.1MB ± 0%  -1.03%  (p=0.000 n=20+20)
Tar            40.0MB ± 0%       39.7MB ± 0%  -0.80%  (p=0.000 n=20+20)
XML            45.2MB ± 0%       44.9MB ± 0%  -0.72%  (p=0.000 n=20+20)

name      old allocs/op     new allocs/op     delta
Template         392k ± 0%         386k ± 0%  -1.44%  (p=0.000 n=20+20)
Unicode          337k ± 0%         337k ± 0%  -0.22%  (p=0.000 n=20+20)
GoTypes         1.22M ± 0%        1.20M ± 0%  -1.33%  (p=0.000 n=20+20)
Compiler        4.76M ± 0%        4.71M ± 0%  -1.12%  (p=0.000 n=20+20)
SSA             11.8M ± 0%        11.7M ± 0%  -1.00%  (p=0.000 n=20+20)
Flate            241k ± 0%         238k ± 0%  -1.49%  (p=0.000 n=20+20)
GoParser         324k ± 0%         320k ± 0%  -1.17%  (p=0.000 n=20+20)
Reflect          981k ± 0%         961k ± 0%  -2.11%  (p=0.000 n=20+20)
Tar              402k ± 0%         397k ± 0%  -1.29%  (p=0.000 n=20+20)
XML              424k ± 0%         419k ± 0%  -1.10%  (p=0.000 n=19+20)

Change-Id: If46667ae98eee2d47a615cad05e18df0629d8388
Reviewed-on: https://go-review.googlesource.com/102495
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-27 08:16:50 +00:00
philhofer
2d0172c3a7 cmd/compile/internal/ssa: emit csel on arm64
Introduce a new SSA pass to generate CondSelect intstrutions,
and add CondSelect lowering rules for arm64.

In order to make the CSEL instruction easier to optimize,
and to simplify the introduction of CSNEG, CSINC, and CSINV
in the future, modify the CSEL instruction to accept a condition
code in the aux field.

Notably, this change makes the go1 Gzip benchmark
more than 10% faster.

Benchmarks on a Cavium ThunderX:

name                      old time/op    new time/op    delta
BinaryTree17-96              15.9s ± 6%     16.0s ± 4%     ~     (p=0.968 n=10+9)
Fannkuch11-96                7.17s ± 0%     7.00s ± 0%   -2.43%  (p=0.000 n=8+9)
FmtFprintfEmpty-96           208ns ± 1%     207ns ± 0%     ~     (p=0.152 n=10+8)
FmtFprintfString-96          379ns ± 0%     375ns ± 0%   -0.95%  (p=0.000 n=10+9)
FmtFprintfInt-96             385ns ± 0%     383ns ± 0%   -0.52%  (p=0.000 n=9+10)
FmtFprintfIntInt-96          591ns ± 0%     586ns ± 0%   -0.85%  (p=0.006 n=7+9)
FmtFprintfPrefixedInt-96     656ns ± 0%     667ns ± 0%   +1.71%  (p=0.000 n=10+10)
FmtFprintfFloat-96           967ns ± 0%     984ns ± 0%   +1.78%  (p=0.000 n=10+10)
FmtManyArgs-96              2.35µs ± 0%    2.25µs ± 0%   -4.63%  (p=0.000 n=9+8)
GobDecode-96                31.0ms ± 0%    30.8ms ± 0%   -0.36%  (p=0.006 n=9+9)
GobEncode-96                24.4ms ± 0%    24.5ms ± 0%   +0.30%  (p=0.000 n=9+9)
Gzip-96                      1.60s ± 0%     1.43s ± 0%  -10.58%  (p=0.000 n=9+10)
Gunzip-96                    167ms ± 0%     169ms ± 0%   +0.83%  (p=0.000 n=8+9)
HTTPClientServer-96          311µs ± 1%     308µs ± 0%   -0.75%  (p=0.000 n=10+10)
JSONEncode-96               65.0ms ± 0%    64.8ms ± 0%   -0.25%  (p=0.000 n=9+8)
JSONDecode-96                262ms ± 1%     261ms ± 1%     ~     (p=0.579 n=10+10)
Mandelbrot200-96            18.0ms ± 0%    18.1ms ± 0%   +0.17%  (p=0.000 n=8+10)
GoParse-96                  14.0ms ± 0%    14.1ms ± 1%   +0.42%  (p=0.003 n=9+10)
RegexpMatchEasy0_32-96       644ns ± 2%     645ns ± 2%     ~     (p=0.836 n=10+10)
RegexpMatchEasy0_1K-96      3.70µs ± 0%    3.49µs ± 0%   -5.58%  (p=0.000 n=10+10)
RegexpMatchEasy1_32-96       662ns ± 2%     657ns ± 2%     ~     (p=0.137 n=10+10)
RegexpMatchEasy1_1K-96      4.47µs ± 0%    4.31µs ± 0%   -3.48%  (p=0.000 n=10+10)
RegexpMatchMedium_32-96      844ns ± 2%     849ns ± 1%     ~     (p=0.208 n=10+10)
RegexpMatchMedium_1K-96      179µs ± 0%     182µs ± 0%   +1.20%  (p=0.000 n=10+10)
RegexpMatchHard_32-96       10.0µs ± 0%    10.1µs ± 0%   +0.48%  (p=0.000 n=10+9)
RegexpMatchHard_1K-96        297µs ± 0%     297µs ± 0%   -0.14%  (p=0.000 n=10+10)
Revcomp-96                   3.08s ± 0%     3.13s ± 0%   +1.56%  (p=0.000 n=9+9)
Template-96                  276ms ± 2%     275ms ± 1%     ~     (p=0.393 n=10+10)
TimeParse-96                1.37µs ± 0%    1.36µs ± 0%   -0.53%  (p=0.000 n=10+7)
TimeFormat-96               1.40µs ± 0%    1.42µs ± 0%   +0.97%  (p=0.000 n=10+10)
[Geo mean]                   264µs          262µs        -0.77%

Change-Id: Ie54eee4b3092af53e6da3baa6d1755098f57f3a2
Reviewed-on: https://go-review.googlesource.com/55670
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-20 06:00:54 +00:00
Josh Bleecher Snyder
e00e57d67c cmd/compile: ignore all unreachable values during simple phi insertion
Simple phi insertion already had a heuristic to check
for dead blocks, namely having no predecessors.
When we stopped generating code for dead blocks,
we eliminated some values contained in more subtle
dead blocks, which confused phi insertion.
Compensate by beefing up the reachability check.

Fixes #19678

Change-Id: I0081e4a46f7ce2f69b131a34a0553874a0cb373e
Reviewed-on: https://go-review.googlesource.com/38602
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-24 18:00:15 +00:00
David Chase
11b283092a cmd/compile: add opcode flag hasSideEffects for do-not-remove
Added a flag to generic and various architectures' atomic
operations that are judged to have observable side effects
and thus cannot be dead-code-eliminated.

Test requires GOMAXPROCS > 1 without preemption in loop.

Fixes #19182.

Change-Id: Id2230031abd2cca0bbb32fd68fc8a58fb912070f
Reviewed-on: https://go-review.googlesource.com/37333
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-02-22 15:15:47 +00:00
Keith Randall
3134ab3c2d cmd/compile: redo nil checks
Get rid of BlockCheck. Josh goaded me into it, and I went
down a rabbithole making it happen.

NilCheck now panics if the pointer is nil and returns void, as before.
BlockCheck is gone, and NilCheck is no longer a Control value for
any block. It just exists (and deadcode knows not to throw it away).

I rewrote the nilcheckelim pass to handle this case.  In particular,
there can now be multiple NilCheck ops per block.

I moved all of the arch-dependent nil check elimination done as
part of ssaGenValue into its own proper pass, so we don't have to
duplicate that code for every architecture.

Making the arch-dependent nil check its own pass means I needed
to add a bunch of flags to the opcode table so I could write
the code without arch-dependent ops everywhere.

Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
Reviewed-on: https://go-review.googlesource.com/29120
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-15 02:42:13 +00:00
Keith Randall
c345a3913f cmd/compile: get rid of BlockCall
No need for it, we can treat calls as (mostly) normal values
that take a memory and return a memory.

Lowers the number of basic blocks needed to represent a function.
"go test -c net/http" uses 27% fewer basic blocks.
Probably doesn't affect generated code much, but should help
various passes whose running time and/or space depends on
the number of basic blocks.

Fixes #15631

Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6
Reviewed-on: https://go-review.googlesource.com/28950
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-09-12 23:27:02 +00:00
Keith Randall
4fa050024f cmd/compile: enable constant-time CFG editing
Provide indexes along with block pointers for Preds
and Succs arrays.  This allows us to splice edges in
and out of those arrays in constant time.

Fixes worst-case O(n^2) behavior in deadcode and fuse.

benchmark                     old ns/op      new ns/op     delta
BenchmarkFuse1-8              2065           2057          -0.39%
BenchmarkFuse10-8             9408           9073          -3.56%
BenchmarkFuse100-8            105238         76277         -27.52%
BenchmarkFuse1000-8           3982562        1026750       -74.22%
BenchmarkFuse10000-8          301220329      12824005      -95.74%
BenchmarkDeadCode1-8          1588           1566          -1.39%
BenchmarkDeadCode10-8         4333           4250          -1.92%
BenchmarkDeadCode100-8        32031          32574         +1.70%
BenchmarkDeadCode1000-8       590407         468275        -20.69%
BenchmarkDeadCode10000-8      17822890       5000818       -71.94%
BenchmarkDeadCode100000-8     1388706640     78021127      -94.38%
BenchmarkDeadCode200000-8     5372518479     168598762     -96.86%

Change-Id: Iccabdbb9343fd1c921ba07bbf673330a1c36ee17
Reviewed-on: https://go-review.googlesource.com/22589
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-05 15:58:59 +00:00
Keith Randall
56e0ecc5ea cmd/compile: keep value use counts in SSA
Keep track of how many uses each Value has.  Each appearance in
Value.Args and in Block.Control counts once.

The number of uses of a value is generically useful to
constrain rewrite rules.  For instance, we might want to
prevent merging index operations into loads if the same
index expression is used lots of times.

But I have one use in particular for which the use count is required.
We must make sure we don't combine ops with loads if the load has
more than one use.  Otherwise, we may split a single load
into multiple loads and that breaks perceived behavior in
the presence of races.  In particular, the load of m.state
in sync/mutex.go:Lock can't be done twice.  (I have a separate
CL which triggers the mutex failure.  This CL has a test which
demonstrates a similar failure.)

Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e
Reviewed-on: https://go-review.googlesource.com/20790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-17 04:20:02 +00:00
Brad Fitzpatrick
5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
Alexandru Moșoi
40f2b57e0b [dev.ssa] cmd/compile/internal/ssa: eliminate phis during deadcode removal
While investigating the differences between 19710 (remove
tautological controls) and 12960 (bounds and nil propagation)
I observed that part of the wins of 19710 come from missed
opportunities for deadcode elimination due to phis.
See for example runtime.stackcacherelease. 19710 happens much
later than 12960 and has more chances to eliminate bounds.

Size of pkg/tool/linux_amd64/* excluding compile:

-this -12960 95882248
+this -12960 95880120
-this +12960 95581512
+this +12960 95555224

This change saves about 25k.

Change-Id: Id2f4e55fc92b71595842ce493c3ed527d424fe0e
Reviewed-on: https://go-review.googlesource.com/19728
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-23 18:52:15 +00:00
Todd Neal
f962f33035 [dev.ssa] cmd/compile: reuse sparse sets across compiler passes
Cache sparse sets in the function so they can be reused by subsequent
compiler passes.

benchmark                        old ns/op     new ns/op     delta
BenchmarkDSEPass-8               206945        180022        -13.01%
BenchmarkDSEPassBlock-8          5286103       2614054       -50.55%
BenchmarkCSEPass-8               1790277       1790655       +0.02%
BenchmarkCSEPassBlock-8          18083588      18112771      +0.16%
BenchmarkDeadcodePass-8          59837         41375         -30.85%
BenchmarkDeadcodePassBlock-8     1651575       511169        -69.05%
BenchmarkMultiPass-8             531529        427506        -19.57%
BenchmarkMultiPassBlock-8        7033496       4487814       -36.19%

benchmark                        old allocs     new allocs     delta
BenchmarkDSEPass-8               11             4              -63.64%
BenchmarkDSEPassBlock-8          599            120            -79.97%
BenchmarkCSEPass-8               18             18             +0.00%
BenchmarkCSEPassBlock-8          2700           2700           +0.00%
BenchmarkDeadcodePass-8          4              3              -25.00%
BenchmarkDeadcodePassBlock-8     30             9              -70.00%
BenchmarkMultiPass-8             24             20             -16.67%
BenchmarkMultiPassBlock-8        1800           1000           -44.44%

benchmark                        old bytes     new bytes     delta
BenchmarkDSEPass-8               221367        142           -99.94%
BenchmarkDSEPassBlock-8          3695207       3846          -99.90%
BenchmarkCSEPass-8               303328        303328        +0.00%
BenchmarkCSEPassBlock-8          5006400       5006400       +0.00%
BenchmarkDeadcodePass-8          84232         10506         -87.53%
BenchmarkDeadcodePassBlock-8     1274940       163680        -87.16%
BenchmarkMultiPass-8             608674        313834        -48.44%
BenchmarkMultiPassBlock-8        9906001       5003450       -49.49%

Change-Id: Ib1fa58c7f494b374d1a4bb9cffbc2c48377b59d3
Reviewed-on: https://go-review.googlesource.com/19100
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2016-01-30 13:57:39 +00:00
Keith Randall
056c09bb88 [dev.ssa] cmd/compile: add backing store buffers for block.{Preds,Succs,Values}
Speeds up compilation by 6%.

Change-Id: Ibaad95710323ddbe13c1b0351843fe43a48d776e
Reviewed-on: https://go-review.googlesource.com/19080
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-01-29 01:01:39 +00:00
Keith Randall
2f57d0fe02 [dev.ssa] cmd/compile: preallocate small-numbered values and blocks
Speeds up the compiler ~5%.

Change-Id: Ia5cf0bcd58701fd14018ec77d01f03d5c7d6385b
Reviewed-on: https://go-review.googlesource.com/19060
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-01-28 22:52:42 +00:00
Keith Randall
3425295e91 [dev.ssa] cmd/compile: clean up comparisons
Add new constant-flags opcodes.  These can be generated from
comparisons that we know the result of, like x&31 < 32.

Constant-fold the constant-flags opcodes into all flag users.

Reorder some CMPxconst args so they read in the comparison direction.

Reorg deadcode removal a bit - it needs to remove the OpCopy ops it
generates when strength-reducing Phi ops.  So it needs to splice out all
the dead blocks and do a copy elimination before it computes live
values.

Change-Id: Ie922602033592ad8212efe4345394973d3b94d9f
Reviewed-on: https://go-review.googlesource.com/18267
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-01-13 18:42:00 +00:00
Keith Randall
02f4d0a130 [dev.ssa] cmd/compile: start arguments as spilled
Declare a function's arguments as having already been
spilled so their use just requires a restore.

Allow spill locations to be portions of larger objects the stack.
Required to load portions of compound input arguments.

Rename the memory input to InputMem.  Use Arg for the
pre-spilled argument values.

Change-Id: I8fe2a03ffbba1022d98bfae2052b376b96d32dda
Reviewed-on: https://go-review.googlesource.com/16536
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2015-11-03 17:29:40 +00:00
Keith Randall
c24681ae2e [dev.ssa] cmd/compile: remember names of values
For debugging, spill values to named variables instead of autotmp_
variables if possible.  We do this by keeping a name -> value map
for each function, keep it up-to-date during deadcode elim, and use
it to override spill decisions in stackalloc.

It might even make stack frames a bit smaller, as it makes it easy
to identify a set of spills which are likely not to interfere.

This just works for one-word variables for now.  Strings/slices
will be a separate CL.

Change-Id: Ie89eba8cab16bcd41b311c479ec46dd7e64cdb67
Reviewed-on: https://go-review.googlesource.com/16336
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2015-10-28 17:00:31 +00:00
Keith Randall
186cf1b9ba [dev.ssa] cmd/compile/internal/ssa: handle dead code a different way
Instead of trying to delete dead code as soon as we find it, just
mark it as dead using a PlainAndDead block kind.  The deadcode pass
will do the real removal.

This way is somewhat more efficient because we don't need to mess
with successor and predecessor lists of all the dead blocks.

Fixes #12347

Change-Id: Ia42d6b5f9cdb3215a51737b3eb117c00bd439b13
Reviewed-on: https://go-review.googlesource.com/14033
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-08-29 17:21:02 +00:00
Josh Bleecher Snyder
7393c24877 [dev.ssa] cmd/compile: everything is live and reachable after regalloc
This CL makes function printing and HTML generation
accurate after regalloc.

Prior to this CL, text and HTML function outputs
showed live values and blocks as dead.

Change-Id: I70669cd8641af841447fc5d2ecbd754b281356f0
Reviewed-on: https://go-review.googlesource.com/13812
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-21 20:00:51 +00:00
Keith Randall
0b46b42943 [dev.ssa] cmd/compile/internal/ssa: New register allocator
Implement a global (whole function) register allocator.
This replaces the local (per basic block) register allocator.

Clobbering of registers by instructions is handled properly.
A separate change will add the correct clobbers to all the instructions.

Change-Id: I38ce4dc7dccb8303c1c0e0295fe70247b0a3f2ea
Reviewed-on: https://go-review.googlesource.com/13622
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Todd Neal <todd@tneal.org>
2015-08-17 21:06:30 +00:00
Josh Bleecher Snyder
35fb514596 [dev.ssa] cmd/compile: add HTML SSA printer
This is an initial implementation.
There are many rough edges and TODOs,
which will hopefully be polished out
with use.

Fixes #12071.

Change-Id: I1d6fd5a343063b5200623bceef2c2cfcc885794e
Reviewed-on: https://go-review.googlesource.com/13472
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-13 21:56:06 +00:00
Keith Randall
cfd8dfaa10 [dev.ssa] cmd/compile/internal/ssa: more checks on ssa structure
Make sure all referenced Blocks and Values are really there.
Fix deadcode to generate SSA graphs that pass this new test.

Change-Id: Ib002ce20e33490eb8c919bd189d209f769d61517
Reviewed-on: https://go-review.googlesource.com/13147
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-08-06 17:33:19 +00:00
Keith Randall
accf9b5951 [dev.ssa] cmd/compile/internal/ssa: comment why replacing phi with copy is ok
Change-Id: I3e2e8862f2fde4349923016b97e8330b0d494e0e
Reviewed-on: https://go-review.googlesource.com/12092
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-12 04:39:44 +00:00
Keith Randall
050ce4390a [dev.ssa] cmd/compile/internal/ssa: Phi inputs from dead blocks are not live
Fixes #11676

Change-Id: I941f951633c89bb1454ce6d1d1b4124d46a7d9dd
Reviewed-on: https://go-review.googlesource.com/12091
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-11 22:45:14 +00:00
Josh Bleecher Snyder
9b048527db [dev.ssa] cmd/compile/ssa: handle nested dead blocks
removePredecessor can change which blocks are live.
However, it cannot remove dead blocks from the function's
slice of blocks because removePredecessor may have been
called from within a function doing a walk of the blocks.

CL 11879 did not handle this correctly and broke the build.

To fix this, mark the block as dead but leave its actual
removal for a deadcode pass. Blocks that are dead must have
no successors, predecessors, values, or control values,
so they will generally be ignored by other passes.
To be safe, we add a deadcode pass after the opt pass,
which is the only other pass that calls removePredecessor.

Two alternatives that I considered and discarded:

(1) Make all call sites aware of the fact that removePrecessor
might make arbitrary changes to the list of blocks. This
will needlessly complicate callers.

(2) Handle the things that can go wrong in practice when
we encounter a dead-but-not-removed block. CL 11930 takes
this approach (and the tests are stolen from that CL).
However, this is just patching over the problem.

Change-Id: Icf0687b0a8148ce5e96b2988b668804411b05bd8
Reviewed-on: https://go-review.googlesource.com/12004
Reviewed-by: Todd Neal <todd@tneal.org>
Reviewed-by: Michael Matloob <michaelmatloob@gmail.com>
2015-07-11 00:08:50 +00:00
Josh Bleecher Snyder
7d10a2c04a [dev.ssa] cmd/compile/ssa: implement constant booleans
The removal of if false { ... } blocks in the opt
pass exposed that removePredecessor needed
to do more cleaning, on pain of failing later
consistency checks.

Change-Id: I45d4ff7e1f7f1486fdd99f867867ce6ea006a288
Reviewed-on: https://go-review.googlesource.com/11879
Reviewed-by: Keith Randall <khr@golang.org>
2015-07-06 21:54:10 +00:00
Josh Bleecher Snyder
37ddc270ca [dev.ssa] cmd/compile/ssa: add -f suffix to logging methods
Requested in CL 11380.

Change-Id: Icf0d23fb8d383c76272401e363cc9b2169d11403
Reviewed-on: https://go-review.googlesource.com/11450
Reviewed-by: Alan Donovan <adonovan@google.com>
2015-06-24 21:48:26 +00:00
Josh Bleecher Snyder
8c6abfeacb [dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors
The SSA implementation logs for three purposes:

	* debug logging
	* fatal errors
	* unimplemented features

Separating these three uses lets us attempt an SSA
implementation for all functions, not just
_ssa functions. This turns the entire standard
library into a compilation test, and makes it
easy to figure out things like
"how much coverage does SSA have now" and
"what should we do next to get more coverage?".

Functions called _ssa are still special.
They log profusely by default and
the output of the SSA implementation
is used. For all other functions,
logging is off, and the implementation
is built and discarded, due to lack of
support for the runtime.

While we're here, fix a few minor bugs and
add some extra Unimplementeds to allow
all.bash to pass.

As of now, SSA handles 20.79% of the functions
in the standard library (689 of 3314).

The top missing features are:

 10.03%  2597 SSA unimplemented: zero for type error not implemented
  7.79%  2016 SSA unimplemented: addr: bad op DOTPTR
  7.33%  1898 SSA unimplemented: unhandled expr EQ
  6.10%  1579 SSA unimplemented: unhandled expr OROR
  4.91%  1271 SSA unimplemented: unhandled expr NE
  4.49%  1163 SSA unimplemented: unhandled expr LROT
  4.00%  1036 SSA unimplemented: unhandled expr LEN
  3.56%   923 SSA unimplemented: unhandled stmt CALLFUNC
  2.37%   615 SSA unimplemented: zero for type []byte not implemented
  1.90%   492 SSA unimplemented: unhandled stmt CALLMETH
  1.74%   450 SSA unimplemented: unhandled expr CALLINTER
  1.74%   450 SSA unimplemented: unhandled expr DOT
  1.71%   444 SSA unimplemented: unhandled expr ANDAND
  1.65%   426 SSA unimplemented: unhandled expr CLOSUREVAR
  1.54%   400 SSA unimplemented: unhandled expr CALLMETH
  1.51%   390 SSA unimplemented: unhandled stmt SWITCH
  1.47%   380 SSA unimplemented: unhandled expr CONV
  1.33%   345 SSA unimplemented: addr: bad op *
  1.30%   336 SSA unimplemented: unhandled OLITERAL 6

Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7
Reviewed-on: https://go-review.googlesource.com/11160
Reviewed-by: Keith Randall <khr@golang.org>
2015-06-21 02:56:36 +00:00
Josh Bleecher Snyder
e00d60901a [dev.ssa] cmd/compile/internal/ssa: minor fixes
* Improve some docs and logging.
* Set correct type and len for indexing into strings.

Fixes #11029.

Change-Id: Ib22c45908e41ba3752010d2f5759e37e3921a48e
Reviewed-on: https://go-review.googlesource.com/10635
Reviewed-by: Keith Randall <khr@golang.org>
2015-06-04 23:13:45 +00:00
Keith Randall
a9cec30fdc [dev.ssa] cmd/compile/internal/ssa: Implement block rewriting rules
Change-Id: I47e5349e34fc18118c4d35bf433f875b958cc3e5
Reviewed-on: https://go-review.googlesource.com/10495
Reviewed-by: Alan Donovan <adonovan@google.com>
2015-05-30 06:08:26 +00:00
Keith Randall
067e8dfd82 [dev.ssa] Merge remote-tracking branch 'origin/master' into mergebranch
Semi-regular merge of tip to dev.ssa.

Complicated a bit by the move of cmd/internal/* to cmd/compile/internal/*.

Change-Id: I1c66d3c29bb95cce4a53c5a3476373aa5245303d
2015-05-28 13:51:18 -07:00