133 Commits

Author SHA1 Message Date
David Chase
dd1d9b36c6 [dev.ssa] cmd/compile: PPC64, add cmp->bool, some shifts, hmul
Includes hmul (all widths)
compare for boolean result and simplifications
shift operations plus changes/additions for implementation
(ORN, ADDME, ADDC)

Also fixed a backwards-operand CMP.

Change-Id: Id723c4e25125c38e0d9ab9ec9448176b75f4cdb4
Reviewed-on: https://go-review.googlesource.com/25410
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-04 18:17:52 +00:00
Cherry Zhang
83208504fe [dev.ssa] cmd/compile: add more on ARM64 SSA
Support the following:
- Shifts. ARM64 machine instructions only use lowest 6 bits of the
  shift (i.e. mod 64). Use conditional selection instruction to
  ensure Go semantics.
- Zero/Move. Alignment is ensured.
- Hmul, Avg64u, Sqrt.
- reserve R18 (platform register in ARM64 ABI) and R29 (frame pointer
  in ARM64 ABI).

Everything compiles, all.bash passed (with non-SSA test disabled).

Change-Id: Ia8ed58dae5cbc001946f0b889357b258655078b1
Reviewed-on: https://go-review.googlesource.com/25290
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-27 16:37:23 +00:00
Keith Randall
25e0a367da [dev.ssa] cmd/compile: clean up tuple types and selects
Make tuple types and their SelectX ops fully generic.
These ops no longer need to be lowered.
Regalloc understands them and their tuple-generating arguments.
We can now have opcodes returning arbitrary pairs of results.
(And it would be easy to move to >2 results if needed.)

Update arm implementation to the new standard.
Implement just enough in 386 port to do 64-bit add.

Change-Id: I370ed5aacce219c82e1954c61d1f63af76c16f79
Reviewed-on: https://go-review.googlesource.com/24976
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-07-18 16:11:36 +00:00
Cherry Zhang
6b6de15d32 [dev.ssa] cmd/compile: support NaCl in SSA for ARM
NaCl code runs in sandbox and there are restrictions for its
instruction uses
(https://developer.chrome.com/native-client/reference/sandbox_internals/arm-32-bit-sandbox).

Like the legacy backend, on NaCl,
- don't use R9, which is used as NaCl's "thread pointer".
- don't use Duff's device.
- don't use indexed load/stores.
- the assembler rewrites DIV/MOD to runtime calls, which on NaCl
  clobbers R12, so R12 is marked as clobbered for DIV/MOD.
- other restrictions are satisfied by the assembler.

Enable SSA specific tests on nacl/arm, and disable non-SSA ones.

Updates #15365.

Change-Id: I9262693ec6756b89ca29d3ae4e52a96fe5403b02
Reviewed-on: https://go-review.googlesource.com/24859
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-07-16 03:13:45 +00:00
Cherry Zhang
7bd88a651d [dev.ssa] cmd/compile: don't sink spills that satisfy merge edges in SSA
If a spill is used to satisfy a merge edge (in shuffle), don't sink
it out of loop.

This is found in the following code (on ARM) where there is a stack
Phi (v268) inside a loop (b36 -> ... -> b47 -> b38 -> b36).

(before shuffle)
  b36: <- b34 b38
    ...
    v268 = Phi <int> v410 v360 : autotmp_198[int]
    ...
    ... -> b47
  b47: <- b44
    ...
    v360 = ... : R6
    v230 = StoreReg <int> v360 : autotmp_198[int]
    v261 = CMPconst <flags> [0] v360
    EQ v261 -> b49 b38 (unlikely)
  b38: <- b47
    ...
    Plain -> b36

During shuffle, v230 (as spill of v360) is found to satisfy v268, but
it didn't record its use in shuffle, and v230 is sunk out of the loop
(to b49), which leads to bad value in v268.

This seems never happened on AMD64 (in make.bash), until 4 registers
are removed.

Change-Id: I01dfc28ae461e853b36977c58bcfc0669e556660
Reviewed-on: https://go-review.googlesource.com/24858
Reviewed-by: David Chase <drchase@google.com>
2016-07-15 18:20:17 +00:00
Josh Bleecher Snyder
a2beee000b [dev.ssa] cmd/compile: improve special register error checking
Provide better diagnostic messages.

Use an int for numRegs comparisons,
to avoid asking whether a uint8 is > 255.

Change-Id: I33ae193ce292b24b369865abda3902c3207d7d3f
Reviewed-on: https://go-review.googlesource.com/24135
Reviewed-by: Keith Randall <khr@golang.org>
2016-06-16 14:34:01 +00:00
Cherry Zhang
93b8aab5c9 [dev.ssa] cmd/compile: handle GetG on ARM
Use hardware g register (R10) for GetG, allow g to appear at LHS of
some ops.

Progress on SSA backend for ARM. Now everything compiles and runs.

Updates #15365.

Change-Id: Icdf93585579faa86cc29b1e17ab7c90f0119fc4e
Reviewed-on: https://go-review.googlesource.com/23952
Reviewed-by: David Chase <drchase@google.com>
2016-06-15 15:36:35 +00:00
Cherry Zhang
fa54bf16e0 [dev.ssa] cmd/compile: fix a few bugs for SSA for ARM
- 64x signed right shift was wrong for shift larger than 0x80000000.
- for Lsh-followed-by-Rsh, the intermediate value should be full int
  width, so when it is spilled MOVW should be used.
- use RET for RetJmp, so the assembler can take case of restoring LR
  for non-leaf case.
- reserve R9 in dynlink mode. R9 is used for GOT by the assembler.

Progress on SSA backend for ARM. Still not complete.

Updates #15365.

Change-Id: I3caca256b92ff7cf96469da2feaf4868a592efc5
Reviewed-on: https://go-review.googlesource.com/23793
Reviewed-by: David Chase <drchase@google.com>
2016-06-08 20:37:31 +00:00
Cherry Zhang
90883091ff [dev.ssa] cmd/compile: clean up hardcoded regmasks in ssa/regalloc.go
Auto-generate register masks and load them through Config.

Passed toolstash -cmp on AMD64.

Tests phi_ssa.go and regalloc_ssa.go in cmd/compile/internal/gc/testdata
passed on ARM.

Updates #15365.

Change-Id: I393924d68067f2dbb13dab82e569fb452c986593
Reviewed-on: https://go-review.googlesource.com/23292
Reviewed-by: David Chase <drchase@google.com>
2016-06-02 13:01:44 +00:00
David Chase
31e13c83c2 [dev.ssa] Merge branch 'master' into dev.ssa
Change-Id: Iabc80b6e0734efbd234d998271e110d2eaad41dd
2016-05-27 15:19:33 -04:00
Russ Cox
7fdec6216c build: enable framepointer mode by default
This has a minor performance cost, but far less than is being gained by SSA.
As an experiment, enable it during the Go 1.7 beta.
Having frame pointers on by default makes Linux's perf, Intel VTune,
and other profilers much more useful, because it lets them gather a
stack trace efficiently on profiling events.
(It doesn't help us that much, since when we walk the stack we usually
need to look up PC-specific information as well.)

Fixes #15840.

Change-Id: I4efd38412a0de4a9c87b1b6e5d11c301e63f1a2a
Reviewed-on: https://go-review.googlesource.com/23451
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-26 19:02:00 +00:00
Cherry Zhang
ccaed50c7b [dev.ssa] cmd/compile: handle boolean values for SSA on ARM
Fix hardcoded flag register mask in ssa/flagalloc.go by auto-generating
the mask.

Also fix a mistake (in previous CL) about conditional branches.

Progress on SSA backend for ARM. Still not complete. Now "container/ring"
package compiles and tests passed.

Updates #15365.

Change-Id: Id7c8805c30dbb8107baedb485ed0f71f59ed6ea8
Reviewed-on: https://go-review.googlesource.com/23093
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-19 02:48:36 +00:00
Keith Randall
075880a8e8 cmd/compile: fix build
Run live vars test only on ssa builds.
We can't just drop KeepAlive ops during regalloc.  We need
to replace them with copies.

Change-Id: Ib4b3b1381415db88fdc2165fc0a9541b73ad9759
Reviewed-on: https://go-review.googlesource.com/23225
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 20:44:00 +00:00
Keith Randall
3572c6418b cmd/compile: keep pointer input arguments live throughout function
Introduce a KeepAlive op which makes sure that its argument is kept
live until the KeepAlive.  Use KeepAlive to mark pointer input
arguments as live after each function call and at each return.

We do this change only for pointer arguments.  Those are the
critical ones to handle because they might have finalizers.
Doing compound arguments (slices, structs, ...) is more complicated
because we would need to track field liveness individually (we do
that for auto variables now, but inputs requires extra trickery).

Turn off the automatic marking of args as live.  That way, when args
are explicitly nulled, plive will know that the original argument is
dead.

The KeepAlive op will be the eventual implementation of
runtime.KeepAlive.

Fixes #15277

Change-Id: I5f223e65d99c9f8342c03fbb1512c4d363e903e5
Reviewed-on: https://go-review.googlesource.com/22365
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-05-18 19:25:27 +00:00
David Chase
6b99fb5bea cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.

Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)

Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs.  This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.

Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.

Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:

	 #blocks	#vars	    product
 max	6171	28180	173,898,780
99.9%	1641	 6548	 10,401,878
  99%	 463	 1909	    873,721
  95%	 152	  639	     95,235
  90%	  84	  359	     30,021

The old algorithm is indeed usually fastest, for 99%ile
values of usually.

The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.

With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
   github.com/gogo/protobuf/test/theproto3/combos/...
...
real	4m35.200s
user	13m16.644s
sys	0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles

With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
   github.com/gogo/protobuf/test/theproto3/combos/...
...
real	10m36.569s
user	25m52.286s
sys	4m3.696s
and pprof reports 8.3GB allocated in the same larger profile

With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).

Fixes #15537.

Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-16 21:08:05 +00:00
David Chase
3c09001917 cmd/compile: correct sparseSet probes in regalloc to avoid index error
In regalloc, a sparse map is preallocated for later use by
spill-in-loop sinking.  However, variables (spills) are added
during register allocation before spill sinking, and a map
query involving any of these new variables will index out of
bounds in the map.

To fix:
1) fix the queries to use s.orig[v.ID].ID instead, to ensure
proper indexing.  Note that s.orig will be nil for values
that are not eligible for spilling (like memory and flags).

2) add a test.

Fixes #15585.

Change-Id: I8f2caa93b132a0f2a9161d2178320d5550583075
Reviewed-on: https://go-review.googlesource.com/22911
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-09 18:35:44 +00:00
Keith Randall
4fa050024f cmd/compile: enable constant-time CFG editing
Provide indexes along with block pointers for Preds
and Succs arrays.  This allows us to splice edges in
and out of those arrays in constant time.

Fixes worst-case O(n^2) behavior in deadcode and fuse.

benchmark                     old ns/op      new ns/op     delta
BenchmarkFuse1-8              2065           2057          -0.39%
BenchmarkFuse10-8             9408           9073          -3.56%
BenchmarkFuse100-8            105238         76277         -27.52%
BenchmarkFuse1000-8           3982562        1026750       -74.22%
BenchmarkFuse10000-8          301220329      12824005      -95.74%
BenchmarkDeadCode1-8          1588           1566          -1.39%
BenchmarkDeadCode10-8         4333           4250          -1.92%
BenchmarkDeadCode100-8        32031          32574         +1.70%
BenchmarkDeadCode1000-8       590407         468275        -20.69%
BenchmarkDeadCode10000-8      17822890       5000818       -71.94%
BenchmarkDeadCode100000-8     1388706640     78021127      -94.38%
BenchmarkDeadCode200000-8     5372518479     168598762     -96.86%

Change-Id: Iccabdbb9343fd1c921ba07bbf673330a1c36ee17
Reviewed-on: https://go-review.googlesource.com/22589
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-05 15:58:59 +00:00
Keith Randall
ade0eb2f06 cmd/compile: fix reslice
:= is the wrong thing here.  The new variable masks the old
variable so we allocate the slice afresh each time around the loop.

Change-Id: I759c30e1bfa88f40decca6dd7d1e051e14ca0844
Reviewed-on: https://go-review.googlesource.com/22679
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-05-01 22:11:45 +00:00
Keith Randall
8ad8d7d87e cmd/compile: Use pre-regalloc value ID in lateSpillUse
The cached copy's ID is sometimes outside the bounds of the orig array.

There's no reason to start at the cached copy and work backwards
to the original value. We already have the original value ID at
all the callsites.

Fixes noopt build

Change-Id: I313508a1917e838a87e8cc83b2ef3c2e4a8db304
Reviewed-on: https://go-review.googlesource.com/22355
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 21:25:50 +00:00
Keith Randall
b57ac33331 cmd/compile: forward-looking desired register biasing
Improve forward-looking desired register calculations.
It is now inter-block and handles a bunch more cases.

Fixes #14504
Fixes #14828
Fixes #15254

Change-Id: Ic240fa0ec6a779d80f577f55c8a6c4ac8c1a940a
Reviewed-on: https://go-review.googlesource.com/22160
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-04-20 15:31:42 +00:00
David Chase
6b0b3f86d6 cmd/compile: fix use of original spill name after sinking
This is a fix for the ssacheck builder
http://build.golang.org/log/baa00f70c34e41186051cfe90568de3d91f115d7
after CL 21307 for sinking spills down loop exits
https://go-review.googlesource.com/#/c/21037/

The fix is to reuse (move) the original spill, thus preserving
the definition of the variable and its use count. Original and
copy both use the same stack slot, but ssacheck needs to see
a definition for the variable itself.

Fixes #15279.

Change-Id: I286285490193dc211b312d64dbc5a54867730bd6
Reviewed-on: https://go-review.googlesource.com/21995
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-14 18:24:54 +00:00
David Chase
6b85a45edc cmd/compile: move spills to loop exits when easy.
For call-free inner loops.

Revised statistics:
  85 inner loop spills sunk
 341 inner loop spills remaining
1162 inner loop spills that were candidates for sinking
     ended up completely register allocated
 119 inner loop spills could have been sunk were used in
     "shuffling" at the bottom of the loop.
   1 inner loop spill not sunk because the register assigned
     changed between def and exit,

 Understanding how to make an inner loop definition not be
 a candidate for from-memory shuffling (to force the shuffle
 code to choose some other value) should pick up some of the
 119 other spills disqualified for this reason.

 Modified the stats printing based on feedback from Austin.

Change-Id: If3fb9b5d5a028f42ccc36c4e3d9e0da39db5ca60
Reviewed-on: https://go-review.googlesource.com/21037
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-13 15:59:42 +00:00
Keith Randall
0004f34cef cmd/compile: regalloc enforces 2-address instructions
Instead of being a hint, resultInArg0 is now enforced by regalloc.
This allows us to delete all the code from amd64/ssa.go which
deals with converting from a semantically three-address instruction
into some copies plus a two-address instruction.

Change-Id: Id4f39a80be4b678718bfd42a229f9094ab6ecd7c
Reviewed-on: https://go-review.googlesource.com/21816
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-04-10 23:20:38 +00:00
David Chase
c3b3e7b4ef cmd/compile: insert instrumentation more carefully in racewalk
Be more careful about inserting instrumentation in racewalk.
If the node being instrumented is an OAS, and it has a non-
empty Ninit, then append instrumentation to the Ninit list
rather than letting it be inserted before the OAS (and the
compilation of its init list).  This deals with the case that
the Ninit list defines a variable used in the RHS of the OAS.

Fixes #15091.

Change-Id: Iac91696d9104d07f0bf1bd3499bbf56b2e1ef073
Reviewed-on: https://go-review.googlesource.com/21771
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: David Chase <drchase@google.com>
2016-04-08 21:06:39 +00:00
Dave Cheney
b9feb91f32 cmd/compile: minor cleanups
Some minor scoping cleanups found by a very old version of grind.

Change-Id: I1d373817586445fc87e38305929097b652696fdd
Reviewed-on: https://go-review.googlesource.com/21064
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-24 11:18:04 +00:00
Keith Randall
4c9a470d46 cmd/compile: start on ARM port
Start working on arm port.  Gets close to correct
code for fibonacci:
    func fib(n int) int {
        if n < 2 {
            return n
        }
        return fib(n-1) + fib(n-2)
    }

Still a lot to do, but this is a good starting point.

Cleaned up some arch-specific dependencies in regalloc.

Change-Id: I4301c6c31a8402168e50dcfee8bcf7aee73ea9d5
Reviewed-on: https://go-review.googlesource.com/21000
Reviewed-by: David Chase <drchase@google.com>
2016-03-23 17:46:05 +00:00
David Chase
815c9a7f28 cmd/compile: use loop information in regalloc
This seems to help the problem reported in #14606; this
change seems to produce about a 4% improvement (mostly
for the 128-8192 shards).

Fixes #14789.

Change-Id: I1bd52c82d4ca81d9d5e9ab371fdfc860d7e8af50
Reviewed-on: https://go-review.googlesource.com/20660
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-18 01:23:29 +00:00
Keith Randall
56e0ecc5ea cmd/compile: keep value use counts in SSA
Keep track of how many uses each Value has.  Each appearance in
Value.Args and in Block.Control counts once.

The number of uses of a value is generically useful to
constrain rewrite rules.  For instance, we might want to
prevent merging index operations into loads if the same
index expression is used lots of times.

But I have one use in particular for which the use count is required.
We must make sure we don't combine ops with loads if the load has
more than one use.  Otherwise, we may split a single load
into multiple loads and that breaks perceived behavior in
the presence of races.  In particular, the load of m.state
in sync/mutex.go:Lock can't be done twice.  (I have a separate
CL which triggers the mutex failure.  This CL has a test which
demonstrates a similar failure.)

Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e
Reviewed-on: https://go-review.googlesource.com/20790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-17 04:20:02 +00:00
Todd Neal
763afe13b9 cmd/compile: change logging of spills for regalloc to Warnl format
Change-Id: I01c000ff3f6dc6b0ed691e289eeef0fa61500337
Reviewed-on: https://go-review.googlesource.com/20744
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-16 02:50:13 +00:00
Keith Randall
369f4f5de5 cmd/compile: regalloc of two address instructions
x86 has a lot of instructions that require the output to be in the same
register as one of the inputs.  When allocating the output register,
allocate the same register as the input if it is available.

Improves the performance of golang.org/x/crypto/sha3 by
10% (from 6% slower than 1.6 to 4% faster).

Fixes #14745

Change-Id: I4d81785240c9368e4dc75107b45c959d200df8e6
Reviewed-on: https://go-review.googlesource.com/20488
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-03-11 04:13:07 +00:00
Todd Neal
6b3d4a5353 cmd/compile: modify regalloc/stackalloc to use the cmd line debug args
Change the existing flags from compile time consts to be configurable
from the command line.

Change-Id: I4aab4bf3dfcbdd8e2b5a2ff51af95c2543967769
Reviewed-on: https://go-review.googlesource.com/20560
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-11 01:35:12 +00:00
Keith Randall
ddc6b64444 cmd/compile: fix defer/deferreturn
Make sure we do any just-before-return cleanup on all paths out of a
function, including when recovering.  Each exit path should include
deferreturn (if there are any defers) and then the exit
code (e.g. copying heap-escaping return values back to the stack).

Introduce a Defer SSA block type which has two outgoing edges - one the
fallthrough edge (the defer was queued successfully) and one which
immediately returns (the defer had a successful recover() call and
normal execution should resume at the return point).

Fixes #14725

Change-Id: Iad035c9fd25ef8b7a74dafbd7461cf04833d981f
Reviewed-on: https://go-review.googlesource.com/20486
Reviewed-by: David Chase <drchase@google.com>
2016-03-10 22:33:49 +00:00
Keith Randall
686fbdb3b0 cmd/compile: make compilation deterministic, fixes toolstash
Make sure we don't depend on map iterator order.

Fixes #14600

Change-Id: Iac0e0c8689f3ace7a4dc8e2127e2fd3c8545bd29
Reviewed-on: https://go-review.googlesource.com/20158
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-03 18:03:45 +00:00
Keith Randall
c03ed491fe cmd/compile: load some live values into registers before loop
If we're about to enter a loop, load values which are live
and will soon be used in the loop into registers.

name                     old time/op    new time/op    delta
BinaryTree17-8              2.80s ± 4%     2.62s ± 2%   -6.43%          (p=0.008 n=5+5)
Fannkuch11-8                2.45s ± 2%     2.14s ± 1%  -12.43%          (p=0.008 n=5+5)
FmtFprintfEmpty-8          49.0ns ± 1%    48.4ns ± 1%   -1.35%          (p=0.032 n=5+5)
FmtFprintfString-8          160ns ± 1%     153ns ± 0%   -4.63%          (p=0.008 n=5+5)
FmtFprintfInt-8             152ns ± 0%     150ns ± 0%   -1.57%          (p=0.000 n=5+4)
FmtFprintfIntInt-8          252ns ± 2%     244ns ± 1%   -3.02%          (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8     223ns ± 0%     223ns ± 0%     ~     (all samples are equal)
FmtFprintfFloat-8           293ns ± 2%     291ns ± 2%     ~             (p=0.389 n=5+5)
FmtManyArgs-8               956ns ± 0%     936ns ± 0%   -2.05%          (p=0.008 n=5+5)
GobDecode-8                7.18ms ± 0%    7.11ms ± 0%   -1.02%          (p=0.008 n=5+5)
GobEncode-8                6.12ms ± 3%    6.07ms ± 1%     ~             (p=0.690 n=5+5)
Gzip-8                      284ms ± 1%     284ms ± 0%     ~             (p=1.000 n=5+5)
Gunzip-8                   40.8ms ± 1%    40.6ms ± 1%     ~             (p=0.310 n=5+5)
HTTPClientServer-8         69.8µs ± 1%    72.2µs ± 4%     ~             (p=0.056 n=5+5)
JSONEncode-8               16.1ms ± 2%    16.2ms ± 1%     ~             (p=0.151 n=5+5)
JSONDecode-8               54.9ms ± 0%    57.0ms ± 1%   +3.79%          (p=0.008 n=5+5)
Mandelbrot200-8            4.35ms ± 0%    4.39ms ± 0%   +0.85%          (p=0.008 n=5+5)
GoParse-8                  3.56ms ± 1%    3.42ms ± 1%   -4.03%          (p=0.008 n=5+5)
RegexpMatchEasy0_32-8      75.6ns ± 1%    75.0ns ± 0%   -0.83%          (p=0.016 n=5+4)
RegexpMatchEasy0_1K-8       250ns ± 0%     252ns ± 1%   +0.80%          (p=0.016 n=4+5)
RegexpMatchEasy1_32-8      75.0ns ± 0%    75.4ns ± 2%     ~             (p=0.206 n=5+5)
RegexpMatchEasy1_1K-8       401ns ± 0%     398ns ± 1%     ~             (p=0.056 n=5+5)
RegexpMatchMedium_32-8      119ns ± 0%     118ns ± 0%   -0.84%          (p=0.008 n=5+5)
RegexpMatchMedium_1K-8     36.6µs ± 0%    36.9µs ± 0%   +0.91%          (p=0.008 n=5+5)
RegexpMatchHard_32-8       1.95µs ± 1%    1.92µs ± 0%   -1.23%          (p=0.032 n=5+5)
RegexpMatchHard_1K-8       58.3µs ± 1%    58.1µs ± 1%     ~             (p=0.548 n=5+5)
Revcomp-8                   425ms ± 1%     389ms ± 1%   -8.39%          (p=0.008 n=5+5)
Template-8                 65.5ms ± 1%    63.6ms ± 1%   -2.86%          (p=0.008 n=5+5)
TimeParse-8                 363ns ± 0%     354ns ± 1%   -2.59%          (p=0.008 n=5+5)
TimeFormat-8                363ns ± 0%     364ns ± 1%     ~             (p=0.159 n=5+5)

Fixes #14511

Change-Id: I1b79d2545271fa90d5b04712cc25573bdc94f2ce
Reviewed-on: https://go-review.googlesource.com/20151
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-03 17:39:43 +00:00
Brad Fitzpatrick
5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
David Chase
288817b05a [dev.ssa] cmd/compile: reduce line number churn in generated code
In regalloc, make LoadReg instructions use the line number
of their *use*, not their *source*.  This reduces the
tendency of debugger stepping to "jump around" the program.

Change-Id: I59e2eeac4dca9168d8af3a93effbc5bdacac2881
Reviewed-on: https://go-review.googlesource.com/19836
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-24 16:57:36 +00:00
David Chase
b86cafc7dc [dev.ssa] cmd/compile: memory allocation tweaks to regalloc and dom
Spotted a minor source of excess allocation in the register
allocator.  Rearranged the dominator tree code to pull its
scratch memory from a reused buffer attached to Config.

Change-Id: I6da6e7b112f7d3eb1fd00c58faa8214cdea44e38
Reviewed-on: https://go-review.googlesource.com/19450
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-22 15:32:58 +00:00
Keith Randall
16b1fce921 [dev.ssa] cmd/compile: add aux typing, flags to ops
Add the aux type to opcodes.
Add rematerializeable as a flag.

Change-Id: I906e19281498f3ee51bb136299bf26e13a54b2ec
Reviewed-on: https://go-review.googlesource.com/19088
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-02 02:55:13 +00:00
David Chase
c87a62f32b [dev.ssa] cmd/compile: reducing alloc footprint of dominator calc
Converted working slices of pointer into slices of pointer
index.  Half the size (on 64-bit machine) and no pointers
to trace if GC occurs while they're live.

TODO - could expose slice mapping ID->*Block; some dom
clients also construct these.

Minor optimization in regalloc that cuts allocation count.

Minor optimization in compile.go that cuts calls to Sprintf.

Change-Id: I28f0bfed422b7344af333dc52ea272441e28e463
Reviewed-on: https://go-review.googlesource.com/19104
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-02 02:20:25 +00:00
Todd Neal
f962f33035 [dev.ssa] cmd/compile: reuse sparse sets across compiler passes
Cache sparse sets in the function so they can be reused by subsequent
compiler passes.

benchmark                        old ns/op     new ns/op     delta
BenchmarkDSEPass-8               206945        180022        -13.01%
BenchmarkDSEPassBlock-8          5286103       2614054       -50.55%
BenchmarkCSEPass-8               1790277       1790655       +0.02%
BenchmarkCSEPassBlock-8          18083588      18112771      +0.16%
BenchmarkDeadcodePass-8          59837         41375         -30.85%
BenchmarkDeadcodePassBlock-8     1651575       511169        -69.05%
BenchmarkMultiPass-8             531529        427506        -19.57%
BenchmarkMultiPassBlock-8        7033496       4487814       -36.19%

benchmark                        old allocs     new allocs     delta
BenchmarkDSEPass-8               11             4              -63.64%
BenchmarkDSEPassBlock-8          599            120            -79.97%
BenchmarkCSEPass-8               18             18             +0.00%
BenchmarkCSEPassBlock-8          2700           2700           +0.00%
BenchmarkDeadcodePass-8          4              3              -25.00%
BenchmarkDeadcodePassBlock-8     30             9              -70.00%
BenchmarkMultiPass-8             24             20             -16.67%
BenchmarkMultiPassBlock-8        1800           1000           -44.44%

benchmark                        old bytes     new bytes     delta
BenchmarkDSEPass-8               221367        142           -99.94%
BenchmarkDSEPassBlock-8          3695207       3846          -99.90%
BenchmarkCSEPass-8               303328        303328        +0.00%
BenchmarkCSEPassBlock-8          5006400       5006400       +0.00%
BenchmarkDeadcodePass-8          84232         10506         -87.53%
BenchmarkDeadcodePassBlock-8     1274940       163680        -87.16%
BenchmarkMultiPass-8             608674        313834        -48.44%
BenchmarkMultiPassBlock-8        9906001       5003450       -49.49%

Change-Id: Ib1fa58c7f494b374d1a4bb9cffbc2c48377b59d3
Reviewed-on: https://go-review.googlesource.com/19100
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2016-01-30 13:57:39 +00:00
Keith Randall
9d6e605cf7 [dev.ssa] cmd/compile: simple forward-looking register allocation tweak
For each value that needs to be in a fixed register at the end of the
block, and try to pick that fixed register when the instruction
generating that value is scheduled (or restored from a spill).

Just used for end-of-block register requirements for now.
Fixed-register instruction requirements (e.g. shift in ecx) can be
added later.  Also two-instruction constraints (input reg == output
reg) might be recorded in a similar manner.

Change-Id: I59916e2e7f73657bb4fc3e3b65389749d7a23fa8
Reviewed-on: https://go-review.googlesource.com/18774
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-01-29 17:32:38 +00:00
Keith Randall
8a961aee28 [dev.ssa] cmd/compile: fix -N build
The OpSB hack didn't quite work.  We need to really
CSE these ops to make regalloc happy.

Change-Id: I9f4d7bfb0929407c84ee60c9e25ff0c0fbea84af
Reviewed-on: https://go-review.googlesource.com/19083
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-01-29 17:27:12 +00:00
Keith Randall
2f57d0fe02 [dev.ssa] cmd/compile: preallocate small-numbered values and blocks
Speeds up the compiler ~5%.

Change-Id: Ia5cf0bcd58701fd14018ec77d01f03d5c7d6385b
Reviewed-on: https://go-review.googlesource.com/19060
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-01-28 22:52:42 +00:00
Keith Randall
6a96a2fe5a [dev.ssa] cmd/compile: make cse faster
It is one of the slowest compiler phases right now, and we
run two of them.

Instead of using a map to make the initial partition, use a sort.
It is much less memory intensive.

Do a few optimizations to avoid work for size-1 equivalence classes.

Implement -N.

Change-Id: I1d2d85d3771abc918db4dd7cc30b0b2d854b15e1
Reviewed-on: https://go-review.googlesource.com/19024
Reviewed-by: David Chase <drchase@google.com>
2016-01-28 20:59:20 +00:00
Keith Randall
7b773946c0 [dev.ssa] cmd/compile: disable xor clearing when flags must be preserved
The x86 backend automatically rewrites MOV $0, AX to
XOR AX, AX.  That rewrite isn't ok when the flags register
is live across the MOV.  Keep track of which moves care
about preserving flags, then disable this rewrite for them.

On x86, Prog.Mark was being used to hold the length of the
instruction.  We already store that in Prog.Isize, so no
need to store it in Prog.Mark also.  This frees up Prog.Mark
to hold a bitmask on x86 just like all the other architectures.

Update #12405

Change-Id: Ibad8a8f41fc6222bec1e4904221887d3cc3ca029
Reviewed-on: https://go-review.googlesource.com/18861
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-26 17:40:22 +00:00
Keith Randall
9094e3ada2 [dev.ssa] cmd/compile: fix spill sizes
In code that does:

    var x, z int32
    var y int64
    z = phi(x, int32(y))

We silently drop the int32 cast because truncation is a no-op.
The phi operation needs to make sure it uses the size of the
phi, not the size of its arguments, when generating spills.

Change-Id: I1f7baf44f019256977a46fdd3dad1972be209042
Reviewed-on: https://go-review.googlesource.com/18390
Reviewed-by: David Chase <drchase@google.com>
2016-01-13 18:39:15 +00:00
Keith Randall
d7ad7b9efe [dev.ssa] cmd/compile: zero register masks for each edge
Forgot to reset these masks before each merge edge is processed.

Change-Id: I2f593189b63f50a1cd12b2dd4645ca7b9614f1f3
Reviewed-on: https://go-review.googlesource.com/18223
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
2016-01-05 20:20:18 +00:00
Keith Randall
7d9f1067d1 [dev.ssa] cmd/compile: better register allocator
Reorder how register & stack allocation is done.  We used to allocate
registers, then fix up merge edges, then allocate stack slots.  This
lead to lots of unnecessary copies on merge edges:

v2 = LoadReg v1
v3 = StoreReg v2

If v1 and v3 are allocated to the same stack slot, then this code is
unnecessary.  But at regalloc time we didn't know the homes of v1 and
v3.

To fix this problem, allocate all the stack slots before fixing up the
merge edges.  That way, we know what stack slots values use so we know
what copies are required.

Use a good technique for shuffling values around on merge edges.

Improves performance of the go1 TimeParse benchmark by ~12%

Change-Id: I731f43e4ff1a7e0dc4cd4aa428fcdb97812b86fa
Reviewed-on: https://go-review.googlesource.com/17915
Reviewed-by: David Chase <drchase@google.com>
2015-12-21 23:12:05 +00:00
Keith Randall
c140df0326 [dev.ssa] cmd/compile: allocate the flag register in a separate pass
Spilling/restoring flag values is a pain to do during regalloc.
Instead, allocate the flag register in a separate pass.  Regalloc then
operates normally on any flag recomputation instructions.

Change-Id: Ia1c3d9e6eff678861193093c0b48a00f90e4156b
Reviewed-on: https://go-review.googlesource.com/17694
Reviewed-by: David Chase <drchase@google.com>
2015-12-11 21:08:15 +00:00
Keith Randall
75102afce7 [dev.ssa] cmd/compile: better register allocation
Use a more precise computation of next use.  It properly
detects lifetime holes and deallocates values during those holes.
It also uses a more precise version of distance to next use which
affects which values get spilled.

Change-Id: I49eb3ebe2d2cb64842ecdaa7fb4f3792f8afb90b
Reviewed-on: https://go-review.googlesource.com/16760
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2015-11-12 15:31:11 +00:00