133 Commits

Author SHA1 Message Date
David Chase
ca360c3992 cmd/compile: better XPos for rematerialized values and JMPs
This attempts to choose better values for values that are
rematerialized (uses the XPos of the consumer, not the
original) and for unconditional branches (uses the last
assigned XPos in the block).

The JMP branches seem to sometimes end up with a PC in the
destination block, I think because of register movement
or rematerialization that gets placed in predecessor blocks.
This may be acceptable because (eyeball-empirically) that is
often the line number of the target block, so the line number
flow is correct.

Added proper test, that checks both -N -l and regular compilation.
The test is also capable (for gdb, delve soon) of tracking
variable printing based on comments in the source code.

There's substantial room for improvement in debugger behavior.

Updates #21098.

Change-Id: I13abd48a39141583b85576a015f561065819afd0
Reviewed-on: https://go-review.googlesource.com/50610
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-07 22:12:36 +00:00
Daniel Martí
ded2c65db3 cmd/compile: simplify a few bits of the code
Remove an unused type, a few redundant returns and replace a few slice
append loops with a single append.

Change-Id: If07248180bae5631b5b152c6051d9635889997d5
Reviewed-on: https://go-review.googlesource.com/66851
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2017-09-28 20:40:17 +00:00
Lynn Boger
b74b43de68 cmd/compile: request r12 for indirect calls on ppc64le
On ppc64le, functions compiled with -shared expect r12 to
hold the function's address for indirect calls. Previously
this was enforced by generating a move instruction if the
address wasn't already in r12. This change avoids that extra
move by requesting r12 in the CALL ops that do indirect calls.

As a result of adding support for plugins on ppc64le, it was
discovered that there would be more cases where this extra
move was needed, so this seemed like a better solution.

Updates #20756

Change-Id: I6770885a46990f78c6d2902a715dcdaa822192a1
Reviewed-on: https://go-review.googlesource.com/62890
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-09-11 18:02:33 +00:00
Daniel Martí
99da8730b0 all: remove some double spaces from comments
Went mainly for the ones that make no sense, such as the ones
mid-sentence or after commas.

Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0
Reviewed-on: https://go-review.googlesource.com/57293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-26 15:09:09 +00:00
Keith Randall
770d8d8207 cmd/compile: free value earlier in nilcheck
When we remove a nil check, add it back to the free Value pool immediately.

Fixes #18732

Change-Id: I8d644faabbfb52157d3f2d071150ff0342ac28dc
Reviewed-on: https://go-review.googlesource.com/58810
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-08-25 06:01:26 +00:00
Keith Randall
bf4d8d3d05 cmd/compile: rename SSA Register.Name to Register.String
Just to get rid of lots of .Name() stutter in printf calls.

Change-Id: I86cf00b3f7b2172387a1c6a7f189c1897fab6300
Reviewed-on: https://go-review.googlesource.com/56630
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-08-17 21:53:08 +00:00
Heschi Kreinick
4c54a047c6 [dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.

After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.

There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:

  type myint int
  var a int
  b := myint(int)
and
  b := (*uint64)(unsafe.Pointer(a))

don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.

Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.

A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.

TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.

go build -toolexec 'toolstash -cmp' -a std succeeds.

With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after:  06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean  /tmp/220352263 /tmp/621364410
completed   15 of   15, estimated time remaining 0s (eta 3:52PM)
name        old time/op       new time/op       delta
Template          199ms ± 3%        198ms ± 2%     ~     (p=0.400 n=15+14)
Unicode          96.6ms ± 5%       96.4ms ± 5%     ~     (p=0.838 n=15+15)
GoTypes           653ms ± 2%        647ms ± 2%     ~     (p=0.102 n=15+14)
Flate             133ms ± 6%        129ms ± 3%   -2.62%  (p=0.041 n=15+15)
GoParser          164ms ± 5%        159ms ± 3%   -3.05%  (p=0.000 n=15+15)
Reflect           428ms ± 4%        422ms ± 3%     ~     (p=0.156 n=15+13)
Tar               123ms ±10%        124ms ± 8%     ~     (p=0.461 n=15+15)
XML               228ms ± 3%        224ms ± 3%   -1.57%  (p=0.045 n=15+15)
[Geo mean]        206ms             377ms       +82.86%

name        old user-time/op  new user-time/op  delta
Template          292ms ±10%        301ms ±12%     ~     (p=0.189 n=15+15)
Unicode           166ms ±37%        158ms ±14%     ~     (p=0.418 n=15+14)
GoTypes           962ms ± 6%        963ms ± 7%     ~     (p=0.976 n=15+15)
Flate             207ms ±19%        200ms ±14%     ~     (p=0.345 n=14+15)
GoParser          246ms ±22%        240ms ±15%     ~     (p=0.587 n=15+15)
Reflect           611ms ±13%        587ms ±14%     ~     (p=0.085 n=15+13)
Tar               211ms ±12%        217ms ±14%     ~     (p=0.355 n=14+15)
XML               335ms ±15%        320ms ±18%     ~     (p=0.169 n=15+15)
[Geo mean]        317ms             583ms       +83.72%

name        old alloc/op      new alloc/op      delta
Template         40.2MB ± 0%       40.2MB ± 0%   -0.15%  (p=0.000 n=14+15)
Unicode          29.2MB ± 0%       29.3MB ± 0%     ~     (p=0.624 n=15+15)
GoTypes           114MB ± 0%        114MB ± 0%   -0.15%  (p=0.000 n=15+14)
Flate            25.7MB ± 0%       25.6MB ± 0%   -0.18%  (p=0.000 n=13+15)
GoParser         32.2MB ± 0%       32.2MB ± 0%   -0.14%  (p=0.003 n=15+15)
Reflect          77.8MB ± 0%       77.9MB ± 0%     ~     (p=0.061 n=15+15)
Tar              27.1MB ± 0%       27.0MB ± 0%   -0.11%  (p=0.029 n=15+15)
XML              42.7MB ± 0%       42.5MB ± 0%   -0.29%  (p=0.000 n=15+15)
[Geo mean]       42.1MB            75.0MB       +78.05%

name        old allocs/op     new allocs/op     delta
Template           402k ± 1%         398k ± 0%   -0.91%  (p=0.000 n=15+15)
Unicode            344k ± 1%         344k ± 0%     ~     (p=0.715 n=15+14)
GoTypes           1.18M ± 0%        1.17M ± 0%   -0.91%  (p=0.000 n=15+14)
Flate              243k ± 0%         240k ± 1%   -1.05%  (p=0.000 n=13+15)
GoParser           327k ± 1%         324k ± 1%   -0.96%  (p=0.000 n=15+15)
Reflect            984k ± 1%         982k ± 0%     ~     (p=0.050 n=15+15)
Tar                261k ± 1%         259k ± 1%   -0.77%  (p=0.000 n=15+15)
XML                411k ± 0%         404k ± 1%   -1.55%  (p=0.000 n=15+15)
[Geo mean]         439k              755k       +72.01%

name        old text-bytes    new text-bytes    delta
HelloSize         694kB ± 0%        694kB ± 0%   -0.00%  (p=0.000 n=15+15)

name        old data-bytes    new data-bytes    delta
HelloSize        5.55kB ± 0%       5.55kB ± 0%     ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize         133kB ± 0%        133kB ± 0%     ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.04MB ± 0%       1.04MB ± 0%     ~     (all equal)

Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-27 20:19:44 +00:00
Heschi Kreinick
2d57d94ac3 [dev.debug] cmd/compile: track variable decomposition in LocalSlot
When the compiler decomposes a user variable, track its origin so that
it can be recomposed during DWARF generation.

Change-Id: Ia71c7f8e7f4d65f0652f1c97b0dda5d9cad41936
Reviewed-on: https://go-review.googlesource.com/50878
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-26 18:39:39 +00:00
David Chase
00263a8968 cmd/compile: reduce debugger-worsening line number churn
Reuse block head or preceding instruction's line number for
register allocator's spill, fill, copy, rematerialization
instructionsl; and also for phi, and for no-src-pos
instructions.  Assembler creates same line number tables
for copy-predecessor-line and for no-src-pos,
but copy-predecessor produces better-looking assembly
language output with -S and with GOSSAFUNC, and does not
require changes to tests of existing assembly language.

Split "copyInto" into two cases, one for register allocation,
one for otherwise.  This caused the test score line change
count to increase by one, which may reflect legitimately
useful information preserved.  Without any special treatment
for copyInto, the change count increases by 21 more, from
51 to 72 (i.e., quite a lot).

There is a test; using two naive "scores" for line number
churn, the old numbering is 2x or 4x worse.

Fixes #18902.

Change-Id: I0a0a69659d30ee4e5d10116a0dd2b8c5df8457b1
Reviewed-on: https://go-review.googlesource.com/36207
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-10 17:16:44 +00:00
Josh Bleecher Snyder
46b88c9fbc cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.

In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.

Though this is a big CL, most of the changes are
mechanical and uninteresting.

Interesting bits:

* Add new singleton globals to package types for the special
  SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
  and TTUPLE, for SSA tuple types.
  ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
  to package types.
* We had picked the name "types" in our rules for the handy
  list of types provided by ssa.Config. That conflicted with
  the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
  types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
  and probably also some other duplicated Type methods
  designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
  and they were not particularly careful about types in general.
  Of necessity, this CL switches them to use *types.Type;
  it does not make them more type-accurate.
  Unfortunately, using types.Type means initializing a bit
  of the types universe.
  This is prime for refactoring and improvement.

This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.

name        old alloc/op      new alloc/op      delta
Template         37.9MB ± 0%       37.7MB ± 0%  -0.57%  (p=0.000 n=10+8)
Unicode          28.9MB ± 0%       28.7MB ± 0%  -0.52%  (p=0.000 n=10+10)
GoTypes           110MB ± 0%        109MB ± 0%  -0.88%  (p=0.000 n=10+10)
Flate            24.7MB ± 0%       24.6MB ± 0%  -0.66%  (p=0.000 n=10+10)
GoParser         31.1MB ± 0%       30.9MB ± 0%  -0.61%  (p=0.000 n=10+9)
Reflect          73.9MB ± 0%       73.4MB ± 0%  -0.62%  (p=0.000 n=10+8)
Tar              25.8MB ± 0%       25.6MB ± 0%  -0.77%  (p=0.000 n=9+10)
XML              41.2MB ± 0%       40.9MB ± 0%  -0.80%  (p=0.000 n=10+10)
[Geo mean]       40.5MB            40.3MB       -0.68%

name        old allocs/op     new allocs/op     delta
Template           385k ± 0%         386k ± 0%    ~     (p=0.356 n=10+9)
Unicode            343k ± 1%         344k ± 0%    ~     (p=0.481 n=10+10)
GoTypes           1.16M ± 0%        1.16M ± 0%  -0.16%  (p=0.004 n=10+10)
Flate              238k ± 1%         238k ± 1%    ~     (p=0.853 n=10+10)
GoParser           320k ± 0%         320k ± 0%    ~     (p=0.720 n=10+9)
Reflect            957k ± 0%         957k ± 0%    ~     (p=0.460 n=10+8)
Tar                252k ± 0%         252k ± 0%    ~     (p=0.133 n=9+10)
XML                400k ± 0%         400k ± 0%    ~     (p=0.796 n=10+10)
[Geo mean]         428k              428k       -0.01%


Removing all the interface calls helps non-trivially with CPU, though.

name        old time/op       new time/op       delta
Template          178ms ± 4%        173ms ± 3%  -2.90%  (p=0.000 n=94+96)
Unicode          85.0ms ± 4%       83.9ms ± 4%  -1.23%  (p=0.000 n=96+96)
GoTypes           543ms ± 3%        528ms ± 3%  -2.73%  (p=0.000 n=98+96)
Flate             116ms ± 3%        113ms ± 4%  -2.34%  (p=0.000 n=96+99)
GoParser          144ms ± 3%        140ms ± 4%  -2.80%  (p=0.000 n=99+97)
Reflect           344ms ± 3%        334ms ± 4%  -3.02%  (p=0.000 n=100+99)
Tar               106ms ± 5%        103ms ± 4%  -3.30%  (p=0.000 n=98+94)
XML               198ms ± 5%        192ms ± 4%  -2.88%  (p=0.000 n=92+95)
[Geo mean]        178ms             173ms       -2.65%

name        old user-time/op  new user-time/op  delta
Template          229ms ± 5%        224ms ± 5%  -2.36%  (p=0.000 n=95+99)
Unicode           107ms ± 6%        106ms ± 5%  -1.13%  (p=0.001 n=93+95)
GoTypes           696ms ± 4%        679ms ± 4%  -2.45%  (p=0.000 n=97+99)
Flate             137ms ± 4%        134ms ± 5%  -2.66%  (p=0.000 n=99+96)
GoParser          176ms ± 5%        172ms ± 8%  -2.27%  (p=0.000 n=98+100)
Reflect           430ms ± 6%        411ms ± 5%  -4.46%  (p=0.000 n=100+92)
Tar               128ms ±13%        123ms ±13%  -4.21%  (p=0.000 n=100+100)
XML               239ms ± 6%        233ms ± 6%  -2.50%  (p=0.000 n=95+97)
[Geo mean]        220ms             213ms       -2.76%


Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-09 23:01:51 +00:00
Matthew Dempsky
1e3570ac86 cmd/internal/objabi: extract shared functionality from obj
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing
the assembler backends no longer requires reinstalling cmd/link or
cmd/addr2line.

There's also now one canonical definition of the object file format in
cmd/internal/objabi/doc.go, with a warning to update all three
implementations.

objabi is still something of a grab bag of unrelated code (e.g., flag
and environment variable handling probably belong in a separate "tool"
package), but this is still progress.

Fixes #15165.
Fixes #20026.

Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c
Reviewed-on: https://go-review.googlesource.com/40972
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-19 00:00:09 +00:00
Cherry Zhang
8a5175df35 cmd/compile: improve startRegs calculation
In register allocation, we calculate what values are used in
and after the current block. If a value is used only after a
function call, since registers are clobbered in call, we don't
need to mark the value live at the entrance of the block.
Before this CL it is considered live, and unnecessary copy or
load may be generated when resolving merge edge.

Fixes #14761.

On AMD64:
name                      old time/op    new time/op    delta
BinaryTree17-12              2.84s ± 1%     2.81s ± 1%   -1.06%  (p=0.000 n=10+9)
Fannkuch11-12                3.61s ± 0%     3.55s ± 1%   -1.77%  (p=0.000 n=10+9)
FmtFprintfEmpty-12          50.4ns ± 4%    50.0ns ± 1%     ~     (p=0.785 n=9+8)
FmtFprintfString-12         80.0ns ± 3%    78.2ns ± 3%   -2.35%  (p=0.004 n=10+9)
FmtFprintfInt-12            81.3ns ± 4%    81.8ns ± 2%     ~     (p=0.159 n=10+10)
FmtFprintfIntInt-12          120ns ± 4%     118ns ± 2%     ~     (p=0.218 n=10+10)
FmtFprintfPrefixedInt-12     152ns ± 3%     155ns ± 2%   +2.11%  (p=0.026 n=10+10)
FmtFprintfFloat-12           240ns ± 1%     238ns ± 1%   -0.79%  (p=0.005 n=9+9)
FmtManyArgs-12               504ns ± 1%     510ns ± 1%   +1.14%  (p=0.000 n=8+9)
GobDecode-12                7.00ms ± 1%    6.99ms ± 0%     ~     (p=0.497 n=9+10)
GobEncode-12                5.47ms ± 1%    5.48ms ± 1%     ~     (p=0.218 n=10+10)
Gzip-12                      258ms ± 2%     256ms ± 1%   -0.96%  (p=0.043 n=10+9)
Gunzip-12                   38.6ms ± 0%    38.3ms ± 0%   -0.64%  (p=0.000 n=9+8)
HTTPClientServer-12         90.4µs ± 3%    87.2µs ±11%     ~     (p=0.053 n=9+10)
JSONEncode-12               15.6ms ± 0%    15.6ms ± 1%     ~     (p=0.077 n=9+9)
JSONDecode-12               55.1ms ± 1%    54.6ms ± 1%   -0.85%  (p=0.010 n=10+9)
Mandelbrot200-12            4.49ms ± 0%    4.47ms ± 0%   -0.25%  (p=0.000 n=10+8)
GoParse-12                  3.38ms ± 0%    3.37ms ± 1%     ~     (p=0.315 n=8+10)
RegexpMatchEasy0_32-12      82.5ns ± 4%    82.0ns ± 0%     ~     (p=0.164 n=10+8)
RegexpMatchEasy0_1K-12       203ns ± 1%     202ns ± 1%   -0.85%  (p=0.000 n=9+10)
RegexpMatchEasy1_32-12      82.3ns ± 1%    81.1ns ± 0%   -1.39%  (p=0.000 n=10+8)
RegexpMatchEasy1_1K-12       357ns ± 1%     357ns ± 1%     ~     (p=0.697 n=8+9)
RegexpMatchMedium_32-12      125ns ± 2%     126ns ± 2%     ~     (p=0.197 n=10+10)
RegexpMatchMedium_1K-12     39.6µs ± 3%    39.6µs ± 1%     ~     (p=0.971 n=10+10)
RegexpMatchHard_32-12       1.99µs ± 2%    1.99µs ± 4%     ~     (p=0.891 n=10+9)
RegexpMatchHard_1K-12       60.1µs ± 3%    60.4µs ± 3%     ~     (p=0.684 n=10+10)
Revcomp-12                   531ms ± 6%     441ms ± 0%  -16.94%  (p=0.000 n=10+9)
Template-12                 58.9ms ± 1%    58.7ms ± 1%     ~     (p=0.315 n=10+10)
TimeParse-12                 319ns ± 1%     320ns ± 4%     ~     (p=0.215 n=9+9)
TimeFormat-12                345ns ± 0%     333ns ± 1%   -3.36%  (p=0.000 n=9+10)
[Geo mean]                  52.2µs         51.6µs        -1.13%

On ARM64:
name                     old time/op    new time/op    delta
BinaryTree17-8              8.53s ± 0%     8.36s ± 0%   -1.89%  (p=0.000 n=10+10)
Fannkuch11-8                6.15s ± 0%     6.10s ± 0%   -0.67%  (p=0.000 n=10+10)
FmtFprintfEmpty-8           117ns ± 0%     117ns ± 0%     ~     (all equal)
FmtFprintfString-8          192ns ± 0%     192ns ± 0%     ~     (all equal)
FmtFprintfInt-8             198ns ± 0%     198ns ± 0%     ~     (p=0.211 n=10+10)
FmtFprintfIntInt-8          289ns ± 0%     291ns ± 0%   +0.59%  (p=0.000 n=7+10)
FmtFprintfPrefixedInt-8     320ns ± 2%     317ns ± 0%     ~     (p=0.431 n=10+8)
FmtFprintfFloat-8           538ns ± 0%     538ns ± 0%     ~     (all equal)
FmtManyArgs-8              1.17µs ± 1%    1.18µs ± 1%     ~     (p=0.063 n=10+10)
GobDecode-8                17.0ms ± 1%    17.2ms ± 1%   +0.83%  (p=0.000 n=10+10)
GobEncode-8                14.2ms ± 0%    14.1ms ± 1%   -0.78%  (p=0.001 n=9+10)
Gzip-8                      806ms ± 0%     797ms ± 0%   -1.12%  (p=0.000 n=6+9)
Gunzip-8                    131ms ± 0%     130ms ± 0%   -0.51%  (p=0.000 n=10+9)
HTTPClientServer-8          206µs ± 9%     212µs ± 2%     ~     (p=0.829 n=10+8)
JSONEncode-8               40.1ms ± 0%    40.1ms ± 0%     ~     (p=0.136 n=9+9)
JSONDecode-8                157ms ± 0%     151ms ± 0%   -3.32%  (p=0.000 n=9+9)
Mandelbrot200-8            10.1ms ± 0%    10.1ms ± 0%   -0.05%  (p=0.000 n=9+8)
GoParse-8                  8.43ms ± 0%    8.43ms ± 0%     ~     (p=0.912 n=10+10)
RegexpMatchEasy0_32-8       228ns ± 1%     227ns ± 0%   -0.26%  (p=0.026 n=10+9)
RegexpMatchEasy0_1K-8      1.92µs ± 0%    1.63µs ± 0%  -15.18%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8       258ns ± 1%     250ns ± 0%   -2.83%  (p=0.000 n=10+10)
RegexpMatchEasy1_1K-8      2.39µs ± 0%    2.13µs ± 0%  -10.94%  (p=0.000 n=9+9)
RegexpMatchMedium_32-8      352ns ± 0%     351ns ± 0%   -0.29%  (p=0.004 n=9+10)
RegexpMatchMedium_1K-8      104µs ± 0%     105µs ± 0%   +0.58%  (p=0.000 n=8+9)
RegexpMatchHard_32-8       5.84µs ± 0%    5.82µs ± 0%   -0.27%  (p=0.000 n=9+10)
RegexpMatchHard_1K-8        177µs ± 0%     177µs ± 0%   -0.07%  (p=0.000 n=9+9)
Revcomp-8                   1.57s ± 1%     1.50s ± 1%   -4.60%  (p=0.000 n=9+10)
Template-8                  157ms ± 1%     153ms ± 1%   -2.28%  (p=0.000 n=10+9)
TimeParse-8                 779ns ± 1%     770ns ± 1%   -1.18%  (p=0.013 n=10+10)
TimeFormat-8                823ns ± 2%     826ns ± 1%     ~     (p=0.324 n=10+9)
[Geo mean]                  144µs          142µs        -1.45%

Reduce cmd/go text size by 0.5%.

Change-Id: I9288ff983c4a7cf03fc0cb35b9b1750828013117
Reviewed-on: https://go-review.googlesource.com/38457
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-29 19:45:32 +00:00
Josh Bleecher Snyder
34975095d0 cmd/compile: provide pos and curfn to temp
Concurrent compilation requires providing an
explicit position and curfn to temp.
This implementation of tempAt temporarily
continues to use the globals lineno and Curfn,
so as not to collide with mdempsky's
work for #19683 eliminating the Curfn dependency
from func nod.

Updates #15756
Updates #19683

Change-Id: Ib3149ca4b0740e9f6eea44babc6f34cdd63028a9
Reviewed-on: https://go-review.googlesource.com/38592
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-25 00:09:21 +00:00
Keith Randall
27bc723b51 cmd/compile: initialize loop depths
Regalloc uses loop depths - make sure they are initialized!

Test to make sure we aren't pushing spills into loops.

This fixes a generated-code performance bug introduced with
the better spill placement change:
https://go-review.googlesource.com/c/34822/

Update #19595

Change-Id: Ib9f0da6fb588503518847d7aab51e569fd3fa61e
Reviewed-on: https://go-review.googlesource.com/38434
Reviewed-by: David Chase <drchase@google.com>
2017-03-23 16:41:22 +00:00
Josh Bleecher Snyder
aea3aff669 cmd/compile: separate ssa.Frontend and ssa.TypeSource
Prior to this CL, the ssa.Frontend field was responsible
for providing types to the backend during compilation.
However, the types needed by the backend are few and static.
It makes more sense to use a struct for them
and to hang that struct off the ssa.Config,
which is the correct home for readonly data.
Now that Types is a struct, we can clean up the names a bit as well.

This has the added benefit of allowing early construction
of all types needed by the backend.
This will be useful for concurrent backend compilation.

Passes toolstash-check -all. No compiler performance change.

Updates #15756

Change-Id: I021658c8cf2836d6a22bbc20cc828ac38c7da08a
Reviewed-on: https://go-review.googlesource.com/38336
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-19 00:21:23 +00:00
Josh Bleecher Snyder
2cdb7f118a cmd/compile: move Frontend field from ssa.Config to ssa.Func
Suggested by mdempsky in CL 38232.
This allows us to use the Frontend field
to associate frontend state and information
with a function.
See the following CL in the series for examples.

This is a giant CL, but it is almost entirely routine refactoring.

The ssa test API is starting to feel a bit unwieldy.
I will clean it up separately, once the dust has settled.

Passes toolstash -cmp.

Updates #15756

Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf
Reviewed-on: https://go-review.googlesource.com/38327
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 23:18:57 +00:00
Josh Bleecher Snyder
a5e3cac895 cmd/compile: rearrange fields between ssa.Func, ssa.Cache, and ssa.Config
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill
the roles laid out for them in CL 38160.

The only non-trivial change in this CL is how cached
values and blocks get IDs. Prior to this CL, their IDs were
assigned as part of resetting the cache, and only modified
IDs were reset. This required knowing how many values and
blocks were modified, which required a tight coupling between
ssa.Func and ssa.Config. To eliminate that coupling,
we now zero values and blocks during reset,
and assign their IDs when they are used.
Since unused values and blocks have ID == 0,
we can efficiently find the last used value/block,
to avoid zeroing everything.
Bulk zeroing is efficient, but not efficient enough
to obviate the need to avoid zeroing everything every time.
As a happy side-effect, ssa.Func.Free is no longer necessary.

DebugHashMatch and friends now belong in func.go.
They have been left in place for clarity and review.
I will move them in a subsequent CL.

Passes toolstash -cmp. No compiler performance impact.
No change in 'go test cmd/compile/internal/ssa' execution time.

Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098
Reviewed-on: https://go-review.googlesource.com/38167
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-17 05:21:42 +00:00
David Chase
886e9e6065 cmd/compile: put spills in better places
Previously we always issued a spill right after the op
that was being spilled.  This CL pushes spills father away
from the generator, hopefully pushing them into unlikely branches.
For example:

  x = ...
  if unlikely {
    call ...
  }
  ... use x ...

Used to compile to

  x = ...
  spill x
  if unlikely {
    call ...
    restore x
  }

It now compiles to

  x = ...
  if unlikely {
    spill x
    call ...
    restore x
  }

This is particularly useful for code which appends, as the only
call is an unlikely call to growslice.  It also helps for the
spills needed around write barrier calls.

The basic algorithm is walk down the dominator tree following a
path where the block still dominates all of the restores.  We're
looking for a block that:
 1) dominates all restores
 2) has the value being spilled in a register
 3) has a loop depth no deeper than the value being spilled

The walking-down code is iterative.  I was forced to limit it to
searching 100 blocks so it doesn't become O(n^2).  Maybe one day
we'll find a better way.

I had to delete most of David's code which pushed spills out of loops.
I suspect this CL subsumes most of the cases that his code handled.

Generally positive performance improvements, but hard to tell for sure
with all the noise.  (compilebench times are unchanged.)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.91s ±15%     2.80s ±12%    ~     (p=0.063 n=10+10)
Fannkuch11-12                3.47s ± 0%     3.30s ± 4%  -4.91%   (p=0.000 n=9+10)
FmtFprintfEmpty-12          48.0ns ± 1%    47.4ns ± 1%  -1.32%    (p=0.002 n=9+9)
FmtFprintfString-12         85.6ns ±11%    79.4ns ± 3%  -7.27%  (p=0.005 n=10+10)
FmtFprintfInt-12            91.8ns ±10%    85.9ns ± 4%    ~      (p=0.203 n=10+9)
FmtFprintfIntInt-12          135ns ±13%     127ns ± 1%  -5.72%   (p=0.025 n=10+9)
FmtFprintfPrefixedInt-12     167ns ± 1%     168ns ± 2%    ~      (p=0.580 n=9+10)
FmtFprintfFloat-12           249ns ±11%     230ns ± 1%  -7.32%  (p=0.000 n=10+10)
FmtManyArgs-12               504ns ± 7%     506ns ± 1%    ~       (p=0.198 n=9+9)
GobDecode-12                6.95ms ± 1%    7.04ms ± 1%  +1.37%  (p=0.001 n=10+10)
GobEncode-12                6.32ms ±13%    6.04ms ± 1%    ~     (p=0.063 n=10+10)
Gzip-12                      233ms ± 1%     235ms ± 0%  +1.01%   (p=0.000 n=10+9)
Gunzip-12                   40.1ms ± 1%    39.6ms ± 0%  -1.12%   (p=0.000 n=10+8)
HTTPClientServer-12          227µs ± 9%     221µs ± 5%    ~       (p=0.114 n=9+8)
JSONEncode-12               16.1ms ± 2%    15.8ms ± 1%  -2.09%    (p=0.002 n=9+8)
JSONDecode-12               61.8ms ±11%    57.9ms ± 1%  -6.30%   (p=0.000 n=10+9)
Mandelbrot200-12            4.30ms ± 3%    4.28ms ± 1%    ~      (p=0.203 n=10+8)
GoParse-12                  3.18ms ± 2%    3.18ms ± 2%    ~     (p=0.579 n=10+10)
RegexpMatchEasy0_32-12      76.7ns ± 1%    77.5ns ± 1%  +0.92%    (p=0.002 n=9+8)
RegexpMatchEasy0_1K-12       239ns ± 3%     239ns ± 1%    ~     (p=0.204 n=10+10)
RegexpMatchEasy1_32-12      71.4ns ± 1%    70.6ns ± 0%  -1.15%   (p=0.000 n=10+9)
RegexpMatchEasy1_1K-12       383ns ± 2%     390ns ±10%    ~       (p=0.181 n=8+9)
RegexpMatchMedium_32-12      114ns ± 0%     113ns ± 1%  -0.88%    (p=0.000 n=9+8)
RegexpMatchMedium_1K-12     36.3µs ± 1%    36.8µs ± 1%  +1.59%   (p=0.000 n=10+8)
RegexpMatchHard_32-12       1.90µs ± 1%    1.90µs ± 1%    ~     (p=0.341 n=10+10)
RegexpMatchHard_1K-12       59.4µs ±11%    57.8µs ± 1%    ~      (p=0.968 n=10+9)
Revcomp-12                   461ms ± 1%     462ms ± 1%    ~       (p=1.000 n=9+9)
Template-12                 67.5ms ± 1%    66.3ms ± 1%  -1.77%   (p=0.000 n=10+8)
TimeParse-12                 314ns ± 3%     309ns ± 0%  -1.56%    (p=0.000 n=9+8)
TimeFormat-12                340ns ± 2%     331ns ± 1%  -2.79%  (p=0.000 n=10+10)

The go binary is 0.2% larger.  Not really sure why the size
would change.

Change-Id: Ia5116e53a3aeb025ef350ffc51c14ae5cc17871c
Reviewed-on: https://go-review.googlesource.com/34822
Reviewed-by: David Chase <drchase@google.com>
2017-03-15 02:09:25 +00:00
Cherry Zhang
8a44c8efae cmd/compile: don't spill rematerializeable value when resolving merge edges
Fixes #19515.

Change-Id: I4bcce152cef52d00fbb5ab4daf72a6e742bae27c
Reviewed-on: https://go-review.googlesource.com/38158
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-14 22:55:52 +00:00
Josh Bleecher Snyder
57546d67ec cmd/compile: add reusable []Location to ssa.Config
name       old time/op      new time/op      delta
Template        218ms ± 3%       214ms ± 3%  -1.70%  (p=0.000 n=30+30)
Unicode         100ms ± 3%       100ms ± 4%    ~     (p=0.614 n=29+30)
GoTypes         657ms ± 1%       660ms ± 3%  +0.46%  (p=0.046 n=29+30)
Compiler        2.80s ± 2%       2.80s ± 1%    ~     (p=0.451 n=28+29)
Flate           131ms ± 2%       132ms ± 4%    ~     (p=1.000 n=29+29)
GoParser        159ms ± 3%       160ms ± 5%    ~     (p=0.341 n=28+30)
Reflect         406ms ± 3%       408ms ± 4%    ~     (p=0.511 n=28+30)
Tar             118ms ± 4%       118ms ± 4%    ~     (p=0.827 n=29+30)
XML             222ms ± 6%       222ms ± 3%    ~     (p=0.532 n=30+30)

name       old user-ns/op   new user-ns/op   delta
Template   274user-ms ± 3%  272user-ms ± 3%  -0.87%  (p=0.015 n=29+30)
Unicode    140user-ms ± 4%  140user-ms ± 3%    ~     (p=0.735 n=29+30)
GoTypes    890user-ms ± 1%  897user-ms ± 2%  +0.88%  (p=0.002 n=29+30)
Compiler   3.88user-s ± 2%  3.89user-s ± 1%    ~     (p=0.132 n=30+29)
Flate      168user-ms ± 2%  157user-ms ± 4%  -6.21%  (p=0.000 n=25+28)
GoParser   211user-ms ± 2%  213user-ms ± 5%    ~     (p=0.086 n=28+30)
Reflect    539user-ms ± 2%  541user-ms ± 3%    ~     (p=0.267 n=27+29)
Tar        156user-ms ± 7%  155user-ms ± 5%    ~     (p=0.708 n=30+30)
XML        291user-ms ± 5%  294user-ms ± 3%  +0.83%  (p=0.029 n=29+30)

name       old alloc/op     new alloc/op     delta
Template       40.7MB ± 0%      39.4MB ± 0%  -3.26%  (p=0.000 n=29+26)
Unicode        30.8MB ± 0%      30.7MB ± 0%  -0.40%  (p=0.000 n=28+30)
GoTypes         123MB ± 0%       119MB ± 0%  -3.47%  (p=0.000 n=30+29)
Compiler        472MB ± 0%       455MB ± 0%  -3.60%  (p=0.000 n=30+30)
Flate          26.5MB ± 0%      25.6MB ± 0%  -3.21%  (p=0.000 n=28+30)
GoParser       32.3MB ± 0%      31.4MB ± 0%  -2.98%  (p=0.000 n=29+30)
Reflect        84.4MB ± 0%      82.1MB ± 0%  -2.83%  (p=0.000 n=30+30)
Tar            27.3MB ± 0%      26.5MB ± 0%  -2.70%  (p=0.000 n=29+29)
XML            44.6MB ± 0%      43.1MB ± 0%  -3.49%  (p=0.000 n=30+30)

name       old allocs/op    new allocs/op    delta
Template         401k ± 1%        399k ± 0%  -0.35%  (p=0.000 n=30+28)
Unicode          331k ± 0%        331k ± 1%    ~     (p=0.907 n=28+30)
GoTypes         1.24M ± 0%       1.23M ± 0%  -0.43%  (p=0.000 n=30+30)
Compiler        4.26M ± 0%       4.25M ± 0%  -0.34%  (p=0.000 n=29+30)
Flate            252k ± 1%        251k ± 1%  -0.41%  (p=0.000 n=30+30)
GoParser         325k ± 1%        324k ± 1%  -0.31%  (p=0.000 n=27+30)
Reflect         1.06M ± 0%       1.05M ± 0%  -0.69%  (p=0.000 n=30+30)
Tar              266k ± 1%        265k ± 1%  -0.51%  (p=0.000 n=29+30)
XML              416k ± 1%        415k ± 1%  -0.36%  (p=0.002 n=30+30)

Change-Id: I8f784001324df83b2764c44f0e83a540e5beab34
Reviewed-on: https://go-review.googlesource.com/36212
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-02 22:39:32 +00:00
Russ Cox
47ce87877b all: merge dev.inline into master
Change-Id: I7715581a04e513dcda9918e853fa6b1ddc703770
2017-02-01 09:47:23 -05:00
Robert Griesemer
472c792e0a [dev.inline] cmd/internal/src: introduce compact source position representation
XPos is a compact (8 instead of 16 bytes on a 64bit machine) source
position representation. There is a 1:1 correspondence between each
XPos and each regular Pos, translated via a global table.

In some sense this brings back the LineHist, though positions can
track line and column information; there is a O(1) translation
between the representations (no binary search), and the translation
is factored out.

The size increase with the prior change is brought down again and
the compiler speed is in line with the master repo (measured on
the same "quiet" machine as for prior change):

name       old time/op     new time/op     delta
Template       256ms ± 1%      262ms ± 2%    ~             (p=0.063 n=5+4)
Unicode        132ms ± 1%      135ms ± 2%    ~             (p=0.063 n=5+4)
GoTypes        891ms ± 1%      871ms ± 1%  -2.28%          (p=0.016 n=5+4)
Compiler       3.84s ± 2%      3.89s ± 2%    ~             (p=0.413 n=5+4)
MakeBash       47.1s ± 1%      46.2s ± 2%    ~             (p=0.095 n=5+5)

name       old user-ns/op  new user-ns/op  delta
Template        309M ± 1%       314M ± 2%    ~             (p=0.111 n=5+4)
Unicode         165M ± 1%       172M ± 9%    ~             (p=0.151 n=5+5)
GoTypes        1.14G ± 2%      1.12G ± 1%    ~             (p=0.063 n=5+4)
Compiler       5.00G ± 1%      4.96G ± 1%    ~             (p=0.286 n=5+4)

Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47
Reviewed-on: https://go-review.googlesource.com/34506
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-09 22:43:22 +00:00
shawnps
067bab00a8 all: fix misspellings
Change-Id: I429637ca91f7db4144f17621de851a548dc1ce76
Reviewed-on: https://go-review.googlesource.com/34923
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-07 16:53:25 +00:00
Robert Griesemer
eaca0e0529 [dev.inline] cmd/internal/src: introduce NoPos and use it instead Pos{}
Using a variable instead of a composite literal makes
the code independent of implementation changes of Pos.

Per David Lazar's suggestion.

Change-Id: I336967ac12a027c51a728a58ac6207cb5119af4a
Reviewed-on: https://go-review.googlesource.com/34148
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 00:35:07 +00:00
Robert Griesemer
c10499b539 [dev.inline] cmd/compile/internal/ssa: another round of renames from line -> pos (cleanup)
Mostly mechanical renames. Make variable names consistent with use.

Change-Id: Iaa89d31deab11eca6e784595b58e779ad525c8a3
Reviewed-on: https://go-review.googlesource.com/34146
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-12-08 23:10:30 +00:00
Robert Griesemer
cfd17f51c8 [dev.inline] cmd/compile/internal/ssa: rename various fields from Line to Pos
This is a mostly mechanical rename followed by manual fixes where necessary.

Change-Id: Ie5c670b133db978f15dc03e50dc2da0c80fc8842
Reviewed-on: https://go-review.googlesource.com/34137
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:36:52 +00:00
Robert Griesemer
82d0caea2c [dev.inline] cmd/internal/src: make Pos implementation abstract
Adjust cmd/compile accordingly.

This will make it easier to replace the underlying implementation.

Change-Id: I33645850bb18c839b24785b6222a9e028617addb
Reviewed-on: https://go-review.googlesource.com/34133
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:31:28 +00:00
Robert Griesemer
24597c080b [dev.inline] cmd/compile: introduce cmd/internal/src.Pos type for line numbers
This is a step toward chosing a different position representation.
By introducing an explicit type, it will be easier to make the
transition step-wise while ensuring everything keeps running.

This has been reviewed via https://go-review.googlesource.com/#/c/34025/.

Change-Id: Ibceddcd62d8f346321ac3250e3940e9c436ed684
Reviewed-on: https://go-review.googlesource.com/34132
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:26:25 +00:00
Cherry Zhang
348275cda6 cmd/compile: make a copy of Phi input if it is still live
Register of Phi input is allocated to the Phi. So if the Phi
input is still live after Phi, we may need to use a spill. In
this case, copy the Phi input to a spare register to avoid a
spill.

Originally targeted the code in issue #16187, and this CL
indeed removes the spill, but doesn't seem to help on benchmark
result. It may help in general, though.

On AMD64:
name                      old time/op    new time/op    delta
BinaryTree17-12              2.79s ± 1%     2.76s ± 0%  -1.33%  (p=0.000 n=10+10)
Fannkuch11-12                3.02s ± 0%     3.14s ± 0%  +3.99%  (p=0.000 n=10+10)
FmtFprintfEmpty-12          51.2ns ± 0%    51.4ns ± 3%    ~      (p=0.368 n=8+10)
FmtFprintfString-12          145ns ± 0%     144ns ± 0%  -0.69%    (p=0.000 n=6+9)
FmtFprintfInt-12             127ns ± 1%     124ns ± 1%  -2.79%   (p=0.000 n=10+9)
FmtFprintfIntInt-12          186ns ± 0%     184ns ± 0%  -1.34%   (p=0.000 n=10+9)
FmtFprintfPrefixedInt-12     196ns ± 0%     194ns ± 0%  -0.97%    (p=0.000 n=9+9)
FmtFprintfFloat-12           293ns ± 2%     287ns ± 0%  -2.00%   (p=0.000 n=10+9)
FmtManyArgs-12               847ns ± 1%     829ns ± 0%  -2.17%   (p=0.000 n=10+7)
GobDecode-12                7.17ms ± 0%    7.18ms ± 0%    ~     (p=0.123 n=10+10)
GobEncode-12                6.08ms ± 1%    6.08ms ± 0%    ~      (p=0.497 n=10+9)
Gzip-12                      277ms ± 1%     275ms ± 1%  -0.47%   (p=0.028 n=10+9)
Gunzip-12                   39.1ms ± 2%    38.2ms ± 1%  -2.20%   (p=0.000 n=10+9)
HTTPClientServer-12         90.9µs ± 4%    87.7µs ± 2%  -3.51%   (p=0.001 n=9+10)
JSONEncode-12               17.3ms ± 1%    16.5ms ± 0%  -5.02%    (p=0.000 n=9+9)
JSONDecode-12               54.6ms ± 1%    54.1ms ± 0%  -0.99%    (p=0.000 n=9+9)
Mandelbrot200-12            4.45ms ± 0%    4.45ms ± 0%  -0.02%    (p=0.006 n=8+9)
GoParse-12                  3.44ms ± 0%    3.48ms ± 1%  +0.95%  (p=0.000 n=10+10)
RegexpMatchEasy0_32-12      84.9ns ± 0%    85.0ns ± 0%    ~       (p=0.241 n=8+8)
RegexpMatchEasy0_1K-12       867ns ± 3%     915ns ±11%  +5.55%  (p=0.037 n=10+10)
RegexpMatchEasy1_32-12      82.7ns ± 5%    83.9ns ± 4%    ~      (p=0.161 n=9+10)
RegexpMatchEasy1_1K-12       361ns ± 1%     363ns ± 0%    ~      (p=0.098 n=10+8)
RegexpMatchMedium_32-12      126ns ± 0%     126ns ± 1%    ~      (p=0.549 n=8+10)
RegexpMatchMedium_1K-12     38.8µs ± 0%    39.1µs ± 0%  +0.67%    (p=0.000 n=9+8)
RegexpMatchHard_32-12       1.95µs ± 0%    1.96µs ± 0%  +0.43%    (p=0.000 n=9+9)
RegexpMatchHard_1K-12       59.0µs ± 0%    59.1µs ± 0%  +0.27%   (p=0.000 n=10+9)
Revcomp-12                   436ms ± 1%     431ms ± 1%  -1.19%  (p=0.005 n=10+10)
Template-12                 56.7ms ± 1%    57.1ms ± 1%  +0.71%   (p=0.001 n=10+9)
TimeParse-12                 312ns ± 0%     310ns ± 0%  -0.80%   (p=0.000 n=10+9)
TimeFormat-12                336ns ± 0%     332ns ± 0%  -1.19%    (p=0.000 n=8+7)
[Geo mean]                  59.2µs         58.9µs       -0.42%

On PPC64:
name                     old time/op    new time/op    delta
BinaryTree17-2              4.67s ± 2%     4.71s ± 1%    ~     (p=0.421 n=5+5)
Fannkuch11-2                3.92s ± 1%     3.94s ± 0%  +0.46%  (p=0.032 n=5+5)
FmtFprintfEmpty-2           122ns ± 0%     120ns ± 2%  -1.80%  (p=0.016 n=4+5)
FmtFprintfString-2          305ns ± 1%     299ns ± 1%  -1.84%  (p=0.008 n=5+5)
FmtFprintfInt-2             243ns ± 0%     241ns ± 1%  -0.66%  (p=0.016 n=4+5)
FmtFprintfIntInt-2          361ns ± 1%     356ns ± 1%  -1.49%  (p=0.016 n=5+5)
FmtFprintfPrefixedInt-2     355ns ± 1%     357ns ± 1%    ~     (p=0.333 n=5+5)
FmtFprintfFloat-2           502ns ± 2%     498ns ± 1%    ~     (p=0.151 n=5+5)
FmtManyArgs-2              1.55µs ± 2%    1.59µs ± 1%  +2.52%  (p=0.008 n=5+5)
GobDecode-2                13.0ms ± 1%    13.0ms ± 1%    ~     (p=0.841 n=5+5)
GobEncode-2                11.8ms ± 1%    11.8ms ± 1%    ~     (p=0.690 n=5+5)
Gzip-2                      499ms ± 1%     503ms ± 0%    ~     (p=0.421 n=5+5)
Gunzip-2                   86.5ms ± 0%    86.4ms ± 1%    ~     (p=0.841 n=5+5)
HTTPClientServer-2         68.2µs ± 2%    69.6µs ± 3%    ~     (p=0.151 n=5+5)
JSONEncode-2               39.0ms ± 1%    37.2ms ± 1%  -4.65%  (p=0.008 n=5+5)
JSONDecode-2                122ms ± 1%     126ms ± 1%  +2.63%  (p=0.008 n=5+5)
Mandelbrot200-2            6.08ms ± 1%    5.89ms ± 1%  -3.06%  (p=0.008 n=5+5)
GoParse-2                  5.95ms ± 2%    5.98ms ± 1%    ~     (p=0.421 n=5+5)
RegexpMatchEasy0_32-2       331ns ± 1%     328ns ± 1%    ~     (p=0.056 n=5+5)
RegexpMatchEasy0_1K-2      1.45µs ± 0%    1.47µs ± 0%  +1.13%  (p=0.008 n=5+5)
RegexpMatchEasy1_32-2       359ns ± 0%     353ns ± 0%  -1.84%  (p=0.008 n=5+5)
RegexpMatchEasy1_1K-2      1.79µs ± 0%    1.81µs ± 1%  +1.16%  (p=0.008 n=5+5)
RegexpMatchMedium_32-2      420ns ± 2%     413ns ± 0%  -1.72%  (p=0.008 n=5+5)
RegexpMatchMedium_1K-2     70.2µs ± 1%    69.5µs ± 1%  -1.09%  (p=0.032 n=5+5)
RegexpMatchHard_32-2       3.87µs ± 1%    3.65µs ± 0%  -5.86%  (p=0.008 n=5+5)
RegexpMatchHard_1K-2        111µs ± 0%     105µs ± 0%  -5.49%  (p=0.016 n=5+4)
Revcomp-2                   1.00s ± 1%     1.01s ± 2%    ~     (p=0.151 n=5+5)
Template-2                  113ms ± 1%     113ms ± 2%    ~     (p=0.841 n=5+5)
TimeParse-2                 555ns ± 0%     550ns ± 1%  -0.87%  (p=0.032 n=5+5)
TimeFormat-2                736ns ± 1%     704ns ± 1%  -4.35%  (p=0.008 n=5+5)
[Geo mean]                  120µs          119µs       -0.77%

Reduce "spilled value remains" by 0.6% in cmd/go on AMD64.

Change-Id: If655df343b0f30d1a49ab1ab644f10c698b96f3e
Reviewed-on: https://go-review.googlesource.com/32442
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-11-18 13:56:23 +00:00
Cherry Zhang
f9238a76ff cmd/compile: make LR allocatable in non-leaf functions on ARM
The mechanism is initially introduced (and reviewed) in CL 30597
on S390X.

Reduce number of "spilled value remains" by 0.4% in cmd/go.

Disabled on ARMv5 because LR is clobbered almost everywhere with
inserted softfloat calls.

Change-Id: I2934737ce2455909647ed2118fe2bd6f0aa5ac52
Reviewed-on: https://go-review.googlesource.com/32178
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-10-28 14:25:33 +00:00
Michael Munday
15817e409b cmd/compile: make link register allocatable in non-leaf functions
We save and restore the link register in non-leaf functions because
it is clobbered by CALLs. It is therefore available for general
purpose use.

Only enabled on s390x currently. The RC4 benchmarks in particular
benefit from the extra register:

name     old speed     new speed     delta
RC4_128  243MB/s ± 2%  341MB/s ± 2%  +40.46%  (p=0.008 n=5+5)
RC4_1K   267MB/s ± 0%  359MB/s ± 1%  +34.32%  (p=0.008 n=5+5)
RC4_8K   271MB/s ± 0%  362MB/s ± 0%  +33.61%  (p=0.008 n=5+5)

Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f
Reviewed-on: https://go-review.googlesource.com/30597
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-11 18:52:35 +00:00
Keith Randall
93d5f43a29 cmd/compile: do regalloc check only when checkEnabled
No point doing this check all the time.

Fixes #15621

Change-Id: I1966c061986fe98fe9ebe146d6b9738c13cef724
Reviewed-on: https://go-review.googlesource.com/30670
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-07 17:33:15 +00:00
Keith Randall
1bddd2ee6a cmd/compile: don't shuffle rematerializeable values around
Better to just rematerialize them when needed instead of
cross-register spilling or other techniques for keeping them in
registers.

This helps for amd64 code that does 1 << x. It is better to do
  loop:
    MOVQ $1, AX  // materialize arg to SLLQ
    SLLQ CX, AX
    ...
    goto loop
than to do
  MOVQ $1, AX    // materialize outsize of loop
  loop:
    MOVQ AX, DX  // save value that's about to be clobbered
    SLLQ CX, AX
    MOVQ DX, AX  // move it back to the correct register
    goto loop

Update #16092

Change-Id: If7ac290208f513061ebb0736e8a79dcb0ba338c0
Reviewed-on: https://go-review.googlesource.com/30471
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-06 02:46:43 +00:00
Matthew Dempsky
6dc356a76a cmd/compile/internal/ssa: erase register copies deterministically
Fixes #17288.

Change-Id: I2ddd01d14667d5c6a2e19bd70489da8d9869d308
Reviewed-on: https://go-review.googlesource.com/30072
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-29 22:22:36 +00:00
Cherry Zhang
f876fb9bae cmd/compile: move value around before kick it out of register
When allocating registers, before kicking out the existing value,
copy it to a spare register if there is one. So later use of this
value can be found in register instead of reload from spill. This
is very helpful for instructions of which the input and/or output
can only be in specific registers, e.g. DIV on x86, MUL/DIV on
MIPS. May also be helpful in general.

For "go build -a cmd/go" on AMD64, reduce "spilled value remains"
by 1% (not including args, which almost certainly remain).

For the code in issue #16061 on AMD64:
MaxRem-12   111µs ± 1%    94µs ± 0%  -15.38%  (p=0.008 n=5+5)

Go1 benchmark on AMD64:
BinaryTree17-12              2.32s ± 2%     2.30s ± 1%    ~     (p=0.421 n=5+5)
Fannkuch11-12                2.52s ± 0%     2.44s ± 0%  -3.44%  (p=0.008 n=5+5)
FmtFprintfEmpty-12          39.9ns ± 3%    39.8ns ± 0%    ~     (p=0.635 n=5+4)
FmtFprintfString-12          114ns ± 1%     113ns ± 1%    ~     (p=0.905 n=5+5)
FmtFprintfInt-12             102ns ± 6%      98ns ± 1%    ~     (p=0.087 n=5+5)
FmtFprintfIntInt-12          146ns ± 5%     147ns ± 1%    ~     (p=0.238 n=5+5)
FmtFprintfPrefixedInt-12     155ns ± 2%     151ns ± 1%  -2.58%  (p=0.008 n=5+5)
FmtFprintfFloat-12           231ns ± 1%     232ns ± 1%    ~     (p=0.286 n=5+5)
FmtManyArgs-12               657ns ± 1%     649ns ± 0%  -1.31%  (p=0.008 n=5+5)
GobDecode-12                6.35ms ± 0%    6.29ms ± 1%    ~     (p=0.056 n=5+5)
GobEncode-12                5.38ms ± 1%    5.45ms ± 1%    ~     (p=0.056 n=5+5)
Gzip-12                      209ms ± 0%     209ms ± 1%    ~     (p=0.690 n=5+5)
Gunzip-12                   31.2ms ± 1%    31.1ms ± 1%    ~     (p=0.548 n=5+5)
HTTPClientServer-12          123µs ± 4%     130µs ± 8%    ~     (p=0.151 n=5+5)
JSONEncode-12               14.0ms ± 1%    14.0ms ± 1%    ~     (p=0.421 n=5+5)
JSONDecode-12               41.2ms ± 1%    41.1ms ± 2%    ~     (p=0.421 n=5+5)
Mandelbrot200-12            3.96ms ± 1%    3.98ms ± 0%    ~     (p=0.421 n=5+5)
GoParse-12                  2.88ms ± 1%    2.88ms ± 1%    ~     (p=0.841 n=5+5)
RegexpMatchEasy0_32-12      68.0ns ± 3%    66.6ns ± 1%  -2.00%  (p=0.024 n=5+5)
RegexpMatchEasy0_1K-12       728ns ± 8%     682ns ± 1%  -6.26%  (p=0.008 n=5+5)
RegexpMatchEasy1_32-12      66.8ns ± 2%    66.0ns ± 1%    ~     (p=0.302 n=5+5)
RegexpMatchEasy1_1K-12       291ns ± 2%     288ns ± 1%    ~     (p=0.111 n=5+5)
RegexpMatchMedium_32-12      103ns ± 2%     100ns ± 0%  -2.53%  (p=0.016 n=5+4)
RegexpMatchMedium_1K-12     31.9µs ± 1%    31.3µs ± 0%  -1.75%  (p=0.008 n=5+5)
RegexpMatchHard_32-12       1.59µs ± 2%    1.59µs ± 1%    ~     (p=0.548 n=5+5)
RegexpMatchHard_1K-12       48.3µs ± 2%    47.7µs ± 1%    ~     (p=0.222 n=5+5)
Revcomp-12                   340ms ± 1%     338ms ± 1%    ~     (p=0.421 n=5+5)
Template-12                 46.3ms ± 1%    46.5ms ± 1%    ~     (p=0.690 n=5+5)
TimeParse-12                 252ns ± 1%     247ns ± 0%  -1.91%  (p=0.000 n=5+4)
TimeFormat-12                277ns ± 1%     267ns ± 0%  -3.82%  (p=0.008 n=5+5)
[Geo mean]                  48.8µs         48.3µs       -0.93%

It has very little effect on binary size and compiler speed.
compilebench:
Template       230ms ±10%      231ms ± 8%    ~             (p=0.546 n=9+9)
Unicode        123ms ± 6%      124ms ± 9%    ~           (p=0.481 n=10+10)
GoTypes        742ms ± 6%      755ms ± 3%    ~           (p=0.123 n=10+10)
Compiler       3.10s ± 3%      3.08s ± 1%    ~           (p=0.631 n=10+10)

Fixes #16061.

Change-Id: Id99cdc7a182ee10a704fa0f04e8e0d0809b2ac56
Reviewed-on: https://go-review.googlesource.com/29732
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-27 15:44:58 +00:00
David Chase
cddddbc623 cmd/compile: use ISEL, cleanup use of zero & extensions
Abandoned earlier efforts to expose zero register,
but left it in numbering to decrease squirrelyness of
register allocator.

ISELrelOp used in code generation of bool := x relOp y.
Some patterns added to better elide zero case and
some sign extension.

Updates: #17109

Change-Id: Ida7839f0023ca8f0ffddc0545f0ac269e65b05d9
Reviewed-on: https://go-review.googlesource.com/29380
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-09-22 17:36:39 +00:00
Keith Randall
75ce89c20d cmd/compile: cache CFG-dependent computations
We compute a lot of stuff based off the CFG: postorder traversal,
dominators, dominator tree, loop nest.  Multiple phases use this
information and we end up recomputing some of it.  Add a cache
for this information so if the CFG hasn't changed, we can reuse
the previous computation.

Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2
Reviewed-on: https://go-review.googlesource.com/29356
Reviewed-by: David Chase <drchase@google.com>
2016-09-19 16:00:13 +00:00
Keith Randall
833ed7c431 cmd/compile: reorganize SSA register numbering
Teach SSA about the cmd/internal/obj/$ARCH register numbering.
It can then return that numbering when requested.  Each architecture
now does not need to know anything about the internal SSA numbering
of registers.

Change-Id: I34472a2736227c15482e60994eebcdd2723fa52d
Reviewed-on: https://go-review.googlesource.com/29249
Reviewed-by: David Chase <drchase@google.com>
2016-09-16 19:01:55 +00:00
Cherry Zhang
46ba59025f cmd/compile: label LoadReg with line number of the use
A tentative fix of #16380. It adds "line" everywhere...

This also reduces binary size slightly (cmd/go on ARM as an example):

			before		after
total binary size	8068097		8018945 (-0.6%)
.gopclntab		1195341		1179929 (-1.3%)
.debug_line		 689692		 652017 (-5.5%)

Change-Id: Ibda657c6999783c5bac180cbbba487006dbf0ed7
Reviewed-on: https://go-review.googlesource.com/25082
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-16 15:38:28 +00:00
Keith Randall
167e381f40 cmd/compile: make ssa compilation unconditional
Rip out the code that allows SSA to be used conditionally.

No longer exists:
 ssa=0 flag
 GOSSAHASH
 GOSSAPKG
 SSATEST

GOSSAFUNC now only controls the printing of the IR/html.

Still need to rip out all of the old backend.  It should no longer be
callable after this CL.

Update #16357

Change-Id: Ib30cc18fba6ca52232c41689ba610b0a94aa74f5
Reviewed-on: https://go-review.googlesource.com/29155
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-09-14 17:38:04 +00:00
Michael Munday
6ec993adc3 cmd/compile: add SSA backend for s390x and enable by default
The new SSA backend modifies the ABI slightly: R0 is now a usable
general purpose register.

Fixes #16677.

Change-Id: I367435ce921e0c7e79e021c80cf8ef5d1d1466cf
Reviewed-on: https://go-review.googlesource.com/28978
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-13 19:39:38 +00:00
Keith Randall
c345a3913f cmd/compile: get rid of BlockCall
No need for it, we can treat calls as (mostly) normal values
that take a memory and return a memory.

Lowers the number of basic blocks needed to represent a function.
"go test -c net/http" uses 27% fewer basic blocks.
Probably doesn't affect generated code much, but should help
various passes whose running time and/or space depends on
the number of basic blocks.

Fixes #15631

Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6
Reviewed-on: https://go-review.googlesource.com/28950
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-09-12 23:27:02 +00:00
Cherry Zhang
4354ffd38b cmd/compile: intrinsify Ctz, Bswap, and some atomics on ARM64
Change-Id: Ia5bf72b70e6f6522d6fb8cd050e78f862d37b5ae
Reviewed-on: https://go-review.googlesource.com/27936
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-08 19:45:25 +00:00
Keith Randall
320ddcf834 cmd/compile: inline atomics from runtime/internal/atomic on amd64
Inline atomic reads and writes on amd64.  There's no reason
to pay the overhead of a call for these.

To keep atomic loads from being reordered, we make them
return a <value,memory> tuple.

Change the meaning of resultInArg0 for tuple-generating ops
to mean the first part of the result tuple, not the second.
This means we can always put the store part of the tuple last,
matching how arguments are laid out.  This requires reordering
the outputs of add32carry and sub32carry and their descendents
in various architectures.

benchmark                    old ns/op     new ns/op     delta
BenchmarkAtomicLoad64-8      2.09          0.26          -87.56%
BenchmarkAtomicStore64-8     7.54          5.72          -24.14%

TBD (in a different CL): Cas, Or8, ...

Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207
Reviewed-on: https://go-review.googlesource.com/27641
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-25 20:09:04 +00:00
Cherry Zhang
e71e1fe87e cmd/compile: get MIPS64 SSA working
- implement *, /, %, shifts, Zero, Move.
- fix mistakes in comparison.
- fix floating point rounding.
- handle RetJmp in assembler (which was not handled, as a consequence
  Duff's device was disabled in the old backend.)

all.bash now passes with SSA on.

Updates #16359.

Change-Id: Ia14eed0ed1176b5d800592080c8f53dded7fe73f
Reviewed-on: https://go-review.googlesource.com/27592
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-25 12:53:36 +00:00
David Chase
5b9ff11c3d cmd/compile: ppc64le working, not optimized enough
This time with the cherry-pick from the proper patch of
the old CL.

Stack size increased.
Corrected NaN-comparison glitches.
Marked g register as clobbered by calls.
Fixed shared libraries.

live_ssa.go still disabled because of differences.
Presumably turning on more optimization will fix
both the stack size and the live_ssa.go glitches.

Enhanced debugging output for shared libs test.

Rebased onto master.

Updates #16010.

Change-Id: I40864faf1ef32c118fb141b7ef8e854498e6b2c4
Reviewed-on: https://go-review.googlesource.com/27159
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-18 16:34:47 +00:00
Keith Randall
c069bc4996 [dev.ssa] cmd/compile: implement GO386=387
Last part of the 386 SSA port.

Modify the x86 backend to simulate SSE registers and
instructions with 387 registers and instructions.
The simulation isn't terribly performant, but it works,
and the old implementation wasn't very performant either.
Leaving to people who care about 387 to optimize if they want.

Turn on SSA backend for 386 by default.

Fixes #16358

Change-Id: I678fb59132620b2c47e993c1c10c4c21135f70c0
Reviewed-on: https://go-review.googlesource.com/25271
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-10 17:41:01 +00:00
Keith Randall
2cbdd55d64 [dev.ssa] cmd/compile: fix PIC for SSA-generated code
Access to globals requires a 2-instruction sequence on PIC 386.

    MOVL foo(SB), AX

is translated by the obj package into:

    CALL getPCofNextInstructionInTempRegister(SB)
    MOVL (&foo-&thisInstruction)(tmpReg), AX

The call returns the PC of the next instruction in a register.
The next instruction then offsets from that register to get the
address required.  The tricky part is the allocation of the
temp register.  The legacy compiler always used CX, and forbid
the register allocator from allocating CX when in PIC mode.
We can't easily do that in SSA because CX is actually a required
register for shift instructions. (I think the old backend got away
with this because the register allocator never uses CX, only
codegen knows that shifts must use CX.)

Instead, we allow the temp register to be anything.  When the
destination of the MOV (or LEA) is an integer register, we can
use that register.  Otherwise, we make sure to compile the
operation using an LEA to reference the global.  So

    MOVL AX, foo(SB)

is never generated directly.  Instead, SSA generates:

    LEAL foo(SB), DX
    MOVL AX, (DX)

which is then rewritten by the obj package to:

    CALL getPcInDX(SB)
    LEAL (&foo-&thisInstruction)(DX), AX
    MOVL AX, (DX)

So this CL modifies the obj package to use different thunks
to materialize the pc into different registers.  We use the
registers that regalloc chose so that SSA can still allocate
the full set of registers.

Change-Id: Ie095644f7164a026c62e95baf9d18a8bcaed0bba
Reviewed-on: https://go-review.googlesource.com/25442
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-09 15:50:07 +00:00
Keith Randall
69a755b602 [dev.ssa] cmd/compile: port SSA backend to amd64p32
It's not a new backend, just a PtrSize==4 modification
of the existing AMD64 backend.

Change-Id: Icc63521a5cf4ebb379f7430ef3f070894c09afda
Reviewed-on: https://go-review.googlesource.com/25586
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-09 15:48:26 +00:00
Cherry Zhang
0484052358 [dev.ssa] cmd/compile: remove flags from regMask
Reg allocator skips flag-typed values. Flag allocator uses the type
and whether the op has "clobberFlags" set.

Tested on AMD64, ARM, ARM64, 386. Passed 'toolstash -cmp' on AMD64.
PPC64 is coded blindly.

Change-Id: Ib1cc27efecef6a1bb27f7d7ed035a582660d244f
Reviewed-on: https://go-review.googlesource.com/25480
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-07 03:08:03 +00:00