646 Commits

Author SHA1 Message Date
Daniel Martí
79dbc1cc7b cmd/compile: replace classnames with Class.String
Since the slice of names is almost exactly the same as what stringer is
already generating for us.

Passes toolstash -cmp on std cmd.

Change-Id: I3f1e95efc690c0108236689e721627f00f79a461
Reviewed-on: https://go-review.googlesource.com/77190
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2017-11-12 13:25:35 +00:00
Daniel Martí
3231d4e4ef cmd/compile: replace opnames with stringer
Now possible, since stringer just got the -trimprefix flag added.

While at it, simplify a few Op stringifications since we can now use %v,
and no longer have to worry about o<len(opnames).

Passes toolstash -cmp on std cmd.

Fixes #15462.

Change-Id: Icdcde0b0a5eb165d18488918175024da274f782b
Reviewed-on: https://go-review.googlesource.com/76790
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Dave Cheney <dave@cheney.net>
2017-11-10 10:56:22 +00:00
Jeff R. Allen
d7ac9bb992 cmd/compile: do not write slices/strings > 2g
The linker will refuse to work on objects larger than
2e9 bytes (see issue #9862 for why).

With this change, the compiler gives a useful error
message explaining this, instead of leaving it to the
linker to give a cryptic message later.

Fixes #1700.

Change-Id: I3933ce08ef846721ece7405bdba81dff644cb004
Reviewed-on: https://go-review.googlesource.com/74330
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-09 18:50:22 +00:00
David Chase
a042221cdb cmd/compile: adjust Pos setting for "empty" blocks
Plain blocks that contain only uninteresting instructions
(that do not have reliable Pos information themselves)
need to have their Pos left unset so that they can
inherit it from their successors.  The "uninteresting"
test was not properly applied and not properly defined.
OpFwdRef does not appear in the ssa.html debugging output,
but at the time of the test these instructions did appear,
and it needs to be part of the test.

Fixes #22365.

Change-Id: I99e5b271acd8f6bcfe0f72395f905c7744ea9a02
Reviewed-on: https://go-review.googlesource.com/74252
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-11-08 22:39:49 +00:00
Austin Clements
3a446d8652 cmd/compile: []T where T is go:notinheap does not need write barriers
Currently, assigning a []T where T is a go:notinheap type generates an
unnecessary write barrier for storing the slice pointer.

This fixes this by teaching HasHeapPointer that this type does not
have a heap pointer, and tweaking the lowering of slice assignments so
the pointer store retains the correct type rather than simply lowering
it to a *uint8 store.

Change-Id: I8bf7c66e64a7fefdd14f2bd0de8a5a3596340bab
Reviewed-on: https://go-review.googlesource.com/76027
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-06 21:07:57 +00:00
David Chase
d58d90152b cmd/compile: adjust locationlist lifetimes
A statement like

  foo = bar + qux

might compile to

  AX := AX + BX

resulting in a regkill for AX before this instruction.
The buggy behavior is to kill AX "at" this instruction,
before it has executed.  (Code generation of no-instruction
values like RegKills applies their effects at the
next actual instruction emitted).

However, bar is still associated with AX until after the
instruction executes, so the effect of the regkill must
occur at the boundary between this instruction and the
next.  Similarly, the new value bound to AX is not visible
until this instruction executes (and in the case of values
that require multiple instructions in code generation, until
all of them have executed).

The ranges are adjusted so that a value's start occurs
at the next following instruction after its evaluation,
and the end occurs after (execution of) the first
instruction following the end of the lifetime as a value.

(Notice the asymmetry; the entire value must be finished
before it is visible, but execution of a single instruction
invalidates.  However, the value *is* visible before that
next instruction executes).

The test was adjusted to make it insensitive to the result
numbering for variables printed by gdb, since that is not
relevant to the test and makes the differences introduced
by small changes larger than necessary/useful.

The test was also improved to present variable probes
more intuitively, and also to allow explicit indication
of "this variable was optimized out"

Change-Id: I39453eead8399e6bb05ebd957289b112d1100c0e
Reviewed-on: https://go-review.googlesource.com/74090
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-05 18:32:53 +00:00
Ilya Tocar
2f1f607b21 cmd/compile: intrinsify math.RoundToEven on amd64
We already do this for floor/ceil, but RoundToEven was added later.
Intrinsify it also.

name           old time/op  new time/op  delta
RoundToEven-8  3.00ns ± 1%  0.68ns ± 2%  -77.34%  (p=0.000 n=10+10)

Change-Id: Ib158cbceb436c6725b2d9353a526c5c4be19bcad
Reviewed-on: https://go-review.googlesource.com/74852
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-11-02 17:33:52 +00:00
Martin Möhrmann
fbfc2031a6 cmd/compile: specialize map creation for small hint sizes
Handle make(map[any]any) and make(map[any]any, hint) where
hint <= BUCKETSIZE special to allow for faster map initialization
and to improve binary size by using runtime calls with fewer arguments.

Given hint is smaller or equal to BUCKETSIZE in which case
overLoadFactor(hint, 0)  is false and no buckets would be allocated by makemap:
* If hmap needs to be allocated on the stack then only hmap's hash0
  field needs to be initialized and no call to makemap is needed.
* If hmap needs to be allocated on the heap then a new special
  makehmap function will allocate hmap and intialize hmap's
  hash0 field.

Reduces size of the godoc by ~36kb.

AMD64
name         old time/op    new time/op    delta
NewEmptyMap    16.6ns ± 2%     5.5ns ± 2%  -66.72%  (p=0.000 n=10+10)
NewSmallMap    64.8ns ± 1%    56.5ns ± 1%  -12.75%  (p=0.000 n=9+10)

Updates #6853

Change-Id: I624e90da6775afaa061178e95db8aca674f44e9b
Reviewed-on: https://go-review.googlesource.com/61190
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-11-02 17:03:45 +00:00
Ilya Tocar
94484d8ed5 cmd/compile: intrinsify math.{Trunc/Ceil/Floor} on amd64
This significantly speed-ups Trunc.
Ceil/Floor are using the same instruction, so do them too.

name     old time/op  new time/op  delta
Floor-6  3.33ns ± 1%  3.22ns ± 0%   -3.39%  (p=0.000 n=10+10)
Ceil-6   3.33ns ± 1%  3.22ns ± 0%   -3.16%  (p=0.000 n=10+7)
Trunc-6  4.83ns ± 0%  3.22ns ± 0%  -33.36%  (p=0.000 n=6+8)

Change-Id: If848790e458eedfe38a6a0407bb4f589c68ac254
Reviewed-on: https://go-review.googlesource.com/68630
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-31 19:30:54 +00:00
Michael Munday
4745604bcb cmd/compile: intrinsify math.RoundToEven on s390x
The new RoundToEven function can be implemented as a single FIDBR
instruction on s390x.

name         old time/op  new time/op  delta
RoundToEven  5.32ns ± 1%  0.86ns ± 1%  -83.86%  (p=0.000 n=10+10)

Change-Id: Iaf597e57a0d1085961701e3c75ff4f6f6dcebb5f
Reviewed-on: https://go-review.googlesource.com/74350
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-31 18:04:27 +00:00
Austin Clements
7e343134d3 cmd/compile: compiler support for buffered write barrier
This CL implements the compiler support for calling the buffered write
barrier added by the previous CL.

Since the buffered write barrier is only implemented on amd64 right
now, this still supports the old, eager write barrier as well. There's
little overhead to supporting both and this way a few tests in
test/fixedbugs that expect to have liveness maps at write barrier
calls can easily opt-in to the old, eager barrier.

This significantly improves the performance of the write barrier:

name             old time/op  new time/op  delta
WriteBarrier-12  73.5ns ±20%  19.2ns ±27%  -73.90%  (p=0.000 n=19+18)

It also reduces the size of binaries because the write barrier call is
more compact:

name        old object-bytes  new object-bytes  delta
Template           398k ± 0%         393k ± 0%  -1.14%  (p=0.008 n=5+5)
Unicode            208k ± 0%         206k ± 0%  -1.00%  (p=0.008 n=5+5)
GoTypes           1.18M ± 0%        1.15M ± 0%  -2.00%  (p=0.008 n=5+5)
Compiler          4.05M ± 0%        3.88M ± 0%  -4.26%  (p=0.008 n=5+5)
SSA               8.25M ± 0%        8.11M ± 0%  -1.59%  (p=0.008 n=5+5)
Flate              228k ± 0%         224k ± 0%  -1.83%  (p=0.008 n=5+5)
GoParser           295k ± 0%         284k ± 0%  -3.62%  (p=0.008 n=5+5)
Reflect           1.00M ± 0%        0.99M ± 0%  -0.70%  (p=0.008 n=5+5)
Tar                339k ± 0%         333k ± 0%  -1.67%  (p=0.008 n=5+5)
XML                404k ± 0%         395k ± 0%  -2.10%  (p=0.008 n=5+5)
[Geo mean]         704k              690k       -2.00%

name        old exe-bytes     new exe-bytes     delta
HelloSize         1.05M ± 0%        1.04M ± 0%  -1.55%  (p=0.008 n=5+5)

https://perf.golang.org/search?q=upload:20171027.1

(Amusingly, this also reduces compiler allocations by 0.75%, which,
combined with the better write barrier, speeds up the compiler overall
by 2.10%. See the perf link.)

It slightly improves the performance of most of the go1 benchmarks and
improves the performance of the x/benchmarks:

name                      old time/op    new time/op    delta
BinaryTree17-12              2.40s ± 1%     2.47s ± 1%  +2.69%  (p=0.000 n=19+19)
Fannkuch11-12                2.95s ± 0%     2.95s ± 0%  +0.21%  (p=0.000 n=20+19)
FmtFprintfEmpty-12          41.8ns ± 4%    41.4ns ± 2%  -1.03%  (p=0.014 n=20+20)
FmtFprintfString-12         68.7ns ± 2%    67.5ns ± 1%  -1.75%  (p=0.000 n=20+17)
FmtFprintfInt-12            79.0ns ± 3%    77.1ns ± 1%  -2.40%  (p=0.000 n=19+17)
FmtFprintfIntInt-12          127ns ± 1%     123ns ± 3%  -3.42%  (p=0.000 n=20+20)
FmtFprintfPrefixedInt-12     152ns ± 1%     150ns ± 1%  -1.02%  (p=0.000 n=18+17)
FmtFprintfFloat-12           211ns ± 1%     209ns ± 0%  -0.99%  (p=0.000 n=20+16)
FmtManyArgs-12               500ns ± 0%     496ns ± 0%  -0.73%  (p=0.000 n=17+20)
GobDecode-12                6.44ms ± 1%    6.53ms ± 0%  +1.28%  (p=0.000 n=20+19)
GobEncode-12                5.46ms ± 0%    5.46ms ± 1%    ~     (p=0.550 n=19+20)
Gzip-12                      220ms ± 1%     216ms ± 0%  -1.75%  (p=0.000 n=19+19)
Gunzip-12                   38.8ms ± 0%    38.6ms ± 0%  -0.30%  (p=0.000 n=18+19)
HTTPClientServer-12         79.0µs ± 1%    78.2µs ± 1%  -1.01%  (p=0.000 n=20+20)
JSONEncode-12               11.9ms ± 0%    11.9ms ± 0%  -0.29%  (p=0.000 n=20+19)
JSONDecode-12               52.6ms ± 0%    52.2ms ± 0%  -0.68%  (p=0.000 n=19+20)
Mandelbrot200-12            3.69ms ± 0%    3.68ms ± 0%  -0.36%  (p=0.000 n=20+20)
GoParse-12                  3.13ms ± 1%    3.18ms ± 1%  +1.67%  (p=0.000 n=19+20)
RegexpMatchEasy0_32-12      73.2ns ± 1%    72.3ns ± 1%  -1.19%  (p=0.000 n=19+18)
RegexpMatchEasy0_1K-12       241ns ± 0%     239ns ± 0%  -0.83%  (p=0.000 n=17+16)
RegexpMatchEasy1_32-12      68.6ns ± 1%    69.0ns ± 1%  +0.47%  (p=0.015 n=18+16)
RegexpMatchEasy1_1K-12       364ns ± 0%     361ns ± 0%  -0.67%  (p=0.000 n=16+17)
RegexpMatchMedium_32-12      104ns ± 1%     103ns ± 1%  -0.79%  (p=0.001 n=20+15)
RegexpMatchMedium_1K-12     33.8µs ± 3%    34.0µs ± 2%    ~     (p=0.267 n=20+19)
RegexpMatchHard_32-12       1.64µs ± 1%    1.62µs ± 2%  -1.25%  (p=0.000 n=19+18)
RegexpMatchHard_1K-12       49.2µs ± 0%    48.7µs ± 1%  -0.93%  (p=0.000 n=19+18)
Revcomp-12                   391ms ± 5%     396ms ± 7%    ~     (p=0.154 n=19+19)
Template-12                 63.1ms ± 0%    59.5ms ± 0%  -5.76%  (p=0.000 n=18+19)
TimeParse-12                 307ns ± 0%     306ns ± 0%  -0.39%  (p=0.000 n=19+17)
TimeFormat-12                325ns ± 0%     323ns ± 0%  -0.50%  (p=0.000 n=19+19)
[Geo mean]                  47.3µs         46.9µs       -0.67%

https://perf.golang.org/search?q=upload:20171026.1

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.25ms ± 1%  2.20ms ± 1%  -2.31%  (p=0.000 n=18+18)
HTTP-12                    12.6µs ± 0%  12.6µs ± 0%  -0.72%  (p=0.000 n=18+17)
JSON-12                    11.0ms ± 0%  11.0ms ± 1%  -0.68%  (p=0.000 n=17+19)

https://perf.golang.org/search?q=upload:20171026.2

Updates #14951.
Updates #22460.

Change-Id: Id4c0932890a1d41020071bec73b8522b1367d3e7
Reviewed-on: https://go-review.googlesource.com/73712
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-30 18:12:46 +00:00
Lynn Boger
4d0151ede5 cmd/compile,cmd/internal/obj/ppc64: make math.Abs,math.Copysign instrinsics on ppc64x
This adds support for math Abs, Copysign to be instrinsics on ppc64x.

New instruction FCPSGN is added to generate fcpsgn. Some new
rules are added to improve the int<->float conversions that are
generated mainly due to the Float64bits and Float64frombits in
the math package. PPC64.rules is also modified as suggested
in the review for CL 63290.

Improvements:
benchmark                           old ns/op     new ns/op     delta
BenchmarkAbs-16                   1.12          0.69          -38.39%
BenchmarkCopysign-16              1.30          0.93          -28.46%
BenchmarkNextafter32-16           9.34          8.05          -13.81%
BenchmarkFrexp-16                 8.81          7.60          -13.73%

Others that used Copysign also saw smaller improvements.

I attempted to make this work using rules since that
seems to be preferred, but due to the use of Float64bits and
Float64frombits in these functions, several rules had to be added and
even then not all cases were matched. Using rules became too
complicated and seemed too fragile for these.

Updates #21390

Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995
Reviewed-on: https://go-review.googlesource.com/67130
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-30 13:56:39 +00:00
Austin Clements
afbe646ab4 cmd/compile: report typedslicecopy write barriers
Most write barrier calls are inserted by SSA, but copy and append are
lowered to runtime.typedslicecopy during walk. Fix these to set
Func.WBPos and emit the "write barrier" warning, as done for the write
barriers inserted by SSA. As part of this, we refactor setting WBPos
and emitting this warning into the frontend so it can be shared by
both walk and SSA.

Change-Id: I5fe9997d9bdb55e03e01dd58aee28908c35f606b
Reviewed-on: https://go-review.googlesource.com/73411
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-10-29 20:21:43 +00:00
Austin Clements
5a4b6bce37 cmd/compile: improve coverage of nowritebarrierrec check
The current go:nowritebarrierrec checker has two problems that limit
its coverage:

1. It doesn't understand that systemstack calls its argument, which
means there are several cases where we fail to detect prohibited write
barriers.

2. It only observes calls in the AST, so calls constructed during
lowering by SSA aren't followed.

This CL completely rewrites this checker to address these issues.

The current checker runs entirely after walk and uses visitBottomUp,
which introduces several problems for checking across systemstack.
First, visitBottomUp itself doesn't understand systemstack calls, so
the callee may be ordered after the caller, causing the checker to
fail to propagate constraints. Second, many systemstack calls are
passed a closure, which is quite difficult to resolve back to the
function definition after transformclosure and walk have run. Third,
visitBottomUp works exclusively on the AST, so it can't observe calls
created by SSA.

To address these problems, this commit splits the check into two
phases and rewrites it to use a call graph generated during SSA
lowering. The first phase runs before transformclosure/walk and simply
records systemstack arguments when they're easy to get. Then, it
modifies genssa to record static call edges at the point where we're
lowering to Progs (which is the latest point at which position
information is conveniently available). Finally, the second phase runs
after all functions have been lowered and uses a direct BFS walk of
the call graph (combining systemstack calls with static calls) to find
prohibited write barriers and construct nice error messages.

Fixes #22384.
For #22460.

Change-Id: I39668f7f2366ab3c1ab1a71eaf25484d25349540
Reviewed-on: https://go-review.googlesource.com/72773
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-10-29 19:36:44 +00:00
Daniel Martí
006bc57095 cmd/compile: clean up various bits of code
* replace a copy of IsMethod with a call of it.
* a few more switches where they simplify the code.
* prefer composite literals over "n := new(...); n.x = y; ...".
* use defers to get rid of three goto labels.
* rewrite updateHasCall into two funcs to remove gotos.

Passes toolstash-check on std cmd.

Change-Id: Icb5442a89a87319ef4b640bbc5faebf41b193ef1
Reviewed-on: https://go-review.googlesource.com/72070
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-10-22 15:50:50 +00:00
Daniel Martí
bb45bc27b5 cmd/compile: make more use of value switches
Use them to replace if/else chains with at least three comparisons,
where the code becomes clearly simpler.

Passes toolstash -cmp on std cmd.

Change-Id: Ic98aa3905944ddcab5aef5f9d9ba376853263d94
Reviewed-on: https://go-review.googlesource.com/70934
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-10-16 19:59:24 +00:00
Matthew Dempsky
53bbddd527 cmd/compile: intrinsify runtime/internal/sys.Ctz{32,64} on ppc64
These functions are identical to math/bits.TrailingZeros{32,64}, which
are already intrinsified on ppc64.

Change-Id: If7ee57e7afe53154874f4b66bacdb6237806128a
Reviewed-on: https://go-review.googlesource.com/70350
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-12 20:01:16 +00:00
David Chase
edcf2d0cd6 cmd/compile: add line numbers to values & blocks in ssa.html
In order to improve the line numbering for debuggers,
it's necessary to trace lines through compilation.
This makes it (much) easier to follow.

The format of the last column of the ssa.html output was
also changed to reduce the spamminess of the file name,
which is usually the same and makes it far harder to read
instructions and line numbers, and to make it wider and also
able to break words when wrapping (long path names still
can push off the end otherwise; side-to-side scrolling was
tried but was more annoying than the occasional wrapped
line).

Sample output now, where [...] is elision for sake of making
the CL character-counter happy -- and the (##) line numbers
are rendered in italics and a smaller font (11 point) under
control of a CSS class "line-number".

genssa
      # /Users/drchase/[...]/ssa/testdata/hist.go
      00000 (35) TEXT	"".main(SB)
      00001 (35) FUNCDATA	$0, gclocals·7be4bb[...]1e8b(SB)
      00002 (35) FUNCDATA	$1, gclocals·9ab98a[...]4568(SB)
v920  00003 (36) LEAQ	""..autotmp_31-640(SP), DI
v858  00004 (36) XORPS	X0, X0
v6    00005 (36) LEAQ	-48(DI), DI
v6    00006 (36) DUFFZERO	$277
v576  00007 (36) LEAQ	""..autotmp_31-640(SP), AX
v10   00008 (36) TESTB	AX, (AX)
b1    00009 (36) JMP	10

and from an earlier phase:

b18: ← b17
v242 (47) = Copy <mem> v238
v243 (47) = VarKill <mem> {.autotmp_16} v242
v244 (48) = Addr <**bufio.Scanner> {scanner} v2
v245 (48) = Load <*bufio.Scanner> v244 v243
[...]
v279 (49) = Store <mem> {int64} v277 v276 v278
v280 (49) = Addr <*error> {.autotmp_18} v2
v281 (49) = Load <error> v280 v279
v282 (49) = Addr <*error> {err} v2
v283 (49) = VarDef <mem> {err} v279
v284 (49) = Store <mem> {error} v282 v281 v283
v285 (47) = VarKill <mem> {.autotmp_18} v284
v286 (47) = VarKill <mem> {.autotmp_17} v285
v287 (50) = Addr <*error> {err} v2
v288 (50) = Load <error> v287 v286
v289 (50) = NeqInter <bool> v288 v51
If v289 → b21 b22 (line 50)

Change-Id: I3f46310918f965761f59e6f03ea53067237c28a8
Reviewed-on: https://go-review.googlesource.com/69591
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-11 21:46:20 +00:00
Keith Randall
e130dcf051 cmd/compile: abort earlier if stack frame too large
If the stack frame is too large, abort immediately.
We used to generate code first, then abort.
In issue 22200, generating code raised a panic
so we got an ICE instead of an error message.

Change the max frame size to 1GB (from 2GB).
Stack frames between 1.1GB and 2GB didn't used to work anyway,
the pcln table generation would have failed and generated an ICE.

Fixes #22200

Change-Id: I1d918ab27ba6ebf5c87ec65d1bccf973f8c8541e
Reviewed-on: https://go-review.googlesource.com/69810
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-11 18:24:13 +00:00
Cherry Zhang
d63de28711 cmd/compile: intrinsify atomics on MIPS64
Change-Id: Ica65b7a52af9558a05d0a0e1dff0f9ec838f4117
Reviewed-on: https://go-review.googlesource.com/68830
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-10 19:43:38 +00:00
Cherry Zhang
6f3e5e637c cmd/compile: intrinsify runtime.getcallersp
Add a compiler intrinsic for getcallersp. So we are able to get
rid of the argument (not done in this CL).

Change-Id: Ic38fda1c694f918328659ab44654198fb116668d
Reviewed-on: https://go-review.googlesource.com/69350
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
2017-10-10 15:15:21 +00:00
David Chase
ca360c3992 cmd/compile: better XPos for rematerialized values and JMPs
This attempts to choose better values for values that are
rematerialized (uses the XPos of the consumer, not the
original) and for unconditional branches (uses the last
assigned XPos in the block).

The JMP branches seem to sometimes end up with a PC in the
destination block, I think because of register movement
or rematerialization that gets placed in predecessor blocks.
This may be acceptable because (eyeball-empirically) that is
often the line number of the target block, so the line number
flow is correct.

Added proper test, that checks both -N -l and regular compilation.
The test is also capable (for gdb, delve soon) of tracking
variable printing based on comments in the source code.

There's substantial room for improvement in debugger behavior.

Updates #21098.

Change-Id: I13abd48a39141583b85576a015f561065819afd0
Reviewed-on: https://go-review.googlesource.com/50610
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-07 22:12:36 +00:00
Matthew Dempsky
31a3b719a0 cmd/compile: cleanup genwrapper slightly
ORETJMP doesn't need an ONAME if we just set the target method on Sym
instead of Left. Conveniently, this is where fmt.go was looking for it
anyway.

Change the iface parameter and global compiling_wrappers to bool.

Passes toolstash-check.

Change-Id: I5333f8bcb4e06bf8161808041125eb95c439aafe
Reviewed-on: https://go-review.googlesource.com/68252
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-10-05 22:37:16 +00:00
Keith Randall
41eabc0fc7 cmd/compile: fix merge rules for panic calls
Use entire inlining call stack to decide whether two panic calls
can be merged. We used to merge panic calls when only the leaf
line numbers matched, but that leads to places higher up the call
stack being merged incorrectly.

Fixes #22083

Change-Id: Ia41400a80de4b6ecf3e5089abce0c42b65e9b38a
Reviewed-on: https://go-review.googlesource.com/67632
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-10-03 09:27:37 +00:00
Heschi Kreinick
6bbe1bc940 cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.

Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.

Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.

There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.

Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-28 20:30:12 +00:00
David Chase
6cac100eef cmd/compile: add intrinsic for reading caller's pc
First step towards removing the mandatory argument for
getcallerpc, which solves certain problems for the runtime.
This might also slightly improve performance.

Intrinsic enabled on 386, amd64, amd64p32,
runtime asm implementation removed on those architectures.

Now-superfluous argument remains in getcallerpc signature
(for a future CL; non-386/amd64 asm funcs ignore it).

Added getcallerpc to the "not a real function" test
in dcl.go, that story is a little odd with respect to
unexported functions but that is not this CL.

Fixes #17327.

Change-Id: I5df1ad91f27ee9ac1f0dd88fa48f1329d6306c3e
Reviewed-on: https://go-review.googlesource.com/31851
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-09-22 18:37:03 +00:00
Michael Munday
7582494e06 cmd/compile: add s390x intrinsics for Ceil, Floor, Round and Trunc
Ceil, Floor and Trunc are pre-existing intrinsics. Round is a new
function and has been added as an intrinsic in this CL. All of the
functions can be implemented as a single 'LOAD FP INTEGER'
instruction, FIDBR, on s390x.

name   old time/op  new time/op  delta
Ceil   2.34ns ± 0%  0.85ns ± 0%  -63.74%  (p=0.000 n=5+4)
Floor  2.33ns ± 0%  0.85ns ± 1%  -63.35%  (p=0.008 n=5+5)
Round  4.23ns ± 0%  0.85ns ± 0%  -79.89%  (p=0.000 n=5+4)
Trunc  2.35ns ± 0%  0.85ns ± 0%  -63.83%  (p=0.029 n=4+4)

Change-Id: Idee7ba24a2899d12bf9afee4eedd6b4aaad3c510
Reviewed-on: https://go-review.googlesource.com/63890
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-09-20 10:01:35 +00:00
Keith Randall
1787ced894 cmd/compile: remove Symbol wrappers from Aux fields
We used to have {Arg,Auto,Extern}Symbol structs with which we wrapped
a *gc.Node or *obj.LSym before storing them in the Aux field
of an ssa.Value.  This let the SSA part of the compiler distinguish
between autos and args, for example.  We no longer need the wrappers
as we can query the underlying objects directly.

There was also some sloppy usage, where VarDef had a *gc.Node
directly in its Aux field, whereas the use of that variable had
that *gc.Node wrapped in an AutoSymbol. Thus the Aux fields didn't
match (using ==) when they probably should.
This sloppy usage cleanup is the only thing in the CL that changes the
generated code - we can get rid of some more unused auto variables if
the matching happens reliably.

Removing this wrapper also lets us get rid of the varsyms cache
(which was used to prevent wrapping the same *gc.Node twice).

Change-Id: I0dedf8f82f84bfee413d310342b777316bd1d478
Reviewed-on: https://go-review.googlesource.com/64452
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-19 22:03:10 +00:00
Daniel Martí
71c9454f99 cmd/compile: remove some redundant types in decls
As per golint's suggestions.

Change-Id: Ie0c6ad9aa5dc69966a279562a341c7b095c47ede
Reviewed-on: https://go-review.googlesource.com/64192
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-17 09:51:38 +00:00
Lynn Boger
fa3fe2e3c6 cmd/compile, math/bits: add rotate rules to PPC64.rules
This adds rules to match the code in math/bits RotateLeft,
RotateLeft32, and RotateLef64 to allow them to be inlined.

The rules are complicated because the code in these function
use different types, and the non-const version of these
shifts generate Mask and Carry instructions that become
subexpressions during the match process.

Also adds a testcase to asm_test.go.

Improvement in math/bits:

BenchmarkRotateLeft-16       1.57     1.32      -15.92%
BenchmarkRotateLeft32-16     1.60     1.37      -14.37%
BenchmarkRotateLeft64-16     1.57     1.32      -15.92%

Updates #21390

Change-Id: Ib6f17669ecc9cab54f18d690be27e2225ca654a4
Reviewed-on: https://go-review.googlesource.com/59932
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-09-11 20:44:22 +00:00
Daniel Martí
99da8730b0 all: remove some double spaces from comments
Went mainly for the ones that make no sense, such as the ones
mid-sentence or after commas.

Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0
Reviewed-on: https://go-review.googlesource.com/57293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-26 15:09:09 +00:00
Michael Munday
744ebfde04 cmd/compile: eliminate stores to unread auto variables
This is a crude compiler pass to eliminate stores to auto variables
that are only ever written to.

Eliminates an unnecessary store to x from the following code:

func f() int {
	var x := 1
	return *(&x)
}

Fixes #19765.

Change-Id: If2c63a8ae67b8c590b6e0cc98a9610939a3eeffa
Reviewed-on: https://go-review.googlesource.com/38746
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-24 16:53:56 +00:00
Keith Randall
bf4d8d3d05 cmd/compile: rename SSA Register.Name to Register.String
Just to get rid of lots of .Name() stutter in printf calls.

Change-Id: I86cf00b3f7b2172387a1c6a7f189c1897fab6300
Reviewed-on: https://go-review.googlesource.com/56630
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-08-17 21:53:08 +00:00
Daniel Martí
3366f51544 cmd/compile: tweaks to unindent some code
Prioritized the chunks of code with 8 or more levels of indentation.
Basically early breaks/returns and joining nested ifs.

Change-Id: I6817df1303226acf2eb904a29f2db720e4f7427a
Reviewed-on: https://go-review.googlesource.com/55630
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-08-17 07:57:19 +00:00
Keith Randall
04d6f982ae runtime: remove link field from itab
We don't use it any more, remove it.

Change-Id: I76ce1a4c2e7048fdd13a37d3718b5abf39ed9d26
Reviewed-on: https://go-review.googlesource.com/44474
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-08-15 01:52:35 +00:00
Cholerae Hu
57bf6aca71 runtime, cmd/compile: add intrinsic getclosureptr
Intrinsic enabled on all architectures,
runtime asm implementation removed on all architectures.

Fixes #21258

Change-Id: I2cb86d460b497c2f287a5b3df5c37fdb231c23a7
Reviewed-on: https://go-review.googlesource.com/53411
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2017-08-11 18:11:22 +00:00
Gerrit Code Review
385cd6681b Merge "Merge remote-tracking branch 'origin/dev.debug' into master" 2017-08-11 17:47:15 +00:00
Lynn Boger
0f19e24da7 cmd/compile: intrinsics for trunc, floor, ceil on ppc64x
This implements trunc, floor, and ceil in the math package
as intrinsics on ppc64x.  Significant improvement mainly due
to avoiding call overhead of args and return value.

BenchmarkCeil-16                    5.95          0.69          -88.40%
BenchmarkFloor-16                   5.95          0.69          -88.40%
BenchmarkTrunc-16                   5.82          0.69          -88.14%

Updates #21390

Change-Id: I951e182694f6e0c431da79c577272b81fb0ebad0
Reviewed-on: https://go-review.googlesource.com/54654
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
2017-08-11 16:35:49 +00:00
Austin Clements
6f6a9398e2 Merge remote-tracking branch 'origin/dev.debug' into master
Change-Id: I85df2745af666b533f4f6f1d06f7c8e137590b5b
2017-08-11 12:17:43 -04:00
Josh Bleecher Snyder
3b87defe4e cmd/compile: unexport gc.Sysfunc
Updates #21352

Change-Id: If21342f30be32e25840b4072b932a6d4257b420d
Reviewed-on: https://go-review.googlesource.com/54091
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Avelino <t@avelino.xxx>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-08-11 00:27:35 +00:00
Josh Bleecher Snyder
2fe53d8d55 cmd/compile: remove gc.Sysfunc calls from 387 backend
gc.Sysfunc must not be called concurrently.
We set up runtime routines used by the backend
prior to doing any backend compilation.
I missed the 387 ones; fix that.

Sysfunc should have been unexported during 1.9.
I will rectify that in a subsequent CL.

Fixes #21352

Change-Id: I8386eaa1e05879c25c672b9c9fc693c938e9aeb6
Reviewed-on: https://go-review.googlesource.com/54090
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Avelino <t@avelino.xxx>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-09 00:19:58 +00:00
Heschi Kreinick
4c54a047c6 [dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.

After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.

There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:

  type myint int
  var a int
  b := myint(int)
and
  b := (*uint64)(unsafe.Pointer(a))

don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.

Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.

A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.

TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.

go build -toolexec 'toolstash -cmp' -a std succeeds.

With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after:  06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean  /tmp/220352263 /tmp/621364410
completed   15 of   15, estimated time remaining 0s (eta 3:52PM)
name        old time/op       new time/op       delta
Template          199ms ± 3%        198ms ± 2%     ~     (p=0.400 n=15+14)
Unicode          96.6ms ± 5%       96.4ms ± 5%     ~     (p=0.838 n=15+15)
GoTypes           653ms ± 2%        647ms ± 2%     ~     (p=0.102 n=15+14)
Flate             133ms ± 6%        129ms ± 3%   -2.62%  (p=0.041 n=15+15)
GoParser          164ms ± 5%        159ms ± 3%   -3.05%  (p=0.000 n=15+15)
Reflect           428ms ± 4%        422ms ± 3%     ~     (p=0.156 n=15+13)
Tar               123ms ±10%        124ms ± 8%     ~     (p=0.461 n=15+15)
XML               228ms ± 3%        224ms ± 3%   -1.57%  (p=0.045 n=15+15)
[Geo mean]        206ms             377ms       +82.86%

name        old user-time/op  new user-time/op  delta
Template          292ms ±10%        301ms ±12%     ~     (p=0.189 n=15+15)
Unicode           166ms ±37%        158ms ±14%     ~     (p=0.418 n=15+14)
GoTypes           962ms ± 6%        963ms ± 7%     ~     (p=0.976 n=15+15)
Flate             207ms ±19%        200ms ±14%     ~     (p=0.345 n=14+15)
GoParser          246ms ±22%        240ms ±15%     ~     (p=0.587 n=15+15)
Reflect           611ms ±13%        587ms ±14%     ~     (p=0.085 n=15+13)
Tar               211ms ±12%        217ms ±14%     ~     (p=0.355 n=14+15)
XML               335ms ±15%        320ms ±18%     ~     (p=0.169 n=15+15)
[Geo mean]        317ms             583ms       +83.72%

name        old alloc/op      new alloc/op      delta
Template         40.2MB ± 0%       40.2MB ± 0%   -0.15%  (p=0.000 n=14+15)
Unicode          29.2MB ± 0%       29.3MB ± 0%     ~     (p=0.624 n=15+15)
GoTypes           114MB ± 0%        114MB ± 0%   -0.15%  (p=0.000 n=15+14)
Flate            25.7MB ± 0%       25.6MB ± 0%   -0.18%  (p=0.000 n=13+15)
GoParser         32.2MB ± 0%       32.2MB ± 0%   -0.14%  (p=0.003 n=15+15)
Reflect          77.8MB ± 0%       77.9MB ± 0%     ~     (p=0.061 n=15+15)
Tar              27.1MB ± 0%       27.0MB ± 0%   -0.11%  (p=0.029 n=15+15)
XML              42.7MB ± 0%       42.5MB ± 0%   -0.29%  (p=0.000 n=15+15)
[Geo mean]       42.1MB            75.0MB       +78.05%

name        old allocs/op     new allocs/op     delta
Template           402k ± 1%         398k ± 0%   -0.91%  (p=0.000 n=15+15)
Unicode            344k ± 1%         344k ± 0%     ~     (p=0.715 n=15+14)
GoTypes           1.18M ± 0%        1.17M ± 0%   -0.91%  (p=0.000 n=15+14)
Flate              243k ± 0%         240k ± 1%   -1.05%  (p=0.000 n=13+15)
GoParser           327k ± 1%         324k ± 1%   -0.96%  (p=0.000 n=15+15)
Reflect            984k ± 1%         982k ± 0%     ~     (p=0.050 n=15+15)
Tar                261k ± 1%         259k ± 1%   -0.77%  (p=0.000 n=15+15)
XML                411k ± 0%         404k ± 1%   -1.55%  (p=0.000 n=15+15)
[Geo mean]         439k              755k       +72.01%

name        old text-bytes    new text-bytes    delta
HelloSize         694kB ± 0%        694kB ± 0%   -0.00%  (p=0.000 n=15+15)

name        old data-bytes    new data-bytes    delta
HelloSize        5.55kB ± 0%       5.55kB ± 0%     ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize         133kB ± 0%        133kB ± 0%     ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.04MB ± 0%       1.04MB ± 0%     ~     (all equal)

Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-27 20:19:44 +00:00
Heschi Kreinick
2d57d94ac3 [dev.debug] cmd/compile: track variable decomposition in LocalSlot
When the compiler decomposes a user variable, track its origin so that
it can be recomposed during DWARF generation.

Change-Id: Ia71c7f8e7f4d65f0652f1c97b0dda5d9cad41936
Reviewed-on: https://go-review.googlesource.com/50878
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-26 18:39:39 +00:00
Heschi Kreinick
c1c08a13e7 [dev.debug] cmd/compile: rename some locals in genssa
When we start tracking the mapping from Value to Prog, valueProgs will
be confusing. Disambiguate.

Change-Id: Ib3b302fedb7eb0ff1bde789d70a11656d82f0897
Reviewed-on: https://go-review.googlesource.com/50876
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-25 19:33:26 +00:00
Josh Bleecher Snyder
7cd6310014 cmd/compile: don't generate liveness maps when the stack is too large
Fixes #20529

Change-Id: I3cb0c037b1737fbc3fa3b1b61ed8a42cfaf8e10d
Reviewed-on: https://go-review.googlesource.com/44344
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-30 22:39:29 +00:00
David Chase
27da3ba5af cmd/compile: don't attach lines to SB, SP, similar constants
Attaching positions to SB, SP, initial mem can result in
less-good line-numbering when compiled for debugging.
This "fix" also removes source position from a zero-valued
struct (but not from its fields) and from a zero-length
array constant.

This may be a general problem for constants in entry blocks.

Fixes #20367.

Change-Id: I7e9df3341be2e2f60f127d35bb31e43cdcfce9a1
Reviewed-on: https://go-review.googlesource.com/43531
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-15 20:39:40 +00:00
Josh Bleecher Snyder
e5bb5e397d cmd/compile: restore panic deduplication
The switch to detailed position information broke
the removal of duplicate panics on the same line.
Restore it.

Neutral compiler performance impact:

name        old alloc/op      new alloc/op      delta
Template         38.8MB ± 0%       38.8MB ± 0%    ~     (p=0.690 n=5+5)
Unicode          28.7MB ± 0%       28.7MB ± 0%  +0.13%  (p=0.032 n=5+5)
GoTypes           109MB ± 0%        109MB ± 0%    ~     (p=1.000 n=5+5)
Compiler          457MB ± 0%        457MB ± 0%    ~     (p=0.151 n=5+5)
SSA              1.09GB ± 0%       1.10GB ± 0%  +0.17%  (p=0.008 n=5+5)
Flate            24.6MB ± 0%       24.5MB ± 0%  -0.35%  (p=0.008 n=5+5)
GoParser         30.9MB ± 0%       31.0MB ± 0%    ~     (p=0.421 n=5+5)
Reflect          73.4MB ± 0%       73.4MB ± 0%    ~     (p=0.056 n=5+5)
Tar              25.6MB ± 0%       25.5MB ± 0%  -0.61%  (p=0.008 n=5+5)
XML              40.9MB ± 0%       40.9MB ± 0%    ~     (p=0.841 n=5+5)
[Geo mean]       71.6MB            71.6MB       -0.07%

name        old allocs/op     new allocs/op     delta
Template           394k ± 0%         395k ± 1%    ~     (p=0.151 n=5+5)
Unicode            343k ± 0%         344k ± 0%  +0.38%  (p=0.032 n=5+5)
GoTypes           1.16M ± 0%        1.16M ± 0%    ~     (p=1.000 n=5+5)
Compiler          4.41M ± 0%        4.42M ± 0%    ~     (p=0.151 n=5+5)
SSA               9.79M ± 0%        9.79M ± 0%    ~     (p=0.690 n=5+5)
Flate              238k ± 1%         238k ± 0%    ~     (p=0.151 n=5+5)
GoParser           321k ± 0%         321k ± 1%    ~     (p=0.548 n=5+5)
Reflect            958k ± 0%         957k ± 0%    ~     (p=0.841 n=5+5)
Tar                252k ± 0%         252k ± 1%    ~     (p=0.151 n=5+5)
XML                401k ± 0%         400k ± 0%    ~     (p=1.000 n=5+5)
[Geo mean]         741k              742k       +0.08%


Reduces object files a little bit:

name        old object-bytes  new object-bytes  delta
Template           386k ± 0%         386k ± 0%  -0.04%  (p=0.008 n=5+5)
Unicode            202k ± 0%         202k ± 0%    ~     (all equal)
GoTypes           1.16M ± 0%        1.16M ± 0%  -0.04%  (p=0.008 n=5+5)
Compiler          3.91M ± 0%        3.91M ± 0%  -0.08%  (p=0.008 n=5+5)
SSA               7.91M ± 0%        7.91M ± 0%  -0.04%  (p=0.008 n=5+5)
Flate              228k ± 0%         227k ± 0%  -0.28%  (p=0.008 n=5+5)
GoParser           283k ± 0%         283k ± 0%  -0.01%  (p=0.008 n=5+5)
Reflect            952k ± 0%         951k ± 0%  -0.03%  (p=0.008 n=5+5)
Tar                188k ± 0%         187k ± 0%  -0.09%  (p=0.008 n=5+5)
XML                406k ± 0%         406k ± 0%  -0.04%  (p=0.008 n=5+5)
[Geo mean]         648k              648k       -0.06%


This was discovered in the context for the Fannkuch benchmark.
It shrinks the number of panicindex calls in that function
from 13 back to 9, their 1.8.1 level.

It shrinks the function text a bit, from 829 to 801 bytes.
It slows down execution a little, presumably due to alignment (?).

name          old time/op  new time/op  delta
Fannkuch11-8   2.68s ± 2%   2.74s ± 1%  +2.09%  (p=0.000 n=19+20)

After this CL, 1.8.1 and tip are identical:

name          old time/op  new time/op  delta
Fannkuch11-8   2.74s ± 2%   2.74s ± 1%   ~     (p=0.301 n=20+20)

Fixes #20332

Change-Id: I2aeacc3e8cf2ac1ff10f36c572a27856f4f8f7c9
Reviewed-on: https://go-review.googlesource.com/43291
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-11 19:34:11 +00:00
David Chase
00263a8968 cmd/compile: reduce debugger-worsening line number churn
Reuse block head or preceding instruction's line number for
register allocator's spill, fill, copy, rematerialization
instructionsl; and also for phi, and for no-src-pos
instructions.  Assembler creates same line number tables
for copy-predecessor-line and for no-src-pos,
but copy-predecessor produces better-looking assembly
language output with -S and with GOSSAFUNC, and does not
require changes to tests of existing assembly language.

Split "copyInto" into two cases, one for register allocation,
one for otherwise.  This caused the test score line change
count to increase by one, which may reflect legitimately
useful information preserved.  Without any special treatment
for copyInto, the change count increases by 21 more, from
51 to 72 (i.e., quite a lot).

There is a test; using two naive "scores" for line number
churn, the old numbering is 2x or 4x worse.

Fixes #18902.

Change-Id: I0a0a69659d30ee4e5d10116a0dd2b8c5df8457b1
Reviewed-on: https://go-review.googlesource.com/36207
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-10 17:16:44 +00:00
Lynn Boger
8304d10763 cmd/compile: ppc64x intrinsics for math/bits
This adds math/bits intrinsics for OnesCount, Len, TrailingZeros on
ppc64x.

benchmark                       old ns/op     new ns/op     delta
BenchmarkLeadingZeros-16        4.26          1.71          -59.86%
BenchmarkLeadingZeros16-16      3.04          1.83          -39.80%
BenchmarkLeadingZeros32-16      3.31          1.82          -45.02%
BenchmarkLeadingZeros64-16      3.69          1.71          -53.66%
BenchmarkTrailingZeros-16       2.55          1.62          -36.47%
BenchmarkTrailingZeros32-16     2.55          1.77          -30.59%
BenchmarkTrailingZeros64-16     2.78          1.62          -41.73%
BenchmarkOnesCount-16           3.19          0.93          -70.85%
BenchmarkOnesCount32-16         2.55          1.18          -53.73%
BenchmarkOnesCount64-16         3.22          0.93          -71.12%

Update #18616

I also made a change to bits_test.go because when debugging some failures
the output was not quite providing the right argument information.

Change-Id: Ia58d31d1777cf4582a4505f85b11a1202ca07d3e
Reviewed-on: https://go-review.googlesource.com/41630
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-10 12:10:56 +00:00
Josh Bleecher Snyder
46b88c9fbc cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.

In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.

Though this is a big CL, most of the changes are
mechanical and uninteresting.

Interesting bits:

* Add new singleton globals to package types for the special
  SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
  and TTUPLE, for SSA tuple types.
  ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
  to package types.
* We had picked the name "types" in our rules for the handy
  list of types provided by ssa.Config. That conflicted with
  the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
  types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
  and probably also some other duplicated Type methods
  designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
  and they were not particularly careful about types in general.
  Of necessity, this CL switches them to use *types.Type;
  it does not make them more type-accurate.
  Unfortunately, using types.Type means initializing a bit
  of the types universe.
  This is prime for refactoring and improvement.

This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.

name        old alloc/op      new alloc/op      delta
Template         37.9MB ± 0%       37.7MB ± 0%  -0.57%  (p=0.000 n=10+8)
Unicode          28.9MB ± 0%       28.7MB ± 0%  -0.52%  (p=0.000 n=10+10)
GoTypes           110MB ± 0%        109MB ± 0%  -0.88%  (p=0.000 n=10+10)
Flate            24.7MB ± 0%       24.6MB ± 0%  -0.66%  (p=0.000 n=10+10)
GoParser         31.1MB ± 0%       30.9MB ± 0%  -0.61%  (p=0.000 n=10+9)
Reflect          73.9MB ± 0%       73.4MB ± 0%  -0.62%  (p=0.000 n=10+8)
Tar              25.8MB ± 0%       25.6MB ± 0%  -0.77%  (p=0.000 n=9+10)
XML              41.2MB ± 0%       40.9MB ± 0%  -0.80%  (p=0.000 n=10+10)
[Geo mean]       40.5MB            40.3MB       -0.68%

name        old allocs/op     new allocs/op     delta
Template           385k ± 0%         386k ± 0%    ~     (p=0.356 n=10+9)
Unicode            343k ± 1%         344k ± 0%    ~     (p=0.481 n=10+10)
GoTypes           1.16M ± 0%        1.16M ± 0%  -0.16%  (p=0.004 n=10+10)
Flate              238k ± 1%         238k ± 1%    ~     (p=0.853 n=10+10)
GoParser           320k ± 0%         320k ± 0%    ~     (p=0.720 n=10+9)
Reflect            957k ± 0%         957k ± 0%    ~     (p=0.460 n=10+8)
Tar                252k ± 0%         252k ± 0%    ~     (p=0.133 n=9+10)
XML                400k ± 0%         400k ± 0%    ~     (p=0.796 n=10+10)
[Geo mean]         428k              428k       -0.01%


Removing all the interface calls helps non-trivially with CPU, though.

name        old time/op       new time/op       delta
Template          178ms ± 4%        173ms ± 3%  -2.90%  (p=0.000 n=94+96)
Unicode          85.0ms ± 4%       83.9ms ± 4%  -1.23%  (p=0.000 n=96+96)
GoTypes           543ms ± 3%        528ms ± 3%  -2.73%  (p=0.000 n=98+96)
Flate             116ms ± 3%        113ms ± 4%  -2.34%  (p=0.000 n=96+99)
GoParser          144ms ± 3%        140ms ± 4%  -2.80%  (p=0.000 n=99+97)
Reflect           344ms ± 3%        334ms ± 4%  -3.02%  (p=0.000 n=100+99)
Tar               106ms ± 5%        103ms ± 4%  -3.30%  (p=0.000 n=98+94)
XML               198ms ± 5%        192ms ± 4%  -2.88%  (p=0.000 n=92+95)
[Geo mean]        178ms             173ms       -2.65%

name        old user-time/op  new user-time/op  delta
Template          229ms ± 5%        224ms ± 5%  -2.36%  (p=0.000 n=95+99)
Unicode           107ms ± 6%        106ms ± 5%  -1.13%  (p=0.001 n=93+95)
GoTypes           696ms ± 4%        679ms ± 4%  -2.45%  (p=0.000 n=97+99)
Flate             137ms ± 4%        134ms ± 5%  -2.66%  (p=0.000 n=99+96)
GoParser          176ms ± 5%        172ms ± 8%  -2.27%  (p=0.000 n=98+100)
Reflect           430ms ± 6%        411ms ± 5%  -4.46%  (p=0.000 n=100+92)
Tar               128ms ±13%        123ms ±13%  -4.21%  (p=0.000 n=100+100)
XML               239ms ± 6%        233ms ± 6%  -2.50%  (p=0.000 n=95+97)
[Geo mean]        220ms             213ms       -2.76%


Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-09 23:01:51 +00:00