779 Commits

Author SHA1 Message Date
Austin Clements
4a4e05b0b1 cmd/compile,runtime/internal/atomic: add Load8
Change-Id: Id52a5730cf9207ee7ccebac4ef12791dc5720e7c
Reviewed-on: https://go-review.googlesource.com/c/go/+/172283
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-05-03 19:25:37 +00:00
Michael Munday
2c1b5130aa cmd/compile: add math/bits.{Add,Sub}64 intrinsics on s390x
This CL adds intrinsics for the 64-bit addition and subtraction
functions in math/bits. These intrinsics use the condition code
to propagate the carry or borrow bit.

To make the carry chains more efficient I've removed the
'clobberFlags' property from most of the load and store
operations. Originally these ops did clobber flags when using
offsets that didn't fit in a signed 20-bit integer, however
that is no longer true.

As with other platforms the intrinsics are faster when executed
in a chain rather than a loop because currently we need to spill
and restore the carry bit between each loop iteration. We may
be able to reduce the need to do this on s390x (e.g. by using
compare-and-branch instructions that do not clobber flags) in the
future.

name           old time/op  new time/op  delta
Add64          1.21ns ± 2%  2.03ns ± 2%  +67.18%  (p=0.000 n=7+10)
Add64multiple  2.98ns ± 3%  1.03ns ± 0%  -65.39%  (p=0.000 n=10+9)
Sub64          1.23ns ± 4%  2.03ns ± 1%  +64.85%  (p=0.000 n=10+10)
Sub64multiple  3.73ns ± 4%  1.04ns ± 1%  -72.28%  (p=0.000 n=10+8)

Change-Id: I913bbd5e19e6b95bef52f5bc4f14d6fe40119083
Reviewed-on: https://go-review.googlesource.com/c/go/+/174303
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-05-03 10:41:15 +00:00
Keith Randall
e7d08b6fe6 cmd/compile: fix line numbers for index panics
In the statement x = a[i], the index panic should appear to come from
the line number of the '['. Previous to this CL we sometimes used the
line number of the '=' instead.

Fixes #29504

Change-Id: Ie718fd303c1ac2aee33e88d52c9ba9bcf220dea1
Reviewed-on: https://go-review.googlesource.com/c/go/+/174617
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-04-30 21:30:30 +00:00
Carlos Eduardo Seo
50ad09418e cmd/compile: intrinsify math/bits.Add64 for ppc64x
This change creates an intrinsic for Add64 for ppc64x and adds a
testcase for it.

name               old time/op  new time/op  delta
Add64-160          1.90ns ±40%  2.29ns ± 0%     ~     (p=0.119 n=5+5)
Add64multiple-160  6.69ns ± 2%  2.45ns ± 4%  -63.47%  (p=0.016 n=4+5)

Change-Id: I9abe6fb023fdf62eea3c9b46a1820f60bb0a7f97
Reviewed-on: https://go-review.googlesource.com/c/go/+/173758
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2019-04-28 23:51:04 +00:00
Keith Randall
fd788a86b6 cmd/compile: always mark atColumn1 results as statements
In 31618, we end up comparing the is-stmt-ness of positions
to repurpose real instructions as inline marks. If the is-stmt-ness
doesn't match, we end up not being able to remove the inline mark.

Always use statement-full positions to do the matching, so we
always find a match if there is one.

Also always use positions that are statements for inline marks.

Fixes #31618

Change-Id: Idaf39bdb32fa45238d5cd52973cadf4504f947d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/173324
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-04-23 17:39:11 +00:00
erifan01
f8f265b9cf cmd/compile: intrinsify math/bits.Sub64 for arm64
This CL instrinsifies Sub64 with arm64 instruction sequence NEGS, SBCS,
NGC and NEG, and optimzes the case of borrowing chains.

Benchmarks:
name              old time/op       new time/op       delta
Sub-64            2.500000ns +- 0%  2.048000ns +- 1%  -18.08%  (p=0.000 n=10+10)
Sub32-64          2.500000ns +- 0%  2.500000ns +- 0%     ~     (all equal)
Sub64-64          2.500000ns +- 0%  2.080000ns +- 0%  -16.80%  (p=0.000 n=10+7)
Sub64multiple-64  7.090000ns +- 0%  2.090000ns +- 0%  -70.52%  (p=0.000 n=10+10)

Change-Id: I3d2664e009a9635e13b55d2c4567c7b34c2c0655
Reviewed-on: https://go-review.googlesource.com/c/go/+/159018
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-04-22 14:40:20 +00:00
Josh Bleecher Snyder
5781df421e all: s/cancelation/cancellation/
Though there is variation in the spelling of canceled,
cancellation is always spelled with a double l.

Reference: https://www.grammarly.com/blog/canceled-vs-cancelled/

Change-Id: I240f1a297776c8e27e74f3eca566d2bc4c856f2f
Reviewed-on: https://go-review.googlesource.com/c/go/+/170060
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-16 20:27:15 +00:00
Josh Bleecher Snyder
336f951b07 cmd/compile: add ORESULT, remove OINDREGSP
This change is mostly cosmetic.

OINDREGSP was used only for reading the results of a function call.
In recognition of that fact, rename it to ORESULT.
Along the way, trim down our handling of it to the bare minimum,
and rely on the increased clarity of ORESULT to inline nodarg.

Passes toolstash-check.

Change-Id: I25b177df4ea54a8e94b1698d044c297b7e453c64
Reviewed-on: https://go-review.googlesource.com/c/go/+/170705
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-04-08 21:33:15 +00:00
Keith Randall
ad6c691542 cmd/compile: remove AUNDEF opcode
This opcode was only used to mark unreachable code for plive to use.
plive now uses the SSA representation, so it knows locations are
unreachable because they are ends of Exit blocks. It doesn't need
these opcodes any more.

These opcodes actually used space in the binary, 2 bytes per undef
on x86 and more for other archs.

Makes the amd64 go binary 0.2% smaller.

Change-Id: I64c84c35db7c7949617a3a5830f09c8e5fcd2620
Reviewed-on: https://go-review.googlesource.com/c/go/+/171058
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-07 01:15:28 +00:00
Michael Munday
726a9398f7 cmd/compile/internal/gc: minor cleanup of slicing
Tidy the code up a little bit to move variable definitions closer
to uses, prefer early return to else branches and some other minor
tweaks.

I'd like to make some more changes to this code in the near future
and this CL should make those changes cleaner.

Change-Id: Ie7d7f2e4bb1e670347941e255c9cdc1703282db5
Reviewed-on: https://go-review.googlesource.com/c/go/+/170120
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2019-04-01 21:16:31 +00:00
Josh Bleecher Snyder
5ee1d5d39f cmd/compile: minor cleanup
Use constants that are easier to read.

Change-Id: I11fd6363b3bd283a4cc7c9908c2327123c64dcf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/169723
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-28 23:16:26 +00:00
David Chase
591193b01f cmd/compile: enhance debug_test for infinite loops
ssa/debug_test.go already had a step limit; this exposes
it to individual tests, and it is then set low for the
infinite loop tests.

That however is not enough; in an infinite loop debuggers
see an unchanging line number, and therefore keep trying
until they see a different one.  To do this, the concept
of a "bogus" line number is introduced, and on output
single-instruction infinite loops are detected and a
hardware nop with correct line number is inserted into
the loop; the branch itself receives a bogus line number.

This breaks up the endless stream of same line number and
causes both gdb and delve to not hang; Delve complains
about the incorrect line number while gdb does
a sort of odd step-to-nowhere that then steps back
to the loop.  Since repeats are suppressed in the reference
file, a single line is shown there.

(The wrong line number mentioned in previous message
was an artifact of debug_test.go, not Delve, and is now
fixed.)

The bogus line number exposed in Delve is less than
wonderful, but compared to hanging, it is better.

Fixes #30664.

Change-Id: I30c927cf8869a84c6c9b84033ee44d7044aab552
Reviewed-on: https://go-review.googlesource.com/c/go/+/168477
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-03-27 21:04:34 +00:00
Keith Randall
2034fbab5b cmd/compile: use existing instructions instead of nops for inline marks
Instead of always inserting a nop to use as the target of an inline
mark, see if we can instead find an instruction we're issuing anyway
with the correct line number, and use that instruction. That way, we
don't need to issue a nop.

Makes cmd/go 0.3% smaller.

Update #29571

Change-Id: If6cfc93ab3352ec2c6e0878f8074a3bf0786b2f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/158021
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2019-03-25 16:49:29 +00:00
Josh Bleecher Snyder
23b476a3c8 cmd/compile: port callnew to ssa conversion
This is part of a general effort to shrink walk.
In an ideal world, we'd have an SSA op for allocation,
but we don't yet have a good mechanism for introducing
function calling during SSA compilation.
In the meantime, SSA conversion is a better place for it.

This also makes it easier to introduce new optimizations;
instead of doing the typecheck walk dance,
we can simply write what we want the backend to do.

I introduced a new opcode in this change because:

(a) It avoids a class of bugs involving correctly detecting
    whether this ONEW is a "before walk" ONEW or an "after walk" ONEW.
    It also means that using ONEW or ONEWOBJ in the wrong context
    will generally result in a faster failure.
(b) Opcodes are cheap.
(c) It provides a better place to put documentation.

This change also is also marginally more performant:

name        old alloc/op      new alloc/op      delta
Template         39.1MB ± 0%       39.0MB ± 0%  -0.14%  (p=0.008 n=5+5)
Unicode          28.4MB ± 0%       28.4MB ± 0%    ~     (p=0.421 n=5+5)
GoTypes           132MB ± 0%        132MB ± 0%  -0.23%  (p=0.008 n=5+5)
Compiler          608MB ± 0%        607MB ± 0%  -0.25%  (p=0.008 n=5+5)
SSA              2.04GB ± 0%       2.04GB ± 0%  -0.01%  (p=0.008 n=5+5)
Flate            24.4MB ± 0%       24.3MB ± 0%  -0.13%  (p=0.008 n=5+5)
GoParser         29.3MB ± 0%       29.1MB ± 0%  -0.54%  (p=0.008 n=5+5)
Reflect          84.8MB ± 0%       84.7MB ± 0%  -0.21%  (p=0.008 n=5+5)
Tar              36.7MB ± 0%       36.6MB ± 0%  -0.10%  (p=0.008 n=5+5)
XML              48.7MB ± 0%       48.6MB ± 0%  -0.24%  (p=0.008 n=5+5)
[Geo mean]       85.0MB            84.8MB       -0.19%

name        old allocs/op     new allocs/op     delta
Template           383k ± 0%         382k ± 0%  -0.26%  (p=0.008 n=5+5)
Unicode            341k ± 0%         341k ± 0%    ~     (p=0.579 n=5+5)
GoTypes           1.37M ± 0%        1.36M ± 0%  -0.39%  (p=0.008 n=5+5)
Compiler          5.59M ± 0%        5.56M ± 0%  -0.49%  (p=0.008 n=5+5)
SSA               16.9M ± 0%        16.9M ± 0%  -0.03%  (p=0.008 n=5+5)
Flate              238k ± 0%         238k ± 0%  -0.23%  (p=0.008 n=5+5)
GoParser           306k ± 0%         303k ± 0%  -0.93%  (p=0.008 n=5+5)
Reflect            990k ± 0%         987k ± 0%  -0.33%  (p=0.008 n=5+5)
Tar                356k ± 0%         355k ± 0%  -0.20%  (p=0.008 n=5+5)
XML                444k ± 0%         442k ± 0%  -0.45%  (p=0.008 n=5+5)
[Geo mean]         848k              845k       -0.33%

Change-Id: I2c36003a7cbf71b53857b7de734852b698f49310
Reviewed-on: https://go-review.googlesource.com/c/go/+/167957
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-03-20 19:38:50 +00:00
erifan01
5714c91b53 cmd/compile: intrinsify math/bits.Add64 for arm64
This CL instrinsifies Add64 with arm64 instruction sequence ADDS, ADCS
and ADC, and optimzes the case of carry chains.The CL also changes the
test code so that the intrinsic implementation can be tested.

Benchmarks:
name               old time/op       new time/op       delta
Add-224            2.500000ns +- 0%  2.090000ns +- 4%  -16.40%  (p=0.000 n=9+10)
Add32-224          2.500000ns +- 0%  2.500000ns +- 0%     ~     (all equal)
Add64-224          2.500000ns +- 0%  1.577778ns +- 2%  -36.89%  (p=0.000 n=10+9)
Add64multiple-224  6.000000ns +- 0%  2.000000ns +- 0%  -66.67%  (p=0.000 n=10+10)

Change-Id: I6ee91c9a85c16cc72ade5fd94868c579f16c7615
Reviewed-on: https://go-review.googlesource.com/c/go/+/159017
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-20 05:39:49 +00:00
Keith Randall
2c423f063b cmd/compile,runtime: provide index information on bounds check failure
A few examples (for accessing a slice of length 3):

   s[-1]    runtime error: index out of range [-1]
   s[3]     runtime error: index out of range [3] with length 3
   s[-1:0]  runtime error: slice bounds out of range [-1:]
   s[3:0]   runtime error: slice bounds out of range [3:0]
   s[3:-1]  runtime error: slice bounds out of range [:-1]
   s[3:4]   runtime error: slice bounds out of range [:4] with capacity 3
   s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3

Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.

An exhaustive set of examples is in issue30116[u].out in the CL.

The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.

Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.

Fixes #30116

Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-03-18 17:33:38 +00:00
Tobias Klauser
156c830bea cmd/compile: eliminate unnecessary type conversions in TrailingZeros(16|8) for arm
This follows CL 156999 which did the same for arm64.

name               old time/op  new time/op  delta
TrailingZeros-4    7.30ns ± 1%  7.30ns ± 0%     ~     (p=0.413 n=9+9)
TrailingZeros8-4   8.32ns ± 0%  7.17ns ± 0%  -13.77%  (p=0.000 n=10+9)
TrailingZeros16-4  8.30ns ± 0%  7.18ns ± 0%  -13.50%  (p=0.000 n=9+10)
TrailingZeros32-4  6.46ns ± 1%  6.47ns ± 1%     ~     (p=0.325 n=10+10)
TrailingZeros64-4  16.3ns ± 0%  16.2ns ± 0%   -0.61%  (p=0.000 n=7+10)

Change-Id: I7e9e1abf7e30d811aa474d272b2824ec7cbbaa98
Reviewed-on: https://go-review.googlesource.com/c/go/+/167797
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-15 18:37:22 +00:00
Richard Musiol
5ee1b84959 math, math/bits: add intrinsics for wasm
This commit adds compiler intrinsics for the packages math and
math/bits on the wasm architecture for better performance.

benchmark                        old ns/op     new ns/op     delta
BenchmarkCeil                    8.31          3.21          -61.37%
BenchmarkCopysign                5.24          3.88          -25.95%
BenchmarkAbs                     5.42          3.34          -38.38%
BenchmarkFloor                   8.29          3.18          -61.64%
BenchmarkRoundToEven             9.76          3.26          -66.60%
BenchmarkSqrtLatency             8.13          4.88          -39.98%
BenchmarkSqrtPrime               5246          3535          -32.62%
BenchmarkTrunc                   8.29          3.15          -62.00%
BenchmarkLeadingZeros            13.0          4.23          -67.46%
BenchmarkLeadingZeros8           4.65          4.42          -4.95%
BenchmarkLeadingZeros16          7.60          4.38          -42.37%
BenchmarkLeadingZeros32          10.7          4.48          -58.13%
BenchmarkLeadingZeros64          12.9          4.31          -66.59%
BenchmarkTrailingZeros           6.52          4.04          -38.04%
BenchmarkTrailingZeros8          4.57          4.14          -9.41%
BenchmarkTrailingZeros16         6.69          4.16          -37.82%
BenchmarkTrailingZeros32         6.97          4.23          -39.31%
BenchmarkTrailingZeros64         6.59          4.00          -39.30%
BenchmarkOnesCount               7.93          3.30          -58.39%
BenchmarkOnesCount8              3.56          3.19          -10.39%
BenchmarkOnesCount16             4.85          3.19          -34.23%
BenchmarkOnesCount32             7.27          3.19          -56.12%
BenchmarkOnesCount64             8.08          3.28          -59.41%
BenchmarkRotateLeft              4.88          3.80          -22.13%
BenchmarkRotateLeft64            5.03          3.63          -27.83%

Change-Id: Ic1e0c2984878be8defb6eb7eb6ee63765c793222
Reviewed-on: https://go-review.googlesource.com/c/go/+/165177
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-14 19:46:19 +00:00
Carlos Eduardo Seo
f2a18b1456 cmd/compile: make math/bits.RotateLeft{32,64} intrinsics on ppc64x
Extends CL 132435 to ppc64x. ppc64x has 32- and 64-bit variable
rotate left instructions.

name             old time/op  new time/op  delta
RotateLeft32-16  1.39ns ± 0%  1.37ns ± 0%  -1.44%  (p=0.008 n=5+5)
RotateLeft64-16  1.35ns ± 0%  1.32ns ± 0%  -2.22%  (p=0.008 n=5+5)

Updates #17566

Change-Id: I567f634ff90d0691db45df0a25c99fcdfe10ca00
Reviewed-on: https://go-review.googlesource.com/c/go/+/163760
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2019-03-14 14:13:34 +00:00
David Chase
82af9e6749 cmd/compile: move statement marks from jumps to targets
When a jump at the end of a block is about to be marked as
a statement, if the first real instruction in the target
block is also a statement for the same line, remove the
mark from the jump.

This is a first effort at a minimal-harm heuristic.
A better heuristic might skip over any "not-statement"
values preceding a definitely marked value.

Fixes #29443.

Change-Id: Ibd52783713b4936e0c2dfda8d708bf186f33b00a
Reviewed-on: https://go-review.googlesource.com/c/go/+/159977
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-13 14:40:36 +00:00
Keith Randall
83a33d3855 cmd/compile: reverse order of slice bounds checks
Turns out this makes the fix for 28797 unnecessary, because this order
ensures that the RHS of IsSliceInBounds ops are always nonnegative.

The real reason for this change is that it also makes dealing with
<0 values easier for reporting values in bounds check panics (issue #30116).

Makes cmd/go negligibly smaller.

Update #28797

Change-Id: I1f25ba6d2b3b3d4a72df3105828aa0a4b629ce85
Reviewed-on: https://go-review.googlesource.com/c/go/+/166377
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-09 00:52:45 +00:00
Peter Waller
7afd58d458 cmd/compile/internal/ssa: set OFOR bBody.Pos to AST Pos
Assign SSA OFOR's bBody.Pos to AST (*Node).Pos as it is created.

An empty for loop has no other information which may be used to give
correct position information in the resulting executable. Such a for
loop may compile to a single `JMP *self` and it is important that the
location of this is in the right place.

Fixes #30167.

Change-Id: Iec44f0281c462c33fac6b7b8ccfc2ef37434c247
Reviewed-on: https://go-review.googlesource.com/c/go/+/163019
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-03-07 22:27:52 +00:00
erifan01
4e2b0dda8c cmd/compile: eliminate unnecessary type conversions in TrailingZeros(16|8) for arm64
This CL eliminates unnecessary type conversion operations: OpZeroExt16to64 and OpZeroExt8to64.
If the input argrument is a nonzero value, then ORconst operation can also be eliminated.

Benchmarks:

name               old time/op  new time/op  delta
TrailingZeros-8    2.75ns ± 0%  2.75ns ± 0%     ~     (all equal)
TrailingZeros8-8   3.49ns ± 1%  2.93ns ± 0%  -16.00%  (p=0.000 n=10+10)
TrailingZeros16-8  3.49ns ± 1%  2.93ns ± 0%  -16.05%  (p=0.000 n=9+10)
TrailingZeros32-8  2.67ns ± 1%  2.68ns ± 1%     ~     (p=0.468 n=10+10)
TrailingZeros64-8  2.67ns ± 1%  2.65ns ± 0%   -0.62%  (p=0.022 n=10+9)

code:

func f16(x uint) { z = bits.TrailingZeros16(uint16(x)) }

Before:

"".f16 STEXT size=48 args=0x8 locals=0x0 leaf
        0x0000 00000 (test.go:7)        TEXT    "".f16(SB), LEAF|NOFRAME|ABIInternal, $0-8
        0x0000 00000 (test.go:7)        FUNCDATA        ZR, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (test.go:7)        FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (test.go:7)        FUNCDATA        $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (test.go:7)        PCDATA  $2, ZR
        0x0000 00000 (test.go:7)        PCDATA  ZR, ZR
        0x0000 00000 (test.go:7)        MOVD    "".x(FP), R0
        0x0004 00004 (test.go:7)        MOVHU   R0, R0
        0x0008 00008 (test.go:7)        ORR     $65536, R0, R0
        0x000c 00012 (test.go:7)        RBIT    R0, R0
        0x0010 00016 (test.go:7)        CLZ     R0, R0
        0x0014 00020 (test.go:7)        MOVD    R0, "".z(SB)
        0x0020 00032 (test.go:7)        RET     (R30)

This line of code is unnecessary:
        0x0004 00004 (test.go:7)        MOVHU   R0, R0

After:

"".f16 STEXT size=32 args=0x8 locals=0x0 leaf
        0x0000 00000 (test.go:7)        TEXT    "".f16(SB), LEAF|NOFRAME|ABIInternal, $0-8
        0x0000 00000 (test.go:7)        FUNCDATA        ZR, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (test.go:7)        FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (test.go:7)        FUNCDATA        $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (test.go:7)        PCDATA  $2, ZR
        0x0000 00000 (test.go:7)        PCDATA  ZR, ZR
        0x0000 00000 (test.go:7)        MOVD    "".x(FP), R0
        0x0004 00004 (test.go:7)        ORR     $65536, R0, R0
        0x0008 00008 (test.go:7)        RBITW   R0, R0
        0x000c 00012 (test.go:7)        CLZW    R0, R0
        0x0010 00016 (test.go:7)        MOVD    R0, "".z(SB)
        0x001c 00028 (test.go:7)        RET     (R30)

The situation of TrailingZeros8 is similar to TrailingZeros16.

Change-Id: I473bdca06be8460a0be87abbae6fe640017e4c9d
Reviewed-on: https://go-review.googlesource.com/c/go/+/156999
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-07 14:24:56 +00:00
Keith Randall
4a9064ef41 cmd/compile: fix ordering for short-circuiting ops
Make sure the side effects inside short-circuited operations (&& and ||)
happen correctly.

Before this CL, we attached the side effects to the node itself using
exprInPlace. That caused other side effects in sibling expressions
to get reordered with respect to the short circuit side effect.

Instead, rewrite a && b like:

r := a
if r {
  r = b
}

That code we can keep correctly ordered with respect to other
side-effects extracted from part of a big expression.

exprInPlace seems generally unsafe. But this was the only case where
exprInPlace is called not at the top level of an expression, so I
don't think the other uses can actually trigger an issue (there can't
be a sibling expression). TODO: maybe those cases don't need "in
place", and we can retire that function generally.

This CL needed a small tweak to the SSA generation of OIF so that the
short circuit optimization still triggers. The short circuit optimization
looks for triangle but not diamonds, so don't bother allocating a block
if it will be empty.

Go 1 benchmarks are in the noise.

Fixes #30566

Change-Id: I19c04296bea63cbd6ad05f87a63b005029123610
Reviewed-on: https://go-review.googlesource.com/c/go/+/165617
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-03-06 20:04:07 +00:00
erifan01
dd91269b7c cmd/compile: optimize math/bits Len32 intrinsic on arm64
Arm64 has a 32-bit CLZ instruction CLZW, which can be used for intrinsic Len32.
Function LeadingZeros32 calls Len32, with this change, the assembly code of
LeadingZeros32 becomes more concise.

Go code:

func f32(x uint32) { z = bits.LeadingZeros32(x) }

Before:

"".f32 STEXT size=32 args=0x8 locals=0x0 leaf
        0x0000 00000 (test.go:7)        TEXT    "".f32(SB), LEAF|NOFRAME|ABIInternal, $0-8
        0x0004 00004 (test.go:7)        MOVWU   "".x(FP), R0
        0x0008 00008 ($GOROOT/src/math/bits/bits.go:30) CLZ     R0, R0
        0x000c 00012 ($GOROOT/src/math/bits/bits.go:30) SUB     $32, R0, R0
        0x0010 00016 (test.go:7)        MOVD    R0, "".z(SB)
        0x001c 00028 (test.go:7)        RET     (R30)

After:

"".f32 STEXT size=32 args=0x8 locals=0x0 leaf
        0x0000 00000 (test.go:7)        TEXT    "".f32(SB), LEAF|NOFRAME|ABIInternal, $0-8
        0x0004 00004 (test.go:7)        MOVWU   "".x(FP), R0
        0x0008 00008 ($GOROOT/src/math/bits/bits.go:30) CLZW    R0, R0
        0x000c 00012 (test.go:7)        MOVD    R0, "".z(SB)
        0x0018 00024 (test.go:7)        RET     (R30)

Benchmarks:
name              old time/op  new time/op  delta
LeadingZeros-8    2.53ns ± 0%  2.55ns ± 0%   +0.67%  (p=0.000 n=10+10)
LeadingZeros8-8   3.56ns ± 0%  3.56ns ± 0%     ~     (all equal)
LeadingZeros16-8  3.55ns ± 0%  3.56ns ± 0%     ~     (p=0.465 n=10+10)
LeadingZeros32-8  3.55ns ± 0%  2.96ns ± 0%  -16.71%  (p=0.000 n=10+7)
LeadingZeros64-8  2.53ns ± 0%  2.54ns ± 0%     ~     (p=0.059 n=8+10)

Change-Id: Ie5666bb82909e341060e02ffd4e86c0e5d67e90a
Reviewed-on: https://go-review.googlesource.com/c/157000
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-02-27 16:09:33 +00:00
Keith Randall
f495f549ac cmd/compile: don't bother compiling functions named "_"
They can't be used, so we don't need code generated for them. We just
need to report errors in their bodies.

The compiler currently has a bunch of special cases sprinkled about
for "_" functions, because we never generate a linker symbol for them.
Instead, abort compilation earlier so we never reach any of that
special-case code.

Fixes #29870

Change-Id: I3530c9c353deabcf75ce9072c0b740e992349ee5
Reviewed-on: https://go-review.googlesource.com/c/158845
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2019-02-26 20:56:24 +00:00
Lynn Boger
2d3474043c cmd/compile: call ginsnop, not ginsnop2 on ppc64le for mid-stack inlining tracebacks
A recent change to fix stacktraces for inlined functions
introduced a regression on ppc64le when compiling position
independent code. That happened because ginsnop2 was called for
the purpose of inserting a NOP to identify the location of
the inlined function, when ginsnop should have been used.
ginsnop2 is intended to be used before deferreturn to ensure
r2 is properly restored when compiling position independent code.
In some cases the location where r2 is loaded from might not be
initialized. If that happens and r2 is used to generate an address,
the result is likely a SEGV.

This fixes that problem.

Fixes #30283

Change-Id: If70ef27fc65ef31969712422306ac3a57adbd5b6
Reviewed-on: https://go-review.googlesource.com/c/163337
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Andrew Bonventre <andybons@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-02-25 18:08:46 +00:00
Keith Randall
585c9e8412 cmd/compile: implement shifts by signed amounts
Allow shifts by signed amounts. Panic if the shift amount is negative.

TODO: We end up doing two compares per shift, see Ian's comment
https://github.com/golang/go/issues/19113#issuecomment-443241799 that
we could do it with a single comparison in the normal case.

The prove pass mostly handles this code well. For instance, it removes the
<0 check for cases like this:
    if s >= 0 { _ = x << s }
    _ = x << len(a)

This case isn't handled well yet:
    _ = x << (y & 0xf)
I'll do followon CLs for unhandled cases as needed.

Update #19113

R=go1.13

Change-Id: I839a5933d94b54ab04deb9dd5149f32c51c90fa1
Reviewed-on: https://go-review.googlesource.com/c/158719
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2019-02-15 23:13:09 +00:00
Austin Clements
a2e79571a9 cmd/compile: separate data and function LSyms
Currently, obj.Ctxt's symbol table does not distinguish between ABI0
and ABIInternal symbols. This is *almost* okay, since a given symbol
name in the final object file is only going to belong to one ABI or
the other, but it requires that the compiler mark a Sym as being a
function symbol before it retrieves its LSym. If it retrieves the LSym
first, that LSym will be created as ABI0, and later marking the Sym as
a function symbol won't change the LSym's ABI.

Marking a Sym as a function symbol before looking up its LSym sounds
easy, except Syms have a dual purpose: they are used just as interned
strings (every function, variable, parameter, etc with the same
textual name shares a Sym), and *also* to store state for whatever
package global has that name. As a result, it's easy to slip up and
look up an LSym when a Sym is serving as the name of a local variable,
and then later mark it as a function when it's serving as the global
with the name.

In general, we were careful to avoid this, but #29610 demonstrates one
case where we messed up. Because of on-demand importing from indexed
export data, it's possible to compile a method wrapper for a type
imported from another package before importing an init function from
that package. If the argument of the method is named "init", the
"init" LSym will be created as a data symbol when compiling the
wrapper, before it gets marked as a function symbol.

To fix this, we separate obj.Ctxt's symbol tables for ABI0 and
ABIInternal symbols. This way, the compiler will simply get a
different LSym once the Sym takes on its package-global meaning as a
function.

This fixes the above ordering issue, and means we no longer need to go
out of our way to create the "init" function early and mark it as a
function symbol.

Fixes #29610.
Updates #27539.

Change-Id: Id9458b40017893d46ef9e4a3f9b47fc49e1ce8df
Reviewed-on: https://go-review.googlesource.com/c/157017
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-01-11 00:45:49 +00:00
Keith Randall
69c2c56453 cmd/compile,runtime: redo mid-stack inlining tracebacks
Work involved in getting a stack trace is divided between
runtime.Callers and runtime.CallersFrames.

Before this CL, runtime.Callers returns a pc per runtime frame.
runtime.CallersFrames is responsible for expanding a runtime frame
into potentially multiple user frames.

After this CL, runtime.Callers returns a pc per user frame.
runtime.CallersFrames just maps those to user frame info.

Entries in the result of runtime.Callers are now pcs
of the calls (or of the inline marks), not of the instruction
just after the call.

Fixes #29007
Fixes #28640
Update #26320

Change-Id: I1c9567596ff73dc73271311005097a9188c3406f
Reviewed-on: https://go-review.googlesource.com/c/152537
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-12-28 20:55:36 +00:00
Martin Möhrmann
38e7177c94 cmd/compile: fix length overflow when appending elements to a slice
Instead of testing len(slice)+numNewElements > cap(slice) use
uint(len(slice)+numNewElements) > uint(cap(slice)) to test
if a slice needs to be grown in an append operation.

This prevents a possible overflow when len(slice) is near the maximum
int value and the addition of a constant number of new elements
makes it overflow and wrap around to a negative number which is
smaller than the capacity of the slice.

Appending a slice to a slice with append(s1, s2...) already used
a uint comparison to test slice capacity and therefore was not
vulnerable to the same overflow issue.

Fixes: #29190

Change-Id: I41733895838b4f80a44f827bf900ce931d8be5ca
Reviewed-on: https://go-review.googlesource.com/c/154037
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-12-14 05:48:18 +00:00
David Chase
ea6259d5e9 cmd/compile: check for negative upper bound to IsSliceInBounds
IsSliceInBounds(x, y) asserts that y is not negative, but
there were cases where this is not true.  Change code
generation to ensure that this is true when it's not obviously
true.  Prove phase cleans a few of these out.

With this change the compiler text section is 0.06% larger,
that is, not very much.  Benchmarking still TBD, may need
to wait for access to a benchmarking box (next week).

Also corrected run.go to handle '?' in -update_errors output.

Fixes #28797.

Change-Id: Ia8af90bc50a91ae6e934ef973def8d3f398fac7b
Reviewed-on: https://go-review.googlesource.com/c/152477
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-12-07 23:04:58 +00:00
David Chase
2b58ca6e3d cmd/compile: begin OpArg and OpPhi location lists at block start
For the entry block, make the "first instruction" be truly
the first instruction.  This allows printing of incoming
parameters with Delve.

Also be sure Phis are marked as being at the start of their
block.  This is observed to move location list pointers,
and where moved, they become correct.

Leading zero-width instructions include LoweredGetClosurePtr.
Because this instruction is actually architecture-specific,
and it is now tested for in 3 different places, also created
Op.isLoweredGetClosurePtr() to reduce future surprises.

Change-Id: Ic043b7265835cf1790382a74334b5714ae4060af
Reviewed-on: https://go-review.googlesource.com/c/145179
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-11-29 14:00:26 +00:00
Brian Kessler
319787a528 cmd/compile: intrinsify math/bits.Div on amd64
Note that the intrinsic implementation panics separately for overflow and
divide by zero, which matches the behavior of the pure go implementation.
There is a modest performance improvement after intrinsic implementation.

name     old time/op  new time/op  delta
Div-4    53.0ns ± 1%  47.0ns ± 0%  -11.28%  (p=0.008 n=5+5)
Div32-4  18.4ns ± 0%  18.5ns ± 1%     ~     (p=0.444 n=5+5)
Div64-4  53.3ns ± 0%  47.5ns ± 4%  -10.77%  (p=0.008 n=5+5)

Updates #28273

Change-Id: Ic1688ecc0964acace2e91bf44ef16f5fb6b6bc82
Reviewed-on: https://go-review.googlesource.com/c/144378
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-27 05:04:25 +00:00
Clément Chigot
9fe9853ae5 cmd/compile: fix nilcheck for AIX
This commit adapts compile tool to create correct nilchecks for AIX.

AIX allows to load a nil pointer. Therefore, the default nilcheck
which issues a load must be replaced by a CMP instruction followed by a
store at 0x0 if the value is nil. The store will trigger a SIGSEGV as on
others OS.

The nilcheck algorithm must be adapted to do not remove nilcheck if it's
only a read. Stores are detected with v.Type.IsMemory().

Tests related to nilptr must be adapted to the previous changements.
nilptr.go cannot be used as it's because the AIX address space starts at
1<<32.

Change-Id: I9f5aaf0b7e185d736a9b119c0ed2fe4e5bd1e7af
Reviewed-on: https://go-review.googlesource.com/c/144538
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-26 14:13:53 +00:00
Yury Smolsky
3068fcfa0d cmd/compile: add control flow graphs to ssa.html
This CL adds CFGs to ssa.html.
It execs dot to generate SVG,
which then gets inlined into the html.
Some standard naming and javascript hacks
enable integration with the rest of ssa.html.
Clicking on blocks highlights the relevant
part of the CFG, and vice versa.

Sample output and screenshots can be seen in #28177.

CFGs can be turned on with the suffix mask:
:*            - dump CFG for every phase
:lower        - just the lower phase
:lower-layout - lower through layout
:w,x-y        - phases w and x through y

Calling dot after every pass is noticeably slow,
instead use the range of phases.

Dead blocks are not displayed on CFG.

User can zoom and pan individual CFG
when the automatic adjustment has failed.

Dot-related errors are reported
without bringing down the process.

Fixes #28177

Change-Id: Id52c42d86c4559ca737288aa10561b67a119c63d
Reviewed-on: https://go-review.googlesource.com/c/142517
Run-TryBot: Yury Smolsky <yury@smolsky.by>
Reviewed-by: David Chase <drchase@google.com>
2018-11-21 10:22:43 +00:00
Josh Bleecher Snyder
bc43889566 cmd/compile: bulk rename
This change does a bulk rename of several identifiers in the compiler.
See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/
for context and for discussion of these particular renames.

Commands run to generate this change:

gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO
gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT
gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG
gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF
gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR
gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP
gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR
gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES
gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP
gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES

gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt
gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr
gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee
gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK
gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign
gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit

Not altered: parameters and local variables (mostly in typecheck.go) named top,
which should probably now be called ctx (and which should probably have a named type).
Also not altered: Field called Top in gc.Func.

gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD
gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD
gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD
gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD
gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD
gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD

Not altered: function gc.hasddd, params and local variables called isddd
Also not altered: fmt.go prints nodes using "isddd(%v)".

cd cmd/compile/internal/gc; go generate

I then manually found impacted comments using exact string match
and fixed them up by hand. The comment changes were trivial.

Passes toolstash-check.

Fixes #27167. If this experiment is deemed a success,
we will open a new tracking issue for renames to do
at the end of the 1.13 cycles.

Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8
Reviewed-on: https://go-review.googlesource.com/c/150140
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-19 00:02:53 +00:00
Martin Möhrmann
75798e8ada runtime: make processor capability variable naming platform specific
The current support_XXX variables are specific for the
amd64 and 386 platforms.

Prefix processor capability variables by architecture to have a
consistent naming scheme and avoid reuse of the existing
variables for new platforms.

This also aligns naming of runtime variables closer with internal/cpu
processor capability variable names.

Change-Id: I3eabb29a03874678851376185d3a62e73c1aff1d
Reviewed-on: https://go-review.googlesource.com/c/91435
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-14 20:30:31 +00:00
Austin Clements
5cf2b4c2d3 cmd/compile: fix race on initializing Sym symFunc flag
SSA lowering can create PFUNC ONAME nodes when compiling method calls.
Since we generally initialize the node's Sym to a func when we set its
class to PFUNC, we did this here, too. Unfortunately, since SSA
compilation is concurrent, this can cause a race if two function
compilations try to initialize the same symbol.

Luckily, we don't need to do this at all, since we're actually just
wrapping an ONAME node around an existing Sym that's already marked as
a function symbol.

Fixes the linux-amd64-racecompile builder, which was broken by CL
147158.

Updates #27539.

Change-Id: I8ddfce6e66a08ce53998c5bfa6f5a423c1ffc1eb
Reviewed-on: https://go-review.googlesource.com/c/149158
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-11-12 22:38:02 +00:00
Austin Clements
685aca45dc cmd/compile, cmd/link: separate stable and internal ABIs
This implements compiler and linker support for separating the
function calling ABI into two ABIs: a stable and an internal ABI. At
the moment, the two ABIs are identical, but we'll be able to evolve
the internal ABI without breaking existing assembly code that depends
on the stable ABI for calling to and from Go.

The Go compiler generates internal ABI symbols for all Go functions.
It uses the symabis information produced by the assembler to create
ABI wrappers whenever it encounters a body-less Go function that's
defined in assembly or a Go function that's referenced from assembly.

Since the two ABIs are currently identical, for the moment this is
implemented using "ABI alias" symbols, which are just forwarding
references to the native ABI symbol for a function. This way there's
no actual code involved in the ABI wrapper, which is good because
we're not deriving any benefit from it right now. Once the ABIs
diverge, we can eliminate ABI aliases.

The linker represents these different ABIs internally as different
versions of the same symbol. This way, the linker keeps us honest,
since every symbol definition and reference also specifies its
version. The linker is responsible for resolving ABI aliases.

Fixes #27539.

Change-Id: I197c52ec9f8fc435db8f7a4259029b20f6d65e95
Reviewed-on: https://go-review.googlesource.com/c/147160
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-11-12 20:46:55 +00:00
Austin Clements
16e6cd9a4d cmd/compile: mark function Syms
In order to mark the obj.LSyms produced by the compiler with the
correct ABI, we need to know which types.Syms refer to function
symbols. This CL adds a flag to types.Syms to mark symbols for
functions, and sets this flag everywhere we create a PFUNC-class node,
and in the one place where we directly create function symbols without
always wrapping them in a PFUNC node (methodSym).

We'll use this information to construct obj.LSyms with correct ABI
information.

For #27539.

Change-Id: Ie3ac8bf3da013e449e78f6ca85546a055f275463
Reviewed-on: https://go-review.googlesource.com/c/147158
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-12 20:46:50 +00:00
Austin Clements
15265ec421 cmd/compile: avoid duplicate GC bitmap symbols
Currently, liveness produces a distinct obj.LSym for each GC bitmap
for each function. These are then named by content hash and only
ultimately deduplicated by WriteObjFile.

For various reasons (see next commit), we want to remove this
deduplication behavior from WriteObjFile. Furthermore, it's
inefficient to produce these duplicate symbols in the first place.

GC bitmaps are the only source of duplicate symbols in the compiler.
This commit eliminates these duplicate symbols by declaring them in
the Ctxt symbol hash just like every other obj.LSym. As a result, all
GC bitmaps with the same content now refer to the same obj.LSym.

The next commit will remove deduplication from WriteObjFile.

For #27539.

Change-Id: I4f15e3d99530122cdf473b7a838c69ef5f79db59
Reviewed-on: https://go-review.googlesource.com/c/146557
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-11-03 15:12:34 +00:00
Martin Möhrmann
020a18c545 cmd/compile: move slice construction to callers of makeslice
Only return a pointer p to the new slices backing array from makeslice.
Makeslice callers then construct sliceheader{p, len, cap} explictly
instead of makeslice returning the slice.

Reduces go binary size by ~0.2%.
Removes 92 (~3.5%) panicindex calls from go binary.

Change-Id: I29b7c3b5fe8b9dcec96e2c43730575071cfe8a94
Reviewed-on: https://go-review.googlesource.com/c/141822
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-10-29 19:23:00 +00:00
Matthew Dempsky
2dda040f19 cmd/compile/internal/gc: represent labels as bare Syms
Avoids allocating an ONAME for OLABEL, OGOTO, and named OBREAK and
OCONTINUE nodes.

Passes toolstash-check.

Change-Id: I359142cd48e8987b5bf29ac100752f8c497261c1
Reviewed-on: https://go-review.googlesource.com/c/145200
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-10-27 07:32:06 +00:00
Matthew Dempsky
c68e3bcb03 cmd/compile: remove -f flag
This is supposed to print out function stack frames, but it's been
broken since golang.org/cl/38593, and no one has noticed.

Change-Id: Iad428a9097d452b878b1f8c5df22afd6f671ac2e
Reviewed-on: https://go-review.googlesource.com/c/145199
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-10-27 02:10:07 +00:00
Keith Randall
7a634034c8 cmd/compile: fix Mul->Mul64 intrinsic alias
The alias declaration needs to come after the function it is aliasing.

It isn't a big deal in this case, as bits.Mul inlines and has as its
body bits.Mul64, so the desired code gets generated regardless.
The alias should only have an effect on inlining cost estimates
(for functions that call bits.Mul).

Change-Id: I0d814899ce7049a0fb36e8ce1ad5ababbaf6265f
Reviewed-on: https://go-review.googlesource.com/c/144597
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-10-25 23:12:46 +00:00
Keith Randall
dd789550a7 cmd/compile: intrinsify math/bits.Sub on amd64
name             old time/op  new time/op  delta
Sub-8            1.12ns ± 1%  1.17ns ± 1%   +5.20%          (p=0.008 n=5+5)
Sub32-8          1.11ns ± 0%  1.11ns ± 0%     ~     (all samples are equal)
Sub64-8          1.12ns ± 0%  1.18ns ± 1%   +5.00%          (p=0.016 n=4+5)
Sub64multiple-8  4.10ns ± 1%  0.86ns ± 1%  -78.93%          (p=0.008 n=5+5)

Fixes #28273

Change-Id: Ibcb6f2fd32d987c3bcbae4f4cd9d335a3de98548
Reviewed-on: https://go-review.googlesource.com/c/144258
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-25 19:47:27 +00:00
Keith Randall
899f3a2892 cmd/compile: intrinsify math/bits.Add on amd64
name             old time/op  new time/op  delta
Add-8            1.11ns ± 0%  1.18ns ± 0%   +6.31%  (p=0.029 n=4+4)
Add32-8          1.02ns ± 0%  1.02ns ± 1%     ~     (p=0.333 n=4+5)
Add64-8          1.11ns ± 1%  1.17ns ± 0%   +5.79%  (p=0.008 n=5+5)
Add64multiple-8  4.35ns ± 1%  0.86ns ± 0%  -80.22%  (p=0.000 n=5+4)

The individual ops are a bit slower (but still very fast).
Using the ops in carry chains is very fast.

Update #28273

Change-Id: Id975f76df2b930abf0e412911d327b6c5b1befe5
Reviewed-on: https://go-review.googlesource.com/c/144257
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-25 19:47:00 +00:00
Carlos Eduardo Seo
5c472132bf cmd/compile, runtime: add new lightweight atomics for ppc64x
This change creates the infrastructure for new lightweight atomics
primitives in runtime/internal/atomic:

- LoadAcq, for load-acquire
- StoreRel, for store-release
- CasRel, for Compare-and-Swap-release

and implements them for ppc64x. There is visible performance improvement
in producer-consumer scenarios, like BenchmarkChanProdCons*:

benchmark                           old ns/op     new ns/op     delta
BenchmarkChanProdCons0-48           2034          2034          +0.00%
BenchmarkChanProdCons10-48          1798          1608          -10.57%
BenchmarkChanProdCons100-48         1596          1585          -0.69%
BenchmarkChanProdConsWork0-48       2084          2046          -1.82%
BenchmarkChanProdConsWork10-48      1829          1668          -8.80%
BenchmarkChanProdConsWork100-48     1650          1650          +0.00%

Fixes #21348

Change-Id: I1f6ce377e4a0fe4bd7f5f775e8036f50070ad8db
Reviewed-on: https://go-review.googlesource.com/c/142277
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-10-23 18:10:38 +00:00
Carlos Eduardo Seo
1e8ecefcd5 cmd/compile: intrinsify math/big.mulWW on ppc64x
This change implements mulWW as an intrinsic for ppc64x. Performance
numbers below:

name                            old time/op    new time/op    delta
QuoRem                            4.54µs ±45%    3.22µs ± 0%  -29.22%  (p=0.029 n=4+4)
ModSqrt225_Tonelli                 765µs ± 3%     757µs ± 0%   -1.02%  (p=0.029 n=4+4)
ModSqrt225_3Mod4                   231µs ± 0%     231µs ± 0%   -0.10%  (p=0.029 n=4+4)
ModSqrt231_Tonelli                 789µs ± 0%     788µs ± 0%   -0.14%  (p=0.029 n=4+4)
ModSqrt231_5Mod8                   267µs ± 0%     267µs ± 0%   -0.13%  (p=0.029 n=4+4)
Sqrt                              49.5µs ±17%    45.3µs ± 0%   -8.48%  (p=0.029 n=4+4)
IntSqr/1                          32.2ns ±22%    24.2ns ± 0%  -24.79%  (p=0.029 n=4+4)
IntSqr/2                          60.6ns ± 0%    60.9ns ± 0%   +0.50%  (p=0.029 n=4+4)
IntSqr/3                          82.8ns ± 0%    83.3ns ± 0%   +0.51%  (p=0.029 n=4+4)
IntSqr/5                           122ns ± 0%     121ns ± 0%   -1.22%  (p=0.029 n=4+4)
IntSqr/8                           227ns ± 0%     226ns ± 0%   -0.44%  (p=0.029 n=4+4)
IntSqr/10                          300ns ± 0%     298ns ± 0%   -0.67%  (p=0.029 n=4+4)
IntSqr/20                         1.02µs ± 0%    0.89µs ± 0%  -13.08%  (p=0.029 n=4+4)
IntSqr/30                         1.73µs ± 0%    1.51µs ± 0%  -12.73%  (p=0.029 n=4+4)
IntSqr/50                         3.69µs ± 1%    3.29µs ± 0%  -10.70%  (p=0.029 n=4+4)
IntSqr/80                         7.64µs ± 0%    7.04µs ± 0%   -7.91%  (p=0.029 n=4+4)
IntSqr/100                        11.1µs ± 0%    10.3µs ± 0%   -7.04%  (p=0.029 n=4+4)
IntSqr/200                        37.9µs ± 0%    36.4µs ± 0%   -4.13%  (p=0.029 n=4+4)
IntSqr/300                        69.4µs ± 0%    66.0µs ± 0%   -4.94%  (p=0.029 n=4+4)
IntSqr/500                         174µs ± 0%     168µs ± 0%   -3.10%  (p=0.029 n=4+4)
IntSqr/800                         347µs ± 0%     333µs ± 0%   -4.06%  (p=0.029 n=4+4)
IntSqr/1000                        524µs ± 0%     507µs ± 0%   -3.21%  (p=0.029 n=4+4)

Change-Id: If067452f5b6579ad3a2e9daa76a7ffe6fceae1bb
Reviewed-on: https://go-review.googlesource.com/c/143217
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-10-22 13:03:28 +00:00