779 Commits

Author SHA1 Message Date
Cuong Manh Le
7e1028a9ff cmd/compile: avoid range over copy of array
Passes toostash-check.

Slightly reduce compiler binary size:

file    before    after     Δ       %
compile 21087288  21070776  -16512  -0.078%
total   131847020 131830508 -16512  -0.013%

file                      before    after     Δ       %
cmd/compile/internal/gc.a 9007472   8999640   -7832   -0.087%
total                     127117794 127109962 -7832   -0.006%

Change-Id: I4aadd68d0a7545770598bed9d3a4d05899b67b52
Reviewed-on: https://go-review.googlesource.com/c/go/+/205777
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-06 23:24:28 +00:00
Josh Bleecher Snyder
7b0b6c2f7e cmd/compile: simplify converted SSA form for 'if false'
The goal here is to make it easier for a human to
examine the SSA when a function contains lots of dead code.

No significant compiler metric or generated code differences.

Change-Id: I81915fa4639bc8820cc9a5e45e526687d0d1f57a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221791
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-03 18:42:30 +00:00
Meng Zhuo
ab7ecea0c8 cmd/compile: add intrinsics for runtime/internal/math on MIPS64x
name              old time/op  new time/op  delta
MulUintptr/small  8.42ns ± 0%  5.93ns ± 0%  -29.66%  (p=0.000 n=9+10)
MulUintptr/large  11.1ns ± 0%   7.4ns ± 0%  -33.17%  (p=0.000 n=10+9)

Change-Id: I6659a886389660461fc2c90bd248243f6e7c29d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/210897
Run-TryBot: Meng Zhuo <mengzhuo1203@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-02 16:12:20 +00:00
Ruixin(Peter) Bao
2962c96c9f cmd/compile: lower float to uint conversions on s390x
Add rules for lowering float <-> unsigned int on s390x.

During compilation,
Cvt64Uto64F rule triggers around 80 times,
Cvt64Fto64U rule triggers around 20 times,
Cvt64Uto32F rule triggers around 5 times.

Change-Id: If4c9d128b9132fce8c0bea9abc09cb43a5df7989
Reviewed-on: https://go-review.googlesource.com/c/go/+/209177
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-02-29 21:37:47 +00:00
Michael Munday
cb74dcc172 cmd/compile: remove Greater* and Geq* generic integer ops
The generic Greater and Geq ops can always be replaced with the Less and
Leq ops. This CL therefore removes them. This simplifies the compiler since
it reduces the number of operations that need handling in both code and in
rewrite rules. This will be especially true when adding control flow
optimizations such as the integer-in-range optimizations in CL 165998.

Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040
Reviewed-on: https://go-review.googlesource.com/c/go/+/220417
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26 13:11:53 +00:00
Joel Sing
2e4f490b31 cmd/compile,cmd/link: fix and re-enable open-coded defers on riscv64
The R_CALLRISCV relocation marker is on the JALR instruction, however the actual
relocation is currently two instructions previous for the AUIPC+ADDI sequence.
Adjust the platform dependent offset accordingly and re-enable open-coded defers.

Fixes #36786.

Change-Id: I71597c193c447930fbe94ce44b7355e89ae877bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/216797
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-01-29 16:34:44 +00:00
Joel Sing
a858d15f11 cmd/compile: disable open-coded defers on riscv64
Open-coded defers are currently broken on riscv64 - disable them for the
time being. All of the standard package tests now pass on linux/riscv64.

Updates issue #27532 and #36786

Change-Id: I20fc25ce91dfad48be32409ba5c64ca9a6acef1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/216517
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Dan Scales <danscales@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-01-28 02:40:44 +00:00
Joel Sing
98d2717499 cmd/compile: implement compiler for riscv64
Based on riscv-go port.

Updates #27532

Change-Id: Ia329daa243db63ff334053b8807ea96b97ce3acf
Reviewed-on: https://go-review.googlesource.com/c/go/+/204631
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-01-18 14:41:40 +00:00
Cherry Zhang
a037582eff cmd/compile: mark empty block preemptible
Currently, a block's control instruction gets the liveness info
of the last Value in the block. However, for an empty block, the
control instruction gets the invalid liveness info and therefore
not preemptible. One example is empty infinite loop, which has
only a control instruction. The control instruction being non-
preemptible makes the whole loop non-preemptible.

Fix this by using a different, preemptible liveness info for
empty block's control. We can choose an arbitrary preemptible
liveness info, as at run time we don't really use the liveness
map at that instruction.

As before, if the last Value in the block is non-preemptible, so
is the block control. For example, the conditional branch in the
write barrier test block is still non-preemptible.

Also, only update liveness info if we are actually emitting
instructions. So zero-width Values' liveness info (which are
always invalid) won't affect the block control's liveness info.
For example, if the last Values in a block is a tuple-generating
operation and a Select, the block control instruction is still
preemptible.

Fixes #35923.

Change-Id: Ic5225f3254b07e4955f7905329b544515907642b
Reviewed-on: https://go-review.googlesource.com/c/go/+/209659
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2019-12-06 01:11:02 +00:00
David Chase
0e02cfb369 cmd/compile: try harder to not use an empty src.XPos for a bogus line
The fix for #35652 did not guarantee that it was using a non-empty
src position to replace an empty one.  The new code checks again
and falls back to a more certain position.  (The input in question
compiles to a single empty infinite loop, and none of the actual instructions
had any source position at all.  That is a bug, but given the pathology
of this input, not one worth dealing with this late in the release cycle,
if ever.)

Literally:

00000 (5) TEXT "".f(SB), ABIInternal
00001 (5) PCDATA $0, $-2
00002 (5) PCDATA $1, $-2
00003 (5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
00004 (5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
00005 (5) FUNCDATA $2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
b2
00006 (?) XCHGL AX, AX
b6
00007 (+1048575) JMP 6
00008 (?) END

TODO: Add runtime.InfiniteLoop(), replace infinite loops with a call to
that, and use an eco-friendly runtime.gopark instead.  (This was Cherry's
excellent idea.)

Updates #35652
Fixes #35695

Change-Id: I4b9a841142ee4df0f6b10863cfa0721a7e13b437
Reviewed-on: https://go-review.googlesource.com/c/go/+/207964
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-22 03:06:22 +00:00
David Chase
9bba63bbbe cmd/compile: make a better bogus line for empty infinite loops
The old recipe for making an infinite loop not be infinite
in the debugger could create an instruction (Prog) with a
line number not tied to any file (index == 0).  This caused
downstream failures in DWARF processing.

So don't do that.  Also adds a test, also adds a check+panic
to ensure that the next time this happens the error is less
mystifying.

Fixes #35652

Change-Id: I04f30bc94fdc4aef20dd9130561303ff84fd945e
Reviewed-on: https://go-review.googlesource.com/c/go/+/207613
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-19 00:38:53 +00:00
Michael Munday
b3885dbc93 cmd/compile, runtime: intrinsify atomic And8 and Or8 on s390x
Intrinsify these functions to match other platforms. Update the
sequence of instructions used in the assembly implementations to
match the intrinsics.

Also, add a micro benchmark so we can more easily measure the
performance of these two functions:

name            old time/op  new time/op  delta
And8-8          5.33ns ± 7%  2.55ns ± 8%  -52.12%  (p=0.000 n=20+20)
And8Parallel-8  7.39ns ± 5%  3.74ns ± 4%  -49.34%  (p=0.000 n=20+20)
Or8-8           4.84ns ±15%  2.64ns ±11%  -45.50%  (p=0.000 n=20+20)
Or8Parallel-8   7.27ns ± 3%  3.84ns ± 4%  -47.10%  (p=0.000 n=19+20)

By using a 'rotate then xor selected bits' instruction combined with
either a 'load and and' or a 'load and or' instruction we can
implement And8 and Or8 with far fewer instructions. Replacing
'compare and swap' with atomic instructions may also improve
performance when there is contention.

Change-Id: I28bb8032052b73ae8ccdf6e4c612d2877085fa01
Reviewed-on: https://go-review.googlesource.com/c/go/+/204277
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-11 15:23:59 +00:00
DQNEO
f07059d949 cmd/compile: rename sizeof_Array and array_* to slice_*
Renames variables sizeof_Array and other array_* variables
that were actually intended for slices and not arrays.

Change-Id: I391b95880cc77cabb8472efe694b7dd19545f31a
Reviewed-on: https://go-review.googlesource.com/c/go/+/180919
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-11-11 12:40:04 +00:00
David Chase
cd53fddabb cmd/compile: add framework for logging optimizer (non)actions to LSP
This is intended to allow IDEs to note where the optimizer
was not able to improve users' code.  There may be other
applications for this, for example in studying effectiveness
of optimizer changes more quickly than running benchmarks,
or in verifying that code changes did not accidentally disable
optimizations in performance-critical code.

Logging of nilcheck (bad) for amd64 is implemented as
proof-of-concept.  In general, the intent is that optimizations
that didn't happen are what will be logged, because that is
believed to be what IDE users want.

Added flag -json=version,dest

Check that version=0.  (Future compilers will support a
few recent versions, I hope that version is always <=3.)

Dest is expected to be one of:

/path (or \path in Windows)
  will create directory /path and fill it w/ json files
file://path
  will create directory path, intended either for
     I:\dont\know\enough\about\windows\paths
     trustme_I_know_what_I_am_doing_probably_testing

Not passing an absolute path name usually leads to
json splattered all over source directories,
or failure when those directories are not writeable.
If you want a foot-gun, you have to ask for it.

The JSON output is directed to subdirectories of dest,
where each subdirectory is net/url.PathEscape of the
package name, and each for each foo.go in the package,
net/url.PathEscape(foo).json is created.  The first line
of foo.json contains version and context information,
and subsequent lines contains LSP-conforming JSON
describing the missing optimizations.

Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/204338
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-10 17:11:34 +00:00
David Chase
a0262b201f cmd/compile: intrinsify functions added to runtime/internal/sys
This restores intrinsic status to functions copied from math/bits
into runtime/internal/sys, as an aid to runtime performance.

Updates #35112.

Change-Id: I41a7d87cf00f1e64d82aa95c5b1000bc128de820
Reviewed-on: https://go-review.googlesource.com/c/go/+/206200
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-09 05:51:04 +00:00
Russ Cox
543c6d2e0d math, cmd/compile: rename Fma to FMA
This API was added for #25819, where it was discussed as math.FMA.
The commit adding it used math.Fma, presumably for consistency
with the rest of the unusual names in package math
(Sincos, Acosh, Erfcinv, Float32bits, etc).

I believe that using an idiomatic Go name is more important here
than consistency with these other names, most of which are historical
baggage from C's standard library.

Early additions like Float32frombits happened before "uppercase for export"
(so they were originally like "float32frombits") and they were not properly
reconsidered when we uppercased the symbols to export them.
That's a mistake we live with.

The names of functions we have added since then, and even a few
that were legacy, are more properly Go-cased, such as IsNaN, IsInf,
and RoundToEven, rather than Isnan, Isinf, and Roundtoeven.
And also constants like MaxFloat32.

For new API, we should keep using proper Go-cased symbols
instead of minimally-upper-cased-C symbols.

So math.FMA, not math.Fma.

This API has not yet been released, so this change does not break
the compatibility promise.

This CL also modifies cmd/compile, since the compiler knows
the name of the function. I could have stopped at changing the
string constants, but it seemed to make more sense to use a
consistent casing everywhere.

Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e
Reviewed-on: https://go-review.googlesource.com/c/go/+/205317
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-07 14:51:06 +00:00
Dan Scales
cc47b0d2cd cmd/compile: handle some missing cases of non-SSAable values for args of open-coded defers
In my experimentation, I had found that most non-SSAable expressions were
converted to autotmp variables during AST evaluation. However, this was not true
generally, as witnessed by issue #35213, which has a non-SSAable field reference
of a struct that is not converted to an autotmp. So, I fixed openDeferSave() to
handle non-SSAable nodes more generally, and make sure that these non-SSAable
expressions are not evaluated more than once (which could incorrectly repeat side
effects).

Fixes #35213

Change-Id: I8043d5576b455e94163599e930ca0275e550d594
Reviewed-on: https://go-review.googlesource.com/c/go/+/203888
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-29 19:58:24 +00:00
Austin Clements
97592b3c14 cmd/compile: intrinsics for runtime/internal/atomic.Store8
For #10958, #24543, but makes sense on its own.

Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/203284
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-29 03:18:55 +00:00
Dan Scales
be64a19d99 cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go binary without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements <austin@google.com>
2019-10-24 13:54:11 +00:00
smasher164
03fb1f607b cmd/compile: don't use FMA on plan9
CL 137156 introduces an intrinsic on AMD64 that executes vfmadd231sd
when feature detection is successful. However, because floating-point
isn't allowed in note handler, the builder disables SSE instructions,
and fails when attempting to execute this instruction. This change
disables FMA on plan9 to immediately use the software fallback.

Fixes #35063.

Change-Id: I87d8f0995bd2f15013d203e618938f5079c9eed2
Reviewed-on: https://go-review.googlesource.com/c/go/+/202617
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-22 19:36:42 +00:00
smasher164
58b031949b cmd/compile: add fma intrinsic for arm
This change introduces an arm intrinsic that generates the FMULAD
instruction for the fused-multiply-add operation on systems that
support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite
rule translates the generic intrinsic to FMULAD.

Updates #25819.

Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/142117
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 17:42:47 +00:00
smasher164
7a6da218b1 cmd/compile: add fma intrinsic for amd64
To permit ssa-level optimization, this change introduces an amd64 intrinsic
that generates the VFMADD231SD instruction for the fused-multiply-add
operation on systems that support it. System support is detected via
cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic
("Fma") to VFMADD231SD.

The benchmark compares the software implementation (old) with the intrinsic
(new).

name   old time/op  new time/op  delta
Fma-4  27.2ns ± 1%   1.0ns ± 9%  -96.48%  (p=0.008 n=5+5)

Updates #25819.

Change-Id: I966655e5f96817a5d06dff5942418a3915b09584
Reviewed-on: https://go-review.googlesource.com/c/go/+/137156
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 16:42:10 +00:00
smasher164
33425ab8db cmd/compile: introduce generic ssa intrinsic for fused-multiply-add
In order to make math.FMA a compiler intrinsic for ISAs like ARM64,
PPC64[le], and S390X, a generic 3-argument opcode "Fma" is provided and
rewritten as

    ARM64: (Fma x y z) -> (FMADDD z x y)
    PPC64: (Fma x y z) -> (FMADD x y z)
    S390X: (Fma x y z) -> (FMADD z x y)

Updates #25819.

Change-Id: Ie5bc628311e6feeb28ddf9adaa6e702c8c291efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/131959
Run-TryBot: Akhil Indurti <aindurti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 16:24:15 +00:00
Bryan C. Mills
b76e6f8825 Revert "cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata"
This reverts CL 190098.

Reason for revert: broke several builders.

Change-Id: I69161352f9ded02537d8815f259c4d391edd9220
Reviewed-on: https://go-review.googlesource.com/c/go/+/201519
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Dan Scales <danscales@google.com>
2019-10-16 20:59:53 +00:00
Dan Scales
dad616375f cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go cmd without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28
Reviewed-on: https://go-review.googlesource.com/c/go/+/190098
Reviewed-by: Austin Clements <austin@google.com>
2019-10-16 18:27:16 +00:00
Cherry Zhang
c4817f5d4f cmd/compile: on Wasm and AIX, let deferred nil function panic at invocation
The Go spec requires

	If a deferred function value evaluates to nil, execution
	panics when the function is invoked, not when the "defer"
	statement is executed.

On Wasm and AIX, currently we actually emit a nil check at the
point of defer statement, which will make it panic too early.
This CL fixes this.

Also, on Wasm, now the nil function will be passed through
deferreturn to jmpdefer, which does an explicit nil check and
calls sigpanic if it is nil. This sigpanic, being called from
assembly, is ABI0. So change the assembler backend to also
handle sigpanic in ABI0.

Fixes #34926.
Updates #8047.

Change-Id: I28489a571cee36d2aef041f917b8cfdc31d557d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/201297
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-16 00:05:37 +00:00
Meng Zhuo
50f1157760 cmd/compile: add math/bits.Mul64 intrinsic on mips64x
Benchmark:
name   old time/op  new time/op  delta
Mul    36.0ns ± 1%   2.8ns ± 0%  -92.31%  (p=0.000 n=10+10)
Mul32  4.37ns ± 0%  4.37ns ± 0%     ~     (p=0.429 n=6+10)
Mul64  36.4ns ± 0%   2.8ns ± 0%  -92.37%  (p=0.000 n=10+9)

Change-Id: Ic4f4e5958adbf24999abcee721d0180b5413fca7
Reviewed-on: https://go-review.googlesource.com/c/go/+/200582
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-14 21:23:34 +00:00
Matthew Dempsky
06b12e660c cmd/compile: move some ONAME-specific flags from Node to Name
The IsClosureVar, IsOutputParamHeapAddr, Assigned, Addrtaken,
InlFormal, and InlLocal flags are only interesting for ONAME nodes, so
it's better to set these flags on Name.flags instead of Node.flags.

Two caveats though:

1. Previously, we would set Assigned and Addrtaken on the entire
expression tree involved in an assignment or addressing operation.
However, the rest of the compiler only actually cares about knowing
whether the underlying ONAME (if any) was assigned/addressed.

2. This actually requires bumping Name.flags from bitset8 to bitset16,
whereas it doesn't allow shrinking Node.flags any. However, Name has
some trailing padding bytes, so expanding Name.flags doesn't cost any
memory.

Passes toolstash-check.

Change-Id: I7775d713566a38d5b9723360b1659b79391744c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/200898
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-14 18:57:11 +00:00
Matthew Dempsky
46be01f4e0 cmd/compile: remove Addable flag
This flag is supposed to indicate whether the expression is
"addressable"; but in practice, we infer this from other
attributes about the expression (e.g., n.Op and n.Class()).

Passes toolstash-check.

Change-Id: I19352ca07ab5646e232d98e8a7c1c9aec822ddd0
Reviewed-on: https://go-review.googlesource.com/c/go/+/200897
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-13 01:48:30 +00:00
David Chase
c450ace12c cmd/compile: remove statement marks from secondary calls
Calls are code-generated in an alternate path that inherits
its positions from values, not from *SSAGenState.  The
default position on *SSAGenState was marked as not-a-statement,
but this was not applied to the value itself, leading to
spurious "is statement" marks in the output (convention:
after code generation in the compiler, everything is either
definitely a statement or definitely not a statement, nothing
is in the undetermined state).

This CL causes a 35 statement regression in ssa/stmtlines_test.
This is down from the earlier 150 because of all the other
CLs preceding this one that deal with the root causes of the
missing lines (repeated lines on nested calls hid missing lines).

This also removes some line repeats from ssa/debug_test.

Change-Id: Ie9a507bd5447e906b35bbd098e3295211df2ae01
Reviewed-on: https://go-review.googlesource.com/c/go/+/188018
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
2019-10-04 20:41:52 +00:00
Ruixin(Peter) Bao
ac2ceba01a cmd/compile/internal/gc: intrinsify mulWW on s390x
SSA rule have already been added previously to intrisinfy Mul/Mul64 on s390x. In this CL,
we want to let mulWW use that SSA rule as well. Also removed an extra line for formatting.

Benchmarks:
QuoRem-18                            3.59µs ±15%    2.94µs ± 3%  -18.06%  (p=0.000 n=8+8)
ModSqrt225_Tonelli-18                 806µs ± 0%     800µs ± 0%   -0.85%  (p=0.000 n=7+8)
ModSqrt225_3Mod4-18                   245µs ± 1%     243µs ± 0%   -0.81%  (p=0.001 n=8+8)
ModSqrt231_Tonelli-18                 837µs ± 0%     834µs ± 1%   -0.36%  (p=0.028 n=8+8)
ModSqrt231_5Mod8-18                   282µs ± 0%     280µs ± 0%   -0.76%  (p=0.000 n=8+8)
Sqrt-18                              45.8µs ± 2%    38.6µs ± 0%  -15.63%  (p=0.000 n=8+8)
IntSqr/1-18                          19.1ns ± 0%    13.1ns ± 0%  -31.41%  (p=0.000 n=8+8)
IntSqr/2-18                          48.3ns ± 2%    48.2ns ± 0%     ~     (p=0.094 n=8+8)
IntSqr/3-18                          70.5ns ± 1%    70.7ns ± 0%     ~     (p=0.428 n=8+8)
IntSqr/5-18                           119ns ± 1%     118ns ± 0%   -1.02%  (p=0.000 n=7+8)
IntSqr/8-18                           215ns ± 1%     215ns ± 0%     ~     (p=0.320 n=8+7)
IntSqr/10-18                          302ns ± 1%     301ns ± 0%     ~     (p=0.148 n=8+7)
IntSqr/20-18                          952ns ± 1%     807ns ± 0%  -15.28%  (p=0.000 n=8+8)
IntSqr/30-18                         1.74µs ± 0%    1.53µs ± 0%  -11.93%  (p=0.000 n=8+8)
IntSqr/50-18                         3.91µs ± 0%    3.57µs ± 0%   -8.64%  (p=0.000 n=7+8)
IntSqr/80-18                         8.66µs ± 1%    8.11µs ± 0%   -6.39%  (p=0.000 n=8+8)
IntSqr/100-18                        12.8µs ± 0%    12.2µs ± 0%   -5.19%  (p=0.000 n=8+8)
IntSqr/200-18                        46.0µs ± 0%    44.5µs ± 0%   -3.06%  (p=0.000 n=8+8)
IntSqr/300-18                        81.4µs ± 0%    78.4µs ± 0%   -3.71%  (p=0.000 n=7+8)
IntSqr/500-18                         212µs ± 1%     206µs ± 0%   -2.66%  (p=0.000 n=8+8)
IntSqr/800-18                         419µs ± 1%     406µs ± 0%   -3.07%  (p=0.000 n=8+8)
IntSqr/1000-18                        635µs ± 0%     621µs ± 0%   -2.13%  (p=0.000 n=8+8)

Change-Id: Ib097857186932b902601ab087cbeff3fc9555c3e
Reviewed-on: https://go-review.googlesource.com/c/go/+/197639
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-02 11:08:43 +00:00
Mohit Verma
c729116332 cmd/compile: use Node.Right for OAS2* nodes (cleanup)
This CL changes cmd/compile to use Node.Right instead of
Node.Rlist for OAS2FUNC/OAS2RECV/OAS2MAPR/OAS2DOTTYPE nodes.
Fixes #32293

Change-Id: I4c9d9100be2d98d15e016797f934f64d385f5faa
Reviewed-on: https://go-review.googlesource.com/c/go/+/197817
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-09-28 05:04:49 +00:00
Cuong Manh Le
75da700d0a cmd/compile: consistently use strlit to access constants string values
Passes toolstash-check.

Change-Id: Ieaef20b7649787727b69469f93ffc942022bc079
Reviewed-on: https://go-review.googlesource.com/c/go/+/195198
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-09-16 11:41:20 +00:00
Ruixin Bao
98aa97806b cmd/compile: add math/bits.Mul64 intrinsic on s390x
This change adds an intrinsic for Mul64 on s390x. To achieve that,
a new assembly instruction, MLGR, is introduced in s390x/asmz.go. This assembly
instruction directly uses an existing instruction on Z and supports multiplication
of two 64 bit unsigned integer and stores the result in two separate registers.

In this case, we require the multiplcand to be stored in register R3 and
the output result (the high and low 64 bit of the product) to be stored in
R2 and R3 respectively.

A test case is also added.

Benchmark:
name      old time/op  new time/op  delta
Mul-18    11.1ns ± 0%   1.4ns ± 0%  -87.39%  (p=0.002 n=8+10)
Mul32-18  2.07ns ± 0%  2.07ns ± 0%     ~     (all equal)
Mul64-18  11.1ns ± 1%   1.4ns ± 0%  -87.42%  (p=0.000 n=10+10)

Change-Id: Ieca6ad1f61fff9a48a31d50bbd3f3c6d9e6675c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/194572
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-13 09:04:48 +00:00
Ainar Garipov
0efbd10157 all: fix typos
Use the following (suboptimal) script to obtain a list of possible
typos:

  #!/usr/bin/env sh

  set -x

  git ls-files |\
    grep -e '\.\(c\|cc\|go\)$' |\
    xargs -n 1\
    awk\
    '/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
    hunspell -d en_US -l |\
    grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
    grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
    sort -f |\
    uniq -c |\
    awk '$1 == 1 { print $2; }'

Then, go through the results manually and fix the most obvious typos in
the non-vendored code.

Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-09-08 17:28:20 +00:00
Matthew Dempsky
581526ce96 cmd/compile: rewrite untyped constant conversion logic
This CL detangles the hairy mess that was convlit+defaultlit. In
particular, it makes the following changes:

1. convlit1 now follows the standard typecheck behavior of setting
"n.Type = nil" if there's an error. Notably, this means for a lot of
test cases, we now avoid reporting useless follow-on error messages.
For example, after reporting that "1 << s + 1.0" has an invalid shift,
we no longer also report that it can't be assigned to string.

2. Previously, assignconvfn had some extra logic for trying to
suppress errors from convlit/defaultlit so that it could provide its
own errors with better context information. Instead, this extra
context information is now passed down into convlit1 directly.

3. Relatedly, this CL also removes redundant calls to defaultlit prior
to assignconv. As a consequence, when an expression doesn't make sense
for a particular assignment (e.g., assigning an untyped string to an
integer), the error messages now say "untyped string" instead of just
"string". This is more consistent with go/types behavior.

4. defaultlit2 is now smarter about only trying to convert pairs of
untyped constants when it's likely to succeed. This allows us to
report better error messages for things like 3+"x"; instead of "cannot
convert 3 to string" we now report "mismatched types untyped number
and untyped string".

Passes toolstash-check.

Change-Id: I26822a02dc35855bd0ac774907b1cf5737e91882
Reviewed-on: https://go-review.googlesource.com/c/go/+/187657
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-09-06 23:15:48 +00:00
Cuong Manh Le
d2f958d8d1 cmd/compile: extend ssa.go to handle 1-element array and 1-field struct
Assinging to 1-element array/1-field struct variable is considered clobbering
the whole variable. By emitting OpVarDef in this case, liveness analysis
can now know the variable is redefined.

Also, the isfat is not necessary anymore, and will be removed in follow up CL.

Fixes #33916

Change-Id: Iece0d90b05273f333d59d6ee5b12ee7dc71908c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/192979
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-09-03 19:33:04 +00:00
Brian Kessler
b003afe4fe cmd/compile: intrinsify RotateLeft32 on wasm
wasm has 32-bit versions of all integer operations. This change
lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits
call.  Benchmarking on amd64 under node.js this is ~25% faster.

node v10.15.3/amd64
name          old time/op  new time/op  delta
RotateLeft    8.37ns ± 1%  8.28ns ± 0%   -1.05%  (p=0.029 n=4+4)
RotateLeft8   11.9ns ± 1%  11.8ns ± 0%     ~     (p=0.167 n=5+5)
RotateLeft16  11.8ns ± 0%  11.8ns ± 0%     ~     (all equal)
RotateLeft32  11.9ns ± 1%   8.7ns ± 0%  -26.32%  (p=0.008 n=5+5)
RotateLeft64  8.31ns ± 1%  8.43ns ± 2%     ~     (p=0.063 n=5+5)

Updates #31265

Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46
Reviewed-on: https://go-review.googlesource.com/c/go/+/182359
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
2019-08-31 17:03:04 +00:00
Ben Shi
8d5197d818 cmd/compile: optimize 386's math.bits.TrailingZeros16
This CL reverts CL 192097 and fixes the issue in CL 189277.

Change-Id: Icd271262e1f5019a8e01c91f91c12c1261eeb02b
Reviewed-on: https://go-review.googlesource.com/c/go/+/192519
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-08-30 17:37:00 +00:00
Ben Shi
3cfd003a8a cmd/compile: optimize ARM's math.bits.RotateLeft32
This CL optimizes math.bits.RotateLeft32 to inline
"MOVW Rx@>Ry, Rd" on ARM.

The benchmark results of math/bits show some improvements.
name               old time/op  new time/op  delta
RotateLeft-4       9.42ns ± 0%  6.91ns ± 0%  -26.66%  (p=0.000 n=40+33)
RotateLeft8-4      8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+31)
RotateLeft16-4     8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+32)
RotateLeft32-4     8.16ns ± 0%  7.54ns ± 0%   -7.68%  (p=0.000 n=40+40)
RotateLeft64-4     15.7ns ± 0%  15.7ns ± 0%     ~     (all equal)

updates #31265

Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712
Reviewed-on: https://go-review.googlesource.com/c/go/+/188697
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28 15:41:58 +00:00
Ben Shi
c683ab8128 cmd/compile: optimize ARM's math.Abs
This CL optimizes math.Abs to an inline ABSD instruction on ARM.

The benchmark results of src/math/ show big improvements.
name                   old time/op  new time/op  delta
Acos-4                  181ns ± 0%   182ns ± 0%   +0.30%  (p=0.000 n=40+40)
Acosh-4                 202ns ± 0%   202ns ± 0%     ~     (all equal)
Asin-4                  163ns ± 0%   163ns ± 0%     ~     (all equal)
Asinh-4                 242ns ± 0%   242ns ± 0%     ~     (all equal)
Atan-4                  120ns ± 0%   121ns ± 0%   +0.83%  (p=0.000 n=40+40)
Atanh-4                 202ns ± 0%   202ns ± 0%     ~     (all equal)
Atan2-4                 173ns ± 0%   173ns ± 0%     ~     (all equal)
Cbrt-4                 1.06µs ± 0%  1.06µs ± 0%   +0.09%  (p=0.000 n=39+37)
Ceil-4                 72.9ns ± 0%  72.8ns ± 0%     ~     (p=0.237 n=40+40)
Copysign-4             13.2ns ± 0%  13.2ns ± 0%     ~     (all equal)
Cos-4                   193ns ± 0%   183ns ± 0%   -5.18%  (p=0.000 n=40+40)
Cosh-4                  254ns ± 0%   239ns ± 0%   -5.91%  (p=0.000 n=40+40)
Erf-4                   112ns ± 0%   112ns ± 0%     ~     (all equal)
Erfc-4                  117ns ± 0%   117ns ± 0%     ~     (all equal)
Erfinv-4                127ns ± 0%   127ns ± 1%     ~     (p=0.492 n=40+40)
Erfcinv-4               128ns ± 0%   128ns ± 0%     ~     (all equal)
Exp-4                   212ns ± 0%   206ns ± 0%   -3.05%  (p=0.000 n=40+40)
ExpGo-4                 216ns ± 0%   209ns ± 0%   -3.24%  (p=0.000 n=40+40)
Expm1-4                 142ns ± 0%   142ns ± 0%     ~     (all equal)
Exp2-4                  191ns ± 0%   184ns ± 0%   -3.45%  (p=0.000 n=40+40)
Exp2Go-4                194ns ± 0%   187ns ± 0%   -3.61%  (p=0.000 n=40+40)
Abs-4                  14.4ns ± 0%   6.3ns ± 0%  -56.39%  (p=0.000 n=38+39)
Dim-4                  12.6ns ± 0%  12.6ns ± 0%     ~     (all equal)
Floor-4                49.6ns ± 0%  49.6ns ± 0%     ~     (all equal)
Max-4                  27.6ns ± 0%  27.6ns ± 0%     ~     (all equal)
Min-4                  27.0ns ± 0%  27.0ns ± 0%     ~     (all equal)
Mod-4                   349ns ± 0%   305ns ± 1%  -12.55%  (p=0.000 n=33+40)
Frexp-4                54.0ns ± 0%  47.1ns ± 0%  -12.78%  (p=0.000 n=38+38)
Gamma-4                 242ns ± 0%   234ns ± 0%   -3.16%  (p=0.000 n=36+40)
Hypot-4                84.8ns ± 0%  67.8ns ± 0%  -20.05%  (p=0.000 n=31+35)
HypotGo-4              88.5ns ± 0%  71.6ns ± 0%  -19.12%  (p=0.000 n=40+38)
Ilogb-4                45.8ns ± 0%  38.9ns ± 0%  -15.12%  (p=0.000 n=40+32)
J0-4                    821ns ± 0%   802ns ± 0%   -2.33%  (p=0.000 n=33+40)
J1-4                    816ns ± 0%   807ns ± 0%   -1.05%  (p=0.000 n=40+29)
Jn-4                   1.67µs ± 0%  1.65µs ± 0%   -1.45%  (p=0.000 n=40+39)
Ldexp-4                61.5ns ± 0%  54.6ns ± 0%  -11.27%  (p=0.000 n=40+32)
Lgamma-4                188ns ± 0%   188ns ± 0%     ~     (all equal)
Log-4                   154ns ± 0%   147ns ± 0%   -4.78%  (p=0.000 n=40+40)
Logb-4                 50.9ns ± 0%  42.7ns ± 0%  -16.11%  (p=0.000 n=34+39)
Log1p-4                 160ns ± 0%   159ns ± 0%     ~     (p=0.828 n=40+40)
Log10-4                 173ns ± 0%   166ns ± 0%   -4.05%  (p=0.000 n=40+40)
Log2-4                 65.3ns ± 0%  58.4ns ± 0%  -10.57%  (p=0.000 n=37+37)
Modf-4                 36.4ns ± 0%  36.4ns ± 0%     ~     (all equal)
Nextafter32-4          36.4ns ± 0%  36.4ns ± 0%     ~     (all equal)
Nextafter64-4          32.7ns ± 0%  32.6ns ± 0%     ~     (p=0.375 n=40+40)
PowInt-4                300ns ± 0%   277ns ± 0%   -7.78%  (p=0.000 n=40+40)
PowFrac-4               676ns ± 0%   635ns ± 0%   -6.00%  (p=0.000 n=40+35)
Pow10Pos-4             17.6ns ± 0%  17.6ns ± 0%     ~     (all equal)
Pow10Neg-4             22.0ns ± 0%  22.0ns ± 0%     ~     (all equal)
Round-4                30.1ns ± 0%  30.1ns ± 0%     ~     (all equal)
RoundToEven-4          38.9ns ± 0%  38.9ns ± 0%     ~     (all equal)
Remainder-4             291ns ± 0%   263ns ± 0%   -9.62%  (p=0.000 n=40+40)
Signbit-4              11.3ns ± 0%  11.3ns ± 0%     ~     (all equal)
Sin-4                   185ns ± 0%   185ns ± 0%     ~     (all equal)
Sincos-4                230ns ± 0%   230ns ± 0%     ~     (all equal)
Sinh-4                  253ns ± 0%   246ns ± 0%   -2.77%  (p=0.000 n=39+39)
SqrtIndirect-4         41.4ns ± 0%  41.4ns ± 0%     ~     (all equal)
SqrtLatency-4          13.8ns ± 0%  13.8ns ± 0%     ~     (all equal)
SqrtIndirectLatency-4  37.0ns ± 0%  37.0ns ± 0%     ~     (p=0.632 n=40+40)
SqrtGoLatency-4         911ns ± 0%   911ns ± 0%   +0.08%  (p=0.000 n=40+40)
SqrtPrime-4            13.2µs ± 0%  13.2µs ± 0%   +0.01%  (p=0.038 n=38+40)
Tan-4                   205ns ± 0%   205ns ± 0%     ~     (all equal)
Tanh-4                  264ns ± 0%   247ns ± 0%   -6.44%  (p=0.000 n=39+32)
Trunc-4                45.2ns ± 0%  45.2ns ± 0%     ~     (all equal)
Y0-4                    796ns ± 0%   792ns ± 0%   -0.55%  (p=0.000 n=35+40)
Y1-4                    804ns ± 0%   797ns ± 0%   -0.82%  (p=0.000 n=24+40)
Yn-4                   1.64µs ± 0%  1.62µs ± 0%   -1.27%  (p=0.000 n=40+39)
Float64bits-4          8.16ns ± 0%  8.16ns ± 0%   +0.04%  (p=0.000 n=35+40)
Float64frombits-4      10.7ns ± 0%  10.7ns ± 0%     ~     (all equal)
Float32bits-4          7.53ns ± 0%  7.53ns ± 0%     ~     (p=0.760 n=40+40)
Float32frombits-4      6.91ns ± 0%  6.91ns ± 0%   -0.04%  (p=0.002 n=32+38)
[Geo mean]              111ns        106ns        -3.98%

Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d
Reviewed-on: https://go-review.googlesource.com/c/go/+/188678
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28 15:41:28 +00:00
Bryan C. Mills
372b0eed17 Revert "cmd/compile: optimize 386's math.bits.TrailingZeros16"
This reverts CL 189277.

Reason for revert: broke 32-bit builders.

Updates #33902

Change-Id: Ie5f180d0371a90e5057ed578c334372e5fc3a286
Reviewed-on: https://go-review.googlesource.com/c/go/+/192097
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2019-08-28 12:57:59 +00:00
Ben Shi
22355d6cd2 cmd/compile: optimize 386's math.bits.TrailingZeros16
This CL optimizes math.bits.TrailingZeros16 on 386 with
a pair of BSFL and ORL instrcutions.

The case TrailingZeros16-4 of the benchmark test in
math/bits shows big improvement.
name               old time/op  new time/op  delta
TrailingZeros16-4  1.55ns ± 1%  0.87ns ± 1%  -43.87%  (p=0.000 n=50+49)

Change-Id: Ia899975b0e46f45dcd20223b713ed632bc32740b
Reviewed-on: https://go-review.googlesource.com/c/go/+/189277
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-08-28 02:29:54 +00:00
LE Manh Cuong
1a432f27d5 cmd/compile: eliminate usage of global Fatalf in ssa.go
state and ssafn both have their own Fatalf, so use them instead of
global Fatalf.

Updates #19683

Change-Id: Ie02a961d4285ab0a3f3b8d889a5b498d926ed567
Reviewed-on: https://go-review.googlesource.com/c/go/+/188539
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 17:05:15 +00:00
Keith Randall
8f296f59de Revert "Revert "cmd/compile,runtime: allocate defer records on the stack""
This reverts CL 180761

Reason for revert: Reinstate the stack-allocated defer CL.

There was nothing wrong with the CL proper, but stack allocation of defers exposed two other issues.

Issue #32477: Fix has been submitted as CL 181258.
Issue #32498: Possible fix is CL 181377 (not submitted yet).

Change-Id: I32b3365d5026600069291b068bbba6cb15295eb3
Reviewed-on: https://go-review.googlesource.com/c/go/+/181378
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-06-10 16:19:39 +00:00
Michael Munday
ac8dbe7747 cmd/compile, runtime: make atomic loads/stores sequentially consistent on s390x
The z/Architecture does not guarantee that a load following a store
will not be reordered with that store, unless they access the same
address. Therefore if we want to ensure the sequential consistency
of atomic loads and stores we need to perform serialization
operations after atomic stores.

We do not need to serialize in the runtime when using StoreRel[ease]
and LoadAcq[uire]. The z/Architecture already provides sufficient
ordering guarantees for these operations.

name              old time/op  new time/op  delta
AtomicLoad64-16   0.51ns ± 0%  0.51ns ± 0%     ~     (all equal)
AtomicStore64-16  0.51ns ± 0%  0.60ns ± 9%  +16.47%  (p=0.000 n=17+20)
AtomicLoad-16     0.51ns ± 0%  0.51ns ± 0%     ~     (all equal)
AtomicStore-16    0.51ns ± 0%  0.60ns ± 9%  +16.50%  (p=0.000 n=18+20)

Fixes #32428.

Change-Id: I88d19a4010c46070e4fff4b41587efe4c628d4d9
Reviewed-on: https://go-review.googlesource.com/c/go/+/180439
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-06-06 16:15:43 +00:00
Keith Randall
49200e3f3e Revert "cmd/compile,runtime: allocate defer records on the stack"
This reverts commit fff4f599fe1c21e411a99de5c9b3777d06ce0ce6.

Reason for revert: Seems to still have issues around GC.

Fixes #32452

Change-Id: Ibe7af629f9ad6a3d5312acd7b066123f484da7f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/180761
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2019-06-05 19:50:09 +00:00
Keith Randall
fff4f599fe cmd/compile,runtime: allocate defer records on the stack
When a defer is executed at most once in a function body,
we can allocate the defer record for it on the stack instead
of on the heap.

This should make defers like this (which are very common) faster.

This optimization applies to 363 out of the 370 static defer sites
in the cmd/go binary.

name     old time/op  new time/op  delta
Defer-4  52.2ns ± 5%  36.2ns ± 3%  -30.70%  (p=0.000 n=10+10)

Fixes #6980
Update #14939

Change-Id: I697109dd7aeef9e97a9eeba2ef65ff53d3ee1004
Reviewed-on: https://go-review.googlesource.com/c/go/+/171758
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-06-04 17:35:20 +00:00
Cherry Zhang
c10db03cbe cmd/compile: make sure build works when intrinsics are disabled
Some runtime functions, like getcallerpc/sp, don't have Go or
assembly implementations and have to be intrinsified. Make sure
they are, even if intrinsics are disabled.

This makes "go build -gcflags=all=-d=ssa/intrinsics/off hello.go"
work.

Change-Id: I77caaed7715d3ca7ffef68a3cdc9357f095c6b9f
Reviewed-on: https://go-review.googlesource.com/c/go/+/179897
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-05-31 21:27:59 +00:00
LE Manh Cuong
d0aca5759e cmd/compile: fix doc typo in ssa.go
Change-Id: Ie299a5eca6f6a7c5a37c00ff0de7ce322450375b
Reviewed-on: https://go-review.googlesource.com/c/go/+/178123
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-05-21 14:19:32 +00:00