802 Commits

Author SHA1 Message Date
Xiangdong Ji
e8f5a33191 cmd/compile: fix incorrect rewriting to if condition
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.

Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.

Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag, in the following categories:

  Block-Op        Meaning                   ARM condition codes
  1. LTnoov        less than                 MI
  2. GEnoov        greater than or equal     PL
  3. LEnoov        less than or equal        MI || EQ
  4. GTnoov        greater than              NEQ & PL

The backend generates two consecutive branch instructions for 'LEnoov'
and 'GTnoov' to model their expected behavior. A slight change to 'gc'
and amd64/386 backends is made to unify the code generation.

Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules
identified on arm64, more might be needed on other arches, like 32-bit arm.

Add two benchmarks profiling the aforementioned category 1&2 and category
3&4 separetely, we expect the first two categories will show performance
improvement and the second will not result in visible regression compared with
the non-optimized version.

This change also updates TestFormats to support using %#x.

Examples exhibiting where does the issue come from:
  1: 'if x + 3 < 0' might be converted to:
  before:
    CMN $3, R0
    BGE <else branch> // wrong branch is taken if 'x+3' overflows
  after:
    CMN $3, R0
    BPL <else branch>

  2: 'if y - 3 > 0' might be converted to:
  before:
    CMP $3, R0
    BLE <else branch> // wrong branch is taken if 'y-3' underflows
  after:
    CMP $3, R0
    BMI <else branch>
    BEQ <else branch>

Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized
version (not the parent commit), generally the optimization version outperforms.

S1:
name                    old time/op  new time/op  delta
CondRewrite/SoloJump  13.6ns ± 0%  12.9ns ± 0%  -5.15%  (p=0.000 n=10+10)
CondRewrite/CombJump  13.8ns ± 1%  12.9ns ± 0%  -6.32%  (p=0.000 n=10+10)

S2:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  11.6ns ± 0%  10.9ns ± 0%  -6.03%  (p=0.000 n=10+10)
CondRewrite/CombJump  11.4ns ± 0%  10.8ns ± 1%  -5.53%  (p=0.000 n=10+10)

S3:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  7.36ns ± 0%  7.50ns ± 0%  +1.79%  (p=0.000 n=9+10)
CondRewrite/CombJump  7.35ns ± 0%  7.75ns ± 0%  +5.51%  (p=0.000 n=8+9)

S4:
name                      old time/op  new time/op  delta
CondRewrite/SoloJump-224  11.5ns ± 1%  10.9ns ± 0%  -4.97%  (p=0.000 n=10+10)
CondRewrite/CombJump-224  11.9ns ± 0%  11.5ns ± 0%  -2.95%  (p=0.000 n=10+10)

S5:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  10.0ns ± 0%  10.0ns ± 0%  -0.45%  (p=0.000 n=9+10)
CondRewrite/CombJump  9.93ns ± 0%  9.77ns ± 0%  -1.53%  (p=0.000 n=10+9)

Go1 perf. data:

name                     old time/op    new time/op    delta
BinaryTree17              6.29s ± 1%     6.30s ± 1%    ~     (p=1.000 n=5+5)
Fannkuch11                5.40s ± 0%     5.40s ± 0%    ~     (p=0.841 n=5+5)
FmtFprintfEmpty          97.9ns ± 0%    98.9ns ± 3%    ~     (p=0.937 n=4+5)
FmtFprintfString          171ns ± 3%     171ns ± 2%    ~     (p=0.754 n=5+5)
FmtFprintfInt             212ns ± 0%     217ns ± 6%  +2.55%  (p=0.008 n=5+5)
FmtFprintfIntInt          296ns ± 1%     297ns ± 2%    ~     (p=0.516 n=5+5)
FmtFprintfPrefixedInt     371ns ± 2%     374ns ± 7%    ~     (p=1.000 n=5+5)
FmtFprintfFloat           435ns ± 1%     439ns ± 2%    ~     (p=0.056 n=5+5)
FmtManyArgs              1.37µs ± 1%    1.36µs ± 1%    ~     (p=0.730 n=5+5)
GobDecode                14.6ms ± 4%    14.4ms ± 4%    ~     (p=0.690 n=5+5)
GobEncode                11.8ms ±20%    11.6ms ±15%    ~     (p=1.000 n=5+5)
Gzip                      507ms ± 0%     491ms ± 0%  -3.22%  (p=0.008 n=5+5)
Gunzip                   73.8ms ± 0%    73.9ms ± 0%    ~     (p=0.690 n=5+5)
HTTPClientServer          116µs ± 0%     116µs ± 0%    ~     (p=0.686 n=4+4)
JSONEncode               21.8ms ± 1%    21.6ms ± 2%    ~     (p=0.151 n=5+5)
JSONDecode                104ms ± 1%     103ms ± 1%  -1.08%  (p=0.016 n=5+5)
Mandelbrot200            9.53ms ± 0%    9.53ms ± 0%    ~     (p=0.421 n=5+5)
GoParse                  7.55ms ± 1%    7.51ms ± 1%    ~     (p=0.151 n=5+5)
RegexpMatchEasy0_32       158ns ± 0%     158ns ± 0%    ~     (all equal)
RegexpMatchEasy0_1K       606ns ± 1%     608ns ± 3%    ~     (p=0.937 n=5+5)
RegexpMatchEasy1_32       143ns ± 0%     144ns ± 1%    ~     (p=0.095 n=5+4)
RegexpMatchEasy1_1K       927ns ± 2%     944ns ± 2%    ~     (p=0.056 n=5+5)
RegexpMatchMedium_32     16.0ns ± 0%    16.0ns ± 0%    ~     (all equal)
RegexpMatchMedium_1K     69.3µs ± 2%    69.7µs ± 0%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32       3.73µs ± 0%    3.73µs ± 1%    ~     (p=0.984 n=5+5)
RegexpMatchHard_1K        111µs ± 1%     110µs ± 0%    ~     (p=0.151 n=5+5)
Revcomp                   1.91s ±47%     1.77s ±68%    ~     (p=1.000 n=5+5)
Template                  138ms ± 1%     138ms ± 1%    ~     (p=1.000 n=5+5)
TimeParse                 787ns ± 2%     785ns ± 1%    ~     (p=0.540 n=5+5)
TimeFormat                729ns ± 1%     726ns ± 1%    ~     (p=0.151 n=5+5)

Updates #38740
Change-Id: I06c604874acdc1e63e66452dadee5df053045222
Reviewed-on: https://go-review.googlesource.com/c/go/+/233097
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2020-05-29 15:39:54 +00:00
Austin Clements
601bc41da2 cmd/compile: don't emit stack maps for write barrier calls
These are necessarily deeply non-preemptible, so there's no point in
emitting stack maps for them. We already mark them as unsafe points,
so this only affects the runtime, since user code does not emit stack
maps at unsafe points. SSAGenState.PrepareCall also excludes them when
it's sanity checking call stack maps.

Right now this only drops a handful of unnecessary stack maps from the
runtime, but we're about to start emitting stack maps only at calls
for user code, too. At that point, this will matter much more.

For #36365.

Change-Id: Ib3abfedfddc8e724d933a064fa4d573500627990
Reviewed-on: https://go-review.googlesource.com/c/go/+/230542
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-29 21:29:18 +00:00
Austin Clements
faafdf5115 cmd/compile: fix unsafe-points with stack maps
The compiler currently conflates whether a Value has a stack map with
whether it's an unsafe point. For the most part, unsafe-points don't
have stack maps, so this is mostly fine, but call instructions can be
both an unsafe-point *and* have a stack map. For example, none of the
instructions in a nosplit function should be preemptible, but calls
must still have stack maps in case the called function grows the stack
or get preempted.

Currently, the compiler can't distinguish this case, so calls in
nosplit functions are marked as safe-points just because they have
stack maps. This is particularly problematic if a nosplit function
calls another nosplit function, since this can introduce a preemption
point where there should be none.

We realized this was a problem for split-stack prologues a while back,
and CL 207349 changed the encoding of unsafe-points to use the
register map index instead of the stack map index so we could record
both a stack map and an unsafe-point at the same instruction. But this
was never extended into the compiler.

This CL fixes this problem in the compiler. We make LivenessIndex
slightly more abstract by separating unsafe-point marks from stack and
register map indexes. We map this to the PCDATA encoding later when
producing Progs. This isn't enough to fix the whole problem for
nosplit functions, because obj still adds prologues and marks those as
preemptible, but it's a step in the right direction.

I checked this CL by comparing maps before and after this change in
the runtime and net/http. In net/http, unsafe-points match exactly; at
anything that isn't an unsafe-point, both the stack and register maps
are unchanged by this CL. In the runtime, at every point that was a
safe-point before this change, the stack maps agree (and mostly the
runtime doesn't have register maps at all now). In both, all CALLs
(except write barrier calls) have stack maps.

For #36365.

Change-Id: I066628938b02e78be5c81a6614295bcf7cc566c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/230541
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-29 21:29:17 +00:00
Josh Bleecher Snyder
f8ff12d480 cmd/compile: use dereference boundedness hint in ssa.addr
Follow-up to (and similar to) CL 228885.
Triggers a handful of times in std+cmd.

Change-Id: Ie04057ca3974ef9eef669335e326a5ed4b7472cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/228999
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-20 17:27:11 +00:00
Josh Bleecher Snyder
4e550bdacd cmd/compile: simplify state.addr
OADDR nodes can't be bounded.
All calls to state.addr thus pass false.
Remove the argument.

Passes toolstash-check.

Change-Id: I9a3fcf37f63b2b5094e043d39ab3b857b5090e91
Reviewed-on: https://go-review.googlesource.com/c/go/+/228788
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-04-20 16:38:31 +00:00
Josh Bleecher Snyder
e8518731be cmd/compile: use dereference boundedness hint during ssa conversion
This has a minor positive effect on generated code,
particularly code using type switches.

Change-Id: I7269769ab0d861ef6fc9e6d7809ffc3573c68340
Reviewed-on: https://go-review.googlesource.com/c/go/+/228885
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-04-20 16:36:22 +00:00
Matthew Dempsky
843453d09e cmd/compile: fix misassumption about n.Left.Bounded()
n.Bounded() is overloaded for multiple meanings based on n.Op. We
can't safely use n.Left.Bounded() without checking n.Left.Op.

Change-Id: I71fe4faa24798dfe3a5705fa3419a35ef93b0ce2
Reviewed-on: https://go-review.googlesource.com/c/go/+/228677
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-04-17 01:07:31 +00:00
Josh Bleecher Snyder
2db4cc38a0 cmd/compile: improve generated code for concrete cases in type switches
Consider

switch x:= x.(type) {
case int:
  // int stmts
case error:
  // error stmts
}

Prior to this change, we lowered this roughly as:

if x, ok := x.(int); ok {
  // int stmts
} else if x, ok := x.(error); ok {
  // error stmts
}

x, ok := x.(error) is implemented with a call to runtime.assertE2I2 or runtime.assertI2I2.

x, ok := x.(int) generates inline code that checks whether x has type int,
and populates x and ok as appropriate. We then immediately branch again on ok.
The shortcircuit pass in the SSA backend is designed to recognize situations
like this, in which we are immediately branching on a bool value
that we just calculated with a branch.

However, the shortcircuit pass has limitations when the intermediate state has phis.
In this case, the phi value is x (the int).
CL 222923 improved the situation, but many cases are still unhandled.
I have further improvements in progress, which is how I found this particular problem,
but they are expensive, and may or may not see the light of day.

In the common case of a lone concrete type in a type switch case,
it is easier and cheaper to simply lower a different way, roughly:

if _, ok := x.(int); ok {
  x := x.(int)
  // int stmts
}

Instead of using a type assertion, though, we extract the value of x
from the interface directly.

This removes the need to track x (the int) across the branch on ok,
which removes the phi, which lets the shortcircuit pass do its job.

Benchmarks for encoding/binary show improvements, as well as some
wild swings on the super fast benchmarks (alignment effects?):

name                      old time/op    new time/op    delta
ReadSlice1000Int32s-8       5.25µs ± 2%    4.87µs ± 3%   -7.11%  (p=0.000 n=44+49)
ReadStruct-8                 451ns ± 2%     417ns ± 2%   -7.39%  (p=0.000 n=45+46)
WriteStruct-8                412ns ± 2%     405ns ± 3%   -1.58%  (p=0.000 n=46+48)
ReadInts-8                   296ns ± 8%     275ns ± 3%   -7.23%  (p=0.000 n=48+50)
WriteInts-8                  324ns ± 1%     318ns ± 2%   -1.67%  (p=0.000 n=44+49)
WriteSlice1000Int32s-8      5.21µs ± 2%    4.92µs ± 1%   -5.67%  (p=0.000 n=46+44)
PutUint16-8                 0.58ns ± 2%    0.59ns ± 2%   +0.63%  (p=0.000 n=49+49)
PutUint32-8                 0.87ns ± 1%    0.58ns ± 1%  -33.10%  (p=0.000 n=46+44)
PutUint64-8                 0.66ns ± 2%    0.87ns ± 2%  +33.07%  (p=0.000 n=47+48)
LittleEndianPutUint16-8     0.86ns ± 2%    0.87ns ± 2%   +0.55%  (p=0.003 n=47+50)
LittleEndianPutUint32-8     0.87ns ± 1%    0.87ns ± 1%     ~     (p=0.547 n=45+47)
LittleEndianPutUint64-8     0.87ns ± 2%    0.87ns ± 1%     ~     (p=0.451 n=46+47)
ReadFloats-8                79.8ns ± 5%    75.9ns ± 2%   -4.83%  (p=0.000 n=50+47)
WriteFloats-8               89.3ns ± 1%    88.9ns ± 1%   -0.48%  (p=0.000 n=46+44)
ReadSlice1000Float32s-8     5.51µs ± 1%    4.87µs ± 2%  -11.74%  (p=0.000 n=47+46)
WriteSlice1000Float32s-8    5.51µs ± 1%    4.93µs ± 1%  -10.60%  (p=0.000 n=48+47)
PutUvarint32-8              25.9ns ± 2%    24.0ns ± 2%   -7.02%  (p=0.000 n=48+50)
PutUvarint64-8              75.1ns ± 1%    61.5ns ± 2%  -18.12%  (p=0.000 n=45+47)
[Geo mean]                  57.3ns         54.3ns        -5.33%

Despite the rarity of type switches, this generates noticeably smaller binaries.

file      before    after     Δ       %
addr2line 4413296   4409200   -4096   -0.093%
api       5982648   5962168   -20480  -0.342%
cgo       4854168   4833688   -20480  -0.422%
compile   19694784  19682560  -12224  -0.062%
cover     5278008   5265720   -12288  -0.233%
doc       4694824   4682536   -12288  -0.262%
fix       3411336   3394952   -16384  -0.480%
link      6721496   6717400   -4096   -0.061%
nm        4371152   4358864   -12288  -0.281%
objdump   4760960   4752768   -8192   -0.172%
pprof     14810820  14790340  -20480  -0.138%
trace     11681076  11668788  -12288  -0.105%
vet       8285464   8244504   -40960  -0.494%
total     115824120 115627576 -196544 -0.170%

Compiler performance is marginally improved (note that go/types has many type switches):

name        old alloc/op      new alloc/op      delta
Template         35.0MB ± 0%       35.0MB ± 0%  +0.09%  (p=0.008 n=5+5)
Unicode          28.5MB ± 0%       28.5MB ± 0%    ~     (p=0.548 n=5+5)
GoTypes           114MB ± 0%        114MB ± 0%  -0.76%  (p=0.008 n=5+5)
Compiler          541MB ± 0%        541MB ± 0%  -0.03%  (p=0.008 n=5+5)
SSA              1.17GB ± 0%       1.17GB ± 0%    ~     (p=0.841 n=5+5)
Flate            21.9MB ± 0%       21.9MB ± 0%    ~     (p=0.421 n=5+5)
GoParser         26.9MB ± 0%       26.9MB ± 0%    ~     (p=0.222 n=5+5)
Reflect          74.6MB ± 0%       74.6MB ± 0%    ~     (p=1.000 n=5+5)
Tar              32.9MB ± 0%       32.8MB ± 0%    ~     (p=0.056 n=5+5)
XML              42.4MB ± 0%       42.1MB ± 0%  -0.77%  (p=0.008 n=5+5)
[Geo mean]       73.2MB            73.1MB       -0.15%

name        old allocs/op     new allocs/op     delta
Template           377k ± 0%         377k ± 0%  +0.06%  (p=0.008 n=5+5)
Unicode            354k ± 0%         354k ± 0%    ~     (p=0.095 n=5+5)
GoTypes           1.31M ± 0%        1.30M ± 0%  -0.73%  (p=0.008 n=5+5)
Compiler          5.44M ± 0%        5.44M ± 0%  -0.04%  (p=0.008 n=5+5)
SSA               11.7M ± 0%        11.7M ± 0%    ~     (p=1.000 n=5+5)
Flate              239k ± 0%         239k ± 0%    ~     (p=1.000 n=5+5)
GoParser           302k ± 0%         302k ± 0%  -0.04%  (p=0.008 n=5+5)
Reflect            977k ± 0%         977k ± 0%    ~     (p=0.690 n=5+5)
Tar                346k ± 0%         346k ± 0%    ~     (p=0.889 n=5+5)
XML                431k ± 0%         430k ± 0%  -0.25%  (p=0.008 n=5+5)
[Geo mean]         806k              806k       -0.10%

For packages with many type switches, this considerably shrinks function text size.
Some examples:

file                                                           before   after    Δ       %
encoding/binary.s                                              30726    29504    -1222   -3.977%
go/printer.s                                                   77597    76005    -1592   -2.052%
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                 65704    63318    -2386   -3.631%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8047     7714     -333    -4.138%

Text size regressions are rare.

Change-Id: Ic10982bbb04876250eaa5bfee97990141ae5fc28
Reviewed-on: https://go-review.googlesource.com/c/go/+/228106
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-14 17:34:31 +00:00
Joel Sing
1eb66be1b9 cmd/compile: enable Sqrt as a compiler intrinsic on riscv64
Change-Id: I829a02ced9aa73b45079e67194186116b39504b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/227805
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-12 16:51:25 +00:00
Keith Randall
ea7126fe14 cmd/compile: use a Sym type instead of interface{} for symbolic offsets
Will help with strongly typed rewrite rules.

Change-Id: Ifbf316a49f4081322b3b8f13bc962713437d9aba
Reviewed-on: https://go-review.googlesource.com/c/go/+/227785
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2020-04-10 16:24:46 +00:00
Keith Randall
28157b3292 cmd/compile: start implementing strongly typed aux and auxint fields
Right now the Aux and AuxInt fields of ssa.Values are typed as
interface{} and int64, respectively. Each rule that uses these values
must cast them to the type they actually are (*obj.LSym, or int32, or
ValAndOff, etc.), use them, and then cast them back to interface{} or
int64.

We know for each opcode what the types of the Aux and AuxInt fields
should be. So let's modify the rule generator to declare the types to
be what we know they should be, autoconverting to and from the generic
types for us. That way we can make the rules more type safe.

It's difficult to make a single CL for this, so I've coopted the "=>"
token to indicate a rule that is strongly typed. "->" rules are
processed as before. That will let us migrate a few rules at a time in
separate CLs.  Hopefully we can reach a state where all rules are
strongly typed and we can drop the distinction.

This CL changes just a few rules to get a feel for what this
transition would look like.

I've decided not to put explicit types in the rules. I think it
makes the rules somewhat clearer, but definitely more verbose.
In particular, the passthrough rules that don't modify the fields
in question are verbose for no real reason.

Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d
Reviewed-on: https://go-review.googlesource.com/c/go/+/190197
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-04-09 21:18:55 +00:00
Josh Bleecher Snyder
376472ddb7 cmd/compile: clean up slice and string offsets/sizes
Minor cleanup:

* Modernize comments.
* Change from int to int64 to avoid conversions.
* Use idiomatic names.

Passes toolstash-check.

Change-Id: I93560c81926c0f4e00f33129cb4846b53bea99e6
Reviewed-on: https://go-review.googlesource.com/c/go/+/227548
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-04-09 01:16:43 +00:00
Josh Bleecher Snyder
b6feb03b24 cmd/compile,runtime: pass only ptr and len to some runtime calls
Some runtime calls accept a slice, but only use ptr and len.
This change modifies most such routines to accept only ptr and len.

After this change, the only runtime calls that accept an unnecessary
cap arg are concatstrings and slicerunetostring.
Neither is particularly common, and both are complicated to modify.

Negligible compiler performance impact. Shrinks binaries a little.
There are only a few regressions; the one I investigated was
due to register allocation fluctuation.

Passes 'go test -race std cmd', modulo #38265 and #38266.
Wow, does that take a long time to run.

Updates #36890

file      before    after     Δ       %       
compile   19655024  19655152  +128    +0.001% 
cover     5244840   5236648   -8192   -0.156% 
dist      3662376   3658280   -4096   -0.112% 
link      6680056   6675960   -4096   -0.061% 
pprof     14789844  14777556  -12288  -0.083% 
test2json 2824744   2820648   -4096   -0.145% 
trace     11647876  11639684  -8192   -0.070% 
vet       8260472   8256376   -4096   -0.050% 
total     115163736 115118808 -44928  -0.039% 

Change-Id: Idb29fa6a81d6a82bfd3b65740b98cf3275ca0a78
Reviewed-on: https://go-review.googlesource.com/c/go/+/227163
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-08 22:19:53 +00:00
Michael Munday
bfd569fcb0 cmd/compile: delete the floating point Greater and Geq ops
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.

Fixes #37316.

Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-07 19:55:05 +00:00
Bradford Lamson-Scribner
6736b2fdb2 cmd/compile: refactor around HTMLWriter removing logger in favor of Func
Replace HTMLWriter's Logger field with a *Func. Implement Fatalf method
for HTMLWriter which gets the Frontend() from the Func and calls down
into it's Fatalf method, passing the msg and args along. Replace
remaining calls to the old Logger with calls to logging methods on
the Func.

Change-Id: I966342ef9997396f3416fb152fa52d60080ebecb
Reviewed-on: https://go-review.googlesource.com/c/go/+/227277
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-05 20:54:32 +00:00
Josh Bleecher Snyder
fff7509d47 cmd/compile: add intrinsic HasCPUFeature for checking cpu features
Before using some CPU instructions, we must check for their presence.
We use global variables in the runtime package to record features.

Prior to this CL, we issued a regular memory load for these features.
The downside to this is that, because it is a regular memory load,
it cannot be hoisted out of loops or otherwise reordered with other loads.

This CL introduces a new intrinsic just for checking cpu features.
It still ends up resulting in a memory load, but that memory load can
now be floated to the entry block and rematerialized as needed.

One downside is that the regular load could be combined with the comparison
into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE.
(It is possible that MOVBQZX+TESTQ+NE would be better.)

This CL does only amd64. It is easy to extend to other architectures.

For the benchmark in #36196, on my machine, this offers a mild speedup.

name      old time/op  new time/op  delta
FMA-8     1.39ns ± 6%  1.29ns ± 9%  -7.19%  (p=0.000 n=97+96)
NonFMA-8  2.03ns ±11%  2.04ns ±12%    ~     (p=0.618 n=99+98)

Updates #15808
Updates #36196

Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb
Reviewed-on: https://go-review.googlesource.com/c/go/+/212360
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2020-04-04 01:01:04 +00:00
Josh Bleecher Snyder
8114242359 cmd/compile, runtime: use more registers for amd64 write barrier calls
The compiler-inserted write barrier calls use a special ABI
for speed and to minimize the binary size impact.

runtime.gcWriteBarrier takes its args in DI and AX.
This change adds gcWriteBarrier wrapper functions,
varying only in the register used for the second argument.
(Allowing variation in the first argument doesn't offer improvements,
which is convenient, as it avoids quadratic API growth.)
This reduces the number of register copies.

The goals are reduced binary size via reduced register pressure/copies.

One downside to this change is that when the write barrier is on,
we may bounce through several different write barrier wrappers,
which is bad for the instruction cache.

Package runtime write barrier benchmarks for this change:

name                old time/op  new time/op  delta
WriteBarrier-8      16.6ns ± 6%  15.6ns ± 6%  -5.73%  (p=0.000 n=97+99)
BulkWriteBarrier-8  4.37ns ± 7%  4.22ns ± 8%  -3.45%  (p=0.000 n=96+99)

However, I don't particularly trust these numbers.
I ran runtime.BenchmarkWriteBarrier multiple times as I rebased
this change, and noticed that the results have high variance
depending on the parent change, perhaps due to aligment.

This change was stress tested with GOGC=1 GODEBUG=gccheckmark=1 go test std.

This change reduces binary sizes:

file      before    after     Δ       %
addr2line 4308720   4296688   -12032  -0.279%
api       5965592   5945368   -20224  -0.339%
asm       5148088   5025464   -122624 -2.382%
buildid   2848760   2844904   -3856   -0.135%
cgo       4828968   4812840   -16128  -0.334%
compile   19754720  19529744  -224976 -1.139%
cover     5256840   5236600   -20240  -0.385%
dist      3670312   3658264   -12048  -0.328%
doc       4669608   4657576   -12032  -0.258%
fix       3377976   3365944   -12032  -0.356%
link      6614888   6586472   -28416  -0.430%
nm        4258368   4254528   -3840   -0.090%
objdump   4656336   4644304   -12032  -0.258%
pack      2295176   2295432   +256    +0.011%
pprof     14762356  14709364  -52992  -0.359%
test2json 2824456   2820600   -3856   -0.137%
trace     11684404  11643700  -40704  -0.348%
vet       8284760   8252248   -32512  -0.392%
total     115210328 114580040 -630288 -0.547%

This change improves compiler performance:

name        old time/op       new time/op       delta
Template          208ms ± 3%        207ms ± 3%  -0.40%  (p=0.030 n=43+44)
Unicode          80.2ms ± 3%       81.3ms ± 3%  +1.25%  (p=0.000 n=41+44)
GoTypes           699ms ± 3%        694ms ± 2%  -0.71%  (p=0.016 n=42+37)
Compiler          3.26s ± 2%        3.23s ± 2%  -0.86%  (p=0.000 n=43+45)
SSA               6.97s ± 1%        6.93s ± 1%  -0.63%  (p=0.000 n=43+45)
Flate             134ms ± 3%        133ms ± 2%    ~     (p=0.139 n=45+42)
GoParser          165ms ± 2%        164ms ± 1%  -0.79%  (p=0.000 n=45+40)
Reflect           434ms ± 4%        435ms ± 4%    ~     (p=0.937 n=44+44)
Tar               181ms ± 2%        181ms ± 2%    ~     (p=0.702 n=43+45)
XML               244ms ± 2%        244ms ± 2%    ~     (p=0.237 n=45+44)
[Geo mean]        403ms             402ms       -0.29%

name        old user-time/op  new user-time/op  delta
Template          271ms ± 2%        268ms ± 1%  -1.40%  (p=0.000 n=42+42)
Unicode           117ms ± 3%        116ms ± 5%    ~     (p=0.066 n=45+45)
GoTypes           948ms ± 2%        936ms ± 2%  -1.30%  (p=0.000 n=41+40)
Compiler          4.26s ± 1%        4.21s ± 2%  -1.25%  (p=0.000 n=37+45)
SSA               9.52s ± 2%        9.41s ± 1%  -1.18%  (p=0.000 n=44+45)
Flate             167ms ± 2%        165ms ± 2%  -1.15%  (p=0.000 n=44+41)
GoParser          201ms ± 2%        198ms ± 1%  -1.40%  (p=0.000 n=43+43)
Reflect           563ms ± 8%        560ms ± 7%    ~     (p=0.206 n=45+44)
Tar               224ms ± 2%        222ms ± 2%  -0.81%  (p=0.000 n=45+45)
XML               308ms ± 2%        304ms ± 1%  -1.17%  (p=0.000 n=42+43)
[Geo mean]        525ms             519ms       -1.08%

name        old alloc/op      new alloc/op      delta
Template         36.3MB ± 0%       36.3MB ± 0%    ~     (p=0.421 n=5+5)
Unicode          28.4MB ± 0%       28.3MB ± 0%    ~     (p=0.056 n=5+5)
GoTypes           121MB ± 0%        121MB ± 0%  -0.14%  (p=0.008 n=5+5)
Compiler          567MB ± 0%        567MB ± 0%  -0.06%  (p=0.016 n=4+5)
SSA              1.26GB ± 0%       1.26GB ± 0%  -0.07%  (p=0.008 n=5+5)
Flate            22.9MB ± 0%       22.8MB ± 0%    ~     (p=0.310 n=5+5)
GoParser         28.0MB ± 0%       27.9MB ± 0%  -0.09%  (p=0.008 n=5+5)
Reflect          78.4MB ± 0%       78.4MB ± 0%  -0.03%  (p=0.008 n=5+5)
Tar              34.2MB ± 0%       34.2MB ± 0%  -0.05%  (p=0.008 n=5+5)
XML              44.4MB ± 0%       44.4MB ± 0%  -0.04%  (p=0.016 n=5+5)
[Geo mean]       76.4MB            76.3MB       -0.05%

name        old allocs/op     new allocs/op     delta
Template           356k ± 0%         356k ± 0%  -0.13%  (p=0.008 n=5+5)
Unicode            326k ± 0%         326k ± 0%  -0.07%  (p=0.008 n=5+5)
GoTypes           1.24M ± 0%        1.24M ± 0%  -0.24%  (p=0.008 n=5+5)
Compiler          5.30M ± 0%        5.28M ± 0%  -0.34%  (p=0.008 n=5+5)
SSA               11.9M ± 0%        11.9M ± 0%  -0.16%  (p=0.008 n=5+5)
Flate              226k ± 0%         225k ± 0%  -0.12%  (p=0.008 n=5+5)
GoParser           287k ± 0%         286k ± 0%  -0.29%  (p=0.008 n=5+5)
Reflect            930k ± 0%         929k ± 0%  -0.05%  (p=0.008 n=5+5)
Tar                332k ± 0%         331k ± 0%  -0.12%  (p=0.008 n=5+5)
XML                411k ± 0%         411k ± 0%  -0.12%  (p=0.008 n=5+5)
[Geo mean]         771k              770k       -0.16%

For some packages, this change significantly reduces the size of executable text.
Examples:

file                                   before   after    Δ       %
cmd/internal/obj/arm.s                 68658    66855    -1803   -2.626%
cmd/internal/obj/mips.s                57486    56272    -1214   -2.112%
cmd/internal/obj/arm64.s               152107   147163   -4944   -3.250%
cmd/internal/obj/ppc64.s               125544   120456   -5088   -4.053%
cmd/vendor/golang.org/x/tools/go/cfg.s 31699    30742    -957    -3.019%

Full listing:

file                                                                     before   after    Δ       %
container/ring.s                                                         1890     1870     -20     -1.058%
container/list.s                                                         5366     5390     +24     +0.447%
internal/cpu.s                                                           3298     3295     -3      -0.091%
internal/testlog.s                                                       1507     1501     -6      -0.398%
image/color.s                                                            8281     8248     -33     -0.399%
runtime.s                                                                480970   480075   -895    -0.186%
sync.s                                                                   16497    16408    -89     -0.539%
internal/singleflight.s                                                  2591     2577     -14     -0.540%
math/rand.s                                                              10456    10438    -18     -0.172%
cmd/go/internal/par.s                                                    2801     2790     -11     -0.393%
internal/reflectlite.s                                                   28477    28417    -60     -0.211%
errors.s                                                                 2750     2736     -14     -0.509%
internal/oserror.s                                                       446      434      -12     -2.691%
sort.s                                                                   17061    17046    -15     -0.088%
io.s                                                                     17063    16999    -64     -0.375%
vendor/golang.org/x/crypto/hkdf.s                                        1962     1936     -26     -1.325%
text/tabwriter.s                                                         9617     9574     -43     -0.447%
hash/crc64.s                                                             3414     3408     -6      -0.176%
hash/crc32.s                                                             6657     6651     -6      -0.090%
bytes.s                                                                  31932    31863    -69     -0.216%
strconv.s                                                                53158    52799    -359    -0.675%
strings.s                                                                42829    42665    -164    -0.383%
encoding/ascii85.s                                                       4833     4791     -42     -0.869%
vendor/golang.org/x/text/transform.s                                     16810    16724    -86     -0.512%
path.s                                                                   6848     6845     -3      -0.044%
encoding/base32.s                                                        9658     9592     -66     -0.683%
bufio.s                                                                  23051    22908    -143    -0.620%
compress/bzip2.s                                                         11773    11764    -9      -0.076%
image.s                                                                  37565    37502    -63     -0.168%
syscall.s                                                                82359    82279    -80     -0.097%
regexp/syntax.s                                                          83573    82930    -643    -0.769%
image/jpeg.s                                                             36535    36490    -45     -0.123%
regexp.s                                                                 64396    64214    -182    -0.283%
time.s                                                                   82724    82622    -102    -0.123%
plugin.s                                                                 6539     6536     -3      -0.046%
context.s                                                                10959    10865    -94     -0.858%
internal/poll.s                                                          24286    24270    -16     -0.066%
reflect.s                                                                168304   167927   -377    -0.224%
internal/fmtsort.s                                                       7416     7376     -40     -0.539%
os.s                                                                     52465    51787    -678    -1.292%
cmd/go/internal/lockedfile/internal/filelock.s                           2326     2317     -9      -0.387%
os/signal.s                                                              4657     4648     -9      -0.193%
runtime/debug.s                                                          6040     5998     -42     -0.695%
encoding/binary.s                                                        30838    30801    -37     -0.120%
vendor/golang.org/x/net/route.s                                          23694    23491    -203    -0.857%
path/filepath.s                                                          17895    17889    -6      -0.034%
cmd/vendor/golang.org/x/sys/unix.s                                       78125    78109    -16     -0.020%
io/ioutil.s                                                              6999     6996     -3      -0.043%
encoding/base64.s                                                        12094    12007    -87     -0.719%
crypto/cipher.s                                                          20466    20372    -94     -0.459%
cmd/go/internal/robustio.s                                               2672     2669     -3      -0.112%
encoding/pem.s                                                           9302     9286     -16     -0.172%
internal/obscuretestdata.s                                               1719     1695     -24     -1.396%
crypto/aes.s                                                             11014    11002    -12     -0.109%
os/exec.s                                                                29388    29231    -157    -0.534%
cmd/internal/browser.s                                                   2266     2260     -6      -0.265%
internal/goroot.s                                                        4601     4592     -9      -0.196%
vendor/golang.org/x/crypto/chacha20poly1305.s                            8945     8942     -3      -0.034%
cmd/vendor/golang.org/x/crypto/ssh/terminal.s                            27226    27195    -31     -0.114%
index/suffixarray.s                                                      36431    36411    -20     -0.055%
fmt.s                                                                    77017    76709    -308    -0.400%
encoding/hex.s                                                           6241     6154     -87     -1.394%
compress/lzw.s                                                           7133     7069     -64     -0.897%
database/sql/driver.s                                                    18888    18877    -11     -0.058%
net/url.s                                                                29838    29739    -99     -0.332%
debug/plan9obj.s                                                         8329     8279     -50     -0.600%
encoding/csv.s                                                           12986    12902    -84     -0.647%
debug/gosym.s                                                            25403    25330    -73     -0.287%
compress/flate.s                                                         51192    50970    -222    -0.434%
vendor/golang.org/x/net/dns/dnsmessage.s                                 86769    86208    -561    -0.647%
compress/gzip.s                                                          9791     9758     -33     -0.337%
compress/zlib.s                                                          7310     7277     -33     -0.451%
archive/zip.s                                                            42356    42166    -190    -0.449%
debug/dwarf.s                                                            108259   107730   -529    -0.489%
encoding/json.s                                                          106378   105910   -468    -0.440%
os/user.s                                                                14751    14724    -27     -0.183%
database/sql.s                                                           99011    98404    -607    -0.613%
log.s                                                                    9466     9423     -43     -0.454%
debug/pe.s                                                               31272    31182    -90     -0.288%
debug/macho.s                                                            32764    32608    -156    -0.476%
encoding/gob.s                                                           136976   136517   -459    -0.335%
vendor/golang.org/x/text/unicode/bidi.s                                  27318    27276    -42     -0.154%
archive/tar.s                                                            71416    70975    -441    -0.618%
vendor/golang.org/x/net/http2/hpack.s                                    23892    23848    -44     -0.184%
vendor/golang.org/x/text/secure/bidirule.s                               3354     3351     -3      -0.089%
mime/quotedprintable.s                                                   5960     5925     -35     -0.587%
net/http/internal.s                                                      5874     5853     -21     -0.358%
math/big.s                                                               184147   183692   -455    -0.247%
debug/elf.s                                                              63775    63567    -208    -0.326%
mime.s                                                                   39802    39709    -93     -0.234%
encoding/xml.s                                                           111038   110713   -325    -0.293%
crypto/dsa.s                                                             6044     6029     -15     -0.248%
go/token.s                                                               12139    12077    -62     -0.511%
crypto/rand.s                                                            6889     6866     -23     -0.334%
go/scanner.s                                                             19030    19008    -22     -0.116%
flag.s                                                                   22320    22236    -84     -0.376%
vendor/golang.org/x/text/unicode/norm.s                                  66652    66391    -261    -0.392%
crypto/rsa.s                                                             31671    31650    -21     -0.066%
crypto/elliptic.s                                                        51553    51403    -150    -0.291%
internal/xcoff.s                                                         22950    22822    -128    -0.558%
go/constant.s                                                            43750    43689    -61     -0.139%
encoding/asn1.s                                                          57086    57035    -51     -0.089%
runtime/trace.s                                                          2609     2603     -6      -0.230%
crypto/x509/pkix.s                                                       10458    10471    +13     +0.124%
image/gif.s                                                              27544    27385    -159    -0.577%
vendor/golang.org/x/net/idna.s                                           24558    24502    -56     -0.228%
image/png.s                                                              42775    42685    -90     -0.210%
vendor/golang.org/x/crypto/cryptobyte.s                                  33616    33493    -123    -0.366%
go/ast.s                                                                 80684    80449    -235    -0.291%
net/internal/socktest.s                                                  16571    16535    -36     -0.217%
crypto/ecdsa.s                                                           11948    11936    -12     -0.100%
text/template/parse.s                                                    95138    94002    -1136   -1.194%
runtime/pprof.s                                                          59702    59639    -63     -0.106%
testing.s                                                                68427    68088    -339    -0.495%
internal/testenv.s                                                       5620     5596     -24     -0.427%
testing/internal/testdeps.s                                              3312     3294     -18     -0.543%
internal/trace.s                                                         78473    78239    -234    -0.298%
testing/iotest.s                                                         4968     4908     -60     -1.208%
os/signal/internal/pty.s                                                 3011     2990     -21     -0.697%
testing/quick.s                                                          12179    12125    -54     -0.443%
cmd/internal/bio.s                                                       9286     9274     -12     -0.129%
cmd/internal/src.s                                                       17684    17663    -21     -0.119%
cmd/internal/goobj2.s                                                    12588    12558    -30     -0.238%
cmd/internal/objabi.s                                                    16408    16390    -18     -0.110%
go/printer.s                                                             77417    77308    -109    -0.141%
go/parser.s                                                              80045    79113    -932    -1.164%
go/format.s                                                              5434     5419     -15     -0.276%
cmd/internal/goobj.s                                                     26146    25954    -192    -0.734%
runtime/pprof/internal/profile.s                                         102518   102178   -340    -0.332%
text/template.s                                                          95343    94935    -408    -0.428%
cmd/internal/dwarf.s                                                     31718    31572    -146    -0.460%
cmd/vendor/golang.org/x/arch/arm/armasm.s                                45240    45151    -89     -0.197%
internal/lazytemplate.s                                                  1470     1457     -13     -0.884%
cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s                            37253    37220    -33     -0.089%
cmd/asm/internal/flags.s                                                 2593     2590     -3      -0.116%
cmd/asm/internal/lex.s                                                   25068    24921    -147    -0.586%
cmd/internal/buildid.s                                                   18536    18263    -273    -1.473%
cmd/vendor/golang.org/x/arch/x86/x86asm.s                                80209    80105    -104    -0.130%
go/doc.s                                                                 75140    74585    -555    -0.739%
cmd/internal/edit.s                                                      3893     3899     +6      +0.154%
html/template.s                                                          89377    88809    -568    -0.636%
cmd/vendor/golang.org/x/arch/arm64/arm64asm.s                            117998   117824   -174    -0.147%
cmd/internal/obj.s                                                       115015   114290   -725    -0.630%
go/build.s                                                               69379    68862    -517    -0.745%
cmd/internal/objfile.s                                                   48106    47982    -124    -0.258%
cmd/cover.s                                                              46239    46113    -126    -0.272%
cmd/addr2line.s                                                          2845     2833     -12     -0.422%
cmd/internal/obj/arm.s                                                   68658    66855    -1803   -2.626%
cmd/internal/obj/mips.s                                                  57486    56272    -1214   -2.112%
cmd/internal/obj/riscv.s                                                 63834    63006    -828    -1.297%
cmd/compile/internal/syntax.s                                            146582   145456   -1126   -0.768%
cmd/internal/obj/wasm.s                                                  44117    44066    -51     -0.116%
cmd/cgo.s                                                                242645   241653   -992    -0.409%
cmd/internal/obj/arm64.s                                                 152107   147163   -4944   -3.250%
net.s                                                                    295972   292010   -3962   -1.339%
go/types.s                                                               321371   319432   -1939   -0.603%
vendor/golang.org/x/net/http/httpproxy.s                                 9450     9423     -27     -0.286%
net/textproto.s                                                          19455    19406    -49     -0.252%
cmd/internal/obj/ppc64.s                                                 125544   120456   -5088   -4.053%
go/internal/srcimporter.s                                                6475     6409     -66     -1.019%
log/syslog.s                                                             8017     7929     -88     -1.098%
cmd/compile/internal/logopt.s                                            10183    10162    -21     -0.206%
net/mail.s                                                               24085    23948    -137    -0.569%
mime/multipart.s                                                         21527    21420    -107    -0.497%
cmd/internal/obj/s390x.s                                                 127610   127757   +147    +0.115%
go/internal/gcimporter.s                                                 34913    34548    -365    -1.045%
vendor/golang.org/x/net/nettest.s                                        28103    28016    -87     -0.310%
cmd/go/internal/cfg.s                                                    9967     9916     -51     -0.512%
cmd/api.s                                                                39703    39603    -100    -0.252%
go/internal/gccgoimporter.s                                              56470    56120    -350    -0.620%
go/importer.s                                                            2077     2056     -21     -1.011%
cmd/compile/internal/types.s                                             48202    47282    -920    -1.909%
cmd/go/internal/str.s                                                    4341     4320     -21     -0.484%
cmd/internal/obj/x86.s                                                   89440    88625    -815    -0.911%
cmd/go/internal/base.s                                                   12667    12580    -87     -0.687%
cmd/go/internal/cache.s                                                  30754    30571    -183    -0.595%
cmd/doc.s                                                                62976    62755    -221    -0.351%
cmd/go/internal/search.s                                                 20114    19993    -121    -0.602%
cmd/vendor/golang.org/x/xerrors.s                                        17923    17855    -68     -0.379%
cmd/go/internal/lockedfile.s                                             16451    16415    -36     -0.219%
cmd/vendor/golang.org/x/mod/sumdb/note.s                                 18200    18150    -50     -0.275%
cmd/vendor/golang.org/x/mod/module.s                                     17869    17851    -18     -0.101%
cmd/asm/internal/arch.s                                                  37533    37482    -51     -0.136%
cmd/fix.s                                                                87728    87492    -236    -0.269%
cmd/vendor/golang.org/x/mod/sumdb/tlog.s                                 36394    36367    -27     -0.074%
cmd/vendor/golang.org/x/mod/sumdb/dirhash.s                              4990     4963     -27     -0.541%
cmd/go/internal/imports.s                                                16499    16469    -30     -0.182%
cmd/vendor/golang.org/x/mod/zip.s                                        18816    18745    -71     -0.377%
cmd/go/internal/cmdflag.s                                                5126     5123     -3      -0.059%
cmd/internal/test2json.s                                                 9540     9452     -88     -0.922%
cmd/go/internal/tool.s                                                   3629     3623     -6      -0.165%
cmd/go/internal/version.s                                                11232    11220    -12     -0.107%
cmd/go/internal/mvs.s                                                    25383    25179    -204    -0.804%
cmd/nm.s                                                                 5815     5803     -12     -0.206%
cmd/dist.s                                                               210146   209140   -1006   -0.479%
cmd/asm/internal/asm.s                                                   68655    68549    -106    -0.154%
cmd/vendor/golang.org/x/mod/modfile.s                                    72974    72510    -464    -0.636%
cmd/go/internal/load.s                                                   107548   106861   -687    -0.639%
cmd/link/internal/sym.s                                                  18708    18581    -127    -0.679%
cmd/asm.s                                                                3367     3343     -24     -0.713%
cmd/gofmt.s                                                              30795    30698    -97     -0.315%
cmd/link/internal/objfile.s                                              21828    21630    -198    -0.907%
cmd/pack.s                                                               14878    14869    -9      -0.060%
cmd/vendor/github.com/google/pprof/internal/elfexec.s                    6788     6782     -6      -0.088%
cmd/test2json.s                                                          1647     1641     -6      -0.364%
cmd/link/internal/loader.s                                               48677    48483    -194    -0.399%
cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s       16783    16773    -10     -0.060%
cmd/link/internal/loadelf.s                                              35464    35126    -338    -0.953%
cmd/link/internal/loadmacho.s                                            29438    29180    -258    -0.876%
cmd/link/internal/loadpe.s                                               16440    16371    -69     -0.420%
cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2106     2100     -6      -0.285%
cmd/link/internal/loadxcoff.s                                            11711    11615    -96     -0.820%
cmd/vendor/golang.org/x/tools/go/analysis/internal/facts.s               14954    14883    -71     -0.475%
cmd/vendor/golang.org/x/tools/go/ast/inspector.s                         5394     5374     -20     -0.371%
cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s               37029    36822    -207    -0.559%
cmd/vendor/golang.org/x/tools/go/analysis/passes/inspect.s               340      337      -3      -0.882%
cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s               9919     9858     -61     -0.615%
cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s                 6705     6690     -15     -0.224%
cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s              9783     9741     -42     -0.429%
cmd/vendor/golang.org/x/tools/go/cfg.s                                   31699    30742    -957    -3.019%
cmd/vendor/golang.org/x/tools/go/analysis/passes/ifaceassert.s           2768     2762     -6      -0.217%
cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s           3031     2998     -33     -1.089%
cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s                 4382     4376     -6      -0.137%
cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s            8654     8642     -12     -0.139%
cmd/vendor/golang.org/x/tools/go/analysis/passes/stringintconv.s         3458     3446     -12     -0.347%
cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s             8011     7995     -16     -0.200%
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s                 6205     6193     -12     -0.193%
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                           66183    65861    -322    -0.487%
cmd/vendor/github.com/google/pprof/profile.s                             150844   150261   -583    -0.386%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s           8057     8054     -3      -0.037%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s          3670     3667     -3      -0.082%
cmd/vendor/github.com/google/pprof/internal/measurement.s                10464    10440    -24     -0.229%
cmd/vendor/golang.org/x/tools/go/types/typeutil.s                        12319    12274    -45     -0.365%
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s                  13503    13342    -161    -1.192%
cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s              5261     5218     -43     -0.817%
cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s              1462     1459     -3      -0.205%
cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s            9594     9582     -12     -0.125%
cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s                34397    34338    -59     -0.172%
cmd/vendor/github.com/google/pprof/internal/graph.s                      53225    52936    -289    -0.543%
cmd/vendor/github.com/ianlancetaylor/demangle.s                          177450   175329   -2121   -1.195%
crypto/x509.s                                                            147892   147388   -504    -0.341%
cmd/go/internal/work.s                                                   306465   304950   -1515   -0.494%
cmd/go/internal/run.s                                                    4664     4657     -7      -0.150%
crypto/tls.s                                                             313130   311833   -1297   -0.414%
net/http/httptrace.s                                                     3979     3905     -74     -1.860%
net/smtp.s                                                               14413    14344    -69     -0.479%
cmd/link/internal/ld.s                                                   545343   542279   -3064   -0.562%
cmd/link/internal/mips.s                                                 6218     6215     -3      -0.048%
cmd/link/internal/mips64.s                                               6108     6103     -5      -0.082%
cmd/link/internal/amd64.s                                                18154    18112    -42     -0.231%
cmd/link/internal/arm64.s                                                22527    22494    -33     -0.146%
cmd/link/internal/arm.s                                                  22574    22494    -80     -0.354%
cmd/link/internal/s390x.s                                                20779    20746    -33     -0.159%
cmd/link/internal/wasm.s                                                 16531    16493    -38     -0.230%
cmd/link/internal/x86.s                                                  18906    18849    -57     -0.301%
cmd/link/internal/ppc64.s                                                26856    26778    -78     -0.290%
net/http.s                                                               559101   556513   -2588   -0.463%
net/http/cookiejar.s                                                     15912    15885    -27     -0.170%
expvar.s                                                                 9531     9525     -6      -0.063%
net/http/httptest.s                                                      16616    16475    -141    -0.849%
net/http/cgi.s                                                           23624    23458    -166    -0.703%
cmd/go/internal/web.s                                                    16546    16489    -57     -0.344%
cmd/vendor/golang.org/x/mod/sumdb.s                                      33197    33117    -80     -0.241%
net/http/fcgi.s                                                          19266    19169    -97     -0.503%
net/http/httputil.s                                                      39875    39728    -147    -0.369%
cmd/vendor/github.com/google/pprof/internal/symbolz.s                    5888     5867     -21     -0.357%
net/rpc.s                                                                34154    34003    -151    -0.442%
cmd/vendor/github.com/google/pprof/internal/transport.s                  2746     2716     -30     -1.092%
cmd/vendor/github.com/google/pprof/internal/binutils.s                   35999    35875    -124    -0.344%
net/rpc/jsonrpc.s                                                        6637     6598     -39     -0.588%
cmd/vendor/github.com/google/pprof/internal/symbolizer.s                 11533    11458    -75     -0.650%
cmd/go/internal/get.s                                                    62921    62803    -118    -0.188%
cmd/vendor/github.com/google/pprof/internal/report.s                     80364    80058    -306    -0.381%
cmd/go/internal/modfetch/codehost.s                                      89680    89066    -614    -0.685%
cmd/trace.s                                                              117171   116701   -470    -0.401%
cmd/vendor/github.com/google/pprof/internal/driver.s                     144268   143297   -971    -0.673%
cmd/go/internal/modfetch.s                                               126299   125860   -439    -0.348%
cmd/vendor/github.com/google/pprof/driver.s                              9042     9000     -42     -0.464%
cmd/go/internal/modconv.s                                                17947    17889    -58     -0.323%
cmd/pprof.s                                                              12399    12326    -73     -0.589%
cmd/go/internal/modload.s                                                151182   150389   -793    -0.525%
cmd/go/internal/generate.s                                               11738    11636    -102    -0.869%
cmd/go/internal/help.s                                                   6571     6531     -40     -0.609%
cmd/go/internal/clean.s                                                  11174    11142    -32     -0.286%
cmd/go/internal/vet.s                                                    7897     7867     -30     -0.380%
cmd/go/internal/envcmd.s                                                 22176    22095    -81     -0.365%
cmd/go/internal/list.s                                                   15216    15067    -149    -0.979%
cmd/go/internal/modget.s                                                 38698    38519    -179    -0.463%
cmd/go/internal/modcmd.s                                                 46674    46441    -233    -0.499%
cmd/go/internal/test.s                                                   64664    64456    -208    -0.322%
cmd/go.s                                                                 6730     6703     -27     -0.401%
cmd/compile/internal/ssa.s                                               3592565  3582500  -10065  -0.280%
cmd/compile/internal/gc.s                                                1549123  1537123  -12000  -0.775%
cmd/compile/internal/riscv64.s                                           14579    14483    -96     -0.658%
cmd/compile/internal/mips.s                                              20578    20419    -159    -0.773%
cmd/compile/internal/ppc64.s                                             25524    25359    -165    -0.646%
cmd/compile/internal/mips64.s                                            19795    19636    -159    -0.803%
cmd/compile/internal/wasm.s                                              13329    13290    -39     -0.293%
cmd/compile/internal/s390x.s                                             28097    27892    -205    -0.730%
cmd/compile/internal/arm.s                                               31489    31321    -168    -0.534%
cmd/compile/internal/arm64.s                                             29803    29590    -213    -0.715%
cmd/compile/internal/amd64.s                                             32961    33221    +260    +0.789%
cmd/compile/internal/x86.s                                               31029    30878    -151    -0.487%
total                                                                    18534966 18440341 -94625  -0.511%

Change-Id: I830d37364f14f0297800adc42c99f60a74c51aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/226367
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-31 21:26:33 +00:00
Joel Sing
efb0ac4ce6 cmd/compile: provide Add/Cas/Exchange atomic intrinsics on riscv64
Provide Add32, Add64, Cas32, Cas64, Exchange32 and Exchange64 atomic
intrinsics on riscv64.

Updates #36765

Change-Id: I9a3b7d2ce3d49f699171fd76a0fed891d149a6bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/223559
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-25 00:06:40 +00:00
Joel Sing
ade988623e cmd/compile: provide Load32/Load64/Store32/Store64 atomic intrinsics on riscv64
Updates #36765

Change-Id: Id5ce5c5f60112e4f4cf9eec1b1ec120994934950
Reviewed-on: https://go-review.googlesource.com/c/go/+/223558
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-24 14:21:50 +00:00
Michael Anthony Knyszek
8c30971da6 cmd/compile: panic if trying to alias an intrinsic with no definitions
Currently if we try to alias an intrinsic which hasn't been defined for
any architecture (such as by accidentally creating the alias before the
intrinsic is created with addF), then we'll just silently not apply any
intrinsics to those aliases.

Catch this particular case by panicking in alias if we try to apply the
alias and it did nothing.

Change-Id: I98e75fc3f7206b08fc9267cedb8db3e109ec4f5d
Reviewed-on: https://go-review.googlesource.com/c/go/+/224637
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-23 16:01:20 +00:00
Michael Anthony Knyszek
830ee4792e cmd/compile: declare runtime bit func aliases after math/bits intrinsics
Currently runtime/internal/sys bit-manipulation functions are aliased to
math/bits functions, which are intrinsified. Unfortunately these aliases
are declared before the intrinsified versions are generated, resulting
in the generic version of the code being copied over.

This change moves the aliases for bit operations in runtime/internal/sys
after the addF calls to generate those intrinsics in SSA, so that the
intrinsified SSA representation of those functions actually get copied
over.

This should improve the overall performance of the runtime (especially
the page allocator) since these bit operations will actually be
intrinsified now.

Change-Id: I4377da13f9a7bb6aee608e50df0297148bf8f806
Reviewed-on: https://go-review.googlesource.com/c/go/+/224437
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-23 15:52:57 +00:00
Joel Sing
2e918c3aab cmd/compile: provide Load8/Store8 atomic intrinsics on riscv64
Updates #36765

Change-Id: Ieeb6bbc54e4841a1348ad50e80342ec4bc675e07
Reviewed-on: https://go-review.googlesource.com/c/go/+/223557
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-17 06:38:32 +00:00
Russ Cox
877ef86bec cmd/compile: add spectre mitigation mode enabled by -spectre
This commit adds a new cmd/compile flag -spectre,
which accepts a comma-separated list of possible
Spectre mitigations to apply, or the empty string (none),
or "all". The only known mitigation right now is "index",
which uses conditional moves to ensure that x86-64 CPUs
do not speculate past index bounds checks.

Speculating past index bounds checks may be problematic
on systems running privileged servers that accept requests
from untrusted users who can execute their own programs
on the same machine. (And some more constraints that
make it even more unlikely in practice.)

The cases this protects against are analogous to the ones
Microsoft explains in the "Array out of bounds load/store feeding ..."
sections here:
https://docs.microsoft.com/en-us/cpp/security/developer-guidance-speculative-execution?view=vs-2019#array-out-of-bounds-load-feeding-an-indirect-branch

Change-Id: Ib7532d7e12466b17e04c4e2075c2a456dc98f610
Reviewed-on: https://go-review.googlesource.com/c/go/+/222660
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-13 19:05:46 +00:00
Cuong Manh Le
7e1028a9ff cmd/compile: avoid range over copy of array
Passes toostash-check.

Slightly reduce compiler binary size:

file    before    after     Δ       %
compile 21087288  21070776  -16512  -0.078%
total   131847020 131830508 -16512  -0.013%

file                      before    after     Δ       %
cmd/compile/internal/gc.a 9007472   8999640   -7832   -0.087%
total                     127117794 127109962 -7832   -0.006%

Change-Id: I4aadd68d0a7545770598bed9d3a4d05899b67b52
Reviewed-on: https://go-review.googlesource.com/c/go/+/205777
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-06 23:24:28 +00:00
Josh Bleecher Snyder
7b0b6c2f7e cmd/compile: simplify converted SSA form for 'if false'
The goal here is to make it easier for a human to
examine the SSA when a function contains lots of dead code.

No significant compiler metric or generated code differences.

Change-Id: I81915fa4639bc8820cc9a5e45e526687d0d1f57a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221791
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-03 18:42:30 +00:00
Meng Zhuo
ab7ecea0c8 cmd/compile: add intrinsics for runtime/internal/math on MIPS64x
name              old time/op  new time/op  delta
MulUintptr/small  8.42ns ± 0%  5.93ns ± 0%  -29.66%  (p=0.000 n=9+10)
MulUintptr/large  11.1ns ± 0%   7.4ns ± 0%  -33.17%  (p=0.000 n=10+9)

Change-Id: I6659a886389660461fc2c90bd248243f6e7c29d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/210897
Run-TryBot: Meng Zhuo <mengzhuo1203@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-02 16:12:20 +00:00
Ruixin(Peter) Bao
2962c96c9f cmd/compile: lower float to uint conversions on s390x
Add rules for lowering float <-> unsigned int on s390x.

During compilation,
Cvt64Uto64F rule triggers around 80 times,
Cvt64Fto64U rule triggers around 20 times,
Cvt64Uto32F rule triggers around 5 times.

Change-Id: If4c9d128b9132fce8c0bea9abc09cb43a5df7989
Reviewed-on: https://go-review.googlesource.com/c/go/+/209177
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-02-29 21:37:47 +00:00
Michael Munday
cb74dcc172 cmd/compile: remove Greater* and Geq* generic integer ops
The generic Greater and Geq ops can always be replaced with the Less and
Leq ops. This CL therefore removes them. This simplifies the compiler since
it reduces the number of operations that need handling in both code and in
rewrite rules. This will be especially true when adding control flow
optimizations such as the integer-in-range optimizations in CL 165998.

Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040
Reviewed-on: https://go-review.googlesource.com/c/go/+/220417
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26 13:11:53 +00:00
Joel Sing
2e4f490b31 cmd/compile,cmd/link: fix and re-enable open-coded defers on riscv64
The R_CALLRISCV relocation marker is on the JALR instruction, however the actual
relocation is currently two instructions previous for the AUIPC+ADDI sequence.
Adjust the platform dependent offset accordingly and re-enable open-coded defers.

Fixes #36786.

Change-Id: I71597c193c447930fbe94ce44b7355e89ae877bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/216797
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-01-29 16:34:44 +00:00
Joel Sing
a858d15f11 cmd/compile: disable open-coded defers on riscv64
Open-coded defers are currently broken on riscv64 - disable them for the
time being. All of the standard package tests now pass on linux/riscv64.

Updates issue #27532 and #36786

Change-Id: I20fc25ce91dfad48be32409ba5c64ca9a6acef1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/216517
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Dan Scales <danscales@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-01-28 02:40:44 +00:00
Joel Sing
98d2717499 cmd/compile: implement compiler for riscv64
Based on riscv-go port.

Updates #27532

Change-Id: Ia329daa243db63ff334053b8807ea96b97ce3acf
Reviewed-on: https://go-review.googlesource.com/c/go/+/204631
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-01-18 14:41:40 +00:00
Cherry Zhang
a037582eff cmd/compile: mark empty block preemptible
Currently, a block's control instruction gets the liveness info
of the last Value in the block. However, for an empty block, the
control instruction gets the invalid liveness info and therefore
not preemptible. One example is empty infinite loop, which has
only a control instruction. The control instruction being non-
preemptible makes the whole loop non-preemptible.

Fix this by using a different, preemptible liveness info for
empty block's control. We can choose an arbitrary preemptible
liveness info, as at run time we don't really use the liveness
map at that instruction.

As before, if the last Value in the block is non-preemptible, so
is the block control. For example, the conditional branch in the
write barrier test block is still non-preemptible.

Also, only update liveness info if we are actually emitting
instructions. So zero-width Values' liveness info (which are
always invalid) won't affect the block control's liveness info.
For example, if the last Values in a block is a tuple-generating
operation and a Select, the block control instruction is still
preemptible.

Fixes #35923.

Change-Id: Ic5225f3254b07e4955f7905329b544515907642b
Reviewed-on: https://go-review.googlesource.com/c/go/+/209659
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2019-12-06 01:11:02 +00:00
David Chase
0e02cfb369 cmd/compile: try harder to not use an empty src.XPos for a bogus line
The fix for #35652 did not guarantee that it was using a non-empty
src position to replace an empty one.  The new code checks again
and falls back to a more certain position.  (The input in question
compiles to a single empty infinite loop, and none of the actual instructions
had any source position at all.  That is a bug, but given the pathology
of this input, not one worth dealing with this late in the release cycle,
if ever.)

Literally:

00000 (5) TEXT "".f(SB), ABIInternal
00001 (5) PCDATA $0, $-2
00002 (5) PCDATA $1, $-2
00003 (5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
00004 (5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
00005 (5) FUNCDATA $2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
b2
00006 (?) XCHGL AX, AX
b6
00007 (+1048575) JMP 6
00008 (?) END

TODO: Add runtime.InfiniteLoop(), replace infinite loops with a call to
that, and use an eco-friendly runtime.gopark instead.  (This was Cherry's
excellent idea.)

Updates #35652
Fixes #35695

Change-Id: I4b9a841142ee4df0f6b10863cfa0721a7e13b437
Reviewed-on: https://go-review.googlesource.com/c/go/+/207964
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-22 03:06:22 +00:00
David Chase
9bba63bbbe cmd/compile: make a better bogus line for empty infinite loops
The old recipe for making an infinite loop not be infinite
in the debugger could create an instruction (Prog) with a
line number not tied to any file (index == 0).  This caused
downstream failures in DWARF processing.

So don't do that.  Also adds a test, also adds a check+panic
to ensure that the next time this happens the error is less
mystifying.

Fixes #35652

Change-Id: I04f30bc94fdc4aef20dd9130561303ff84fd945e
Reviewed-on: https://go-review.googlesource.com/c/go/+/207613
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-19 00:38:53 +00:00
Michael Munday
b3885dbc93 cmd/compile, runtime: intrinsify atomic And8 and Or8 on s390x
Intrinsify these functions to match other platforms. Update the
sequence of instructions used in the assembly implementations to
match the intrinsics.

Also, add a micro benchmark so we can more easily measure the
performance of these two functions:

name            old time/op  new time/op  delta
And8-8          5.33ns ± 7%  2.55ns ± 8%  -52.12%  (p=0.000 n=20+20)
And8Parallel-8  7.39ns ± 5%  3.74ns ± 4%  -49.34%  (p=0.000 n=20+20)
Or8-8           4.84ns ±15%  2.64ns ±11%  -45.50%  (p=0.000 n=20+20)
Or8Parallel-8   7.27ns ± 3%  3.84ns ± 4%  -47.10%  (p=0.000 n=19+20)

By using a 'rotate then xor selected bits' instruction combined with
either a 'load and and' or a 'load and or' instruction we can
implement And8 and Or8 with far fewer instructions. Replacing
'compare and swap' with atomic instructions may also improve
performance when there is contention.

Change-Id: I28bb8032052b73ae8ccdf6e4c612d2877085fa01
Reviewed-on: https://go-review.googlesource.com/c/go/+/204277
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-11 15:23:59 +00:00
DQNEO
f07059d949 cmd/compile: rename sizeof_Array and array_* to slice_*
Renames variables sizeof_Array and other array_* variables
that were actually intended for slices and not arrays.

Change-Id: I391b95880cc77cabb8472efe694b7dd19545f31a
Reviewed-on: https://go-review.googlesource.com/c/go/+/180919
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-11-11 12:40:04 +00:00
David Chase
cd53fddabb cmd/compile: add framework for logging optimizer (non)actions to LSP
This is intended to allow IDEs to note where the optimizer
was not able to improve users' code.  There may be other
applications for this, for example in studying effectiveness
of optimizer changes more quickly than running benchmarks,
or in verifying that code changes did not accidentally disable
optimizations in performance-critical code.

Logging of nilcheck (bad) for amd64 is implemented as
proof-of-concept.  In general, the intent is that optimizations
that didn't happen are what will be logged, because that is
believed to be what IDE users want.

Added flag -json=version,dest

Check that version=0.  (Future compilers will support a
few recent versions, I hope that version is always <=3.)

Dest is expected to be one of:

/path (or \path in Windows)
  will create directory /path and fill it w/ json files
file://path
  will create directory path, intended either for
     I:\dont\know\enough\about\windows\paths
     trustme_I_know_what_I_am_doing_probably_testing

Not passing an absolute path name usually leads to
json splattered all over source directories,
or failure when those directories are not writeable.
If you want a foot-gun, you have to ask for it.

The JSON output is directed to subdirectories of dest,
where each subdirectory is net/url.PathEscape of the
package name, and each for each foo.go in the package,
net/url.PathEscape(foo).json is created.  The first line
of foo.json contains version and context information,
and subsequent lines contains LSP-conforming JSON
describing the missing optimizations.

Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/204338
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-10 17:11:34 +00:00
David Chase
a0262b201f cmd/compile: intrinsify functions added to runtime/internal/sys
This restores intrinsic status to functions copied from math/bits
into runtime/internal/sys, as an aid to runtime performance.

Updates #35112.

Change-Id: I41a7d87cf00f1e64d82aa95c5b1000bc128de820
Reviewed-on: https://go-review.googlesource.com/c/go/+/206200
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-09 05:51:04 +00:00
Russ Cox
543c6d2e0d math, cmd/compile: rename Fma to FMA
This API was added for #25819, where it was discussed as math.FMA.
The commit adding it used math.Fma, presumably for consistency
with the rest of the unusual names in package math
(Sincos, Acosh, Erfcinv, Float32bits, etc).

I believe that using an idiomatic Go name is more important here
than consistency with these other names, most of which are historical
baggage from C's standard library.

Early additions like Float32frombits happened before "uppercase for export"
(so they were originally like "float32frombits") and they were not properly
reconsidered when we uppercased the symbols to export them.
That's a mistake we live with.

The names of functions we have added since then, and even a few
that were legacy, are more properly Go-cased, such as IsNaN, IsInf,
and RoundToEven, rather than Isnan, Isinf, and Roundtoeven.
And also constants like MaxFloat32.

For new API, we should keep using proper Go-cased symbols
instead of minimally-upper-cased-C symbols.

So math.FMA, not math.Fma.

This API has not yet been released, so this change does not break
the compatibility promise.

This CL also modifies cmd/compile, since the compiler knows
the name of the function. I could have stopped at changing the
string constants, but it seemed to make more sense to use a
consistent casing everywhere.

Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e
Reviewed-on: https://go-review.googlesource.com/c/go/+/205317
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-07 14:51:06 +00:00
Dan Scales
cc47b0d2cd cmd/compile: handle some missing cases of non-SSAable values for args of open-coded defers
In my experimentation, I had found that most non-SSAable expressions were
converted to autotmp variables during AST evaluation. However, this was not true
generally, as witnessed by issue #35213, which has a non-SSAable field reference
of a struct that is not converted to an autotmp. So, I fixed openDeferSave() to
handle non-SSAable nodes more generally, and make sure that these non-SSAable
expressions are not evaluated more than once (which could incorrectly repeat side
effects).

Fixes #35213

Change-Id: I8043d5576b455e94163599e930ca0275e550d594
Reviewed-on: https://go-review.googlesource.com/c/go/+/203888
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-29 19:58:24 +00:00
Austin Clements
97592b3c14 cmd/compile: intrinsics for runtime/internal/atomic.Store8
For #10958, #24543, but makes sense on its own.

Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/203284
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-29 03:18:55 +00:00
Dan Scales
be64a19d99 cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go binary without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements <austin@google.com>
2019-10-24 13:54:11 +00:00
smasher164
03fb1f607b cmd/compile: don't use FMA on plan9
CL 137156 introduces an intrinsic on AMD64 that executes vfmadd231sd
when feature detection is successful. However, because floating-point
isn't allowed in note handler, the builder disables SSE instructions,
and fails when attempting to execute this instruction. This change
disables FMA on plan9 to immediately use the software fallback.

Fixes #35063.

Change-Id: I87d8f0995bd2f15013d203e618938f5079c9eed2
Reviewed-on: https://go-review.googlesource.com/c/go/+/202617
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-22 19:36:42 +00:00
smasher164
58b031949b cmd/compile: add fma intrinsic for arm
This change introduces an arm intrinsic that generates the FMULAD
instruction for the fused-multiply-add operation on systems that
support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite
rule translates the generic intrinsic to FMULAD.

Updates #25819.

Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/142117
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 17:42:47 +00:00
smasher164
7a6da218b1 cmd/compile: add fma intrinsic for amd64
To permit ssa-level optimization, this change introduces an amd64 intrinsic
that generates the VFMADD231SD instruction for the fused-multiply-add
operation on systems that support it. System support is detected via
cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic
("Fma") to VFMADD231SD.

The benchmark compares the software implementation (old) with the intrinsic
(new).

name   old time/op  new time/op  delta
Fma-4  27.2ns ± 1%   1.0ns ± 9%  -96.48%  (p=0.008 n=5+5)

Updates #25819.

Change-Id: I966655e5f96817a5d06dff5942418a3915b09584
Reviewed-on: https://go-review.googlesource.com/c/go/+/137156
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 16:42:10 +00:00
smasher164
33425ab8db cmd/compile: introduce generic ssa intrinsic for fused-multiply-add
In order to make math.FMA a compiler intrinsic for ISAs like ARM64,
PPC64[le], and S390X, a generic 3-argument opcode "Fma" is provided and
rewritten as

    ARM64: (Fma x y z) -> (FMADDD z x y)
    PPC64: (Fma x y z) -> (FMADD x y z)
    S390X: (Fma x y z) -> (FMADD z x y)

Updates #25819.

Change-Id: Ie5bc628311e6feeb28ddf9adaa6e702c8c291efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/131959
Run-TryBot: Akhil Indurti <aindurti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 16:24:15 +00:00
Bryan C. Mills
b76e6f8825 Revert "cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata"
This reverts CL 190098.

Reason for revert: broke several builders.

Change-Id: I69161352f9ded02537d8815f259c4d391edd9220
Reviewed-on: https://go-review.googlesource.com/c/go/+/201519
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Dan Scales <danscales@google.com>
2019-10-16 20:59:53 +00:00
Dan Scales
dad616375f cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go cmd without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28
Reviewed-on: https://go-review.googlesource.com/c/go/+/190098
Reviewed-by: Austin Clements <austin@google.com>
2019-10-16 18:27:16 +00:00
Cherry Zhang
c4817f5d4f cmd/compile: on Wasm and AIX, let deferred nil function panic at invocation
The Go spec requires

	If a deferred function value evaluates to nil, execution
	panics when the function is invoked, not when the "defer"
	statement is executed.

On Wasm and AIX, currently we actually emit a nil check at the
point of defer statement, which will make it panic too early.
This CL fixes this.

Also, on Wasm, now the nil function will be passed through
deferreturn to jmpdefer, which does an explicit nil check and
calls sigpanic if it is nil. This sigpanic, being called from
assembly, is ABI0. So change the assembler backend to also
handle sigpanic in ABI0.

Fixes #34926.
Updates #8047.

Change-Id: I28489a571cee36d2aef041f917b8cfdc31d557d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/201297
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-16 00:05:37 +00:00
Meng Zhuo
50f1157760 cmd/compile: add math/bits.Mul64 intrinsic on mips64x
Benchmark:
name   old time/op  new time/op  delta
Mul    36.0ns ± 1%   2.8ns ± 0%  -92.31%  (p=0.000 n=10+10)
Mul32  4.37ns ± 0%  4.37ns ± 0%     ~     (p=0.429 n=6+10)
Mul64  36.4ns ± 0%   2.8ns ± 0%  -92.37%  (p=0.000 n=10+9)

Change-Id: Ic4f4e5958adbf24999abcee721d0180b5413fca7
Reviewed-on: https://go-review.googlesource.com/c/go/+/200582
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-14 21:23:34 +00:00