802 Commits

Author SHA1 Message Date
Keith Randall
585c9e8412 cmd/compile: implement shifts by signed amounts
Allow shifts by signed amounts. Panic if the shift amount is negative.

TODO: We end up doing two compares per shift, see Ian's comment
https://github.com/golang/go/issues/19113#issuecomment-443241799 that
we could do it with a single comparison in the normal case.

The prove pass mostly handles this code well. For instance, it removes the
<0 check for cases like this:
    if s >= 0 { _ = x << s }
    _ = x << len(a)

This case isn't handled well yet:
    _ = x << (y & 0xf)
I'll do followon CLs for unhandled cases as needed.

Update #19113

R=go1.13

Change-Id: I839a5933d94b54ab04deb9dd5149f32c51c90fa1
Reviewed-on: https://go-review.googlesource.com/c/158719
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2019-02-15 23:13:09 +00:00
Austin Clements
a2e79571a9 cmd/compile: separate data and function LSyms
Currently, obj.Ctxt's symbol table does not distinguish between ABI0
and ABIInternal symbols. This is *almost* okay, since a given symbol
name in the final object file is only going to belong to one ABI or
the other, but it requires that the compiler mark a Sym as being a
function symbol before it retrieves its LSym. If it retrieves the LSym
first, that LSym will be created as ABI0, and later marking the Sym as
a function symbol won't change the LSym's ABI.

Marking a Sym as a function symbol before looking up its LSym sounds
easy, except Syms have a dual purpose: they are used just as interned
strings (every function, variable, parameter, etc with the same
textual name shares a Sym), and *also* to store state for whatever
package global has that name. As a result, it's easy to slip up and
look up an LSym when a Sym is serving as the name of a local variable,
and then later mark it as a function when it's serving as the global
with the name.

In general, we were careful to avoid this, but #29610 demonstrates one
case where we messed up. Because of on-demand importing from indexed
export data, it's possible to compile a method wrapper for a type
imported from another package before importing an init function from
that package. If the argument of the method is named "init", the
"init" LSym will be created as a data symbol when compiling the
wrapper, before it gets marked as a function symbol.

To fix this, we separate obj.Ctxt's symbol tables for ABI0 and
ABIInternal symbols. This way, the compiler will simply get a
different LSym once the Sym takes on its package-global meaning as a
function.

This fixes the above ordering issue, and means we no longer need to go
out of our way to create the "init" function early and mark it as a
function symbol.

Fixes #29610.
Updates #27539.

Change-Id: Id9458b40017893d46ef9e4a3f9b47fc49e1ce8df
Reviewed-on: https://go-review.googlesource.com/c/157017
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-01-11 00:45:49 +00:00
Keith Randall
69c2c56453 cmd/compile,runtime: redo mid-stack inlining tracebacks
Work involved in getting a stack trace is divided between
runtime.Callers and runtime.CallersFrames.

Before this CL, runtime.Callers returns a pc per runtime frame.
runtime.CallersFrames is responsible for expanding a runtime frame
into potentially multiple user frames.

After this CL, runtime.Callers returns a pc per user frame.
runtime.CallersFrames just maps those to user frame info.

Entries in the result of runtime.Callers are now pcs
of the calls (or of the inline marks), not of the instruction
just after the call.

Fixes #29007
Fixes #28640
Update #26320

Change-Id: I1c9567596ff73dc73271311005097a9188c3406f
Reviewed-on: https://go-review.googlesource.com/c/152537
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-12-28 20:55:36 +00:00
Martin Möhrmann
38e7177c94 cmd/compile: fix length overflow when appending elements to a slice
Instead of testing len(slice)+numNewElements > cap(slice) use
uint(len(slice)+numNewElements) > uint(cap(slice)) to test
if a slice needs to be grown in an append operation.

This prevents a possible overflow when len(slice) is near the maximum
int value and the addition of a constant number of new elements
makes it overflow and wrap around to a negative number which is
smaller than the capacity of the slice.

Appending a slice to a slice with append(s1, s2...) already used
a uint comparison to test slice capacity and therefore was not
vulnerable to the same overflow issue.

Fixes: #29190

Change-Id: I41733895838b4f80a44f827bf900ce931d8be5ca
Reviewed-on: https://go-review.googlesource.com/c/154037
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-12-14 05:48:18 +00:00
David Chase
ea6259d5e9 cmd/compile: check for negative upper bound to IsSliceInBounds
IsSliceInBounds(x, y) asserts that y is not negative, but
there were cases where this is not true.  Change code
generation to ensure that this is true when it's not obviously
true.  Prove phase cleans a few of these out.

With this change the compiler text section is 0.06% larger,
that is, not very much.  Benchmarking still TBD, may need
to wait for access to a benchmarking box (next week).

Also corrected run.go to handle '?' in -update_errors output.

Fixes #28797.

Change-Id: Ia8af90bc50a91ae6e934ef973def8d3f398fac7b
Reviewed-on: https://go-review.googlesource.com/c/152477
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-12-07 23:04:58 +00:00
David Chase
2b58ca6e3d cmd/compile: begin OpArg and OpPhi location lists at block start
For the entry block, make the "first instruction" be truly
the first instruction.  This allows printing of incoming
parameters with Delve.

Also be sure Phis are marked as being at the start of their
block.  This is observed to move location list pointers,
and where moved, they become correct.

Leading zero-width instructions include LoweredGetClosurePtr.
Because this instruction is actually architecture-specific,
and it is now tested for in 3 different places, also created
Op.isLoweredGetClosurePtr() to reduce future surprises.

Change-Id: Ic043b7265835cf1790382a74334b5714ae4060af
Reviewed-on: https://go-review.googlesource.com/c/145179
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-11-29 14:00:26 +00:00
Brian Kessler
319787a528 cmd/compile: intrinsify math/bits.Div on amd64
Note that the intrinsic implementation panics separately for overflow and
divide by zero, which matches the behavior of the pure go implementation.
There is a modest performance improvement after intrinsic implementation.

name     old time/op  new time/op  delta
Div-4    53.0ns ± 1%  47.0ns ± 0%  -11.28%  (p=0.008 n=5+5)
Div32-4  18.4ns ± 0%  18.5ns ± 1%     ~     (p=0.444 n=5+5)
Div64-4  53.3ns ± 0%  47.5ns ± 4%  -10.77%  (p=0.008 n=5+5)

Updates #28273

Change-Id: Ic1688ecc0964acace2e91bf44ef16f5fb6b6bc82
Reviewed-on: https://go-review.googlesource.com/c/144378
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-27 05:04:25 +00:00
Clément Chigot
9fe9853ae5 cmd/compile: fix nilcheck for AIX
This commit adapts compile tool to create correct nilchecks for AIX.

AIX allows to load a nil pointer. Therefore, the default nilcheck
which issues a load must be replaced by a CMP instruction followed by a
store at 0x0 if the value is nil. The store will trigger a SIGSEGV as on
others OS.

The nilcheck algorithm must be adapted to do not remove nilcheck if it's
only a read. Stores are detected with v.Type.IsMemory().

Tests related to nilptr must be adapted to the previous changements.
nilptr.go cannot be used as it's because the AIX address space starts at
1<<32.

Change-Id: I9f5aaf0b7e185d736a9b119c0ed2fe4e5bd1e7af
Reviewed-on: https://go-review.googlesource.com/c/144538
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-26 14:13:53 +00:00
Yury Smolsky
3068fcfa0d cmd/compile: add control flow graphs to ssa.html
This CL adds CFGs to ssa.html.
It execs dot to generate SVG,
which then gets inlined into the html.
Some standard naming and javascript hacks
enable integration with the rest of ssa.html.
Clicking on blocks highlights the relevant
part of the CFG, and vice versa.

Sample output and screenshots can be seen in #28177.

CFGs can be turned on with the suffix mask:
:*            - dump CFG for every phase
:lower        - just the lower phase
:lower-layout - lower through layout
:w,x-y        - phases w and x through y

Calling dot after every pass is noticeably slow,
instead use the range of phases.

Dead blocks are not displayed on CFG.

User can zoom and pan individual CFG
when the automatic adjustment has failed.

Dot-related errors are reported
without bringing down the process.

Fixes #28177

Change-Id: Id52c42d86c4559ca737288aa10561b67a119c63d
Reviewed-on: https://go-review.googlesource.com/c/142517
Run-TryBot: Yury Smolsky <yury@smolsky.by>
Reviewed-by: David Chase <drchase@google.com>
2018-11-21 10:22:43 +00:00
Josh Bleecher Snyder
bc43889566 cmd/compile: bulk rename
This change does a bulk rename of several identifiers in the compiler.
See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/
for context and for discussion of these particular renames.

Commands run to generate this change:

gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO
gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT
gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG
gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF
gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR
gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP
gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR
gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES
gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP
gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES

gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt
gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr
gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee
gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK
gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign
gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit

Not altered: parameters and local variables (mostly in typecheck.go) named top,
which should probably now be called ctx (and which should probably have a named type).
Also not altered: Field called Top in gc.Func.

gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD
gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD
gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD
gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD
gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD
gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD

Not altered: function gc.hasddd, params and local variables called isddd
Also not altered: fmt.go prints nodes using "isddd(%v)".

cd cmd/compile/internal/gc; go generate

I then manually found impacted comments using exact string match
and fixed them up by hand. The comment changes were trivial.

Passes toolstash-check.

Fixes #27167. If this experiment is deemed a success,
we will open a new tracking issue for renames to do
at the end of the 1.13 cycles.

Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8
Reviewed-on: https://go-review.googlesource.com/c/150140
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-19 00:02:53 +00:00
Martin Möhrmann
75798e8ada runtime: make processor capability variable naming platform specific
The current support_XXX variables are specific for the
amd64 and 386 platforms.

Prefix processor capability variables by architecture to have a
consistent naming scheme and avoid reuse of the existing
variables for new platforms.

This also aligns naming of runtime variables closer with internal/cpu
processor capability variable names.

Change-Id: I3eabb29a03874678851376185d3a62e73c1aff1d
Reviewed-on: https://go-review.googlesource.com/c/91435
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-14 20:30:31 +00:00
Austin Clements
5cf2b4c2d3 cmd/compile: fix race on initializing Sym symFunc flag
SSA lowering can create PFUNC ONAME nodes when compiling method calls.
Since we generally initialize the node's Sym to a func when we set its
class to PFUNC, we did this here, too. Unfortunately, since SSA
compilation is concurrent, this can cause a race if two function
compilations try to initialize the same symbol.

Luckily, we don't need to do this at all, since we're actually just
wrapping an ONAME node around an existing Sym that's already marked as
a function symbol.

Fixes the linux-amd64-racecompile builder, which was broken by CL
147158.

Updates #27539.

Change-Id: I8ddfce6e66a08ce53998c5bfa6f5a423c1ffc1eb
Reviewed-on: https://go-review.googlesource.com/c/149158
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-11-12 22:38:02 +00:00
Austin Clements
685aca45dc cmd/compile, cmd/link: separate stable and internal ABIs
This implements compiler and linker support for separating the
function calling ABI into two ABIs: a stable and an internal ABI. At
the moment, the two ABIs are identical, but we'll be able to evolve
the internal ABI without breaking existing assembly code that depends
on the stable ABI for calling to and from Go.

The Go compiler generates internal ABI symbols for all Go functions.
It uses the symabis information produced by the assembler to create
ABI wrappers whenever it encounters a body-less Go function that's
defined in assembly or a Go function that's referenced from assembly.

Since the two ABIs are currently identical, for the moment this is
implemented using "ABI alias" symbols, which are just forwarding
references to the native ABI symbol for a function. This way there's
no actual code involved in the ABI wrapper, which is good because
we're not deriving any benefit from it right now. Once the ABIs
diverge, we can eliminate ABI aliases.

The linker represents these different ABIs internally as different
versions of the same symbol. This way, the linker keeps us honest,
since every symbol definition and reference also specifies its
version. The linker is responsible for resolving ABI aliases.

Fixes #27539.

Change-Id: I197c52ec9f8fc435db8f7a4259029b20f6d65e95
Reviewed-on: https://go-review.googlesource.com/c/147160
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-11-12 20:46:55 +00:00
Austin Clements
16e6cd9a4d cmd/compile: mark function Syms
In order to mark the obj.LSyms produced by the compiler with the
correct ABI, we need to know which types.Syms refer to function
symbols. This CL adds a flag to types.Syms to mark symbols for
functions, and sets this flag everywhere we create a PFUNC-class node,
and in the one place where we directly create function symbols without
always wrapping them in a PFUNC node (methodSym).

We'll use this information to construct obj.LSyms with correct ABI
information.

For #27539.

Change-Id: Ie3ac8bf3da013e449e78f6ca85546a055f275463
Reviewed-on: https://go-review.googlesource.com/c/147158
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-12 20:46:50 +00:00
Austin Clements
15265ec421 cmd/compile: avoid duplicate GC bitmap symbols
Currently, liveness produces a distinct obj.LSym for each GC bitmap
for each function. These are then named by content hash and only
ultimately deduplicated by WriteObjFile.

For various reasons (see next commit), we want to remove this
deduplication behavior from WriteObjFile. Furthermore, it's
inefficient to produce these duplicate symbols in the first place.

GC bitmaps are the only source of duplicate symbols in the compiler.
This commit eliminates these duplicate symbols by declaring them in
the Ctxt symbol hash just like every other obj.LSym. As a result, all
GC bitmaps with the same content now refer to the same obj.LSym.

The next commit will remove deduplication from WriteObjFile.

For #27539.

Change-Id: I4f15e3d99530122cdf473b7a838c69ef5f79db59
Reviewed-on: https://go-review.googlesource.com/c/146557
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-11-03 15:12:34 +00:00
Martin Möhrmann
020a18c545 cmd/compile: move slice construction to callers of makeslice
Only return a pointer p to the new slices backing array from makeslice.
Makeslice callers then construct sliceheader{p, len, cap} explictly
instead of makeslice returning the slice.

Reduces go binary size by ~0.2%.
Removes 92 (~3.5%) panicindex calls from go binary.

Change-Id: I29b7c3b5fe8b9dcec96e2c43730575071cfe8a94
Reviewed-on: https://go-review.googlesource.com/c/141822
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-10-29 19:23:00 +00:00
Matthew Dempsky
2dda040f19 cmd/compile/internal/gc: represent labels as bare Syms
Avoids allocating an ONAME for OLABEL, OGOTO, and named OBREAK and
OCONTINUE nodes.

Passes toolstash-check.

Change-Id: I359142cd48e8987b5bf29ac100752f8c497261c1
Reviewed-on: https://go-review.googlesource.com/c/145200
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-10-27 07:32:06 +00:00
Matthew Dempsky
c68e3bcb03 cmd/compile: remove -f flag
This is supposed to print out function stack frames, but it's been
broken since golang.org/cl/38593, and no one has noticed.

Change-Id: Iad428a9097d452b878b1f8c5df22afd6f671ac2e
Reviewed-on: https://go-review.googlesource.com/c/145199
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-10-27 02:10:07 +00:00
Keith Randall
7a634034c8 cmd/compile: fix Mul->Mul64 intrinsic alias
The alias declaration needs to come after the function it is aliasing.

It isn't a big deal in this case, as bits.Mul inlines and has as its
body bits.Mul64, so the desired code gets generated regardless.
The alias should only have an effect on inlining cost estimates
(for functions that call bits.Mul).

Change-Id: I0d814899ce7049a0fb36e8ce1ad5ababbaf6265f
Reviewed-on: https://go-review.googlesource.com/c/144597
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-10-25 23:12:46 +00:00
Keith Randall
dd789550a7 cmd/compile: intrinsify math/bits.Sub on amd64
name             old time/op  new time/op  delta
Sub-8            1.12ns ± 1%  1.17ns ± 1%   +5.20%          (p=0.008 n=5+5)
Sub32-8          1.11ns ± 0%  1.11ns ± 0%     ~     (all samples are equal)
Sub64-8          1.12ns ± 0%  1.18ns ± 1%   +5.00%          (p=0.016 n=4+5)
Sub64multiple-8  4.10ns ± 1%  0.86ns ± 1%  -78.93%          (p=0.008 n=5+5)

Fixes #28273

Change-Id: Ibcb6f2fd32d987c3bcbae4f4cd9d335a3de98548
Reviewed-on: https://go-review.googlesource.com/c/144258
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-25 19:47:27 +00:00
Keith Randall
899f3a2892 cmd/compile: intrinsify math/bits.Add on amd64
name             old time/op  new time/op  delta
Add-8            1.11ns ± 0%  1.18ns ± 0%   +6.31%  (p=0.029 n=4+4)
Add32-8          1.02ns ± 0%  1.02ns ± 1%     ~     (p=0.333 n=4+5)
Add64-8          1.11ns ± 1%  1.17ns ± 0%   +5.79%  (p=0.008 n=5+5)
Add64multiple-8  4.35ns ± 1%  0.86ns ± 0%  -80.22%  (p=0.000 n=5+4)

The individual ops are a bit slower (but still very fast).
Using the ops in carry chains is very fast.

Update #28273

Change-Id: Id975f76df2b930abf0e412911d327b6c5b1befe5
Reviewed-on: https://go-review.googlesource.com/c/144257
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-25 19:47:00 +00:00
Carlos Eduardo Seo
5c472132bf cmd/compile, runtime: add new lightweight atomics for ppc64x
This change creates the infrastructure for new lightweight atomics
primitives in runtime/internal/atomic:

- LoadAcq, for load-acquire
- StoreRel, for store-release
- CasRel, for Compare-and-Swap-release

and implements them for ppc64x. There is visible performance improvement
in producer-consumer scenarios, like BenchmarkChanProdCons*:

benchmark                           old ns/op     new ns/op     delta
BenchmarkChanProdCons0-48           2034          2034          +0.00%
BenchmarkChanProdCons10-48          1798          1608          -10.57%
BenchmarkChanProdCons100-48         1596          1585          -0.69%
BenchmarkChanProdConsWork0-48       2084          2046          -1.82%
BenchmarkChanProdConsWork10-48      1829          1668          -8.80%
BenchmarkChanProdConsWork100-48     1650          1650          +0.00%

Fixes #21348

Change-Id: I1f6ce377e4a0fe4bd7f5f775e8036f50070ad8db
Reviewed-on: https://go-review.googlesource.com/c/142277
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-10-23 18:10:38 +00:00
Carlos Eduardo Seo
1e8ecefcd5 cmd/compile: intrinsify math/big.mulWW on ppc64x
This change implements mulWW as an intrinsic for ppc64x. Performance
numbers below:

name                            old time/op    new time/op    delta
QuoRem                            4.54µs ±45%    3.22µs ± 0%  -29.22%  (p=0.029 n=4+4)
ModSqrt225_Tonelli                 765µs ± 3%     757µs ± 0%   -1.02%  (p=0.029 n=4+4)
ModSqrt225_3Mod4                   231µs ± 0%     231µs ± 0%   -0.10%  (p=0.029 n=4+4)
ModSqrt231_Tonelli                 789µs ± 0%     788µs ± 0%   -0.14%  (p=0.029 n=4+4)
ModSqrt231_5Mod8                   267µs ± 0%     267µs ± 0%   -0.13%  (p=0.029 n=4+4)
Sqrt                              49.5µs ±17%    45.3µs ± 0%   -8.48%  (p=0.029 n=4+4)
IntSqr/1                          32.2ns ±22%    24.2ns ± 0%  -24.79%  (p=0.029 n=4+4)
IntSqr/2                          60.6ns ± 0%    60.9ns ± 0%   +0.50%  (p=0.029 n=4+4)
IntSqr/3                          82.8ns ± 0%    83.3ns ± 0%   +0.51%  (p=0.029 n=4+4)
IntSqr/5                           122ns ± 0%     121ns ± 0%   -1.22%  (p=0.029 n=4+4)
IntSqr/8                           227ns ± 0%     226ns ± 0%   -0.44%  (p=0.029 n=4+4)
IntSqr/10                          300ns ± 0%     298ns ± 0%   -0.67%  (p=0.029 n=4+4)
IntSqr/20                         1.02µs ± 0%    0.89µs ± 0%  -13.08%  (p=0.029 n=4+4)
IntSqr/30                         1.73µs ± 0%    1.51µs ± 0%  -12.73%  (p=0.029 n=4+4)
IntSqr/50                         3.69µs ± 1%    3.29µs ± 0%  -10.70%  (p=0.029 n=4+4)
IntSqr/80                         7.64µs ± 0%    7.04µs ± 0%   -7.91%  (p=0.029 n=4+4)
IntSqr/100                        11.1µs ± 0%    10.3µs ± 0%   -7.04%  (p=0.029 n=4+4)
IntSqr/200                        37.9µs ± 0%    36.4µs ± 0%   -4.13%  (p=0.029 n=4+4)
IntSqr/300                        69.4µs ± 0%    66.0µs ± 0%   -4.94%  (p=0.029 n=4+4)
IntSqr/500                         174µs ± 0%     168µs ± 0%   -3.10%  (p=0.029 n=4+4)
IntSqr/800                         347µs ± 0%     333µs ± 0%   -4.06%  (p=0.029 n=4+4)
IntSqr/1000                        524µs ± 0%     507µs ± 0%   -3.21%  (p=0.029 n=4+4)

Change-Id: If067452f5b6579ad3a2e9daa76a7ffe6fceae1bb
Reviewed-on: https://go-review.googlesource.com/c/143217
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-10-22 13:03:28 +00:00
Josh Bleecher Snyder
2578ac54eb cmd/compile: move argument stack construction to SSA generation
The goal of this change is to move work from walk to SSA,
and simplify things along the way.

This is hard to accomplish cleanly with small incremental changes,
so this large commit message aims to provide a roadmap to the diff.

High level description:

Prior to this change, walk was responsible for constructing (most of) the stack for function calls.
ascompatte gathered variadic arguments into a slice.
It also rewrote n.List from a list of arguments to a list of assignments to stack slots.
ascompatte was called multiple times to handle the receiver in a method call.
reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack.
adjustargs then made extra stack space for go/defer args as needed.

Node to SSA construction evaluated all the statements in n.List,
and issued the function call, assuming that the stack was correctly constructed.
Intrinsic calls had to dig around inside n.List to extract the arguments,
since intrinsics don't use the stack to make function calls.

This change moves stack construction to the SSA construction phase.
ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did.
It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries.
It does not, however, make any assignments to stack slots.
Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List.
(It would be better to use Ninit instead of List; future work.)
During SSA construction, after doing all the temporary assignments in n.List,
the function arguments are assigned to stack slots by
constructing the appropriate SSA Value, using (*state).storeArg.
SSA construction also now handles adjustments for go/defer args.
This change also simplifies intrinsic calls, since we no longer need to undo walk's work.

Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely.

Generated code differences:

There were a few optimizations applied along the way, the old way.
f(g()) was rewritten to do a block copy of function results to function arguments.
And reorder1 avoided introducing the final "save the stack" temporary in n.List.

The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed.

SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary.
The exception was when the temporary's type was not SSA-able;
in that case, we got a Move into an autotmp and then an immediate Move onto the stack,
with the autotmp never read or used again.
This change introduces a new rewrite rule to detect such pointless double Moves
and collapse them into a single Move.
This is actually more powerful than the original optimization,
since the original optimization relied on the imprecise Node.HasCall calculation.

The other significant difference in the generated code is that the stack is now constructed
completely in SP-offset order. Prior to this change, the stack was constructed somewhat
haphazardly: first the final argument that Node.HasCall deemed to require a temporary,
then other arguments, then the method receiver, then the defer/go args.
SP-offset is probably a good default order. See future work.

There are a few minor object file size changes as a result of this change.
I investigated some regressions in early versions of this change.

One regression (in archive/tar) was the addition of a single CMPQ instruction,
which would be eliminated were this TODO from flagalloc to be done:
	// TODO: Remove original instructions if they are never used.

One regression (in text/template) was an ADDQconstmodify that is now
a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change
in the order in which arguments are written. The argument change
order can also now be luckier, so this appears to be a wash.

All in all, though there will be minor winners and losers,
this change appears to be performance neutral.

Future work:

Move loading the result of function calls to SSA construction; eliminate OINDREGSP.

Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass.
Among other benefits, this would make it easier to transition to a new calling convention.
This would require rethinking the handling of stack conflicts and is non-trivial.

Figure out some clean way to indicate that stack construction Stores/Moves
do not alias each other, so that subsequent passes may do things like
CSE+tighten shared stack setup, do DSE using non-first Stores, etc.
This would allow us to eliminate the minor text/template regression.

Possibly make assignments to stack slots not treated as statements by DWARF.

Compiler benchmarks:

name        old time/op       new time/op       delta
Template          182ms ± 2%        179ms ± 2%  -1.69%  (p=0.000 n=47+48)
Unicode          86.3ms ± 5%       85.1ms ± 4%  -1.36%  (p=0.001 n=50+50)
GoTypes           646ms ± 1%        642ms ± 1%  -0.63%  (p=0.000 n=49+48)
Compiler          2.89s ± 1%        2.86s ± 2%  -1.36%  (p=0.000 n=48+50)
SSA               8.47s ± 1%        8.37s ± 2%  -1.22%  (p=0.000 n=47+50)
Flate             122ms ± 2%        121ms ± 2%  -0.66%  (p=0.000 n=47+45)
GoParser          147ms ± 2%        146ms ± 2%  -0.53%  (p=0.006 n=46+49)
Reflect           406ms ± 2%        403ms ± 2%  -0.76%  (p=0.000 n=48+43)
Tar               162ms ± 3%        162ms ± 4%    ~     (p=0.191 n=46+50)
XML               223ms ± 2%        222ms ± 2%  -0.37%  (p=0.031 n=45+49)
[Geo mean]        382ms             378ms       -0.89%

name        old user-time/op  new user-time/op  delta
Template          219ms ± 3%        216ms ± 3%  -1.56%  (p=0.000 n=50+48)
Unicode           109ms ± 6%        109ms ± 5%    ~     (p=0.190 n=50+49)
GoTypes           836ms ± 2%        828ms ± 2%  -0.96%  (p=0.000 n=49+48)
Compiler          3.87s ± 2%        3.80s ± 1%  -1.81%  (p=0.000 n=49+46)
SSA               12.0s ± 1%        11.8s ± 1%  -2.01%  (p=0.000 n=48+50)
Flate             142ms ± 3%        141ms ± 3%  -0.85%  (p=0.003 n=50+48)
GoParser          178ms ± 4%        175ms ± 4%  -1.66%  (p=0.000 n=48+46)
Reflect           520ms ± 2%        512ms ± 2%  -1.44%  (p=0.000 n=45+48)
Tar               200ms ± 3%        198ms ± 4%  -0.61%  (p=0.037 n=47+50)
XML               277ms ± 3%        275ms ± 3%  -0.85%  (p=0.000 n=49+48)
[Geo mean]        482ms             476ms       -1.23%

name        old alloc/op      new alloc/op      delta
Template         36.1MB ± 0%       35.3MB ± 0%  -2.18%  (p=0.008 n=5+5)
Unicode          29.8MB ± 0%       29.3MB ± 0%  -1.58%  (p=0.008 n=5+5)
GoTypes           125MB ± 0%        123MB ± 0%  -2.13%  (p=0.008 n=5+5)
Compiler          531MB ± 0%        513MB ± 0%  -3.40%  (p=0.008 n=5+5)
SSA              2.00GB ± 0%       1.93GB ± 0%  -3.34%  (p=0.008 n=5+5)
Flate            24.5MB ± 0%       24.3MB ± 0%  -1.18%  (p=0.008 n=5+5)
GoParser         29.4MB ± 0%       28.7MB ± 0%  -2.34%  (p=0.008 n=5+5)
Reflect          87.1MB ± 0%       86.0MB ± 0%  -1.33%  (p=0.008 n=5+5)
Tar              35.3MB ± 0%       34.8MB ± 0%  -1.44%  (p=0.008 n=5+5)
XML              47.9MB ± 0%       47.1MB ± 0%  -1.86%  (p=0.008 n=5+5)
[Geo mean]       82.8MB            81.1MB       -2.08%

name        old allocs/op     new allocs/op     delta
Template           352k ± 0%         347k ± 0%  -1.32%  (p=0.008 n=5+5)
Unicode            342k ± 0%         339k ± 0%  -0.66%  (p=0.008 n=5+5)
GoTypes           1.29M ± 0%        1.27M ± 0%  -1.30%  (p=0.008 n=5+5)
Compiler          4.98M ± 0%        4.87M ± 0%  -2.14%  (p=0.008 n=5+5)
SSA               15.7M ± 0%        15.2M ± 0%  -2.86%  (p=0.008 n=5+5)
Flate              233k ± 0%         231k ± 0%  -0.83%  (p=0.008 n=5+5)
GoParser           296k ± 0%         291k ± 0%  -1.54%  (p=0.016 n=5+4)
Reflect           1.05M ± 0%        1.04M ± 0%  -0.65%  (p=0.008 n=5+5)
Tar                343k ± 0%         339k ± 0%  -0.97%  (p=0.008 n=5+5)
XML                432k ± 0%         426k ± 0%  -1.19%  (p=0.008 n=5+5)
[Geo mean]         815k              804k       -1.35%

name        old object-bytes  new object-bytes  delta
Template          505kB ± 0%        505kB ± 0%  -0.01%  (p=0.008 n=5+5)
Unicode           224kB ± 0%        224kB ± 0%    ~     (all equal)
GoTypes          1.82MB ± 0%       1.83MB ± 0%  +0.06%  (p=0.008 n=5+5)
Flate             324kB ± 0%        324kB ± 0%  +0.00%  (p=0.008 n=5+5)
GoParser          402kB ± 0%        402kB ± 0%  +0.04%  (p=0.008 n=5+5)
Reflect          1.39MB ± 0%       1.39MB ± 0%  -0.01%  (p=0.008 n=5+5)
Tar               449kB ± 0%        449kB ± 0%  -0.02%  (p=0.008 n=5+5)
XML               598kB ± 0%        597kB ± 0%  -0.05%  (p=0.008 n=5+5)

Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143
Reviewed-on: https://go-review.googlesource.com/c/114797
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-10-19 21:23:16 +00:00
David Chase
fa31093ec4 cmd/compile: attach slots to incoming params for better debugging
This change attaches a slots to the OpArg values for
incoming params, and this in turn causes location lists
to be generated for params, and that yields better
debugging, in delve and sometimes in gdb.

The parameter lifetimes could start earlier; they are in
fact defined on entry, not at the point where the OpArg is
finally mentioned.  (that will be addressed in another CL)

Change-Id: Icca891e118291d260c35a14acd5bc92bb82d9e9f
Reviewed-on: https://go-review.googlesource.com/c/141697
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-18 20:04:31 +00:00
Martin Möhrmann
a1ca4893ff cmd/compile: add intrinsics for runtime/internal/math on 386 and amd64
Add generic, 386 and amd64 specific ops and SSA rules for multiplication
with overflow and branching based on overflow flags. Use these to intrinsify
runtime/internal/math.MulUinptr.

On amd64
  mul, overflow := math.MulUintptr(a, b)
  if overflow {
is lowered to two instructions:
  MULQ SI
  JO 0x10ee35c

No codegen tests as codegen can not currently test unexported internal runtime
functions.

amd64:
name              old time/op  new time/op  delta
MulUintptr/small  1.16ns ± 5%  0.88ns ± 6%  -24.36%  (p=0.000 n=19+20)
MulUintptr/large  10.7ns ± 1%   1.1ns ± 1%  -89.28%  (p=0.000 n=17+19)

Change-Id: If60739a86f820e5044d677276c21df90d3c7a86a
Reviewed-on: https://go-review.googlesource.com/c/141820
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-15 19:04:09 +00:00
Martin Möhrmann
9f66b41bee cmd/compile: avoid implicit bounds checks after explicit checks for append
The generated code for the append builtin already checks if the appended
to slice is large enough and calls growslice if that is not the case.
Trust that this ensures the slice is large enough and avoid the
implicit bounds check when slicing the slice to its new size.

Removes 365 panicslice calls (-14%) from the go binary which
reduces the binary size by ~12kbyte.

Change-Id: I1b88418675ff409bc0b956853c9e95241274d5a6
Reviewed-on: https://go-review.googlesource.com/c/119315
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-15 18:23:03 +00:00
David Chase
69c5830c2b cmd/compile: repair display of values & blocks in prog column
This restores the printing of vXX and bYY in the left-hand
edge of the last column of ssa.html, where the generated
progs appear.

Change-Id: I81ab9b2fa5ae28e6e5de1b77665cfbed8d14e000
Reviewed-on: https://go-review.googlesource.com/c/141277
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Yury Smolsky <yury@smolsky.by>
2018-10-11 15:29:00 +00:00
Carlos Eduardo Seo
23578f9d00 cmd/compile: intrinsify TrailingZeros16, OnesCount{8,16} for ppc64x
This change implements TrailingZeros16, OnesCount8 and OnesCount16
as intrinsics for ppc64x.

benchmark                       old ns/op     new ns/op     delta
BenchmarkTrailingZeros16-40     2.16          1.61          -25.46%

benchmark                   old ns/op     new ns/op     delta
BenchmarkOnesCount-40       0.71          0.71          +0.00%
BenchmarkOnesCount8-40      0.93          0.69          -25.81%
BenchmarkOnesCount16-40     1.54          0.75          -51.30%
BenchmarkOnesCount32-40     0.75          0.74          -1.33%
BenchmarkOnesCount64-40     0.71          0.71          +0.00%

Change-Id: I010fa9c0ef596a09362870d81193c633e70da637
Reviewed-on: https://go-review.googlesource.com/c/139137
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-10-11 13:21:50 +00:00
Matthew Dempsky
62e5215a2a cmd/compile: merge TPTR32 and TPTR64 as TPTR
Change-Id: I0490098a7235458c5aede1135426a9f19f8584a7
Reviewed-on: https://go-review.googlesource.com/c/76312
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-10-04 04:08:08 +00:00
Matthew Dempsky
5e8beed149 cmd/compile: remove pointer arithmetic
Change-Id: Ie4bab0b74d5a4e1aecd8501a48176b2e9a3d8c42
Reviewed-on: https://go-review.googlesource.com/c/76311
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-10-04 04:07:53 +00:00
Keith Randall
9a8372f8bd cmd/compile,runtime: remove ambiguously live logic
The previous CL introduced stack objects. This CL removes the old
ambiguously live liveness analysis. After this CL we're relying
on stack objects exclusively.

Update a bunch of liveness tests to reflect the new world.

Fixes #22350

Change-Id: I739b26e015882231011ce6bc1a7f426049e59f31
Reviewed-on: https://go-review.googlesource.com/c/134156
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-03 19:54:16 +00:00
Keith Randall
cbafcc55e8 cmd/compile,runtime: implement stack objects
Rework how the compiler+runtime handles stack-allocated variables
whose address is taken.

Direct references to such variables work as before. References through
pointers, however, use a new mechanism. The new mechanism is more
precise than the old "ambiguously live" mechanism. It computes liveness
at runtime based on the actual references among objects on the stack.

Each function records all of its address-taken objects in a FUNCDATA.
These are called "stack objects". The runtime then uses that
information while scanning a stack to find all of the stack objects on
a stack. It then does a mark phase on the stack objects, using all the
pointers found on the stack (and ancillary structures, like defer
records) as the root set. Only stack objects which are found to be
live during this mark phase will be scanned and thus retain any heap
objects they point to.

A subsequent CL will remove all the "ambiguously live" logic from
the compiler, so that the stack object tracing will be required.
For this CL, the stack tracing is all redundant with the current
ambiguously live logic.

Update #22350

Change-Id: Ide19f1f71a5b6ec8c4d54f8f66f0e9a98344772f
Reviewed-on: https://go-review.googlesource.com/c/134155
Reviewed-by: Austin Clements <austin@google.com>
2018-10-03 19:52:49 +00:00
Carlos Eduardo Seo
9aed4cc395 cmd/compile: instrinsify math/bits.Mul on ppc64x
Add SSA rules to intrinsify Mul/Mul64 on ppc64x.

benchmark             old ns/op     new ns/op     delta
BenchmarkMul-40       8.80          0.93          -89.43%
BenchmarkMul32-40     1.39          1.39          +0.00%
BenchmarkMul64-40     5.39          0.93          -82.75%

Updates #24813

Change-Id: I6e95bfbe976a2278bd17799df184a7fbc0e57829
Reviewed-on: https://go-review.googlesource.com/138917
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-10-02 18:56:06 +00:00
Brian Kessler
9eb53ab9bc cmd/compile: intrinsify math/bits.Mul
Add SSA rules to intrinsify Mul/Mul64 (AMD64 and ARM64).
SSA rules for other functions and architectures are left as a future
optimization.  Benchmark results on AMD64/ARM64 before and after SSA
implementation are below.

amd64
name     old time/op  new time/op  delta
Add-4    1.78ns ± 0%  1.85ns ±12%     ~     (p=0.397 n=4+5)
Add32-4  1.71ns ± 1%  1.70ns ± 0%     ~     (p=0.683 n=5+5)
Add64-4  1.80ns ± 2%  1.77ns ± 0%   -1.22%  (p=0.048 n=5+5)
Sub-4    1.78ns ± 0%  1.78ns ± 0%     ~     (all equal)
Sub32-4  1.78ns ± 1%  1.78ns ± 0%     ~     (p=1.000 n=5+5)
Sub64-4  1.78ns ± 1%  1.78ns ± 0%     ~     (p=0.968 n=5+4)
Mul-4    11.5ns ± 1%   1.8ns ± 2%  -84.39%  (p=0.008 n=5+5)
Mul32-4  1.39ns ± 0%  1.38ns ± 3%     ~     (p=0.175 n=5+5)
Mul64-4  6.85ns ± 1%  1.78ns ± 1%  -73.97%  (p=0.008 n=5+5)
Div-4    57.1ns ± 1%  56.7ns ± 0%     ~     (p=0.087 n=5+5)
Div32-4  18.0ns ± 0%  18.0ns ± 0%     ~     (all equal)
Div64-4  56.4ns ±10%  53.6ns ± 1%     ~     (p=0.071 n=5+5)

arm64
name      old time/op  new time/op  delta
Add-96    5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Add32-96  5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Add64-96  5.52ns ± 0%  5.51ns ± 0%     ~     (p=0.444 n=5+5)
Sub-96    5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Sub32-96  5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Sub64-96  5.51ns ± 0%  5.51ns ± 0%     ~     (all equal)
Mul-96    34.6ns ± 0%   5.0ns ± 0%  -85.52%  (p=0.008 n=5+5)
Mul32-96  4.51ns ± 0%  4.51ns ± 0%     ~     (all equal)
Mul64-96  21.1ns ± 0%   5.0ns ± 0%  -76.26%  (p=0.008 n=5+5)
Div-96    64.7ns ± 0%  64.7ns ± 0%     ~     (all equal)
Div32-96  17.0ns ± 0%  17.0ns ± 0%     ~     (all equal)
Div64-96  53.1ns ± 0%  53.1ns ± 0%     ~     (all equal)

Updates #24813

Change-Id: I9bda6d2102f65cae3d436a2087b47ed8bafeb068
Reviewed-on: https://go-review.googlesource.com/129415
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-09-26 20:35:57 +00:00
erifan01
8149db4f64 cmd/compile: intrinsify math.RoundToEven and math.Abs on arm64
math.RoundToEven can be done by one arm64 instruction FRINTND, intrinsify it to improve performance.
The current pure Go implementation of the function Abs is translated into five instructions on arm64:
str, ldr, and, str, ldr. The intrinsic implementation requires only one instruction, so in terms of
performance, intrinsify it is worthwhile.

Benchmarks:
name           old time/op  new time/op  delta
Abs-8          3.50ns ± 0%  1.50ns ± 0%  -57.14%  (p=0.000 n=10+10)
RoundToEven-8  9.26ns ± 0%  1.50ns ± 0%  -83.80%  (p=0.000 n=10+10)

Change-Id: I9456b26ab282b544dfac0154fc86f17aed96ac3d
Reviewed-on: https://go-review.googlesource.com/116535
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-13 14:52:51 +00:00
erifan01
204cc14bdd cmd/compile: implement non-constant rotates using ROR on arm64
Add some rules to match the Go code like:
	y &= 63
	x << y | x >> (64-y)
or
	y &= 63
	x >> y | x << (64-y)
as a ROR instruction. Make math/bits.RotateLeft faster on arm64.

Extends CL 132435 to arm64.

Benchmarks of math/bits.RotateLeftxxN:
name            old time/op       new time/op       delta
RotateLeft-8    3.548750ns +- 1%  2.003750ns +- 0%  -43.54%  (p=0.000 n=8+8)
RotateLeft8-8   3.925000ns +- 0%  3.925000ns +- 0%     ~     (p=1.000 n=8+8)
RotateLeft16-8  3.925000ns +- 0%  3.927500ns +- 0%     ~     (p=0.608 n=8+8)
RotateLeft32-8  3.925000ns +- 0%  2.002500ns +- 0%  -48.98%  (p=0.000 n=8+8)
RotateLeft64-8  3.536250ns +- 0%  2.003750ns +- 0%  -43.34%  (p=0.000 n=8+8)

Change-Id: I77622cd7f39b917427e060647321f5513973232c
Reviewed-on: https://go-review.googlesource.com/122542
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-07 14:52:02 +00:00
Michael Munday
f94de9c9fb cmd/compile: make math/bits.RotateLeft{32,64} intrinsics on s390x
Extends CL 132435 to s390x. s390x has 32- and 64-bit variable
rotate left instructions.

Change-Id: Ic4f1ebb0e0543207ed2fc8c119e0163b428138a5
Reviewed-on: https://go-review.googlesource.com/133035
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-05 08:29:02 +00:00
Michael Munday
6f9b94ab66 cmd/compile: implement OnesCount{8,16,32,64} intrinsics on s390x
This CL implements the math/bits.OnesCount{8,16,32,64} functions
as intrinsics on s390x using the 'population count' (popcnt)
instruction. This instruction was released as the 'population-count'
facility which uses the same facility bit (45) as the
'distinct-operands' facility which is a pre-requisite for Go on
s390x. We can therefore use it without a feature check.

The s390x popcnt instruction treats a 64 bit register as a vector
of 8 bytes, summing the number of ones in each byte individually.
It then writes the results to the corresponding bytes in the
output register. Therefore to implement OnesCount{16,32,64} we
need to sum the individual byte counts using some extra
instructions. To do this efficiently I've added some additional
pseudo operations to the s390x SSA backend.

Unlike other architectures the new instruction sequence is faster
for OnesCount8, so that is implemented using the intrinsic.

name         old time/op  new time/op  delta
OnesCount    3.21ns ± 1%  1.35ns ± 0%  -58.00%  (p=0.000 n=20+20)
OnesCount8   0.91ns ± 1%  0.81ns ± 0%  -11.43%  (p=0.000 n=20+20)
OnesCount16  1.51ns ± 3%  1.21ns ± 0%  -19.71%  (p=0.000 n=20+17)
OnesCount32  1.91ns ± 0%  1.12ns ± 1%  -41.60%  (p=0.000 n=19+20)
OnesCount64  3.18ns ± 4%  1.35ns ± 0%  -57.52%  (p=0.000 n=20+20)

Change-Id: Id54f0bd28b6db9a887ad12c0d72fcc168ef9c4e0
Reviewed-on: https://go-review.googlesource.com/114675
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-03 14:35:38 +00:00
Andrew Bonventre
5ac2476748 cmd/compile: make math/bits.RotateLeft* an intrinsic on amd64
Previously, pattern matching was good enough to achieve good performance
for the RotateLeft* functions, but the inlining cost for them was much
too high. Make RotateLeft* intrinsic on amd64 as a stop-gap for now to
reduce inlining costs.

This should be done (or at least looked at) for other architectures
as well.

Updates golang/go#17566

Change-Id: I6a106ff00b6c4e3f490650af3e083ed2be00c819
Reviewed-on: https://go-review.googlesource.com/132435
Run-TryBot: Andrew Bonventre <andybons@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-08-30 22:48:28 +00:00
Yury Smolsky
4cc027fb55 cmd/compile: display AST IR in ssa.html
This change adds a new column, AST IR. That column contains
nodes for a function specified in $GOSSAFUNC.

Also this CL enables horizontal scrolling of sources and AST columns.

Fixes #26662

Change-Id: I3fba39fd998bb05e9c93038e8ec2384c69613b24
Reviewed-on: https://go-review.googlesource.com/126858
Run-TryBot: Yury Smolsky <yury@smolsky.by>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-24 19:11:12 +00:00
Kazuhiro Sera
ad644d2e86 all: fix typos detected by github.com/client9/misspell
Change-Id: Iadb3c5de8ae9ea45855013997ed70f7929a88661
GitHub-Last-Rev: ae85bcf82be8fee533e2b9901c6133921382c70a
GitHub-Pull-Request: golang/go#26920
Reviewed-on: https://go-review.googlesource.com/128955
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-08-23 15:54:07 +00:00
Yury Smolsky
9e2a04d5eb cmd/compile: add sources for inlined functions to ssa.html
This CL adds the source code of all inlined functions
into the function specified in $GOSSAFUNC.
The code is appended to the sources column of ssa.html.

ssaDumpInlined is populated with references to inlined functions.
Then it is used for dumping the sources in buildssa.

The source columns contains code in following order:
target function, inlined functions sorted by filename, lineno.

Fixes #25904

Change-Id: I4f6d4834376f1efdfda1f968a5335c0543ed36bc
Reviewed-on: https://go-review.googlesource.com/126606
Run-TryBot: Yury Smolsky <yury@smolsky.by>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-23 05:11:33 +00:00
Yury Smolsky
c35069d642 cmd/compile: clean the output of GOSSAFUNC
Since we print almost everything to ssa.html in the GOSSAFUNC mode,
there is a need to stop spamming stdout when user just wants to see
ssa.html.

This changes cleans output of the GOSSAFUNC debug mode.
To enable the dump of the debug data to stdout, one must
put suffix + after the function name like that:

GOSSAFUNC=Foo+

Otherwise gc will not print the IR and ASM to stdout after each phase.
AST IR is still sent to stdout because it is not included
into ssa.html. It will be fixed in a separate change.

The change adds printing out the full path to the ssa.html file.

Updates #25942

Change-Id: I711e145e05f0443c7df5459ca528dced273a62ee
Reviewed-on: https://go-review.googlesource.com/126603
Run-TryBot: Yury Smolsky <yury@smolsky.by>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-23 05:11:17 +00:00
Yury Smolsky
e34f660a52 cmd/compile: cache the value of environment variable GOSSAFUNC
Store the value of GOSSAFUNC in a global variable to avoid
multiple calls to os.Getenv from gc.buildssa and gc.mkinlcall1.
The latter is implemented in the CL 126606.

Updates #25942

Change-Id: I58caaef2fee23694d80dc5a561a2e809bf077fa4
Reviewed-on: https://go-review.googlesource.com/126604
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-22 21:16:19 +00:00
Yury Smolsky
8c0425825c cmd/compile: display Go code for a function in ssa.html
This CL adds the "sources" column at the beginning of SSA table.
This column displays the source code for the function being passed
in the GOSSAFUNC env variable.

Also UI was extended so that clicking on particular line will
highlight all places this line is referenced.

JS code was cleaned and formatted.

This CL does not handle inlined functions. See issue 25904.

Change-Id: Ic7833a0b05e38795f4cf090f3dc82abf62d97026
Reviewed-on: https://go-review.googlesource.com/119035
Run-TryBot: Yury Smolsky <yury@smolsky.by>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-22 19:58:07 +00:00
Daniel Martí
c180798956 cmd/compile: fix racy setting of gc's Config.Race
ssaConfig.Race was being set by many goroutines concurrently, resulting
in a data race seen below. This was very likely introduced by CL 121235.

	WARNING: DATA RACE
	Write at 0x00c000344408 by goroutine 12:
	  cmd/compile/internal/gc.buildssa()
	      /workdir/go/src/cmd/compile/internal/gc/ssa.go:134 +0x7a8
	  cmd/compile/internal/gc.compileSSA()
	      /workdir/go/src/cmd/compile/internal/gc/pgen.go:259 +0x5d
	  cmd/compile/internal/gc.compileFunctions.func2()
	      /workdir/go/src/cmd/compile/internal/gc/pgen.go:323 +0x5a

	Previous write at 0x00c000344408 by goroutine 11:
	  cmd/compile/internal/gc.buildssa()
	      /workdir/go/src/cmd/compile/internal/gc/ssa.go:134 +0x7a8
	  cmd/compile/internal/gc.compileSSA()
	      /workdir/go/src/cmd/compile/internal/gc/pgen.go:259 +0x5d
	  cmd/compile/internal/gc.compileFunctions.func2()
	      /workdir/go/src/cmd/compile/internal/gc/pgen.go:323 +0x5a

	Goroutine 12 (running) created at:
	  cmd/compile/internal/gc.compileFunctions()
	      /workdir/go/src/cmd/compile/internal/gc/pgen.go:321 +0x39b
	  cmd/compile/internal/gc.Main()
	      /workdir/go/src/cmd/compile/internal/gc/main.go:651 +0x437d
	  main.main()
	      /workdir/go/src/cmd/compile/main.go:51 +0x100

	Goroutine 11 (running) created at:
	  cmd/compile/internal/gc.compileFunctions()
	      /workdir/go/src/cmd/compile/internal/gc/pgen.go:321 +0x39b
	  cmd/compile/internal/gc.Main()
	      /workdir/go/src/cmd/compile/internal/gc/main.go:651 +0x437d
	  main.main()
	      /workdir/go/src/cmd/compile/main.go:51 +0x100

Instead, set up the field exactly once as part of initssaconfig.

Change-Id: I2c30c6b1cf92b8fd98e7cb5c2e10c526467d0b0a
Reviewed-on: https://go-review.googlesource.com/130375
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2018-08-21 12:29:21 +00:00
Tobias Klauser
e7f59f0284 cmd/compile/internal/gc: unexport Deferproc and Newproc
They are no longer used outside the package since CL 38080.

Passes toolstash-check -all

Change-Id: I30977ed2b233b7c8c53632cc420938bc3b0e37c6
Reviewed-on: https://go-review.googlesource.com/129781
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-21 08:12:06 +00:00
Ilya Tocar
4201c2077e cmd/compile: omit racefuncentry/exit when they are not needed
When compiling with -race, we insert calls to racefuncentry,
into every function. Add a rule that removes them in leaf functions,
without instrumented loads/stores.
Shaves ~30kb from "-race" version of go tool:

file difference:
go_old 15626192
go_new 15597520 [-28672 bytes]

section differences:
global text (code) = -24513 bytes (-0.358598%)
read-only data = -5849 bytes (-0.167064%)
Total difference -30362 bytes (-0.097928%)

Fixes #24662

Change-Id: Ia63bf1827f4cf2c25e3e28dcd097c150994ade0a
Reviewed-on: https://go-review.googlesource.com/121235
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-20 22:07:22 +00:00
Richard Musiol
81555cb4f3 cmd/compile/internal/gc: add nil check for closure call on wasm
This commit adds an explicit nil check for closure calls on wasm,
so calling a nil func causes a proper panic instead of crashing on the
WebAssembly level.

Change-Id: I6246844f316677976cdd420618be5664444c25ae
Reviewed-on: https://go-review.googlesource.com/127759
Run-TryBot: Richard Musiol <neelance@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-14 09:19:38 +00:00