62 Commits

Author SHA1 Message Date
Cherry Zhang
ee330385ca cmd/internal/obj, runtime: preempt & restart some instruction sequences
On some architectures, for async preemption the injected call
needs to clobber a register (usually REGTMP) in order to return
to the preempted function. As a consequence, the PC ranges where
REGTMP is live are not preemptible.

The uses of REGTMP are usually generated by the assembler, where
it needs to load or materialize a large constant or offset that
doesn't fit into the instruction. In those cases, REGTMP is not
live at the start of the instruction sequence. Instead of giving
up preemption in those cases, we could preempt it and restart the
sequence when resuming the execution. Basically, this is like
reissuing an interrupted instruction, except that here the
"instruction" is a Prog that consists of multiple machine
instructions. For this to work, we need to generate PC data to
mark the start of the Prog.

Currently this is only done for ARM64.

TODO: the split-stack function prologue is currently not async
preemptible. We could use this mechanism, preempt it and restart
at the function entry.

Change-Id: I37cb282f8e606e7ab6f67b3edfdc6063097b4bd1
Reviewed-on: https://go-review.googlesource.com/c/go/+/208126
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-05-06 15:41:12 +00:00
Ruixin(Peter) Bao
a45ea55da7 cmd/internal: allow ADDE to work with memory location on s390x
Originally on s390x, ADDE does not work when adding numbers from a memory location.
For example: ADDE (R3), R4 will result in a failure.

Since ADDC, ADD and ADDW already supports adding from memory location,
let's support that for ADDE as well.

Change-Id: I7cbe112ea154733a621b948c6a21bbee63fb0c62
Reviewed-on: https://go-review.googlesource.com/c/go/+/229304
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-22 11:37:03 +00:00
Ruixin(Peter) Bao
553a8626ba cmd/internal: add MVCIN instruction to s390x assembler
On s390x, we already have MVCIN opcode in asmz.go,
but we did not use it. This CL uses that opcode and adds MVCIN
instruction.

MVCIN instruction can be used to move data from one storage location
to another while reversing the order of bytes within the field. This
could be useful when transforming data from little-endian to big-endian.

Change-Id: Ifa1a911c0d3442f4a62f91f74ed25b196d01636b
Reviewed-on: https://go-review.googlesource.com/c/go/+/227478
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-07 15:03:09 +00:00
Russ Cox
fc8a6336d1 cmd/asm, cmd/compile, runtime: add -spectre=ret mode
This commit extends the -spectre flag to cmd/asm and adds
a new Spectre mitigation mode "ret", which enables the use
of retpolines.

Retpolines prevent speculation about the target of an indirect
jump or call and are described in more detail here:
https://support.google.com/faqs/answer/7625886

Change-Id: I4f2cb982fa94e44d91e49bd98974fd125619c93a
Reviewed-on: https://go-review.googlesource.com/c/go/+/222661
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-13 19:05:54 +00:00
Cherry Zhang
4751db93ef cmd/internal/obj/s390x: mark unsafe points
For async preemption, we will be using REGTMP as a temporary
register in injected call on S390X, which will clobber it. So any
code that uses REGTMP is not safe for async preemption.

In the assembler backend, we expand a Prog to multiple machine
instructions and use REGTMP as a temporary register if necessary.
These need to be marked unsafe. Unlike ARM64 and MIPS,
instructions on S390X are variable length so we don't use the
length as a condition. Instead, we set a bit on the Prog whenever
REGTMP is used.

Change-Id: Ie5d14068a950f4c7cea51dff2c4a8bdc19ec9348
Reviewed-on: https://go-review.googlesource.com/c/go/+/204105
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07 20:34:27 +00:00
Cherry Zhang
1da575a7bc cmd/internal/obj/s390x: add support of SPM instruction
For restoring condition code (we already support IPM instruction
for saving condition code).

Change-Id: I56d376df44a5f831134a130d052521cec6b5b781
Reviewed-on: https://go-review.googlesource.com/c/go/+/204104
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-11-04 17:19:36 +00:00
Ruixin(Peter) Bao
a8fc82f77a cmd/asm/internal/asm/testdata/s390x: add test cases for some assembly instructions
From CL 199979, I noticed that there were some
instructions not covered by the test cases. Added those in this CL.

Additional tests for assembly instructions are also added
based on suggestions made during the review of this CL.

Previously, VSB and VSH are not included in asmz.go, they were also
added in this patch.

Change-Id: I6060a9813b483a161d61ad2240c30eec6de61536
Reviewed-on: https://go-review.googlesource.com/c/go/+/203721
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2019-11-04 15:43:40 +00:00
Michael Munday
38c4a73706 cmd/asm: add s390x branch-on-count instructions
The branch-on-count instructions on s390x decrement the input
register and then compare its value to 0. If not equal the branch
is taken.

These instructions are useful for implementing loops with a set
number of iterations (which might be in a register).

For example, this for loop:

	for i := 0; i < n; i++ {
		... // i is not used or modified in the loop
	}

Could be implemented using this assembly:

	MOVD  Rn, Ri
loop:
	...
	BRCTG Ri, loop

Note that i will count down from n in the assembly whereas in the
original for loop it counted up to n which is why we can't use i
in the loop.

These instructions will only be used in hand-written codegen and
assembly for now since SSA blocks cannot currently modify values.
We could look into this in the future though.

Change-Id: Iaab93b8aa2699513b825439b8ea20d8fe2ea1ee6
Reviewed-on: https://go-review.googlesource.com/c/go/+/199977
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-09 15:04:59 +00:00
Michael Munday
cf03238020 cmd/compile: use numeric condition code masks on s390x
Prior to this CL conditional branches on s390x always used an
extended mnemonic such as BNE, BLT and so on to represent branch
instructions with different condition code masks. This CL adds
support for numeric condition code masks to the s390x SSA backend
so that we can encode the condition under which a Block's
successor is chosen as a field in that Block rather than in its
type.

This change will be useful as we come to add support for combined
compare-and-branch instructions. Rather than trying to add extended
mnemonics for every possible combination of mask and compare-and-
branch instruction we can instead use a single mnemonic for each
instruction.

Change-Id: Idb7458f187b50906877d683695c291dff5279553
Reviewed-on: https://go-review.googlesource.com/c/go/+/197178
Reviewed-by: Keith Randall <khr@golang.org>
2019-09-26 14:47:12 +00:00
Michael Munday
8c99e45ef9 cmd/asm: add masked branch and conditional load instructions to s390x
The branch-relative-on-condition (BRC) instruction allows us to use
an immediate to specify under what conditions the branch is taken.
For example, `BRC $7, L1` is equivalent to `BNE L1`. It is sometimes
useful to specify branches in this way when either we don't have
an extended mnemonic for a particular mask value or we want to
generate the condition code mask programmatically.

The new load-on-condition (LOCR and LOCGR) and compare-and-branch
(CRJ, CGRJ, CLRJ, CLGRJ, CIJ, CGIJ, CLIJ and CLGIJ) instructions
provide the same flexibility for conditional loads and combined
compare and branch instructions.

Change-Id: Ic6f5d399b0157e278b39bd3645f4ee0f4df8e5fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/196558
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-09-25 22:24:41 +00:00
Ruixin Bao
98aa97806b cmd/compile: add math/bits.Mul64 intrinsic on s390x
This change adds an intrinsic for Mul64 on s390x. To achieve that,
a new assembly instruction, MLGR, is introduced in s390x/asmz.go. This assembly
instruction directly uses an existing instruction on Z and supports multiplication
of two 64 bit unsigned integer and stores the result in two separate registers.

In this case, we require the multiplcand to be stored in register R3 and
the output result (the high and low 64 bit of the product) to be stored in
R2 and R3 respectively.

A test case is also added.

Benchmark:
name      old time/op  new time/op  delta
Mul-18    11.1ns ± 0%   1.4ns ± 0%  -87.39%  (p=0.002 n=8+10)
Mul32-18  2.07ns ± 0%  2.07ns ± 0%     ~     (all equal)
Mul64-18  11.1ns ± 1%   1.4ns ± 0%  -87.42%  (p=0.000 n=10+10)

Change-Id: Ieca6ad1f61fff9a48a31d50bbd3f3c6d9e6675c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/194572
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-13 09:04:48 +00:00
Ruixin Bao
90e0b40ab2 cmd/internal/obj/s390x: use 12 bit load and store instruction when possible on s390x
Originally, we default to use load and store instruction with 20 bit displacement.
However, that is not necessary. Some instructions have a displacement smaller
than 12 bit. This CL allows the usage of 12 bit load and store instruction when
that happens.

This change also reduces the size of .text section in go binary by 19 KB.

Some tests are also added to verify the functionality of the change.

Change-Id: I13edea06ca653d4b9ffeaefe8d010bc2f065c2ba
Reviewed-on: https://go-review.googlesource.com/c/go/+/194857
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-12 15:49:54 +00:00
Ruixin(Peter) Bao
11c2411c50 cmd/compile/internal/s390x: replace 4-byte NOP with a 2-byte NOP on s390x
Added a new instruction, NOPH, with the encoding [0x0700](i.e: bcr 0, 0) and
replace the current 4-byte nop that was encoded using the WORD instruction.

This reduces the size of .text section in go binary by around 17KB and make
generated code easier to read.

Change-Id: I6a756df39e93c4415ea6d038ba4af001b8ccb286
Reviewed-on: https://go-review.googlesource.com/c/go/+/194344
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-11 12:19:26 +00:00
Shulhan
ed7f323c8f all: simplify code using "gofmt -s -w"
Most changes are removing redundant declaration of type when direct
instantiating value of map or slice, e.g. []T{T{}} become []T{{}}.

Small changes are removing the high order of subslice if its value
is the length of slice itself, e.g. T[:len(T)] become T[:].

The following file is excluded due to incompatibility with go1.4,

- src/cmd/compile/internal/gc/ssa.go

Change-Id: Id3abb09401795ce1e6da591a89749cba8502fb26
Reviewed-on: https://go-review.googlesource.com/c/go/+/166437
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-05-06 22:19:22 +00:00
Michael Munday
2c1b5130aa cmd/compile: add math/bits.{Add,Sub}64 intrinsics on s390x
This CL adds intrinsics for the 64-bit addition and subtraction
functions in math/bits. These intrinsics use the condition code
to propagate the carry or borrow bit.

To make the carry chains more efficient I've removed the
'clobberFlags' property from most of the load and store
operations. Originally these ops did clobber flags when using
offsets that didn't fit in a signed 20-bit integer, however
that is no longer true.

As with other platforms the intrinsics are faster when executed
in a chain rather than a loop because currently we need to spill
and restore the carry bit between each loop iteration. We may
be able to reduce the need to do this on s390x (e.g. by using
compare-and-branch instructions that do not clobber flags) in the
future.

name           old time/op  new time/op  delta
Add64          1.21ns ± 2%  2.03ns ± 2%  +67.18%  (p=0.000 n=7+10)
Add64multiple  2.98ns ± 3%  1.03ns ± 0%  -65.39%  (p=0.000 n=10+9)
Sub64          1.23ns ± 4%  2.03ns ± 1%  +64.85%  (p=0.000 n=10+10)
Sub64multiple  3.73ns ± 4%  1.04ns ± 1%  -72.28%  (p=0.000 n=10+8)

Change-Id: I913bbd5e19e6b95bef52f5bc4f14d6fe40119083
Reviewed-on: https://go-review.googlesource.com/c/go/+/174303
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-05-03 10:41:15 +00:00
Michael Munday
0f79510dc5 cmd/asm: add s390x 'rotate then ... selected bits' instructions
This CL adds the following instructions, useful for shifting/rotating
and masking operations:

 * RNSBG - rotate then and selected bits
 * ROSBG - rotate then or selected bits
 * RXSBG - rotate then exclusive or selected bits
 * RISBG - rotate then insert selected bits

It also adds the 'T' (test), 'Z' (zero), 'H' (high), 'L' (low) and
'N' (no test) variants of these instructions as appropriate.

Operands are ordered as: I₃, I₄, I₅, R₂, R₁.

Key: I₃=start, I₄=end, I₅=amount, R₂=source, R₁=destination

Change-Id: I200d12287e1df7447f37f4919da5e9a93d27c792
Reviewed-on: https://go-review.googlesource.com/c/go/+/159357
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-16 09:17:24 +00:00
Michael Munday
9c843f031d cmd/internal/obj/s390x: handle RestArgs in s390x assembler
Allow up to 3 RestArgs arguments to be specified. This is needed to
for us to add the 'rotate and ... bits' instructions, which require
5 arguments, cleanly.

Change-Id: I76b89adfb5e3cd85a43023e412f0cc202d489e0b
Reviewed-on: https://go-review.googlesource.com/c/go/+/171726
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-16 09:17:14 +00:00
Michael Munday
e61985427e cmd/internal/obj/s390x: remove param field from optab
The param field isn't useful, we can just use REGSP instead.

Change-Id: I2ac68131c390209cc84e43aa7620ccbf5ae69120
Reviewed-on: https://go-review.googlesource.com/c/go/+/171725
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-16 09:17:06 +00:00
Michael Munday
6966b67510 cmd/asm: add 'insert program mask' instruction for s390x
This CL adds the 'insert program mask' (IPM) instruction to s390x.
IPM stores the current program mask (which contains the condition
code) into a general purpose register.

This instruction will be useful when implementing intrinsics for
the arithmetic functions in the math/bits package. We can also
potentially use it to convert some condition codes into bool
values.

The condition code can be saved and restored using an instruction
sequence such as:

  IPM  R4          // save condition code to R4
  ...
  TMLH R4, $0x3000 // restore condition code from R4

We can also use IPM to save the carry bit to a register using an
instruction sequence such as:

  IPM     R4                   // save condition code to R4
  RISBLGZ $31, $31, $3, R4, R4 // isolate carry bit in R4

Change-Id: I169d450b6ea1a7ff8c0286115ddc42618da8a2f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/165997
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-29 17:34:06 +00:00
Ian Lance Taylor
e546ef123e cmd/internal/obj/s390x: don't crash on invalid instruction
I didn't bother with a test as there doesn't seem to be an existing
framework for testing assembler failures, and tests for invalid code
aren't all that interesting.

Fixes #26700

Change-Id: I719410d83527802a09b9d38625954fdb36a3c0f7
Reviewed-on: https://go-review.googlesource.com/c/153177
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-12-07 15:43:09 +00:00
bill_ofarrell
3f3142ad99 cmd/asm: add s390x VMSLG instruction variants
VMSLG has three variants on z14 and later machines. These variants are used in "limbified" squaring:
VMSLEG: Even Shift Indication -- the even-indexed intermediate result is doubled
VMSLOG: Odd Shift Indication -- the odd-indexed intermediate result is doubled
VMSLEOG: Even and Odd Shift Indication -- both intermediate results are doubled
Limbified squaring is very useful for high performance cryptographic algorithms, such as
elliptic curve. This change allows these instructions to be used in Go assembly.

Change-Id: Iaad577b07320205539f99b3cb37a2a984882721b
Reviewed-on: https://go-review.googlesource.com/c/145180
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2018-10-29 09:54:51 +00:00
Michael Munday
6f9b94ab66 cmd/compile: implement OnesCount{8,16,32,64} intrinsics on s390x
This CL implements the math/bits.OnesCount{8,16,32,64} functions
as intrinsics on s390x using the 'population count' (popcnt)
instruction. This instruction was released as the 'population-count'
facility which uses the same facility bit (45) as the
'distinct-operands' facility which is a pre-requisite for Go on
s390x. We can therefore use it without a feature check.

The s390x popcnt instruction treats a 64 bit register as a vector
of 8 bytes, summing the number of ones in each byte individually.
It then writes the results to the corresponding bytes in the
output register. Therefore to implement OnesCount{16,32,64} we
need to sum the individual byte counts using some extra
instructions. To do this efficiently I've added some additional
pseudo operations to the s390x SSA backend.

Unlike other architectures the new instruction sequence is faster
for OnesCount8, so that is implemented using the intrinsic.

name         old time/op  new time/op  delta
OnesCount    3.21ns ± 1%  1.35ns ± 0%  -58.00%  (p=0.000 n=20+20)
OnesCount8   0.91ns ± 1%  0.81ns ± 0%  -11.43%  (p=0.000 n=20+20)
OnesCount16  1.51ns ± 3%  1.21ns ± 0%  -19.71%  (p=0.000 n=20+17)
OnesCount32  1.91ns ± 0%  1.12ns ± 1%  -41.60%  (p=0.000 n=19+20)
OnesCount64  3.18ns ± 4%  1.35ns ± 0%  -57.52%  (p=0.000 n=20+20)

Change-Id: Id54f0bd28b6db9a887ad12c0d72fcc168ef9c4e0
Reviewed-on: https://go-review.googlesource.com/114675
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-03 14:35:38 +00:00
Michael Munday
9fa988547a cmd/internal/obj/s390x: increase maximum number of loop iterations
The maximum number of 'spanz' iterations that the s390x assembler
performs to reach a fixed point for relative offsets was 10. This
turned out to be too aggressive for one example of auto-generated
fuzzing code. Increase the number of iterations by 10x to reduce
the likelihood that the limit will be hit again. This limit only
exists to help find bugs in the assembler.

master at tip does not fail with the example code in the issue, I
have therefore not submitted it as a test (it is also quite large).
I tested this change with the example code at the commit given and
it fixes the issue.

Fixes #25269.

Change-Id: I0e44948957a7faff51c7d27c0b7746ed6e2d47bb
Reviewed-on: https://go-review.googlesource.com/122235
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-07-05 07:21:50 +00:00
Daniel Martí
9ecf899b29 cmd: remove some unnecessary gotos
Pick the low-hanging fruit, which are the gotos that don't go very far
and labels that aren't used often. All of them have easy replacements
with breaks and returns.

One slightly tricky rewrite is defaultlitreuse. We cannot use a defer
func to reset lineno, because one of its return paths does not reset
lineno, and thus broke toolstash -cmp.

Passes toolstash -cmp on std cmd.

Change-Id: Id1c0967868d69bb073addc7c5c3017ca91ff966f
Reviewed-on: https://go-review.googlesource.com/110063
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-05-01 10:46:08 +00:00
bill_ofarrell
3c65bb5b90 cmd/asm: add s390x VMSLG instruction
This instruction was introduced on the z14 to accelerate "limbified"
multiplications for certain cryptographic algorithms. This change allows
it to be used in Go assembly.

Change-Id: Ic93dae7fec1756f662874c08a5abc435bce9dd9e
Reviewed-on: https://go-review.googlesource.com/109695
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-28 22:15:25 +00:00
Michael Munday
32e6461dc6 cmd/asm, math: add s390x floating point test instructions
Floating point test instructions allow special cases (NaN, ±∞ and
a few other useful properties) to be checked directly.

This CL adds the following instructions to the assembler:
 * LTEBR - load and test (float32)
 * LTDBR - load and test (float64)
 * TCEB  - test data class (float32)
 * TCDB  - test data class (float64)

Note that I have only added immediate versions of the 'test data
class' instructions for now as that's the only case I think the
compiler will use.

Change-Id: I3398aab2b3a758bf909bd158042234030c8af582
Reviewed-on: https://go-review.googlesource.com/104457
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-03 16:08:04 +00:00
Michael Munday
c280126557 cmd/asm, cmd/internal/obj/s390x, math: add "test under mask" instructions
Adds the following s390x test under mask (immediate) instructions:

TMHH
TMHL
TMLH
TMLL

These are useful for testing bits and are already used in the math package.

Change-Id: Idffb3f83b238dba76ac1e42ac6b0bf7f1d11bea2
Reviewed-on: https://go-review.googlesource.com/41092
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-30 23:55:14 +00:00
Michael Munday
96cdacb971 cmd/asm, cmd/compile: optimize math.Abs and math.Copysign on s390x
This change adds three new instructions:

- LPDFR: load positive (math.Abs(x))
- LNDFR: load negative (-math.Abs(x))
- CPSDR: copy sign (math.Copysign(x, y))

By making use of GPR <-> FPR moves we can now compile math.Abs and
math.Copysign to these instructions using SSA rules.

This CL also adds new rules to merge address generation into combined
load operations. This makes GPR <-> FPR move matching more reliable.

name                 old time/op  new time/op  delta
Copysign             1.85ns ± 0%  1.40ns ± 1%  -24.65%  (p=0.000 n=8+10)
Abs                  1.58ns ± 1%  0.73ns ± 1%  -53.64%  (p=0.000 n=10+10)

The geo mean improvement for all math package benchmarks was 4.6%.

Change-Id: I0cec35c5c1b3fb45243bf666b56b57faca981bc9
Reviewed-on: https://go-review.googlesource.com/73950
Run-TryBot: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-30 23:42:51 +00:00
isharipo
8c67f210a1 cmd/internal/obj: change Prog.From3 to RestArgs ([]Addr)
This change makes it easier to express instructions
with arbitrary number of operands.

Rationale: previous approach with operand "hiding" does
not scale well, AVX and especially AVX512 have many
instructions with 3+ operands.

x86 asm backend is updated to handle up to 6 explicit operands.
It also fixes issue with 4-th immediate operand type checks.
All `ytab` tables are updated accordingly.

Changes to non-x86 backends only include these patterns:
`p.From3 = X` => `p.SetFrom3(X)`
`p.From3.X = Y` => `p.GetFrom3().X = Y`

Over time, other backends can adapt Prog.RestArgs
and reduce the amount of workarounds.

-- Performance --

x/benchmark/build:

$ benchstat upstream.bench patched.bench
name      old time/op                 new time/op                 delta
Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)

name      old binary-size             new binary-size             delta
Build-48                  10.3M ± 0%                  10.3M ± 0%   ~     (all equal)

name      old build-time/op           new build-time/op           delta
Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)

name      old build-peak-RSS-bytes    new build-peak-RSS-bytes    delta
Build-48                  145MB ± 5%                  148MB ± 5%   ~     (p=0.218 n=10+10)

name      old build-user+sys-time/op  new build-user+sys-time/op  delta
Build-48                  21.0s ± 2%                  21.2s ± 2%   ~     (p=0.075 n=10+10)

Microbenchmark shows a slight slowdown.

name        old time/op  new time/op  delta
AMD64asm-4  49.5ms ± 1%  49.9ms ± 1%  +0.67%  (p=0.001 n=23+15)

func BenchmarkAMD64asm(b *testing.B) {
  for i := 0; i < b.N; i++ {
    TestAMD64EndToEnd(nil)
    TestAMD64Encoder(nil)
  }
}

Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183
Reviewed-on: https://go-review.googlesource.com/63490
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-15 21:05:03 +00:00
Cherry Zhang
f20944de78 cmd/compile: set/unset base register for better assembly print
For address of an auto or arg, on all non-x86 architectures
the assembler backend encodes the actual SP offset in the
instruction but leaves the offset in Prog unchanged. When the
assembly is printed in compile -S, it shows an offset
relative to pseudo FP/SP with an actual hardware SP base
register (e.g. R13 on ARM). This is confusing. Unset the
base register if it is indeed SP, so the assembly output is
consistent. If the base register isn't SP, it should be an
error and the error output contains the actual base register.

For address loading instructions, the base register isn't set
in the compiler on non-x86 architectures. Set it. Normally it
is SP and will be unset in the change mentioned above for
printing. If it is not, it will be an error and the error
output contains the actual base register.

No change in generated binary, only printed assembly. Passes
"go build -a -toolexec 'toolstash -cmp' std cmd" on all
architectures.

Fixes #21064.

Change-Id: Ifafe8d5f9b437efbe824b63b3cbc2f5f6cdc1fd5
Reviewed-on: https://go-review.googlesource.com/49432
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-08-02 12:24:02 +00:00
Michael Munday
35cf3843a4 cmd/{asm,compile}: avoid zeroAuto clobbering flags on s390x
This CL modifies how MOV[DWHB] instructions that store a constant to
memory are assembled to avoid them clobbering the condition code
(flags). It also modifies zeroAuto to use MOVD instructions instead of
CLEAR (which is assembled as XC).

MOV[DWHB]storeconst ops also no longer clobbers flags.

Note: this CL modifies the assembler so that it can no longer handle
immediates outside the range of an int16 or offsets from SB, which
reflects what the machine instructions support. The compiler doesn't
need this capability any more and I don't think this affects any existing
assembly, but it is easy to workaround if it does.

Fixes #20187.

Change-Id: Ie54947ff38367bd6a19962bf1a6d0296a4accffb
Reviewed-on: https://go-review.googlesource.com/42179
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-05-02 17:43:31 +00:00
Michael Hudson-Doyle
d2a9545178 cmd/internal: remove SymKind values that are only checked for, never set
Change-Id: Id152767c033c12966e9e12ae303b99f38776f919
Reviewed-on: https://go-review.googlesource.com/40987
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-28 20:01:54 +00:00
Michael Munday
db6f3bbc9a cmd: fix the order that s390x operands are printed in
The assembler reordered the operands of some instructions to put the
first operand into From3. Unfortunately this meant that when the
instructions were printed the operands were in a different order than
the assembler would expect as input. For example, 'MVC $8, (R1), (R2)'
would be printed as 'MVC (R1), $8, (R2)'.

Originally this was done to ensure that From contained the source
memory operand. The current compiler no longer requires this and so
this CL simply makes all instructions use the standard order for
operands: From, Reg, From3 and finally To.

Fixes #18295

Change-Id: Ib2b5ec29c647ca7a995eb03dc78f82d99618b092
Reviewed-on: https://go-review.googlesource.com/40299
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-25 15:16:56 +00:00
Matthew Dempsky
1e3570ac86 cmd/internal/objabi: extract shared functionality from obj
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing
the assembler backends no longer requires reinstalling cmd/link or
cmd/addr2line.

There's also now one canonical definition of the object file format in
cmd/internal/objabi/doc.go, with a warning to update all three
implementations.

objabi is still something of a grab bag of unrelated code (e.g., flag
and environment variable handling probably belong in a separate "tool"
package), but this is still progress.

Fixes #15165.
Fixes #20026.

Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c
Reviewed-on: https://go-review.googlesource.com/40972
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-19 00:00:09 +00:00
Matthew Dempsky
1747078695 cmd/internal/obj: un-embed FuncInfo field in LSym
Automated refactoring using github.com/mdempsky/unbed (to rewrite
s.Foo to s.FuncInfo.Foo) and then gorename (to rename the FuncInfo
field to just Func).

Passes toolstash-check -all.

Change-Id: I802c07a1239e0efea058a91a87c5efe12170083a
Reviewed-on: https://go-review.googlesource.com/40670
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-18 17:29:50 +00:00
Michael Munday
eed6938cbb cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions
The instructions allow moves between floating point and general
purpose registers without any conversion taking place.

Change-Id: I82c6f3ad9c841a83783b5be80dcf5cd538ff49e6
Reviewed-on: https://go-review.googlesource.com/38777
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-17 16:33:51 +00:00
Josh Bleecher Snyder
1e69245418 cmd/internal/obj/s390x: make assembler almost concurrency-safe
CL 39922 made the arm assembler concurrency-safe.
This CL does the same, but for s390x.
The approach is similar: introduce ctxtz to hold
function-local state and thread it through
the assembler as necessary.

One race remains after this CL, similar to CL 40252.

That race is conceptually unrelated to this refactoring,
and will be addressed in a separate CL.

Passes toolstash-check -all.

Updates #15756

Change-Id: Iabf17aa242b70c0b078c2e85dae3d93a5e512372
Reviewed-on: https://go-review.googlesource.com/40371
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
2017-04-11 15:02:28 +00:00
Josh Bleecher Snyder
63c1aff60b cmd/internal/obj: eagerly initialize assemblers
CL 38662 changed the x86 assembler to be eagerly
initialized, for a concurrent backend.

This CL puts in place a proper mechanism for doing so,
and switches all architectures to use it.

Passes toolstash-check -all.

Updates #15756

Change-Id: Id2aa527d3a8259c95797d63a2f0d1123e3ca2a1c
Reviewed-on: https://go-review.googlesource.com/39917
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-07 16:57:03 +00:00
Josh Bleecher Snyder
5b59b32c97 cmd/compile: teach assemblers to accept a Prog allocator
The existing bulk Prog allocator is not concurrency-safe.
To allow for concurrency-safe bulk allocation of Progs,
I want to move Prog allocation and caching upstream,
to the clients of cmd/internal/obj.

This is a preliminary enabling refactoring.
After this CL, instead of calling Ctxt.NewProg
throughout the assemblers, we thread through
a newprog function that returns a new Prog.

That function is set up to be Ctxt.NewProg,
so there are no real changes in this CL;
this CL only establishes the plumbing.

Passes toolstash-check -all.
Negligible compiler performance impact.

Updates #15756

name        old time/op     new time/op     delta
Template        213ms ± 3%      214ms ± 4%    ~     (p=0.574 n=49+47)
Unicode        90.1ms ± 5%     89.9ms ± 4%    ~     (p=0.417 n=50+49)
GoTypes         585ms ± 4%      584ms ± 3%    ~     (p=0.466 n=49+49)
SSA             6.50s ± 3%      6.52s ± 2%    ~     (p=0.251 n=49+49)
Flate           128ms ± 4%      128ms ± 4%    ~     (p=0.673 n=49+50)
GoParser        152ms ± 3%      152ms ± 3%    ~     (p=0.810 n=48+49)
Reflect         372ms ± 4%      372ms ± 5%    ~     (p=0.778 n=49+50)
Tar             113ms ± 5%      111ms ± 4%  -0.98%  (p=0.016 n=50+49)
XML             208ms ± 3%      208ms ± 2%    ~     (p=0.483 n=47+49)
[Geo mean]      285ms           285ms       -0.17%

name        old user-ns/op  new user-ns/op  delta
Template         253M ± 8%       254M ± 9%    ~     (p=0.899 n=50+50)
Unicode          106M ± 9%       106M ±11%    ~     (p=0.642 n=50+50)
GoTypes          736M ± 4%       740M ± 4%    ~     (p=0.121 n=50+49)
SSA             8.82G ± 3%      8.88G ± 2%  +0.65%  (p=0.006 n=49+48)
Flate            147M ± 4%       147M ± 5%    ~     (p=0.844 n=47+48)
GoParser         179M ± 4%       178M ± 6%    ~     (p=0.785 n=50+50)
Reflect          443M ± 6%       441M ± 5%    ~     (p=0.850 n=48+47)
Tar              126M ± 5%       126M ± 5%    ~     (p=0.734 n=50+50)
XML              244M ± 5%       244M ± 5%    ~     (p=0.594 n=49+50)
[Geo mean]       341M            341M       +0.11%

Change-Id: Ice962f61eb3a524c2db00a166cb582c22caa7d68
Reviewed-on: https://go-review.googlesource.com/39633
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-06 02:07:21 +00:00
Josh Bleecher Snyder
7e817859b3 cmd/internal/obj: eliminate Curp
Remove the global obj.Link.Curp.

In asmz.go, replace the only use by passing it as an argument.
In asm0.go and asm9.go, it was written but never read.
In asm5.go and asm7.go, thread it through as an argument.

Passes toolstash-check -all.

Updates #15756

Change-Id: I1a0faa89e768820f35d73a8b37ec8088d78d15f7
Reviewed-on: https://go-review.googlesource.com/38715
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-27 18:51:42 +00:00
Michael Munday
a524616860 cmd/{asm,internal/obj/s390x}, math: remove emulated float instructions
The s390x port was based on the ppc64 port and, because of the way the
port was done, inherited some instructions from it. ppc64 supports
3-operand (4-operand for FMADD etc.) floating point instructions
but s390x doesn't (the destination register is always an input) and
so these were emulated.

There is a bug in the emulation of FMADD whereby if the destination
register is also a source for the multiplication it will be
clobbered. This doesn't break any assembly code in the std lib but
could affect future work.

To fix this I have gone through the floating point instructions and
removed all unnecessary 3-/4-operand emulation. The compiler doesn't
need it and assembly writers don't need it, it's just a source of
bugs.

I've also deleted the FNMADD family of emulated instructions. They
aren't used anywhere.

Change-Id: Ic07cedcf141a6a3b43a0c84895460f6cfbf56c04
Reviewed-on: https://go-review.googlesource.com/33350
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-02-10 16:11:25 +00:00
Michael Munday
fd118b69fa cmd/asm, cmd/internal/obj/s390x: fix encoding of VREPI{H,F,G}
Also adds tests for all missing VRI-a instructions (which may be
affected by this change).

Fixes #18749.

Change-Id: I48249dda626f32555da9ab58659e2e140de6504a
Reviewed-on: https://go-review.googlesource.com/35561
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-02-01 19:37:18 +00:00
Michael Munday
3ef07c412f cmd, runtime: remove s390x 3 operand immediate logical ops
These are emulated by the assembler and we don't need them.

Change-Id: I2b07c5315a5b642fdb5e50b468453260ae121164
Reviewed-on: https://go-review.googlesource.com/31758
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-25 12:36:06 +00:00
Michael Munday
930ab0afd7 cmd/asm, cmd/internal/obj/s390x: fix VFMA and VFMS encoding
The m5 and m6 fields were the wrong way round.

Fixes #17444.

Change-Id: I10297064f2cd09d037eac581c96a011358f70aae
Reviewed-on: https://go-review.googlesource.com/31130
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-10-21 02:14:57 +00:00
Ian Lance Taylor
e32ac7978d cmd/link, cmd/internal/obj: stop exporting various names
Just happened to notice that these names (funcAlign and friends) are
never referenced outside their package, so no need to export them.

Change-Id: I4bbdaa4b0ef330c3c3ef50a2ca39593977a83545
Reviewed-on: https://go-review.googlesource.com/31496
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-10-19 21:16:58 +00:00
Michael Munday
1cfb5c3fd5 cmd/compile: merge loads into operations on s390x
Adds the new canMergeLoad function which can be used by rules to
decide whether a load can be merged into an operation. The function
ensures that the merge will not reorder the load relative to memory
operations (for example, stores) in such a way that the block can no
longer be scheduled.

This new function enables transformations such as:

MOVD 0(R1), R2
ADD  R2, R3

to:

ADD  0(R1), R3

The two-operand form of the following instructions can now read a
single memory operand:

 - ADD
 - ADDC
 - ADDW
 - MULLD
 - MULLW
 - SUB
 - SUBC
 - SUBE
 - SUBW
 - AND
 - ANDW
 - OR
 - ORW
 - XOR
 - XORW

Improves SHA3 performance by 6-8%.

Updates #15054.

Change-Id: Ibcb9122126cd1a26f2c01c0dfdbb42fe5e7b5b94
Reviewed-on: https://go-review.googlesource.com/29272
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-17 19:45:20 +00:00
Michael Munday
f13372c9f7 cmd/internal/obj/s390x: remove support for stores of global addresses
This CL removes support for MOVD instructions that store the address
of a global variable. For example:

  MOVD $main·a(SB), (R1)
  MOVD $main·b(SB), main·c(SB)

These instructions are emulated and the new backend doesn't need them
(the stores now always go through an intermediate register).

Change-Id: I3a1bcb3f19c5096ad0426afd76d35a4d7975733b
Reviewed-on: https://go-review.googlesource.com/30720
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-09 20:19:31 +00:00
Michael Munday
45b26a93f3 cmd/{asm,compile}: replace TESTB op with CMPWconst on s390x
TESTB was implemented as AND $0xff, Rx, REGTMP. Unfortunately there
is no 3-operand AND-with-immediate instruction and so it was emulated
by the assembler using two instructions.

This CL uses CMPW instead of AND and also optimizes CMPW to use
the chi instruction where possible.

Overall this CL reduces the size of the .text section of the
bin/go binary by ~2%.

Change-Id: Ic335c29fc1129378fcbb1265bfb10f5b744a0f3f
Reviewed-on: https://go-review.googlesource.com/30690
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-07 20:02:59 +00:00
Michael Munday
91706c04b9 cmd/asm, cmd/internal/obj/s390x: delete unused instructions
Deletes the following s390x instructions:

 - ADDME
 - ADDZE
 - SUBME
 - SUBZE

They appear to be emulated PPC instructions left over from the
porting process and I don't think they will ever be useful.

Change-Id: I9b1ba78019dbd1218d0c8f8ea2903878802d1990
Reviewed-on: https://go-review.googlesource.com/30538
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-06 11:45:48 +00:00
Michael Munday
dd1dcf9496 cmd/{asm,compile}: add ANDW, ORW and XORW instructions to s390x
Adds the following instructions and uses them in the SSA backend:

 - ANDW
 - ORW
 - XORW

The instruction encodings for 32-bit operations are typically shorter,
particularly when an immediate is used. For example, XORW $-1, R1
only requires one instruction, whereas XOR requires two.

Also removes some unused instructions (that were emulated):

 - ANDN
 - NAND
 - ORN
 - NOR

Change-Id: Iff2a16f52004ba498720034e354be9771b10cac4
Reviewed-on: https://go-review.googlesource.com/30291
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-06 02:59:04 +00:00