Explcitly block fused multiply-add pattern matching when a cast is used
after the multiplication, for example:
- (a * b) + c // can emit fused multiply-add
- float64(a * b) + c // cannot emit fused multiply-add
float{32,64} and complex{64,128} casts of matching types are now kept
as OCONV operations rather than being replaced with OCONVNOP operations
because they now imply a rounding operation (and therefore aren't a
no-op anymore).
Operations (for example, multiplication) on complex types may utilize
fused multiply-add and -subtract instructions internally. There is no
way to disable this behavior at the moment.
Improves the performance of the floating point implementation of
poly1305:
name old speed new speed delta
64 246MB/s ± 0% 275MB/s ± 0% +11.48% (p=0.000 n=10+8)
1K 312MB/s ± 0% 357MB/s ± 0% +14.41% (p=0.000 n=10+10)
64Unaligned 246MB/s ± 0% 274MB/s ± 0% +11.43% (p=0.000 n=10+10)
1KUnaligned 312MB/s ± 0% 357MB/s ± 0% +14.39% (p=0.000 n=10+8)
Updates #17895.
Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16
Reviewed-on: https://go-review.googlesource.com/36963
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Fix up and enable a few rules.
They trigger a handful of times in std,
despite the frontend handling.
Change-Id: I83378c057cbbc95a4f2b58cd8c36aec0e9dc547f
Reviewed-on: https://go-review.googlesource.com/37227
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Replaces pairs of shifts with sign/zero extension where possible.
For example:
(uint64(x) << 32) >> 32 -> uint64(uint32(x))
Reduces the execution time of the following code by ~4.5% on s390x:
for i := 0; i < N; i++ {
x += (uint64(i)<<32)>>32
}
Change-Id: Idb2d56f27e80a2e1366bc995922ad3fd958c51a7
Reviewed-on: https://go-review.googlesource.com/37292
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
The type of an intermediate multiply was wrong. When that
intermediate multiply was spilled, the top 32 bits were lost.
Fixes#19153
Change-Id: Ib29350a4351efa405935b7f7ee3c112668e64108
Reviewed-on: https://go-review.googlesource.com/37212
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently the conversion from constant divides to multiplies is mostly
done during the walk pass. This is suboptimal because SSA can
determine that the value being divided by is constant more often
(e.g. after inlining).
Change-Id: If1a9b993edd71be37396b9167f77da271966f85f
Reviewed-on: https://go-review.googlesource.com/37015
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
These rules trigger 116 times while running make.bash.
And at least for the sample code at
https://github.com/golang/go/issues/18906#issuecomment-277174241
they are providing optimizations not already present
in amd64.
Updates #18906
Change-Id: I410a480f566f5ab176fc573fb5ac74f9cffec225
Reviewed-on: https://go-review.googlesource.com/36217
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This is a mostly mechanical rename followed by manual fixes where necessary.
Change-Id: Ie5c670b133db978f15dc03e50dc2da0c80fc8842
Reviewed-on: https://go-review.googlesource.com/34137
Reviewed-by: David Lazar <lazard@golang.org>
Adjust cmd/compile accordingly.
This will make it easier to replace the underlying implementation.
Change-Id: I33645850bb18c839b24785b6222a9e028617addb
Reviewed-on: https://go-review.googlesource.com/34133
Reviewed-by: David Lazar <lazard@golang.org>
The code to do the conversion is smaller than the
call to the runtime.
The 1-result asserts need to call panic if they fail, but that
code is out of line.
The only conversions left in the runtime are those which
might allocate and those which might need to generate an itab.
Given the following types:
type E interface{}
type I interface { foo() }
type I2 iterface { foo(); bar() }
type Big [10]int
func (b Big) foo() { ... }
This CL inlines the following conversions:
was assertE2T
var e E = ...
b := i.(Big)
was assertE2T2
var e E = ...
b, ok := i.(Big)
was assertI2T
var i I = ...
b := i.(Big)
was assertI2T2
var i I = ...
b, ok := i.(Big)
was assertI2E
var i I = ...
e := i.(E)
was assertI2E2
var i I = ...
e, ok := i.(E)
These are the remaining runtime calls:
convT2E:
var b Big = ...
var e E = b
convT2I:
var b Big = ...
var i I = b
convI2I:
var i2 I2 = ...
var i I = i2
assertE2I:
var e E = ...
i := e.(I)
assertE2I2:
var e E = ...
i, ok := e.(I)
assertI2I:
var i I = ...
i2 := i.(I2)
assertI2I2:
var i I = ...
i2, ok := i.(I2)
Fixes#17405Fixes#8422
Change-Id: Ida2367bf8ce3cd2c6bb599a1814f1d275afabe21
Reviewed-on: https://go-review.googlesource.com/32313
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
We used to have to keep on-stack copies of these types.
Now they can be registerized.
[0]T is kind of trivial but might as well handle it.
This change enables another change I'm working on to improve how x.(T)
expressions are handled (#17405). This CL helps because now all
types that are direct interface types are registerizeable (e.g. [1]*byte).
No higher-degree arrays for now because non-constant indexes are hard.
Update #17405
Change-Id: I2399940965d17b3969ae66f6fe447a8cefdd6edd
Reviewed-on: https://go-review.googlesource.com/32416
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
var x uint64
uint8(x >> 56)
We don't need to generate any code for the uint8().
Update #15090
Change-Id: Ie1ca4e32022dccf7f7bc42d531a285521fb67872
Reviewed-on: https://go-review.googlesource.com/28191
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
When we do
var x []byte = ...
y := x[i:]
We can't just use y.ptr = x.ptr + i, as the new pointer may point to the
next object in memory after the backing array.
We used to fix this by doing:
y.cap = x.cap - i
delta := i
if y.cap == 0 {
delta = 0
}
y.ptr = x.ptr + delta
That generates a branch in what is otherwise straight-line code.
Better to do:
y.cap = x.cap - i
mask := (y.cap - 1) >> 63 // -1 if y.cap==0, 0 otherwise
y.ptr = x.ptr + i &^ mask
It's about the same number of instructions (~4, depending on what
parts are constant, and the target architecture), but it is all
inline. It plays nicely with CSE, and the mask can be computed
in parallel with the index (in cases where a multiply is required).
It is a minor win in both speed and space.
Change-Id: Ied60465a0b8abb683c02208402e5bb7ac0e8370f
Reviewed-on: https://go-review.googlesource.com/32022
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Adapt old test for prove's bounds check elimination.
Added missing rule to generic rules that lead to differences
between 32 and 64 bit platforms on sliceopt test.
Added debugging to prove.go that was helpful-to-necessary to
discover that missing rule.
Lowered debugging level on prove.go from 3 to 1; no idea
why it was previously 3.
Change-Id: I09de206aeb2fced9f2796efe2bfd4a59927eda0c
Reviewed-on: https://go-review.googlesource.com/23290
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Get rid of BlockCheck. Josh goaded me into it, and I went
down a rabbithole making it happen.
NilCheck now panics if the pointer is nil and returns void, as before.
BlockCheck is gone, and NilCheck is no longer a Control value for
any block. It just exists (and deadcode knows not to throw it away).
I rewrote the nilcheckelim pass to handle this case. In particular,
there can now be multiple NilCheck ops per block.
I moved all of the arch-dependent nil check elimination done as
part of ssaGenValue into its own proper pass, so we don't have to
duplicate that code for every architecture.
Making the arch-dependent nil check its own pass means I needed
to add a bunch of flags to the opcode table so I could write
the code without arch-dependent ops everywhere.
Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
Reviewed-on: https://go-review.googlesource.com/29120
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
If a program wants to evaluate math.Sqrt for any constant value
(for example, math.Sqrt(3)), we can replace that expression with
its evaluation (1.7320508075688772) at compile time, instead of
generating a SQRT assembly command or equivalent.
Adds tests that math.Sqrt generates the correct values. I also
compiled a short program and verified that the Sqrt expression was
replaced by a constant value in the "after opt" step.
Adds a short doc to the top of generic.rules explaining what the file
does and how other files interact with it.
Fixes#15543.
Change-Id: I6b6e63ac61cec50763a09ba581024adeee03d4fa
Reviewed-on: https://go-review.googlesource.com/27457
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Previously, genMatch0 and genResult0 contained
lots of duplication: locating the op, parsing
the value, validation, etc.
Parsing and validation was mixed in with code gen.
Extract a helper, parseValue. It is responsible
for parsing the value, locating the op, and doing
shared validation.
As a bonus (and possibly as my original motivation),
make op selection pay attention to the number
of args present.
This allows arch-specific ops to share a name
with generic ops as long as there is no ambiguity.
It also detects and reports unresolved ambiguity,
unlike before, where it would simply always
pick the generic op, with no warning.
Also use parseValue when generating the top-level
op dispatch, to ensure its opinion about ops
matches genMatch0 and genResult0.
The order of statements in the generated code used
to depend on the exact rule. It is now somewhat
independent of the rule. That is the source
of some of the generated code changes in this CL.
See rewritedec64 and rewritegeneric for examples.
It is a one-time change.
The op dispatch switch and functions used to be
sorted by opname without architecture. The sort
now includes the architecture, leading to further
generated code changes.
See rewriteARM and rewriteAMD64 for examples.
Again, it is a one-time change.
There are no functional changes.
Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c
Reviewed-on: https://go-review.googlesource.com/24649
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Add some simplification rules for floating point ops.
cmd/internal/obj/arm supports instructions that compare FP register
to 0, but runtime softfloat simulator does not. This CL adds these
instructions to softfloat simulator as well.
Updates #15365.
Change-Id: I29405b2bfcb4c8cf106cb7a1a811409fec91b170
Reviewed-on: https://go-review.googlesource.com/24790
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
These were helpful for some autogenerated code
I'm working with.
Change-Id: I7b89c69552ca99bf560a14bfbcd6bd238595ddf6
Reviewed-on: https://go-review.googlesource.com/24742
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Encode the size and the alignment into AuxInt of Zero and Move ops.
On AMD64, we simply don't look at the alignment. On ARM and PPC64, we
only generate aligned stores.
Updates #15365.
Change-Id: Ifdcc205c364f67c4516b9adebfe7d50d223b6863
Reviewed-on: https://go-review.googlesource.com/24511
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Break really long lines.
Add spacing to line up columns.
In AMD64, put all the optimization rules after all the
lowering rules.
Change-Id: I45cc7368bf278416e67f89e74358db1bd4326a93
Reviewed-on: https://go-review.googlesource.com/22470
Reviewed-by: David Chase <drchase@google.com>
Make sure ops have the right number of args, set
aux and auxint only if allowed, etc.
Normalize error reporting format.
Change-Id: Ie545fcc5990c8c7d62d40d9a0a55885f941eb645
Reviewed-on: https://go-review.googlesource.com/22320
Reviewed-by: David Chase <drchase@google.com>
Make sure the results of unsigned constant-folded
shifts are sign-extended into the AuxInt field.
Fixes#15175
Change-Id: I3490d1bc3d9b2e1578ed30964645508577894f58
Reviewed-on: https://go-review.googlesource.com/21586
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
In b we only need the division by 0 check.
func b(i uint, v []byte) byte {
return v[i%uint(len(v))]
}
Updates #15079.
Change-Id: Ic7491e677dd57cd6ba577efbce576dbb6e023cbd
Reviewed-on: https://go-review.googlesource.com/21502
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Ahmed Waheed <oneofone@gmail.com>
Compound AUTO types weren't named previously. That was because live
variable analysis (plive.go) doesn't handle spilling to compound types.
It can't handle them because there is no valid place to put VARDEFs when
regalloc is spilling compound types.
compound types = multiword builtin types: complex, string, slice, and
interface.
Instead, we split named AUTOs into individual one-word variables. For
example, a string s gets split into a byte ptr s.ptr and an integer
s.len. Those two variables can be spilled to / restored from
independently. As a result, live variable analysis can handle them
because they are one-word objects.
This CL will change how AUTOs are described in DWARF information.
Consider the code:
func f(s string, i int) int {
x := s[i:i+5]
g()
return lookup(x)
}
The old compiler would spill x to two consecutive slots on the stack,
both named x (at offsets 0 and 8). The new compiler spills the pointer
of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a
constant (5) and can be rematerialized for the call to lookup.
So compound objects may not be spilled in their entirety, and even if
they are they won't necessarily be contiguous. Such is the price of
optimization.
Re-enable live variable analysis tests. One test remains disabled, it
fails because of #14904.
Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801
Reviewed-on: https://go-review.googlesource.com/21233
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Previously if we were only using the low bits of AuxInt,
the high bits were ignored and could be junk. This CL
changes that behavior to define the high bits to be the
sign-extended version of the low bits for all cases.
There are 2 main benefits:
- Deterministic representation. This helps with CSE.
(Const8 [0x1]) and (Const8 [0x101]) used to be the same "value"
but CSE couldn't see them as such.
- Testability. We can check that all ops leave AuxInt in a state
consistent with the new rule. In the old scheme, it was hard
to check whether a rule correctly used only the low-order bits.
Side benefits:
- ==0 and !=0 tests are easier.
Drawbacks:
- This differs from the runtime representation in registers,
where it is important that we allow upper bits to be undefined
(so we're not sign/zero-extending all the time).
- Ops that treat AuxInt as unsigned (shifts, mostly) need to be
a bit more careful.
Change-Id: I9a685ff27e36dc03287c9ab1cecd6c0b4045c819
Reviewed-on: https://go-review.googlesource.com/21256
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
khr: Lifting the nil check out of the loop altogether is an admirable
goal, and this rewrite is one step on the way. But without lifting it
out of the loop, the rewrite is just hurting us.
Fixes#14917
Change-Id: Idb917f37d89f50f8e046d5ebd7c092b1e0eb0633
Reviewed-on: https://go-review.googlesource.com/21040
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Shows up occassionally, especially after p = p[:8:len(p)]
Updates #14905
Change-Id: Iab35ef2eac57817e6a10c6aaeeb84709e8021641
Reviewed-on: https://go-review.googlesource.com/21025
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Allow names to be used for subexpressions of match rules.
For example:
(OpA x:(OpB y)) -> ..use x here to refer to the OpB value..
This gets rid of the .Args[0].Args[0]... way of naming we
used to use.
While we're here, give all subexpression matches names instead
of recomputing them with .Args[i] sequences each time they
are referenced. Makes the generated rule code a bit smaller.
Change-Id: Ie42139f6f208933b75bd2ae8bd34e95419bc0e4e
Reviewed-on: https://go-review.googlesource.com/20997
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
For the following example, but there are a few more in the stdlib:
func histogram(b []byte, h *[256]int32) {
for _, t := range b {
h[t]++
}
}
Change-Id: I56615f341ae52e02ef34025588dc6d1c52122295
Reviewed-on: https://go-review.googlesource.com/20924
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Keep track of how many uses each Value has. Each appearance in
Value.Args and in Block.Control counts once.
The number of uses of a value is generically useful to
constrain rewrite rules. For instance, we might want to
prevent merging index operations into loads if the same
index expression is used lots of times.
But I have one use in particular for which the use count is required.
We must make sure we don't combine ops with loads if the load has
more than one use. Otherwise, we may split a single load
into multiple loads and that breaks perceived behavior in
the presence of races. In particular, the load of m.state
in sync/mutex.go:Lock can't be done twice. (I have a separate
CL which triggers the mutex failure. This CL has a test which
demonstrates a similar failure.)
Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e
Reviewed-on: https://go-review.googlesource.com/20790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
All of a struct's fields have to fit into memory anyway, so index them
with int instead of int64. This also makes it nicer for
cmd/compile/internal/gc to reuse the same NumFields function.
Change-Id: I210be804a0c33370ec9977414918c02c675b0fbe
Reviewed-on: https://go-review.googlesource.com/20691
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Split the auxFloat type into 32/64 bit versions and perform checking for
exactly representable float32 values. Perform const folding on
float32/64. Comment out some const negation rules that the frontend
already performs.
Change-Id: Ib3f8d59fa8b30e50fe0267786cfb3c50a06169d2
Reviewed-on: https://go-review.googlesource.com/20568
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
* Refacts a bit saving and restoring parents restrictions
* Shaves ~100k from pkg/tools/linux_amd64,
but most of the savings come from the rewrite rules.
* Improves on the following artificial test case:
func f1(a4 bool, a6 bool) bool {
return a6 || (a6 || (a6 || a4)) || (a6 || (a4 || a6 || (false || a6)))
}
Change-Id: I714000f75a37a3a6617c6e6834c75bd23674215f
Reviewed-on: https://go-review.googlesource.com/20306
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
I would like to add a
func (t *Type) Elem() *Type
method to package gc, but that would collide with the existing
func (t *Type) Elem() ssa.Type
method needed to make *gc.Type implement ssa.Type. Because the latter
is much less widely used right now than the former will be, this CL
renames it to ElemType.
Longer term, hopefully gc and ssa will share a common Type interface,
and ElemType can go away.
Change-Id: I270008515dc4c01ef531cf715637a924659c4735
Reviewed-on: https://go-review.googlesource.com/20546
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
* Move lowering into a separate pass.
* SliceLen/SliceCap is now available to various intermediate passes
which use useful for bounds checking.
* Add a second opt pass to handle the new opportunities
Decreases the code size of binaries in pkg/tool/linux_amd64
by ~45K.
Updates #14564#14606
Change-Id: I5b2bd6202181c50623a3585fbf15c0d6db6d4685
Reviewed-on: https://go-review.googlesource.com/20172
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>