Converted working slices of pointer into slices of pointer
index. Half the size (on 64-bit machine) and no pointers
to trace if GC occurs while they're live.
TODO - could expose slice mapping ID->*Block; some dom
clients also construct these.
Minor optimization in regalloc that cuts allocation count.
Minor optimization in compile.go that cuts calls to Sprintf.
Change-Id: I28f0bfed422b7344af333dc52ea272441e28e463
Reviewed-on: https://go-review.googlesource.com/19104
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
For each value that needs to be in a fixed register at the end of the
block, and try to pick that fixed register when the instruction
generating that value is scheduled (or restored from a spill).
Just used for end-of-block register requirements for now.
Fixed-register instruction requirements (e.g. shift in ecx) can be
added later. Also two-instruction constraints (input reg == output
reg) might be recorded in a similar manner.
Change-Id: I59916e2e7f73657bb4fc3e3b65389749d7a23fa8
Reviewed-on: https://go-review.googlesource.com/18774
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
The OpSB hack didn't quite work. We need to really
CSE these ops to make regalloc happy.
Change-Id: I9f4d7bfb0929407c84ee60c9e25ff0c0fbea84af
Reviewed-on: https://go-review.googlesource.com/19083
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
It is one of the slowest compiler phases right now, and we
run two of them.
Instead of using a map to make the initial partition, use a sort.
It is much less memory intensive.
Do a few optimizations to avoid work for size-1 equivalence classes.
Implement -N.
Change-Id: I1d2d85d3771abc918db4dd7cc30b0b2d854b15e1
Reviewed-on: https://go-review.googlesource.com/19024
Reviewed-by: David Chase <drchase@google.com>
The x86 backend automatically rewrites MOV $0, AX to
XOR AX, AX. That rewrite isn't ok when the flags register
is live across the MOV. Keep track of which moves care
about preserving flags, then disable this rewrite for them.
On x86, Prog.Mark was being used to hold the length of the
instruction. We already store that in Prog.Isize, so no
need to store it in Prog.Mark also. This frees up Prog.Mark
to hold a bitmask on x86 just like all the other architectures.
Update #12405
Change-Id: Ibad8a8f41fc6222bec1e4904221887d3cc3ca029
Reviewed-on: https://go-review.googlesource.com/18861
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
In code that does:
var x, z int32
var y int64
z = phi(x, int32(y))
We silently drop the int32 cast because truncation is a no-op.
The phi operation needs to make sure it uses the size of the
phi, not the size of its arguments, when generating spills.
Change-Id: I1f7baf44f019256977a46fdd3dad1972be209042
Reviewed-on: https://go-review.googlesource.com/18390
Reviewed-by: David Chase <drchase@google.com>
Forgot to reset these masks before each merge edge is processed.
Change-Id: I2f593189b63f50a1cd12b2dd4645ca7b9614f1f3
Reviewed-on: https://go-review.googlesource.com/18223
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reorder how register & stack allocation is done. We used to allocate
registers, then fix up merge edges, then allocate stack slots. This
lead to lots of unnecessary copies on merge edges:
v2 = LoadReg v1
v3 = StoreReg v2
If v1 and v3 are allocated to the same stack slot, then this code is
unnecessary. But at regalloc time we didn't know the homes of v1 and
v3.
To fix this problem, allocate all the stack slots before fixing up the
merge edges. That way, we know what stack slots values use so we know
what copies are required.
Use a good technique for shuffling values around on merge edges.
Improves performance of the go1 TimeParse benchmark by ~12%
Change-Id: I731f43e4ff1a7e0dc4cd4aa428fcdb97812b86fa
Reviewed-on: https://go-review.googlesource.com/17915
Reviewed-by: David Chase <drchase@google.com>
Spilling/restoring flag values is a pain to do during regalloc.
Instead, allocate the flag register in a separate pass. Regalloc then
operates normally on any flag recomputation instructions.
Change-Id: Ia1c3d9e6eff678861193093c0b48a00f90e4156b
Reviewed-on: https://go-review.googlesource.com/17694
Reviewed-by: David Chase <drchase@google.com>
Use a more precise computation of next use. It properly
detects lifetime holes and deallocates values during those holes.
It also uses a more precise version of distance to next use which
affects which values get spilled.
Change-Id: I49eb3ebe2d2cb64842ecdaa7fb4f3792f8afb90b
Reviewed-on: https://go-review.googlesource.com/16760
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Declare a function's arguments as having already been
spilled so their use just requires a restore.
Allow spill locations to be portions of larger objects the stack.
Required to load portions of compound input arguments.
Rename the memory input to InputMem. Use Arg for the
pre-spilled argument values.
Change-Id: I8fe2a03ffbba1022d98bfae2052b376b96d32dda
Reviewed-on: https://go-review.googlesource.com/16536
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
We "spill" flag values by recomputing them from their original
inputs. The "find original inputs" part of the algorithm was
a hack. It was broken by rematerialization. This change does
the real job of keeping track of original values for each
spill/restore/flagrecompute/rematerialization we issue.
Change-Id: I95088326a4ee4958c98148b063e518c80e863e4c
Reviewed-on: https://go-review.googlesource.com/16500
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Use faulting loads instead of test/jeq to do nil checks.
Fold nil checks into a following load/store if possible.
Makes binaries about 2% smaller.
Change-Id: I54af0f0a93c853f37e34e0ce7e3f01dd2ac87f64
Reviewed-on: https://go-review.googlesource.com/16287
Reviewed-by: David Chase <drchase@google.com>
Register phis are better than stack phis. If we have
unused registers available, use them for phis.
Change-Id: I3045711c65caa1b6d0be29131b87b57466320cc2
Reviewed-on: https://go-review.googlesource.com/16080
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
getg reads from memory, so it should really have a
memory arg. It is critical in functions which call setg
to make sure getg gets ordered correctly with setg.
Change-Id: Ief4875421f741fc49c07b0e1f065ce2535232341
Reviewed-on: https://go-review.googlesource.com/16100
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
It isn't safe in functions that also call setg.
Change-Id: I76a7bf0401b4b6c8a129c245b15a2d6f06080e94
Reviewed-on: https://go-review.googlesource.com/16095
Reviewed-by: Todd Neal <todd@tneal.org>
Rematerialize constants instead of spilling and loading them.
"Constants" includes constant offsets from SP and SB.
Should help somewhat with stack frame sizes. I'm not sure
exactly how much yet.
Change-Id: I44dbad97aae870cf31cb6e89c92fe4f6a2b9586f
Reviewed-on: https://go-review.googlesource.com/16029
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Cleaned up first-block-in-function code.
Added cases for |PHEAP for PPARAM and PAUTO.
Made PPARAMOUT act more like PAUTO for purposes
of address generation and vardef placement.
Added cases for OCLOSUREVAR and Ops for getting closure
pointer. Closure ops are scheduled at top of entry block
to capture DX.
Wrote test that seems to show proper behavior for addressed
parameters, locals, and returns.
Change-Id: Iee93ebf9e3d9f74cfb4d1c1da8038eb278d8a857
Reviewed-on: https://go-review.googlesource.com/14650
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
Load-and-sign-extend opcodes were being generated in the
wrong block, leading to having more than one memory variable
live at once. Fix the rules + add a test.
Change-Id: Iadf80e55ea901549c15c628ae295c2d0f1f64525
Reviewed-on: https://go-review.googlesource.com/14591
Reviewed-by: Todd Neal <todd@tneal.org>
Run-TryBot: Todd Neal <todd@tneal.org>
The code previously always used AX causing errors. For now, just
switch off the type in order to at least generate valid code.
Change-Id: Iaf13120a24b62456b9b33c04ab31f2d5104b381b
Reviewed-on: https://go-review.googlesource.com/13943
Reviewed-by: David Chase <drchase@google.com>
This CL takes a simple approach to spilling and loading flags.
We never spill. When a load is needed, we recalculate,
loading the arguments as needed.
This is simple and architecture-independent.
It is not very efficient, but as of this CL,
there are fewer than 200 flag spills during make.bash.
This was tested by manually reverting CLs 13813 and 13843,
causing SETcc, MOV, and LEA instructions to clobber flags,
which dramatically increases the number of flags spills.
With that done, all stdlib tests that used to pass
still pass.
For future reference, here are some other, more efficient
amd64-only schemes that we could adapt in the future if needed.
(1) Spill exactly the flags needed.
For example, if we know that the flags will be needed
by a SETcc or Jcc op later, we could use SETcc to
extract just the relevant flag. When needed,
we could use TESTB and change the op to JNE/SETNE.
(Alternatively, we could leave the op unaltered
and prepare an appropriate CMPB instruction
to produce the desired flag.)
However, this requires separate handling for every
instruction that uses the flags register,
including (say) SBBQcarrymask.
We could enable this on an ad hoc basis for common cases
and fall back to recalculation for other cases.
(2) Spill all flags with PUSHF and POPF
This modifies SP, which the runtime won't like.
It also requires coordination with stackalloc to
make sure that we have a stack slot ready for use.
(3) Spill almost all flags with LAHF, SETO, and SAHF
See http://blog.freearrow.com/archives/396
for details. This would handle all the flags we currently
use. However, LAHF and SAHF are not universally available
and it requires arranging for AX to be free.
Change-Id: Ie36600fd8e807ef2bee83e2e2ae3685112a7f276
Reviewed-on: https://go-review.googlesource.com/13844
Reviewed-by: Keith Randall <khr@golang.org>
Implement a global (whole function) register allocator.
This replaces the local (per basic block) register allocator.
Clobbering of registers by instructions is handled properly.
A separate change will add the correct clobbers to all the instructions.
Change-Id: I38ce4dc7dccb8303c1c0e0295fe70247b0a3f2ea
Reviewed-on: https://go-review.googlesource.com/13622
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Todd Neal <todd@tneal.org>
regalloc expects to find all OpSP and OpSB values
in the entry block.
There is no value to moving them; don't.
Change-Id: I775198f03ce7420348721ffc5e7d2bab065465b1
Reviewed-on: https://go-review.googlesource.com/13266
Reviewed-by: Keith Randall <khr@golang.org>
Failure to treat control ops as live can lead
to them being eliminated when they live in
other blocks.
Change-Id: I604a1977a3d3884b1f4516bea4e15885ce38272d
Reviewed-on: https://go-review.googlesource.com/13138
Reviewed-by: Keith Randall <khr@golang.org>
The existing backend simply elides OCONVNOP.
There's no reason for us to do any differently.
Rather than insert ConvNops and then rewrite them
away, stop creating them in the first place.
Change-Id: I4bcbe2229fcebd189ae18df24f2c612feb6e215e
Reviewed-on: https://go-review.googlesource.com/12810
Reviewed-by: Keith Randall <khr@golang.org>
If flushing a value from a register that might be used by the current
old-schedule value, save it to the home location.
This resolves the error that was changed from panic to unimplemented in
CL 12655.
Change-Id: If864be34abcd6e11d6117a061376e048a3e29b3a
Reviewed-on: https://go-review.googlesource.com/12682
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Prior to this, we were smashing our own stack,
which caused the crypto/sha256 tests to fail.
Change-Id: I7dd94cf466d175b3be0cd65f9c4fe8b1223081fe
Reviewed-on: https://go-review.googlesource.com/12660
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
This reduces the wall time to run test/slice3.go
on my laptop from >10m to ~20s.
This could perhaps be further reduced by using
a worklist of blocks and/or implementing the
suggestion in the comment in this CL, but at this
point, it's fast enough that there is no need.
Change-Id: I741119e0c8310051d7185459f78be8b89237b85b
Reviewed-on: https://go-review.googlesource.com/12564
Reviewed-by: Keith Randall <khr@golang.org>
Use *Node of type ONAME instead of string as the key for variable maps.
This will prevent aliasing between two identically named but
differently scoped variables.
Introduce an Aux value that encodes the offset of a variable
from a base pointer (either global base pointer or stack pointer).
Allow LEAQ and derivatives (MOVQ, etc.) to also have such an Aux field.
Allocate space for AUTO variables in stackalloc.
Change-Id: Ibdccdaea4bbc63a1f4882959ac374f2b467e3acd
Reviewed-on: https://go-review.googlesource.com/11238
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
The SSA implementation logs for three purposes:
* debug logging
* fatal errors
* unimplemented features
Separating these three uses lets us attempt an SSA
implementation for all functions, not just
_ssa functions. This turns the entire standard
library into a compilation test, and makes it
easy to figure out things like
"how much coverage does SSA have now" and
"what should we do next to get more coverage?".
Functions called _ssa are still special.
They log profusely by default and
the output of the SSA implementation
is used. For all other functions,
logging is off, and the implementation
is built and discarded, due to lack of
support for the runtime.
While we're here, fix a few minor bugs and
add some extra Unimplementeds to allow
all.bash to pass.
As of now, SSA handles 20.79% of the functions
in the standard library (689 of 3314).
The top missing features are:
10.03% 2597 SSA unimplemented: zero for type error not implemented
7.79% 2016 SSA unimplemented: addr: bad op DOTPTR
7.33% 1898 SSA unimplemented: unhandled expr EQ
6.10% 1579 SSA unimplemented: unhandled expr OROR
4.91% 1271 SSA unimplemented: unhandled expr NE
4.49% 1163 SSA unimplemented: unhandled expr LROT
4.00% 1036 SSA unimplemented: unhandled expr LEN
3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC
2.37% 615 SSA unimplemented: zero for type []byte not implemented
1.90% 492 SSA unimplemented: unhandled stmt CALLMETH
1.74% 450 SSA unimplemented: unhandled expr CALLINTER
1.74% 450 SSA unimplemented: unhandled expr DOT
1.71% 444 SSA unimplemented: unhandled expr ANDAND
1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR
1.54% 400 SSA unimplemented: unhandled expr CALLMETH
1.51% 390 SSA unimplemented: unhandled stmt SWITCH
1.47% 380 SSA unimplemented: unhandled expr CONV
1.33% 345 SSA unimplemented: addr: bad op *
1.30% 336 SSA unimplemented: unhandled OLITERAL 6
Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7
Reviewed-on: https://go-review.googlesource.com/11160
Reviewed-by: Keith Randall <khr@golang.org>
Add an additional int64 auxiliary field to Value.
There are two main reasons for doing this:
1) Ints in interfaces require allocation, and we store ints in Aux a lot.
2) I'd like to have both *gc.Sym and int offsets included in lots
of operations (e.g. MOVQloadidx8). It will be more efficient to
store them as separate fields instead of a pointer to a sym/int pair.
It also simplifies a bunch of code.
This is just the refactoring. I'll start using this some more in a
subsequent changelist.
Change-Id: I1ca797ff572553986cf90cab3ac0a0c1d01ad241
Reviewed-on: https://go-review.googlesource.com/10929
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Revamp autogeneration. Get rid of gogenerate commands, they are more
trouble than they are worth. (If the code won't compile, gogenerate
doesn't work.)
Generate opcode enums & tables. This means we only have to specify
opcodes in one place instead of two.
Add arch prefixes to opcodes so they will be globally unique.
Change-Id: I175d0a89b701b2377bbe699f3756731b7c9f5a9f
Reviewed-on: https://go-review.googlesource.com/10812
Reviewed-by: Alan Donovan <adonovan@google.com>
Add ops to load, store, select ptr & len, and build constant strings.
A few other minor cleanups.
Change-Id: I6f0f7419d641b119b613ed44561cd308a466051c
Reviewed-on: https://go-review.googlesource.com/10449
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Semi-regular merge of tip to dev.ssa.
Complicated a bit by the move of cmd/internal/* to cmd/compile/internal/*.
Change-Id: I1c66d3c29bb95cce4a53c5a3476373aa5245303d