802 Commits

Author SHA1 Message Date
philhofer
d68bb16b1e cmd/compile/internal/ssa: recognize constant pointer comparison
Teach the backend to recognize that the address of a symbol
is equal with itself, and that the addresses of two different
symbols are different.

Some examples of where this rule hits in the standard library:

 - inlined uses of (*time.Time).setLoc (e.g. time.UTC)
 - inlined uses of bufio.NewReader (via type assertion)

Change-Id: I23dcb068c2ec333655c1292917bec13bbd908c24
Reviewed-on: https://go-review.googlesource.com/38338
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-20 21:04:44 +00:00
Josh Bleecher Snyder
e22ba7f0fb cmd/compile: enable CSE of constant strings
CL 27254 changed a constant string to a byte array
in encoding/hex and got significant performance
improvements.

hex.Encode used the string twice in a single function.
The rewrite rules lower constant strings into components.
The pointer component requires an aux symbol.
The existing implementation created a new aux symbol every time.
As a result, constant string pointers were never CSE'd.
Tighten then moved the pointer calculation next to the uses, i.e.
into the loop.

The re-use of aux syms enabled by this CL
occurs 3691 times during make.bash.

This CL should not go in without CL 38338
or something like it.

Change-Id: Ibbf5b17283c0e31821d04c7e08d995c654de5663
Reviewed-on: https://go-review.googlesource.com/28219
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-20 20:22:26 +00:00
Martin Möhrmann
2805d20689 cmd/compile: replace all uses of ptrto by typPtr
This makes the overall naming and use of the functions
to create a Type more consistent.

Passes toolstash -cmp.

Change-Id: Ie0d40b42cc32b5ecf5f20502675a225038ea40e4
Reviewed-on: https://go-review.googlesource.com/38354
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 14:29:53 +00:00
Josh Bleecher Snyder
872db79989 cmd/compile: add more types to ssa.Types
This reduces the number of calls back into the
gc Type routines, which will help performance
in a concurrent backend.
It also reduces the number of callsites
that must be considered in making the transition.

Passes toolstash-check -all. No compiler performance changes.

Updates #15756

Change-Id: Ic7a8f1daac7e01a21658ae61ac118b2a70804117
Reviewed-on: https://go-review.googlesource.com/38340
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-19 02:03:44 +00:00
Josh Bleecher Snyder
aea3aff669 cmd/compile: separate ssa.Frontend and ssa.TypeSource
Prior to this CL, the ssa.Frontend field was responsible
for providing types to the backend during compilation.
However, the types needed by the backend are few and static.
It makes more sense to use a struct for them
and to hang that struct off the ssa.Config,
which is the correct home for readonly data.
Now that Types is a struct, we can clean up the names a bit as well.

This has the added benefit of allowing early construction
of all types needed by the backend.
This will be useful for concurrent backend compilation.

Passes toolstash-check -all. No compiler performance change.

Updates #15756

Change-Id: I021658c8cf2836d6a22bbc20cc828ac38c7da08a
Reviewed-on: https://go-review.googlesource.com/38336
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-19 00:21:23 +00:00
Josh Bleecher Snyder
dc4434a0c0 cmd/compile: make stkptrsize local
While we're here, also eliminate a few more Curfn uses.

Passes toolstash -cmp. No compiler performance impact.

Updates #15756

Change-Id: Ib8db9e23467bbaf16cc44bf62d604910f733d6b8
Reviewed-on: https://go-review.googlesource.com/38331
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 23:57:27 +00:00
Josh Bleecher Snyder
758b5b3284 cmd/compile: make Stksize local
Passes toolstash -cmp. No compiler performance impact.

Updates #15756

Change-Id: I85b45244453ae28d4da76be4313badddcbf3f5dc
Reviewed-on: https://go-review.googlesource.com/38330
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 23:57:15 +00:00
Josh Bleecher Snyder
e0a5e69be2 cmd/compile: make Maxarg local
Passes toolstash -cmp. No compiler performance impact.

Updates #15756

Change-Id: I1294058716d83dd1be495d399ed7ab2277754dc6
Reviewed-on: https://go-review.googlesource.com/38329
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 23:57:03 +00:00
Josh Bleecher Snyder
174b858f78 cmd/compile: pass frame size to defframe
Preparation for de-globalizing Stksize and MaxArg.

Passes toolstash -cmp. No compiler performance impact.

Updates #15756

Change-Id: I312f0bbd15587a6aebf472cd66c8e62b89e55c8a
Reviewed-on: https://go-review.googlesource.com/38328
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 23:56:22 +00:00
Josh Bleecher Snyder
da8e939ba9 cmd/compile: thread Curfn through SSA
This is a first step towards eliminating the
Curfn global in the backend.
There's more to do.

Passes toolstash -cmp. No compiler performance impact.

Updates #15756

Change-Id: Ib09f550a001e279a5aeeed0f85698290f890939c
Reviewed-on: https://go-review.googlesource.com/38232
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 23:45:22 +00:00
Cherry Zhang
ce584e516e cmd/compile: using a single Store op for non-pointer non-skip store
This makes fewer Values around until decomposing, reducing
allocation in compiler.

name       old alloc/op    new alloc/op    delta
Template      41.4MB ± 0%     40.8MB ± 0%  -1.29%  (p=0.000 n=10+10)
Unicode       30.3MB ± 0%     30.2MB ± 0%  -0.24%  (p=0.000 n=10+10)
GoTypes        118MB ± 0%      115MB ± 0%  -2.23%  (p=0.000 n=10+10)
Compiler       505MB ± 0%      493MB ± 0%  -2.47%  (p=0.000 n=10+10)
SSA            881MB ± 0%      872MB ± 0%  -1.03%  (p=0.000 n=10+10)

name       old allocs/op   new allocs/op   delta
Template        401k ± 1%       400k ± 1%    ~     (p=0.631 n=10+10)
Unicode         321k ± 0%       321k ± 1%    ~     (p=0.684 n=10+10)
GoTypes        1.18M ± 0%      1.17M ± 0%  -0.34%  (p=0.000 n=10+10)
Compiler       4.63M ± 0%      4.61M ± 0%  -0.43%  (p=0.000 n=10+10)
SSA            7.83M ± 0%      7.82M ± 0%  -0.13%  (p=0.000 n=10+10)

Change-Id: I8f736396294444248a439bd4c90be1357024ce88
Reviewed-on: https://go-review.googlesource.com/38294
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-17 23:21:45 +00:00
Josh Bleecher Snyder
2cdb7f118a cmd/compile: move Frontend field from ssa.Config to ssa.Func
Suggested by mdempsky in CL 38232.
This allows us to use the Frontend field
to associate frontend state and information
with a function.
See the following CL in the series for examples.

This is a giant CL, but it is almost entirely routine refactoring.

The ssa test API is starting to feel a bit unwieldy.
I will clean it up separately, once the dust has settled.

Passes toolstash -cmp.

Updates #15756

Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf
Reviewed-on: https://go-review.googlesource.com/38327
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 23:18:57 +00:00
Matthew Dempsky
09272ae981 cmd/compile/internal/gc: rename Thearch to thearch
Prepared using gorename.

Change-Id: Id55dac9ae5446a8bfeac06e7995b35f4c249eeca
Reviewed-on: https://go-review.googlesource.com/38302
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-17 22:10:58 +00:00
Matthew Dempsky
3e2f980e27 cmd/compile: eliminate direct uses of gc.Thearch in backends
This CL changes the GOARCH.Init functions to take gc.Thearch as a
parameter, which gc.Main supplies.

Additionally, the x86 backend is refactored to decide within Init
whether to use the 387 or SSE2 instruction generators, rather than for
each individual SSA Value/Block.

Passes toolstash-check -all.

Change-Id: Ie6305a6cd6f6ab4e89ecbb3cbbaf5ffd57057a24
Reviewed-on: https://go-review.googlesource.com/38301
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-17 22:10:53 +00:00
Keith Randall
42e97468a1 cmd/compile: intrinsic for math/bits.Reverse on ARM64
I don't know that it exists for any other architectures.

Update #18616

Change-Id: Idfe5dee251764d32787915889ec0be4bebc5be24
Reviewed-on: https://go-review.googlesource.com/38323
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-17 18:07:18 +00:00
Josh Bleecher Snyder
a5e3cac895 cmd/compile: rearrange fields between ssa.Func, ssa.Cache, and ssa.Config
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill
the roles laid out for them in CL 38160.

The only non-trivial change in this CL is how cached
values and blocks get IDs. Prior to this CL, their IDs were
assigned as part of resetting the cache, and only modified
IDs were reset. This required knowing how many values and
blocks were modified, which required a tight coupling between
ssa.Func and ssa.Config. To eliminate that coupling,
we now zero values and blocks during reset,
and assign their IDs when they are used.
Since unused values and blocks have ID == 0,
we can efficiently find the last used value/block,
to avoid zeroing everything.
Bulk zeroing is efficient, but not efficient enough
to obviate the need to avoid zeroing everything every time.
As a happy side-effect, ssa.Func.Free is no longer necessary.

DebugHashMatch and friends now belong in func.go.
They have been left in place for clarity and review.
I will move them in a subsequent CL.

Passes toolstash -cmp. No compiler performance impact.
No change in 'go test cmd/compile/internal/ssa' execution time.

Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098
Reviewed-on: https://go-review.googlesource.com/38167
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-17 05:21:42 +00:00
Josh Bleecher Snyder
ccaa8e3c6f cmd/compile: avoid calling unnecessary Sym format routine
Minor cleanup only.

No reason to go through String() when it is
just as easy to do a direct string comparison.

Eliminates a surprising number of allocations.

name       old alloc/op    new alloc/op    delta
Template      40.9MB ± 0%     40.9MB ± 0%    ~     (p=0.190 n=10+10)
Unicode       30.3MB ± 0%     30.3MB ± 0%    ~     (p=0.218 n=10+10)
GoTypes        116MB ± 0%      116MB ± 0%  -0.09%  (p=0.000 n=10+10)
SSA            871MB ± 0%      869MB ± 0%  -0.14%  (p=0.000 n=10+9)
Flate         26.2MB ± 0%     26.2MB ± 0%  -0.15%  (p=0.002 n=10+10)
GoParser      32.5MB ± 0%     32.5MB ± 0%    ~     (p=0.165 n=10+10)
Reflect       80.5MB ± 0%     80.4MB ± 0%  -0.12%  (p=0.003 n=9+10)
Tar           27.3MB ± 0%     27.3MB ± 0%  -0.13%  (p=0.008 n=10+9)
XML           43.1MB ± 0%     43.1MB ± 0%    ~     (p=0.218 n=10+10)

name       old allocs/op   new allocs/op   delta
Template        402k ± 1%       400k ± 1%  -0.64%  (p=0.002 n=10+10)
Unicode         322k ± 1%       321k ± 1%    ~     (p=0.075 n=10+10)
GoTypes        1.19M ± 0%      1.18M ± 0%  -0.90%  (p=0.000 n=10+10)
SSA            7.94M ± 0%      7.81M ± 0%  -1.66%  (p=0.000 n=10+9)
Flate           246k ± 0%       242k ± 1%  -1.42%  (p=0.000 n=10+10)
GoParser        325k ± 1%       323k ± 1%  -0.84%  (p=0.000 n=10+10)
Reflect        1.02M ± 0%      1.01M ± 0%  -0.99%  (p=0.000 n=10+10)
Tar             259k ± 0%       257k ± 1%  -0.72%  (p=0.009 n=10+10)
XML             406k ± 1%       403k ± 1%  -0.69%  (p=0.001 n=10+10)

Change-Id: Ia129a4cd272027d627e1f3b27e9f07f93e3aa27e
Reviewed-on: https://go-review.googlesource.com/38230
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-17 05:12:23 +00:00
Josh Bleecher Snyder
0cfb23135c cmd/compile: move hasdefer to Func
Passes toolstash -cmp.

Updates #15756

Change-Id: Ia071dbbd7f2ee0f8433d8c37af4f7b588016244e
Reviewed-on: https://go-review.googlesource.com/38231
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 04:56:11 +00:00
Keith Randall
495b167919 cmd/compile: intrinsics for math/bits.{Len,LeadingZeros}
name              old time/op  new time/op  delta
LeadingZeros-4    2.00ns ± 0%  1.34ns ± 1%  -33.02%  (p=0.000 n=8+10)
LeadingZeros16-4  1.62ns ± 0%  1.57ns ± 0%   -3.09%  (p=0.001 n=8+9)
LeadingZeros32-4  2.14ns ± 0%  1.48ns ± 0%  -30.84%  (p=0.002 n=8+10)
LeadingZeros64-4  2.06ns ± 1%  1.33ns ± 0%  -35.08%  (p=0.000 n=8+8)

8-bit args is a special case - the Go code is really fast because
it is just a single table lookup.  So I've disabled that for now.
Intrinsics were actually slower:
LeadingZeros8-4   1.22ns ± 3%  1.58ns ± 1%  +29.56%  (p=0.000 n=10+10)

Update #18616

Change-Id: Ia9c289b9ba59c583ea64060470315fd637e814cf
Reviewed-on: https://go-review.googlesource.com/38311
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-16 22:53:49 +00:00
Keith Randall
dd9892e31b cmd/compile: intrinsify math/bits.ReverseBytes
Update #18616

Change-Id: I0c2d643cbbeb131b4c9b12194697afa4af48e1d2
Reviewed-on: https://go-review.googlesource.com/38166
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-16 19:41:56 +00:00
Cherry Zhang
c8f38b3398 cmd/compile: use type information in Aux for Store size
Remove size AuxInt in Store, and alignment in Move/Zero. We still
pass size AuxInt to Move/Zero, as it is used for partial Move/Zero
lowering (e.g. cmd/compile/internal/ssa/gen/386.rules:288).
SizeAndAlign is gone.

Passes "toolstash -cmp" on std.

Change-Id: I1ca34652b65dd30de886940e789fcf41d521475d
Reviewed-on: https://go-review.googlesource.com/38150
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-16 14:25:04 +00:00
Cherry Zhang
1b85300602 cmd/compile: clean up SSA-building code
Now that the write barrier insertion is moved to SSA, the SSA
building code can be simplified.

Updates #17583.

Change-Id: I5cacc034b11aa90b0abe6f8dd97e4e3994e2bc25
Reviewed-on: https://go-review.googlesource.com/36840
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-16 14:24:40 +00:00
Cherry Zhang
9ebf3d5100 cmd/compile: move write barrier insertion to SSA
When the compiler insert write barriers, the frontend makes
conservative decisions at an early stage. This sometimes have
false positives because of the lack of information, for example,
writes on stack. SSA's writebarrier pass identifies writes on
stack and eliminates write barriers for them.

This CL moves write barrier insertion into SSA. The frontend no
longer makes decisions about write barriers, and simply does
normal assignments and emits normal Store ops when building SSA.
SSA writebarrier pass inserts write barrier for Stores when needed.
There, it has better information about the store because Phi and
Copy propagation are done at that time.

This CL only changes StoreWB to Store in gc/ssa.go. A followup CL
simplifies SSA building code.

Updates #17583.

Change-Id: I4592d9bc0067503befc169c50b4e6f4765673bec
Reviewed-on: https://go-review.googlesource.com/36839
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-16 14:24:21 +00:00
Cherry Zhang
211c8c9f1a cmd/compile: pass types on SSA Store/Move/Zero ops
For SSA Store/Move/Zero ops, attach the type of the value being
stored to the op as the Aux field. This type will be used for
write barrier insertion (in a followup CL). Since SSA passes
do not accurately propagate types of values (because of type
casting), we can't simply use type of the store's arguments
for write barrier insertion.

Passes "toolstash -cmp" on std.

Updates #17583.

Change-Id: I051d5e5c482931640d1d7d879b2a6bb91f2e0056
Reviewed-on: https://go-review.googlesource.com/36838
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-16 14:22:53 +00:00
Keith Randall
d5dc490519 cmd/compile: intrinsics for math/bits.TrailingZerosX
Implement math/bits.TrailingZerosX using intrinsics.

Generally reorganize the intrinsic spec a bit.
The instrinsics data structure is now built at init time.
This will make doing the other functions in math/bits easier.

Update sys.CtzX to return int instead of uint{64,32} so it
matches math/bits.TrailingZerosX.

Improve the intrinsics a bit for amd64.  We don't need the CMOV
for <64 bit versions.

Update #18616

Change-Id: Ic1c5339c943f961d830ae56f12674d7b29d4ff39
Reviewed-on: https://go-review.googlesource.com/38155
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-16 02:44:16 +00:00
Josh Bleecher Snyder
c03e75e539 cmd/compile: check labels and gotos before building SSA
This CL introduces yet another compiler pass,
which checks for correct control flow constructs
prior to converting from AST to SSA form.

It cannot be integrated with walk, since walk rewrites
switch and select statements on the fly.

To reduce code duplication, this CL also does some
minor refactoring.

With this pass in place, the AST to SSA converter
can now stop generating SSA for any known-dead code.
This minor savings pays for the minor cost of the new pass.

Performance is almost a wash:

name       old time/op     new time/op     delta
Template       206ms ± 4%      205ms ± 4%   ~     (p=0.108 n=43+43)
Unicode       84.0ms ± 4%     84.0ms ± 4%   ~     (p=0.979 n=43+43)
GoTypes        550ms ± 3%      553ms ± 3%   ~     (p=0.065 n=40+41)
Compiler       2.57s ± 4%      2.58s ± 2%   ~     (p=0.103 n=44+41)
SSA            3.94s ± 3%      3.93s ± 2%   ~     (p=0.833 n=44+42)
Flate          126ms ± 6%      125ms ± 4%   ~     (p=0.941 n=43+39)
GoParser       147ms ± 4%      148ms ± 3%   ~     (p=0.164 n=42+39)
Reflect        359ms ± 3%      357ms ± 5%   ~     (p=0.241 n=43+44)
Tar            106ms ± 5%      106ms ± 7%   ~     (p=0.853 n=40+43)
XML            202ms ± 3%      203ms ± 3%   ~     (p=0.488 n=42+41)

name       old user-ns/op  new user-ns/op  delta
Template        240M ± 4%       239M ± 4%   ~     (p=0.844 n=42+43)
Unicode         107M ± 5%       107M ± 4%   ~     (p=0.332 n=40+43)
GoTypes         735M ± 3%       731M ± 4%   ~     (p=0.141 n=43+44)
Compiler       3.51G ± 3%      3.52G ± 3%   ~     (p=0.208 n=42+43)
SSA            5.72G ± 4%      5.72G ± 3%   ~     (p=0.928 n=44+42)
Flate           151M ± 7%       150M ± 8%   ~     (p=0.662 n=44+43)
GoParser        181M ± 5%       181M ± 4%   ~     (p=0.379 n=41+44)
Reflect         447M ± 4%       445M ± 4%   ~     (p=0.344 n=43+43)
Tar             125M ± 7%       124M ± 6%   ~     (p=0.353 n=43+43)
XML             248M ± 4%       250M ± 6%   ~     (p=0.158 n=44+44)

name       old alloc/op    new alloc/op    delta
Template      40.3MB ± 0%     40.2MB ± 0%  -0.27%  (p=0.000 n=10+10)
Unicode       30.3MB ± 0%     30.2MB ± 0%  -0.10%  (p=0.015 n=10+10)
GoTypes        114MB ± 0%      114MB ± 0%  -0.06%  (p=0.000 n=7+9)
Compiler       480MB ± 0%      481MB ± 0%  +0.07%  (p=0.000 n=10+10)
SSA            864MB ± 0%      862MB ± 0%  -0.25%  (p=0.000 n=9+10)
Flate         25.9MB ± 0%     25.9MB ± 0%    ~     (p=0.123 n=10+10)
GoParser      32.1MB ± 0%     32.1MB ± 0%    ~     (p=0.631 n=10+10)
Reflect       79.9MB ± 0%     79.6MB ± 0%  -0.39%  (p=0.000 n=10+9)
Tar           27.1MB ± 0%     27.0MB ± 0%  -0.18%  (p=0.003 n=10+10)
XML           42.6MB ± 0%     42.6MB ± 0%    ~     (p=0.143 n=10+10)

name       old allocs/op   new allocs/op   delta
Template        401k ± 0%       401k ± 1%    ~     (p=0.353 n=10+10)
Unicode         322k ± 0%       322k ± 0%    ~     (p=0.739 n=10+10)
GoTypes        1.18M ± 0%      1.18M ± 0%  +0.25%  (p=0.001 n=7+8)
Compiler       4.51M ± 0%      4.53M ± 0%  +0.37%  (p=0.000 n=10+10)
SSA            7.91M ± 0%      7.93M ± 0%  +0.20%  (p=0.000 n=9+10)
Flate           244k ± 0%       245k ± 0%    ~     (p=0.123 n=10+10)
GoParser        323k ± 1%       324k ± 1%  +0.40%  (p=0.035 n=10+10)
Reflect        1.01M ± 0%      1.02M ± 0%  +0.37%  (p=0.000 n=10+9)
Tar             258k ± 1%       258k ± 1%    ~     (p=0.661 n=10+9)
XML             403k ± 0%       405k ± 0%  +0.47%  (p=0.004 n=10+10)

Updates #15756
Updates #19250

Change-Id: I647bfbb745c35630447eb79dfcaa994b490ce942
Reviewed-on: https://go-review.googlesource.com/38159
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-15 21:44:57 +00:00
philhofer
295307ae78 cmd/compile: de-virtualize interface calls
With this change, code like

    h := sha1.New()
    h.Write(buf)
    sum := h.Sum()

gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.

The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.

The most common pattern identified by the compiler
is a constructor like

    func New() Interface { return &impl{...} }

where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.

Some existing benchmarks that change on darwin/amd64:

Crc64/ISO4KB-8        2.67µs ± 1%    2.66µs ± 0%  -0.36%  (p=0.015 n=10+10)
Crc64/ISO1KB-8         694ns ± 0%     690ns ± 1%  -0.59%  (p=0.001 n=10+10)
Adler32KB-8            473ns ± 1%     471ns ± 0%  -0.39%  (p=0.010 n=10+9)

On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.

Updates #19361

Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-14 18:49:23 +00:00
David Chase
b59a405656 Revert "cmd/compile: de-virtualize interface calls"
This reverts commit 4e0c7c3f61475116c4ae8d11ef796819d9c404f0.

Reason for revert: The presence-of-optimization test program is fragile, breaks under noopt, and might break if the Go libraries are tweaked.  It needs to be (re)written without reference to other packages.

Change-Id: I3aaf1ab006a1a255f961a978e9c984341740e3c7
Reviewed-on: https://go-review.googlesource.com/38097
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 21:15:32 +00:00
Matthew Dempsky
118b3fe7bb cmd/compile/internal/gc: refactor ACALL Prog creation
This abstracts creation of ACALL Progs into package gc. The main
benefit of this today is we can refactor away a lot of common
boilerplate code.

Later, once liveness analysis happens on the SSA graph, this will also
provide an easy insertion point for emitting the PCDATA Progs
immediately before call instructions.

Passes toolstash-check -all.

Change-Id: Ia15108ace97201cd84314f1ca916dfeb4f09d61c
Reviewed-on: https://go-review.googlesource.com/38081
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 21:04:16 +00:00
Matthew Dempsky
08d8d5c986 cmd/compile/internal/ssa: replace {Defer,Go}Call with StaticCall
Passes toolstash-check -all.

Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2
Reviewed-on: https://go-review.googlesource.com/38080
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 19:44:36 +00:00
Philip Hofer
4e0c7c3f61 cmd/compile: de-virtualize interface calls
With this change, code like

    h := sha1.New()
    h.Write(buf)
    sum := h.Sum()

gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.

The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.

The most common pattern identified by the compiler
is a constructor like

    func New() Interface { return &impl{...} }

where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.

Some existing benchmarks that change on darwin/amd64:

Crc64/ISO4KB-8        2.67µs ± 1%    2.66µs ± 0%  -0.36%  (p=0.015 n=10+10)
Crc64/ISO1KB-8         694ns ± 0%     690ns ± 1%  -0.59%  (p=0.001 n=10+10)
Adler32KB-8            473ns ± 1%     471ns ± 0%  -0.39%  (p=0.010 n=10+9)

On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.

Updates #19361

Change-Id: Ia9d30afdd5f6b4d38d38b14b88f308acae8ce7ed
Reviewed-on: https://go-review.googlesource.com/37751
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 18:24:57 +00:00
Josh Bleecher Snyder
3dcfce8d19 cmd/compile: add OpOffPtr [c] SP to constant cache
They accounted for almost 30% of all CSE'd values.

By never creating the duplicates in the first place,
we reduce the high water mark of Value IDs,
which in turn makes all SSA phases cheaper,
particularly regalloc.

name       old time/op     new time/op     delta
Template       200ms ± 3%      198ms ± 4%  -0.87%  (p=0.016 n=50+49)
Unicode       86.9ms ± 2%     85.5ms ± 3%  -1.56%  (p=0.000 n=49+50)
GoTypes        553ms ± 4%      551ms ± 4%    ~     (p=0.183 n=50+49)
SSA            3.97s ± 3%      3.93s ± 2%  -1.06%  (p=0.000 n=48+48)
Flate          124ms ± 4%      124ms ± 3%    ~     (p=0.545 n=48+50)
GoParser       146ms ± 4%      146ms ± 4%    ~     (p=0.810 n=49+49)
Reflect        357ms ± 3%      355ms ± 3%  -0.59%  (p=0.049 n=50+48)
Tar            106ms ± 4%      107ms ± 5%    ~     (p=0.454 n=49+50)
XML            203ms ± 4%      203ms ± 4%    ~     (p=0.726 n=48+50)

name       old user-ns/op  new user-ns/op  delta
Template        237M ± 3%       235M ± 4%    ~     (p=0.208 n=47+48)
Unicode         111M ± 4%       108M ± 9%  -2.50%  (p=0.000 n=47+50)
GoTypes         736M ± 5%       729M ± 4%  -0.95%  (p=0.017 n=50+46)
SSA            5.73G ± 4%      5.74G ± 4%    ~     (p=0.765 n=50+50)
Flate           150M ± 5%       148M ± 6%  -0.89%  (p=0.045 n=48+47)
GoParser        180M ± 5%       178M ± 7%  -1.34%  (p=0.012 n=50+50)
Reflect         450M ± 4%       444M ± 4%  -1.40%  (p=0.000 n=50+49)
Tar             124M ± 7%       123M ± 7%    ~     (p=0.092 n=50+50)
XML             248M ± 6%       245M ± 5%    ~     (p=0.057 n=50+50)

name       old alloc/op    new alloc/op    delta
Template      39.4MB ± 0%     39.3MB ± 0%  -0.37%  (p=0.000 n=50+50)
Unicode       30.9MB ± 0%     30.9MB ± 0%  -0.27%  (p=0.000 n=48+50)
GoTypes        114MB ± 0%      113MB ± 0%  -1.03%  (p=0.000 n=50+49)
SSA            882MB ± 0%      865MB ± 0%  -1.95%  (p=0.000 n=49+49)
Flate         25.8MB ± 0%     25.7MB ± 0%  -0.21%  (p=0.000 n=50+50)
GoParser      31.7MB ± 0%     31.6MB ± 0%  -0.33%  (p=0.000 n=50+50)
Reflect       79.7MB ± 0%     79.3MB ± 0%  -0.49%  (p=0.000 n=44+49)
Tar           27.2MB ± 0%     27.1MB ± 0%  -0.31%  (p=0.000 n=50+50)
XML           42.7MB ± 0%     42.3MB ± 0%  -1.05%  (p=0.000 n=48+49)

name       old allocs/op   new allocs/op   delta
Template        379k ± 1%       380k ± 1%  +0.26%  (p=0.000 n=50+50)
Unicode         324k ± 1%       324k ± 1%    ~     (p=0.964 n=49+50)
GoTypes        1.14M ± 0%      1.15M ± 0%  +0.14%  (p=0.000 n=50+49)
SSA            7.89M ± 0%      7.89M ± 0%  -0.05%  (p=0.000 n=49+49)
Flate           240k ± 1%       241k ± 1%  +0.27%  (p=0.001 n=50+50)
GoParser        310k ± 1%       311k ± 1%  +0.48%  (p=0.000 n=50+49)
Reflect        1.00M ± 0%      1.00M ± 0%  +0.17%  (p=0.000 n=48+50)
Tar             254k ± 1%       255k ± 1%  +0.23%  (p=0.005 n=50+50)
XML             395k ± 1%       395k ± 1%  +0.19%  (p=0.002 n=49+47)

Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f
Reviewed-on: https://go-review.googlesource.com/38003
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-10 16:50:58 +00:00
Josh Bleecher Snyder
678f35b676 cmd/compile: fix SSA type for first runtime call arg/result
CLs 37254 and 37869 contained similar fixes.

Change-Id: I0cbf01c691b54d82acef398489df6e9c89ebb83f
Reviewed-on: https://go-review.googlesource.com/38000
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
2017-03-10 01:02:14 +00:00
Keith Randall
67d69f899c cmd/compile: set base register of spill/restore to SP
Previously the base register was unset, which lead to the disassembler
using "FP" instead of "SP" as the base register.  That lead to some
confusion as to what the difference betweeen the two was.
Be consistent and always use SP.

Fixes #19458

Change-Id: Ie8f8ee54653bd202c0cf6fbf1d350e3c8c8b67a0
Reviewed-on: https://go-review.googlesource.com/37971
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-09 01:19:00 +00:00
David Chase
d71f36b5aa cmd/compile: check loop rescheduling with stack bound, not counter
After benchmarking with a compiler modified to have better
spill location, it became clear that this method of checking
was actually faster on (at least) two different architectures
(ppc64 and amd64) and it also provides more timely interruption
of loops.

This change adds a modified FOR loop node "FORUNTIL" that
checks after executing the loop body instead of before (i.e.,
always at least once).  This ensures that a pointer past the
end of a slice or array is not made visible to the garbage
collector.

Without the rescheduling checks inserted, the restructured
loop from this  change apparently provides a 1% geomean
improvement on PPC64 running the go1 benchmarks; the
improvement on AMD64 is only 0.12%.

Inserting the rescheduling check exposed some peculiar bug
with the ssa test code for s390x; this was updated based on
initial code actually generated for GOARCH=s390x to use
appropriate OpArg, OpAddr, and OpVarDef.

NaCl is disabled in testing.

Change-Id: Ieafaa9a61d2a583ad00968110ef3e7a441abca50
Reviewed-on: https://go-review.googlesource.com/36206
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 18:52:12 +00:00
Matthew Dempsky
5e4a958351 cmd/compile: refactor portable SSA Op handling
Several SSA ops will always behave identically regardless of target
architecture, so handle those within gc/ssa.go instead.

Passes toolstash-check -all.

Change-Id: I54d514e80ab86723e44332a5a38e3054cbca8c5d
Reviewed-on: https://go-review.googlesource.com/37931
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-07 21:17:03 +00:00
Matthew Dempsky
c310c688ff cmd/compile, runtime: simplify multiway select implementation
This commit reworks multiway select statements to use normal control
flow primitives instead of the previous setjmp/longjmp-like behavior.
This simplifies liveness analysis and should prevent issues around
"returns twice" function calls within SSA passes.

test/live.go is updated because liveness analysis's CFG is more
representative of actual control flow. The case bodies are the only
real successors of the selectgo call, but previously the selectsend,
selectrecv, etc. calls were included in the successors list too.

Updates #19331.

Change-Id: I7f879b103a4b85e62fc36a270d812f54c0aa3e83
Reviewed-on: https://go-review.googlesource.com/37661
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-07 20:14:17 +00:00
Josh Bleecher Snyder
4e428907c5 cmd/compile: avoid generating some dead blocks
We generate a lot of pointless dead blocks
during the AST to SSA conversion.
There are a few commonly occurring kinds
of statements that contain neither variables
nor code and that switch to a new block themselves.
Stop making dead blocks for them.

For the code in #19379, this reduces compilation
wall time by 36% and max rss by 28%.

This also helps a little for regular code,
particularly code heavy on switch statements.

name       old time/op      new time/op      delta
Template        231ms ± 3%       230ms ± 5%    ~     (p=0.402 n=17+16)
Unicode         101ms ± 4%       103ms ± 5%    ~     (p=0.221 n=19+18)
GoTypes         635ms ± 5%       625ms ± 4%    ~     (p=0.063 n=20+18)
Compiler        2.93s ± 2%       2.89s ± 2%  -1.22%  (p=0.003 n=20+19)
SSA             4.53s ± 3%       4.52s ± 3%    ~     (p=0.380 n=20+19)
Flate           132ms ± 4%       133ms ± 5%    ~     (p=0.647 n=20+19)
GoParser        161ms ± 3%       161ms ± 4%    ~     (p=0.749 n=20+19)
Reflect         403ms ± 4%       397ms ± 3%  -1.53%  (p=0.030 n=20+19)
Tar             121ms ± 2%       121ms ± 8%    ~     (p=0.544 n=19+19)
XML             225ms ± 3%       224ms ± 4%    ~     (p=0.396 n=20+19)

name       old user-ns/op   new user-ns/op   delta
Template   302user-ms ± 1%  297user-ms ± 7%  -1.49%  (p=0.048 n=15+18)
Unicode    142user-ms ± 3%  143user-ms ± 5%    ~     (p=0.363 n=19+17)
GoTypes    852user-ms ± 5%  851user-ms ± 3%    ~     (p=0.851 n=20+18)
Compiler   4.11user-s ± 6%  3.98user-s ± 3%  -3.08%  (p=0.000 n=20+19)
SSA        6.91user-s ± 5%  6.82user-s ± 7%    ~     (p=0.113 n=20+19)
Flate      164user-ms ± 4%  168user-ms ± 4%  +2.42%  (p=0.001 n=18+19)
GoParser   207user-ms ± 4%  206user-ms ± 4%    ~     (p=0.176 n=20+18)
Reflect    509user-ms ± 4%  505user-ms ± 4%    ~     (p=0.113 n=20+19)
Tar        153user-ms ± 7%  151user-ms ± 9%    ~     (p=0.283 n=20+19)
XML        284user-ms ± 4%  282user-ms ± 4%    ~     (p=0.270 n=20+19)

name       old alloc/op     new alloc/op     delta
Template       42.6MB ± 0%      41.9MB ± 0%  -1.55%  (p=0.000 n=19+19)
Unicode        31.7MB ± 0%      31.7MB ± 0%    ~     (p=0.828 n=20+18)
GoTypes         124MB ± 0%       121MB ± 0%  -2.11%  (p=0.000 n=20+17)
Compiler        534MB ± 0%       523MB ± 0%  -2.06%  (p=0.000 n=20+19)
SSA             989MB ± 0%       977MB ± 0%  -1.28%  (p=0.000 n=20+19)
Flate          27.8MB ± 0%      27.5MB ± 0%  -0.98%  (p=0.000 n=20+19)
GoParser       34.3MB ± 0%      34.0MB ± 0%  -0.81%  (p=0.000 n=20+19)
Reflect        84.6MB ± 0%      82.9MB ± 0%  -2.00%  (p=0.000 n=17+18)
Tar            28.8MB ± 0%      28.3MB ± 0%  -1.52%  (p=0.000 n=16+18)
XML            47.2MB ± 0%      45.8MB ± 0%  -2.99%  (p=0.000 n=20+19)

name       old allocs/op    new allocs/op    delta
Template         421k ± 1%        419k ± 1%  -0.41%  (p=0.001 n=20+19)
Unicode          338k ± 1%        338k ± 1%    ~     (p=0.478 n=20+19)
GoTypes         1.28M ± 0%       1.28M ± 0%  -0.36%  (p=0.000 n=20+18)
Compiler        5.06M ± 0%       5.03M ± 0%  -0.63%  (p=0.000 n=20+19)
SSA             9.14M ± 0%       9.11M ± 0%  -0.34%  (p=0.000 n=20+19)
Flate            267k ± 1%        266k ± 1%    ~     (p=0.149 n=20+19)
GoParser         347k ± 0%        347k ± 1%    ~     (p=0.103 n=19+19)
Reflect         1.07M ± 0%       1.07M ± 0%  -0.42%  (p=0.000 n=16+18)
Tar              274k ± 0%        273k ± 1%    ~     (p=0.116 n=19+19)
XML              449k ± 0%        446k ± 1%  -0.60%  (p=0.000 n=20+19)

Updates #19379

Change-Id: Ie798c347a0c081f5e349e1529880bebaae290967
Reviewed-on: https://go-review.googlesource.com/37760
Reviewed-by: David Chase <drchase@google.com>
2017-03-06 18:31:03 +00:00
Matthew Dempsky
870d079c76 cmd/compile/internal/gc: replace Node.Ullman with Node.HasCall
Since switching to SSA, the only remaining use for the Ullman field
was in tracking whether or not an expression contained a function
call. Give it a new name and encode it in our fancy new bitset field.

Passes toolstash-check.

Change-Id: I95b7f9cb053856320c0d66efe14996667e6011c2
Reviewed-on: https://go-review.googlesource.com/37721
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-03 22:35:44 +00:00
Aliaksandr Valialkin
ed70f37e73 cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets
This reduces compiler memory usage by up to 4% - see compilebench
results below.

name       old time/op     new time/op     delta
Template       245ms ± 4%      241ms ± 2%  -1.88%  (p=0.029 n=10+10)
Unicode        126ms ± 3%      124ms ± 3%    ~     (p=0.105 n=10+10)
GoTypes        805ms ± 2%      813ms ± 3%    ~     (p=0.515 n=8+10)
Compiler       3.95s ± 2%      3.83s ± 1%  -2.96%  (p=0.000 n=9+10)
MakeBash       47.4s ± 4%      46.6s ± 1%  -1.59%  (p=0.028 n=9+10)

name       old user-ns/op  new user-ns/op  delta
Template        324M ± 5%       326M ± 3%    ~     (p=0.935 n=10+10)
Unicode         186M ± 5%       178M ±10%    ~     (p=0.067 n=9+10)
GoTypes        1.08G ± 7%      1.09G ± 4%    ~     (p=0.956 n=10+10)
Compiler       5.34G ± 4%      5.31G ± 1%    ~     (p=0.501 n=10+8)

name       old alloc/op    new alloc/op    delta
Template      41.0MB ± 0%     39.8MB ± 0%  -3.03%  (p=0.000 n=10+10)
Unicode       32.3MB ± 0%     31.0MB ± 0%  -4.13%  (p=0.000 n=10+10)
GoTypes        119MB ± 0%      116MB ± 0%  -2.39%  (p=0.000 n=10+10)
Compiler       499MB ± 0%      487MB ± 0%  -2.48%  (p=0.000 n=10+10)

name       old allocs/op   new allocs/op   delta
Template        380k ± 1%       379k ± 1%    ~     (p=0.436 n=10+10)
Unicode         324k ± 1%       324k ± 0%    ~     (p=0.853 n=10+10)
GoTypes        1.15M ± 0%      1.15M ± 0%    ~     (p=0.481 n=10+10)
Compiler       4.41M ± 0%      4.41M ± 0%  -0.12%  (p=0.007 n=10+10)

name       old text-bytes  new text-bytes  delta
HelloSize       623k ± 0%       623k ± 0%    ~     (all equal)
CmdGoSize      6.64M ± 0%      6.64M ± 0%    ~     (all equal)

name       old data-bytes  new data-bytes  delta
HelloSize      5.81k ± 0%      5.81k ± 0%    ~     (all equal)
CmdGoSize       238k ± 0%       238k ± 0%    ~     (all equal)

name       old bss-bytes   new bss-bytes   delta
HelloSize       134k ± 0%       134k ± 0%    ~     (all equal)
CmdGoSize       152k ± 0%       152k ± 0%    ~     (all equal)

name       old exe-bytes   new exe-bytes   delta
HelloSize       967k ± 0%       967k ± 0%    ~     (all equal)
CmdGoSize      10.2M ± 0%      10.2M ± 0%    ~     (all equal)

Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52
Reviewed-on: https://go-review.googlesource.com/37445
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-03 21:06:03 +00:00
Matthew Dempsky
d8a0f74801 cmd/compile/internal/gc: remove OHMUL Op
Previously the compiler rewrote constant division into OHMUL
operations, but that rewriting was moved to SSA in CL 37015. Now OHMUL
is unused, so we can get rid of it.

Change-Id: Ib6fc7c2b6435510bafb5735b3b4f42cfd8ed8cdb
Reviewed-on: https://go-review.googlesource.com/37750
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-03 17:47:53 +00:00
Cherry Zhang
5bfd1ef036 cmd/compile: get rid of "volatile" in SSA
A value is "volatile" if it is a pointer to the argument region
on stack which will be clobbered by function call. This is used
to make sure the value is safe when inserting write barrier calls.
The writebarrier pass can tell whether a value is such a pointer.
Therefore no need to mark it when building SSA and thread this
information through.

Passes "toolstash -cmp" on std.

Updates #17583.

Change-Id: Idc5fc0d710152b94b3c504ce8db55ea9ff5b5195
Reviewed-on: https://go-review.googlesource.com/36835
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-03 13:26:15 +00:00
Matthew Dempsky
0ee9c46cb1 cmd/compile: add missing WBs for reflect.{Slice,String}Header.Data
Fixes #19168.

Change-Id: I3f4fcc0b189c53819ac29ef8de86fdad76a17488
Reviewed-on: https://go-review.googlesource.com/37663
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-02 17:21:50 +00:00
Lynn Boger
95c9583a18 cmd/compile: intrinsify atomics on ppc64x
This adds the necessary changes so that atomics are treated as
intrinsics on ppc64x.

The implementations of And8 and Or8 require power8 for
both ppc64 and ppc64le.  This is a new requirement
for ppc64.

Fixes #8739

Change-Id: Icb85e2755a49166ee3652668279f6ed5ebbca901
Reviewed-on: https://go-review.googlesource.com/36832
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-01 19:56:01 +00:00
Michael Munday
bd8a39b67a cmd/compile: emit fused multiply-{add,subtract} instructions on s390x
Explcitly block fused multiply-add pattern matching when a cast is used
after the multiplication, for example:

    - (a * b) + c        // can emit fused multiply-add
    - float64(a * b) + c // cannot emit fused multiply-add

float{32,64} and complex{64,128} casts of matching types are now kept
as OCONV operations rather than being replaced with OCONVNOP operations
because they now imply a rounding operation (and therefore aren't a
no-op anymore).

Operations (for example, multiplication) on complex types may utilize
fused multiply-add and -subtract instructions internally. There is no
way to disable this behavior at the moment.

Improves the performance of the floating point implementation of
poly1305:

name         old speed     new speed     delta
64           246MB/s ± 0%  275MB/s ± 0%  +11.48%   (p=0.000 n=10+8)
1K           312MB/s ± 0%  357MB/s ± 0%  +14.41%  (p=0.000 n=10+10)
64Unaligned  246MB/s ± 0%  274MB/s ± 0%  +11.43%  (p=0.000 n=10+10)
1KUnaligned  312MB/s ± 0%  357MB/s ± 0%  +14.39%   (p=0.000 n=10+8)

Updates #17895.

Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16
Reviewed-on: https://go-review.googlesource.com/36963
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-28 15:34:20 +00:00
Michael Munday
10d718b983 cmd/compile: fix type of OffPtr generated by ODOTPTR
The type of the OffPtr should be consistent with the type of the
following load. Before this CL it was typed as a pointer to the
struct.

Fixes #19164.

Change-Id: Ibcdec4411c6f719702f76f8dba3cce8691bfbe0c
Reviewed-on: https://go-review.googlesource.com/37254
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-02-21 19:28:38 +00:00
Matthew Dempsky
c61cf5e6b7 cmd/compile/internal/gc: remove Node.IsStatic field
We can immediately emit static assignment data rather than queueing
them up to be processed during SSA building.

Passes toolstash -cmp.

Change-Id: I8bcea4b72eafb0cc0b849cd93e9cde9d84f30d5e
Reviewed-on: https://go-review.googlesource.com/37024
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-17 22:06:52 +00:00
Cherry Zhang
c4b8dadb40 cmd/compile: fix some types in SSA
These seem not to really matter, but good to be correct.

Change-Id: I02edb9797c3d6739725cfbe4723c75f151acd05e
Reviewed-on: https://go-review.googlesource.com/36837
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-17 19:20:46 +00:00
Cherry Zhang
c4ef597c47 cmd/compile: redo writebarrier pass
SSA's writebarrier pass requires WB store ops are always at the
end of a block. If we move write barrier insertion into SSA and
emits normal Store ops when building SSA, this requirement becomes
impractical -- it will create too many blocks for all the Store
ops.

Redo SSA's writebarrier pass, explicitly order values in store
order, so it no longer needs this requirement.

Updates #17583.
Fixes #19067.

Change-Id: I66e817e526affb7e13517d4245905300a90b7170
Reviewed-on: https://go-review.googlesource.com/36834
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-02-17 19:20:25 +00:00
Matthew Dempsky
794f1ebff7 cmd/compile: simplify needwritebarrier
Currently, whether we need a write barrier is simply a property of the
pointer slot being written to.

The only optimization we currently apply using the value being written
is that pointers to stack variables can omit write barriers because
they're only written to stack slots... but we already omit write
barriers for all writes to the stack anyway.

Passes toolstash -cmp.

Change-Id: I7f16b71ff473899ed96706232d371d5b2b7ae789
Reviewed-on: https://go-review.googlesource.com/37109
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-16 22:42:36 +00:00