cybercyst/go - go - Gitea: Git with a cup of tea

mirror of https://github.com/golang/go.git synced 2025-05-27 18:31:35 +00:00

Author	SHA1	Message	Date
Russ Cox	47ce87877b	all: merge dev.inline into master Change-Id: I7715581a04e513dcda9918e853fa6b1ddc703770	2017-02-01 09:47:23 -05:00
Robert Griesemer	472c792e0a	[dev.inline] cmd/internal/src: introduce compact source position representation XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-01-09 22:43:22 +00:00
shawnps	067bab00a8	all: fix misspellings Change-Id: I429637ca91f7db4144f17621de851a548dc1ce76 Reviewed-on: https://go-review.googlesource.com/34923 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-01-07 16:53:25 +00:00
Robert Griesemer	eaca0e0529	[dev.inline] cmd/internal/src: introduce NoPos and use it instead Pos{} Using a variable instead of a composite literal makes the code independent of implementation changes of Pos. Per David Lazar's suggestion. Change-Id: I336967ac12a027c51a728a58ac6207cb5119af4a Reviewed-on: https://go-review.googlesource.com/34148 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2016-12-09 00:35:07 +00:00
Robert Griesemer	c10499b539	[dev.inline] cmd/compile/internal/ssa: another round of renames from line -> pos (cleanup) Mostly mechanical renames. Make variable names consistent with use. Change-Id: Iaa89d31deab11eca6e784595b58e779ad525c8a3 Reviewed-on: https://go-review.googlesource.com/34146 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-12-08 23:10:30 +00:00
Robert Griesemer	cfd17f51c8	[dev.inline] cmd/compile/internal/ssa: rename various fields from Line to Pos This is a mostly mechanical rename followed by manual fixes where necessary. Change-Id: Ie5c670b133db978f15dc03e50dc2da0c80fc8842 Reviewed-on: https://go-review.googlesource.com/34137 Reviewed-by: David Lazar <lazard@golang.org>	2016-12-08 21:36:52 +00:00
Robert Griesemer	82d0caea2c	[dev.inline] cmd/internal/src: make Pos implementation abstract Adjust cmd/compile accordingly. This will make it easier to replace the underlying implementation. Change-Id: I33645850bb18c839b24785b6222a9e028617addb Reviewed-on: https://go-review.googlesource.com/34133 Reviewed-by: David Lazar <lazard@golang.org>	2016-12-08 21:31:28 +00:00
Robert Griesemer	24597c080b	[dev.inline] cmd/compile: introduce cmd/internal/src.Pos type for line numbers This is a step toward chosing a different position representation. By introducing an explicit type, it will be easier to make the transition step-wise while ensuring everything keeps running. This has been reviewed via https://go-review.googlesource.com/#/c/34025/. Change-Id: Ibceddcd62d8f346321ac3250e3940e9c436ed684 Reviewed-on: https://go-review.googlesource.com/34132 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Lazar <lazard@golang.org>	2016-12-08 21:26:25 +00:00
Cherry Zhang	348275cda6	cmd/compile: make a copy of Phi input if it is still live Register of Phi input is allocated to the Phi. So if the Phi input is still live after Phi, we may need to use a spill. In this case, copy the Phi input to a spare register to avoid a spill. Originally targeted the code in issue #16187, and this CL indeed removes the spill, but doesn't seem to help on benchmark result. It may help in general, though. On AMD64: name old time/op new time/op delta BinaryTree17-12 2.79s ± 1% 2.76s ± 0% -1.33% (p=0.000 n=10+10) Fannkuch11-12 3.02s ± 0% 3.14s ± 0% +3.99% (p=0.000 n=10+10) FmtFprintfEmpty-12 51.2ns ± 0% 51.4ns ± 3% ~ (p=0.368 n=8+10) FmtFprintfString-12 145ns ± 0% 144ns ± 0% -0.69% (p=0.000 n=6+9) FmtFprintfInt-12 127ns ± 1% 124ns ± 1% -2.79% (p=0.000 n=10+9) FmtFprintfIntInt-12 186ns ± 0% 184ns ± 0% -1.34% (p=0.000 n=10+9) FmtFprintfPrefixedInt-12 196ns ± 0% 194ns ± 0% -0.97% (p=0.000 n=9+9) FmtFprintfFloat-12 293ns ± 2% 287ns ± 0% -2.00% (p=0.000 n=10+9) FmtManyArgs-12 847ns ± 1% 829ns ± 0% -2.17% (p=0.000 n=10+7) GobDecode-12 7.17ms ± 0% 7.18ms ± 0% ~ (p=0.123 n=10+10) GobEncode-12 6.08ms ± 1% 6.08ms ± 0% ~ (p=0.497 n=10+9) Gzip-12 277ms ± 1% 275ms ± 1% -0.47% (p=0.028 n=10+9) Gunzip-12 39.1ms ± 2% 38.2ms ± 1% -2.20% (p=0.000 n=10+9) HTTPClientServer-12 90.9µs ± 4% 87.7µs ± 2% -3.51% (p=0.001 n=9+10) JSONEncode-12 17.3ms ± 1% 16.5ms ± 0% -5.02% (p=0.000 n=9+9) JSONDecode-12 54.6ms ± 1% 54.1ms ± 0% -0.99% (p=0.000 n=9+9) Mandelbrot200-12 4.45ms ± 0% 4.45ms ± 0% -0.02% (p=0.006 n=8+9) GoParse-12 3.44ms ± 0% 3.48ms ± 1% +0.95% (p=0.000 n=10+10) RegexpMatchEasy0_32-12 84.9ns ± 0% 85.0ns ± 0% ~ (p=0.241 n=8+8) RegexpMatchEasy0_1K-12 867ns ± 3% 915ns ±11% +5.55% (p=0.037 n=10+10) RegexpMatchEasy1_32-12 82.7ns ± 5% 83.9ns ± 4% ~ (p=0.161 n=9+10) RegexpMatchEasy1_1K-12 361ns ± 1% 363ns ± 0% ~ (p=0.098 n=10+8) RegexpMatchMedium_32-12 126ns ± 0% 126ns ± 1% ~ (p=0.549 n=8+10) RegexpMatchMedium_1K-12 38.8µs ± 0% 39.1µs ± 0% +0.67% (p=0.000 n=9+8) RegexpMatchHard_32-12 1.95µs ± 0% 1.96µs ± 0% +0.43% (p=0.000 n=9+9) RegexpMatchHard_1K-12 59.0µs ± 0% 59.1µs ± 0% +0.27% (p=0.000 n=10+9) Revcomp-12 436ms ± 1% 431ms ± 1% -1.19% (p=0.005 n=10+10) Template-12 56.7ms ± 1% 57.1ms ± 1% +0.71% (p=0.001 n=10+9) TimeParse-12 312ns ± 0% 310ns ± 0% -0.80% (p=0.000 n=10+9) TimeFormat-12 336ns ± 0% 332ns ± 0% -1.19% (p=0.000 n=8+7) [Geo mean] 59.2µs 58.9µs -0.42% On PPC64: name old time/op new time/op delta BinaryTree17-2 4.67s ± 2% 4.71s ± 1% ~ (p=0.421 n=5+5) Fannkuch11-2 3.92s ± 1% 3.94s ± 0% +0.46% (p=0.032 n=5+5) FmtFprintfEmpty-2 122ns ± 0% 120ns ± 2% -1.80% (p=0.016 n=4+5) FmtFprintfString-2 305ns ± 1% 299ns ± 1% -1.84% (p=0.008 n=5+5) FmtFprintfInt-2 243ns ± 0% 241ns ± 1% -0.66% (p=0.016 n=4+5) FmtFprintfIntInt-2 361ns ± 1% 356ns ± 1% -1.49% (p=0.016 n=5+5) FmtFprintfPrefixedInt-2 355ns ± 1% 357ns ± 1% ~ (p=0.333 n=5+5) FmtFprintfFloat-2 502ns ± 2% 498ns ± 1% ~ (p=0.151 n=5+5) FmtManyArgs-2 1.55µs ± 2% 1.59µs ± 1% +2.52% (p=0.008 n=5+5) GobDecode-2 13.0ms ± 1% 13.0ms ± 1% ~ (p=0.841 n=5+5) GobEncode-2 11.8ms ± 1% 11.8ms ± 1% ~ (p=0.690 n=5+5) Gzip-2 499ms ± 1% 503ms ± 0% ~ (p=0.421 n=5+5) Gunzip-2 86.5ms ± 0% 86.4ms ± 1% ~ (p=0.841 n=5+5) HTTPClientServer-2 68.2µs ± 2% 69.6µs ± 3% ~ (p=0.151 n=5+5) JSONEncode-2 39.0ms ± 1% 37.2ms ± 1% -4.65% (p=0.008 n=5+5) JSONDecode-2 122ms ± 1% 126ms ± 1% +2.63% (p=0.008 n=5+5) Mandelbrot200-2 6.08ms ± 1% 5.89ms ± 1% -3.06% (p=0.008 n=5+5) GoParse-2 5.95ms ± 2% 5.98ms ± 1% ~ (p=0.421 n=5+5) RegexpMatchEasy0_32-2 331ns ± 1% 328ns ± 1% ~ (p=0.056 n=5+5) RegexpMatchEasy0_1K-2 1.45µs ± 0% 1.47µs ± 0% +1.13% (p=0.008 n=5+5) RegexpMatchEasy1_32-2 359ns ± 0% 353ns ± 0% -1.84% (p=0.008 n=5+5) RegexpMatchEasy1_1K-2 1.79µs ± 0% 1.81µs ± 1% +1.16% (p=0.008 n=5+5) RegexpMatchMedium_32-2 420ns ± 2% 413ns ± 0% -1.72% (p=0.008 n=5+5) RegexpMatchMedium_1K-2 70.2µs ± 1% 69.5µs ± 1% -1.09% (p=0.032 n=5+5) RegexpMatchHard_32-2 3.87µs ± 1% 3.65µs ± 0% -5.86% (p=0.008 n=5+5) RegexpMatchHard_1K-2 111µs ± 0% 105µs ± 0% -5.49% (p=0.016 n=5+4) Revcomp-2 1.00s ± 1% 1.01s ± 2% ~ (p=0.151 n=5+5) Template-2 113ms ± 1% 113ms ± 2% ~ (p=0.841 n=5+5) TimeParse-2 555ns ± 0% 550ns ± 1% -0.87% (p=0.032 n=5+5) TimeFormat-2 736ns ± 1% 704ns ± 1% -4.35% (p=0.008 n=5+5) [Geo mean] 120µs 119µs -0.77% Reduce "spilled value remains" by 0.6% in cmd/go on AMD64. Change-Id: If655df343b0f30d1a49ab1ab644f10c698b96f3e Reviewed-on: https://go-review.googlesource.com/32442 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-11-18 13:56:23 +00:00
Cherry Zhang	f9238a76ff	cmd/compile: make LR allocatable in non-leaf functions on ARM The mechanism is initially introduced (and reviewed) in CL 30597 on S390X. Reduce number of "spilled value remains" by 0.4% in cmd/go. Disabled on ARMv5 because LR is clobbered almost everywhere with inserted softfloat calls. Change-Id: I2934737ce2455909647ed2118fe2bd6f0aa5ac52 Reviewed-on: https://go-review.googlesource.com/32178 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-10-28 14:25:33 +00:00
Michael Munday	15817e409b	cmd/compile: make link register allocatable in non-leaf functions We save and restore the link register in non-leaf functions because it is clobbered by CALLs. It is therefore available for general purpose use. Only enabled on s390x currently. The RC4 benchmarks in particular benefit from the extra register: name old speed new speed delta RC4_128 243MB/s ± 2% 341MB/s ± 2% +40.46% (p=0.008 n=5+5) RC4_1K 267MB/s ± 0% 359MB/s ± 1% +34.32% (p=0.008 n=5+5) RC4_8K 271MB/s ± 0% 362MB/s ± 0% +33.61% (p=0.008 n=5+5) Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f Reviewed-on: https://go-review.googlesource.com/30597 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-11 18:52:35 +00:00
Keith Randall	93d5f43a29	cmd/compile: do regalloc check only when checkEnabled No point doing this check all the time. Fixes #15621 Change-Id: I1966c061986fe98fe9ebe146d6b9738c13cef724 Reviewed-on: https://go-review.googlesource.com/30670 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-07 17:33:15 +00:00
Keith Randall	1bddd2ee6a	cmd/compile: don't shuffle rematerializeable values around Better to just rematerialize them when needed instead of cross-register spilling or other techniques for keeping them in registers. This helps for amd64 code that does 1 << x. It is better to do loop: MOVQ $1, AX // materialize arg to SLLQ SLLQ CX, AX ... goto loop than to do MOVQ $1, AX // materialize outsize of loop loop: MOVQ AX, DX // save value that's about to be clobbered SLLQ CX, AX MOVQ DX, AX // move it back to the correct register goto loop Update #16092 Change-Id: If7ac290208f513061ebb0736e8a79dcb0ba338c0 Reviewed-on: https://go-review.googlesource.com/30471 TryBot-Result: Gobot Gobot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-10-06 02:46:43 +00:00
Matthew Dempsky	6dc356a76a	cmd/compile/internal/ssa: erase register copies deterministically Fixes #17288. Change-Id: I2ddd01d14667d5c6a2e19bd70489da8d9869d308 Reviewed-on: https://go-review.googlesource.com/30072 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-09-29 22:22:36 +00:00
Cherry Zhang	f876fb9bae	cmd/compile: move value around before kick it out of register When allocating registers, before kicking out the existing value, copy it to a spare register if there is one. So later use of this value can be found in register instead of reload from spill. This is very helpful for instructions of which the input and/or output can only be in specific registers, e.g. DIV on x86, MUL/DIV on MIPS. May also be helpful in general. For "go build -a cmd/go" on AMD64, reduce "spilled value remains" by 1% (not including args, which almost certainly remain). For the code in issue #16061 on AMD64: MaxRem-12 111µs ± 1% 94µs ± 0% -15.38% (p=0.008 n=5+5) Go1 benchmark on AMD64: BinaryTree17-12 2.32s ± 2% 2.30s ± 1% ~ (p=0.421 n=5+5) Fannkuch11-12 2.52s ± 0% 2.44s ± 0% -3.44% (p=0.008 n=5+5) FmtFprintfEmpty-12 39.9ns ± 3% 39.8ns ± 0% ~ (p=0.635 n=5+4) FmtFprintfString-12 114ns ± 1% 113ns ± 1% ~ (p=0.905 n=5+5) FmtFprintfInt-12 102ns ± 6% 98ns ± 1% ~ (p=0.087 n=5+5) FmtFprintfIntInt-12 146ns ± 5% 147ns ± 1% ~ (p=0.238 n=5+5) FmtFprintfPrefixedInt-12 155ns ± 2% 151ns ± 1% -2.58% (p=0.008 n=5+5) FmtFprintfFloat-12 231ns ± 1% 232ns ± 1% ~ (p=0.286 n=5+5) FmtManyArgs-12 657ns ± 1% 649ns ± 0% -1.31% (p=0.008 n=5+5) GobDecode-12 6.35ms ± 0% 6.29ms ± 1% ~ (p=0.056 n=5+5) GobEncode-12 5.38ms ± 1% 5.45ms ± 1% ~ (p=0.056 n=5+5) Gzip-12 209ms ± 0% 209ms ± 1% ~ (p=0.690 n=5+5) Gunzip-12 31.2ms ± 1% 31.1ms ± 1% ~ (p=0.548 n=5+5) HTTPClientServer-12 123µs ± 4% 130µs ± 8% ~ (p=0.151 n=5+5) JSONEncode-12 14.0ms ± 1% 14.0ms ± 1% ~ (p=0.421 n=5+5) JSONDecode-12 41.2ms ± 1% 41.1ms ± 2% ~ (p=0.421 n=5+5) Mandelbrot200-12 3.96ms ± 1% 3.98ms ± 0% ~ (p=0.421 n=5+5) GoParse-12 2.88ms ± 1% 2.88ms ± 1% ~ (p=0.841 n=5+5) RegexpMatchEasy0_32-12 68.0ns ± 3% 66.6ns ± 1% -2.00% (p=0.024 n=5+5) RegexpMatchEasy0_1K-12 728ns ± 8% 682ns ± 1% -6.26% (p=0.008 n=5+5) RegexpMatchEasy1_32-12 66.8ns ± 2% 66.0ns ± 1% ~ (p=0.302 n=5+5) RegexpMatchEasy1_1K-12 291ns ± 2% 288ns ± 1% ~ (p=0.111 n=5+5) RegexpMatchMedium_32-12 103ns ± 2% 100ns ± 0% -2.53% (p=0.016 n=5+4) RegexpMatchMedium_1K-12 31.9µs ± 1% 31.3µs ± 0% -1.75% (p=0.008 n=5+5) RegexpMatchHard_32-12 1.59µs ± 2% 1.59µs ± 1% ~ (p=0.548 n=5+5) RegexpMatchHard_1K-12 48.3µs ± 2% 47.7µs ± 1% ~ (p=0.222 n=5+5) Revcomp-12 340ms ± 1% 338ms ± 1% ~ (p=0.421 n=5+5) Template-12 46.3ms ± 1% 46.5ms ± 1% ~ (p=0.690 n=5+5) TimeParse-12 252ns ± 1% 247ns ± 0% -1.91% (p=0.000 n=5+4) TimeFormat-12 277ns ± 1% 267ns ± 0% -3.82% (p=0.008 n=5+5) [Geo mean] 48.8µs 48.3µs -0.93% It has very little effect on binary size and compiler speed. compilebench: Template 230ms ±10% 231ms ± 8% ~ (p=0.546 n=9+9) Unicode 123ms ± 6% 124ms ± 9% ~ (p=0.481 n=10+10) GoTypes 742ms ± 6% 755ms ± 3% ~ (p=0.123 n=10+10) Compiler 3.10s ± 3% 3.08s ± 1% ~ (p=0.631 n=10+10) Fixes #16061. Change-Id: Id99cdc7a182ee10a704fa0f04e8e0d0809b2ac56 Reviewed-on: https://go-review.googlesource.com/29732 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-09-27 15:44:58 +00:00
David Chase	cddddbc623	cmd/compile: use ISEL, cleanup use of zero & extensions Abandoned earlier efforts to expose zero register, but left it in numbering to decrease squirrelyness of register allocator. ISELrelOp used in code generation of bool := x relOp y. Some patterns added to better elide zero case and some sign extension. Updates: #17109 Change-Id: Ida7839f0023ca8f0ffddc0545f0ac269e65b05d9 Reviewed-on: https://go-review.googlesource.com/29380 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-09-22 17:36:39 +00:00
Keith Randall	75ce89c20d	cmd/compile: cache CFG-dependent computations We compute a lot of stuff based off the CFG: postorder traversal, dominators, dominator tree, loop nest. Multiple phases use this information and we end up recomputing some of it. Add a cache for this information so if the CFG hasn't changed, we can reuse the previous computation. Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2 Reviewed-on: https://go-review.googlesource.com/29356 Reviewed-by: David Chase <drchase@google.com>	2016-09-19 16:00:13 +00:00
Keith Randall	833ed7c431	cmd/compile: reorganize SSA register numbering Teach SSA about the cmd/internal/obj/$ARCH register numbering. It can then return that numbering when requested. Each architecture now does not need to know anything about the internal SSA numbering of registers. Change-Id: I34472a2736227c15482e60994eebcdd2723fa52d Reviewed-on: https://go-review.googlesource.com/29249 Reviewed-by: David Chase <drchase@google.com>	2016-09-16 19:01:55 +00:00
Cherry Zhang	46ba59025f	cmd/compile: label LoadReg with line number of the use A tentative fix of #16380. It adds "line" everywhere... This also reduces binary size slightly (cmd/go on ARM as an example): before after total binary size 8068097 8018945 (-0.6%) .gopclntab 1195341 1179929 (-1.3%) .debug_line 689692 652017 (-5.5%) Change-Id: Ibda657c6999783c5bac180cbbba487006dbf0ed7 Reviewed-on: https://go-review.googlesource.com/25082 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-09-16 15:38:28 +00:00
Keith Randall	167e381f40	cmd/compile: make ssa compilation unconditional Rip out the code that allows SSA to be used conditionally. No longer exists: ssa=0 flag GOSSAHASH GOSSAPKG SSATEST GOSSAFUNC now only controls the printing of the IR/html. Still need to rip out all of the old backend. It should no longer be callable after this CL. Update #16357 Change-Id: Ib30cc18fba6ca52232c41689ba610b0a94aa74f5 Reviewed-on: https://go-review.googlesource.com/29155 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-09-14 17:38:04 +00:00
Michael Munday	6ec993adc3	cmd/compile: add SSA backend for s390x and enable by default The new SSA backend modifies the ABI slightly: R0 is now a usable general purpose register. Fixes #16677. Change-Id: I367435ce921e0c7e79e021c80cf8ef5d1d1466cf Reviewed-on: https://go-review.googlesource.com/28978 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-09-13 19:39:38 +00:00
Keith Randall	c345a3913f	cmd/compile: get rid of BlockCall No need for it, we can treat calls as (mostly) normal values that take a memory and return a memory. Lowers the number of basic blocks needed to represent a function. "go test -c net/http" uses 27% fewer basic blocks. Probably doesn't affect generated code much, but should help various passes whose running time and/or space depends on the number of basic blocks. Fixes #15631 Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6 Reviewed-on: https://go-review.googlesource.com/28950 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-09-12 23:27:02 +00:00
Cherry Zhang	4354ffd38b	cmd/compile: intrinsify Ctz, Bswap, and some atomics on ARM64 Change-Id: Ia5bf72b70e6f6522d6fb8cd050e78f862d37b5ae Reviewed-on: https://go-review.googlesource.com/27936 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-09-08 19:45:25 +00:00
Keith Randall	320ddcf834	cmd/compile: inline atomics from runtime/internal/atomic on amd64 Inline atomic reads and writes on amd64. There's no reason to pay the overhead of a call for these. To keep atomic loads from being reordered, we make them return a <value,memory> tuple. Change the meaning of resultInArg0 for tuple-generating ops to mean the first part of the result tuple, not the second. This means we can always put the store part of the tuple last, matching how arguments are laid out. This requires reordering the outputs of add32carry and sub32carry and their descendents in various architectures. benchmark old ns/op new ns/op delta BenchmarkAtomicLoad64-8 2.09 0.26 -87.56% BenchmarkAtomicStore64-8 7.54 5.72 -24.14% TBD (in a different CL): Cas, Or8, ... Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207 Reviewed-on: https://go-review.googlesource.com/27641 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-08-25 20:09:04 +00:00
Cherry Zhang	e71e1fe87e	cmd/compile: get MIPS64 SSA working - implement *, /, %, shifts, Zero, Move. - fix mistakes in comparison. - fix floating point rounding. - handle RetJmp in assembler (which was not handled, as a consequence Duff's device was disabled in the old backend.) all.bash now passes with SSA on. Updates #16359. Change-Id: Ia14eed0ed1176b5d800592080c8f53dded7fe73f Reviewed-on: https://go-review.googlesource.com/27592 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-08-25 12:53:36 +00:00
David Chase	5b9ff11c3d	cmd/compile: ppc64le working, not optimized enough This time with the cherry-pick from the proper patch of the old CL. Stack size increased. Corrected NaN-comparison glitches. Marked g register as clobbered by calls. Fixed shared libraries. live_ssa.go still disabled because of differences. Presumably turning on more optimization will fix both the stack size and the live_ssa.go glitches. Enhanced debugging output for shared libs test. Rebased onto master. Updates #16010. Change-Id: I40864faf1ef32c118fb141b7ef8e854498e6b2c4 Reviewed-on: https://go-review.googlesource.com/27159 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-08-18 16:34:47 +00:00
Keith Randall	c069bc4996	[dev.ssa] cmd/compile: implement GO386=387 Last part of the 386 SSA port. Modify the x86 backend to simulate SSE registers and instructions with 387 registers and instructions. The simulation isn't terribly performant, but it works, and the old implementation wasn't very performant either. Leaving to people who care about 387 to optimize if they want. Turn on SSA backend for 386 by default. Fixes #16358 Change-Id: I678fb59132620b2c47e993c1c10c4c21135f70c0 Reviewed-on: https://go-review.googlesource.com/25271 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-08-10 17:41:01 +00:00
Keith Randall	2cbdd55d64	[dev.ssa] cmd/compile: fix PIC for SSA-generated code Access to globals requires a 2-instruction sequence on PIC 386. MOVL foo(SB), AX is translated by the obj package into: CALL getPCofNextInstructionInTempRegister(SB) MOVL (&foo-&thisInstruction)(tmpReg), AX The call returns the PC of the next instruction in a register. The next instruction then offsets from that register to get the address required. The tricky part is the allocation of the temp register. The legacy compiler always used CX, and forbid the register allocator from allocating CX when in PIC mode. We can't easily do that in SSA because CX is actually a required register for shift instructions. (I think the old backend got away with this because the register allocator never uses CX, only codegen knows that shifts must use CX.) Instead, we allow the temp register to be anything. When the destination of the MOV (or LEA) is an integer register, we can use that register. Otherwise, we make sure to compile the operation using an LEA to reference the global. So MOVL AX, foo(SB) is never generated directly. Instead, SSA generates: LEAL foo(SB), DX MOVL AX, (DX) which is then rewritten by the obj package to: CALL getPcInDX(SB) LEAL (&foo-&thisInstruction)(DX), AX MOVL AX, (DX) So this CL modifies the obj package to use different thunks to materialize the pc into different registers. We use the registers that regalloc chose so that SSA can still allocate the full set of registers. Change-Id: Ie095644f7164a026c62e95baf9d18a8bcaed0bba Reviewed-on: https://go-review.googlesource.com/25442 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-09 15:50:07 +00:00
Keith Randall	69a755b602	[dev.ssa] cmd/compile: port SSA backend to amd64p32 It's not a new backend, just a PtrSize==4 modification of the existing AMD64 backend. Change-Id: Icc63521a5cf4ebb379f7430ef3f070894c09afda Reviewed-on: https://go-review.googlesource.com/25586 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-09 15:48:26 +00:00
Cherry Zhang	0484052358	[dev.ssa] cmd/compile: remove flags from regMask Reg allocator skips flag-typed values. Flag allocator uses the type and whether the op has "clobberFlags" set. Tested on AMD64, ARM, ARM64, 386. Passed 'toolstash -cmp' on AMD64. PPC64 is coded blindly. Change-Id: Ib1cc27efecef6a1bb27f7d7ed035a582660d244f Reviewed-on: https://go-review.googlesource.com/25480 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-07 03:08:03 +00:00
David Chase	dd1d9b36c6	[dev.ssa] cmd/compile: PPC64, add cmp->bool, some shifts, hmul Includes hmul (all widths) compare for boolean result and simplifications shift operations plus changes/additions for implementation (ORN, ADDME, ADDC) Also fixed a backwards-operand CMP. Change-Id: Id723c4e25125c38e0d9ab9ec9448176b75f4cdb4 Reviewed-on: https://go-review.googlesource.com/25410 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-08-04 18:17:52 +00:00
Cherry Zhang	83208504fe	[dev.ssa] cmd/compile: add more on ARM64 SSA Support the following: - Shifts. ARM64 machine instructions only use lowest 6 bits of the shift (i.e. mod 64). Use conditional selection instruction to ensure Go semantics. - Zero/Move. Alignment is ensured. - Hmul, Avg64u, Sqrt. - reserve R18 (platform register in ARM64 ABI) and R29 (frame pointer in ARM64 ABI). Everything compiles, all.bash passed (with non-SSA test disabled). Change-Id: Ia8ed58dae5cbc001946f0b889357b258655078b1 Reviewed-on: https://go-review.googlesource.com/25290 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-07-27 16:37:23 +00:00
Keith Randall	25e0a367da	[dev.ssa] cmd/compile: clean up tuple types and selects Make tuple types and their SelectX ops fully generic. These ops no longer need to be lowered. Regalloc understands them and their tuple-generating arguments. We can now have opcodes returning arbitrary pairs of results. (And it would be easy to move to >2 results if needed.) Update arm implementation to the new standard. Implement just enough in 386 port to do 64-bit add. Change-Id: I370ed5aacce219c82e1954c61d1f63af76c16f79 Reviewed-on: https://go-review.googlesource.com/24976 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-07-18 16:11:36 +00:00
Cherry Zhang	6b6de15d32	[dev.ssa] cmd/compile: support NaCl in SSA for ARM NaCl code runs in sandbox and there are restrictions for its instruction uses (https://developer.chrome.com/native-client/reference/sandbox_internals/arm-32-bit-sandbox). Like the legacy backend, on NaCl, - don't use R9, which is used as NaCl's "thread pointer". - don't use Duff's device. - don't use indexed load/stores. - the assembler rewrites DIV/MOD to runtime calls, which on NaCl clobbers R12, so R12 is marked as clobbered for DIV/MOD. - other restrictions are satisfied by the assembler. Enable SSA specific tests on nacl/arm, and disable non-SSA ones. Updates #15365. Change-Id: I9262693ec6756b89ca29d3ae4e52a96fe5403b02 Reviewed-on: https://go-review.googlesource.com/24859 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-07-16 03:13:45 +00:00
Cherry Zhang	7bd88a651d	[dev.ssa] cmd/compile: don't sink spills that satisfy merge edges in SSA If a spill is used to satisfy a merge edge (in shuffle), don't sink it out of loop. This is found in the following code (on ARM) where there is a stack Phi (v268) inside a loop (b36 -> ... -> b47 -> b38 -> b36). (before shuffle) b36: <- b34 b38 ... v268 = Phi <int> v410 v360 : autotmp_198[int] ... ... -> b47 b47: <- b44 ... v360 = ... : R6 v230 = StoreReg <int> v360 : autotmp_198[int] v261 = CMPconst <flags> [0] v360 EQ v261 -> b49 b38 (unlikely) b38: <- b47 ... Plain -> b36 During shuffle, v230 (as spill of v360) is found to satisfy v268, but it didn't record its use in shuffle, and v230 is sunk out of the loop (to b49), which leads to bad value in v268. This seems never happened on AMD64 (in make.bash), until 4 registers are removed. Change-Id: I01dfc28ae461e853b36977c58bcfc0669e556660 Reviewed-on: https://go-review.googlesource.com/24858 Reviewed-by: David Chase <drchase@google.com>	2016-07-15 18:20:17 +00:00
Josh Bleecher Snyder	a2beee000b	[dev.ssa] cmd/compile: improve special register error checking Provide better diagnostic messages. Use an int for numRegs comparisons, to avoid asking whether a uint8 is > 255. Change-Id: I33ae193ce292b24b369865abda3902c3207d7d3f Reviewed-on: https://go-review.googlesource.com/24135 Reviewed-by: Keith Randall <khr@golang.org>	2016-06-16 14:34:01 +00:00
Cherry Zhang	93b8aab5c9	[dev.ssa] cmd/compile: handle GetG on ARM Use hardware g register (R10) for GetG, allow g to appear at LHS of some ops. Progress on SSA backend for ARM. Now everything compiles and runs. Updates #15365. Change-Id: Icdf93585579faa86cc29b1e17ab7c90f0119fc4e Reviewed-on: https://go-review.googlesource.com/23952 Reviewed-by: David Chase <drchase@google.com>	2016-06-15 15:36:35 +00:00
Cherry Zhang	fa54bf16e0	[dev.ssa] cmd/compile: fix a few bugs for SSA for ARM - 64x signed right shift was wrong for shift larger than 0x80000000. - for Lsh-followed-by-Rsh, the intermediate value should be full int width, so when it is spilled MOVW should be used. - use RET for RetJmp, so the assembler can take case of restoring LR for non-leaf case. - reserve R9 in dynlink mode. R9 is used for GOT by the assembler. Progress on SSA backend for ARM. Still not complete. Updates #15365. Change-Id: I3caca256b92ff7cf96469da2feaf4868a592efc5 Reviewed-on: https://go-review.googlesource.com/23793 Reviewed-by: David Chase <drchase@google.com>	2016-06-08 20:37:31 +00:00
Cherry Zhang	90883091ff	[dev.ssa] cmd/compile: clean up hardcoded regmasks in ssa/regalloc.go Auto-generate register masks and load them through Config. Passed toolstash -cmp on AMD64. Tests phi_ssa.go and regalloc_ssa.go in cmd/compile/internal/gc/testdata passed on ARM. Updates #15365. Change-Id: I393924d68067f2dbb13dab82e569fb452c986593 Reviewed-on: https://go-review.googlesource.com/23292 Reviewed-by: David Chase <drchase@google.com>	2016-06-02 13:01:44 +00:00
David Chase	31e13c83c2	[dev.ssa] Merge branch 'master' into dev.ssa Change-Id: Iabc80b6e0734efbd234d998271e110d2eaad41dd	2016-05-27 15:19:33 -04:00
Russ Cox	7fdec6216c	build: enable framepointer mode by default This has a minor performance cost, but far less than is being gained by SSA. As an experiment, enable it during the Go 1.7 beta. Having frame pointers on by default makes Linux's perf, Intel VTune, and other profilers much more useful, because it lets them gather a stack trace efficiently on profiling events. (It doesn't help us that much, since when we walk the stack we usually need to look up PC-specific information as well.) Fixes #15840. Change-Id: I4efd38412a0de4a9c87b1b6e5d11c301e63f1a2a Reviewed-on: https://go-review.googlesource.com/23451 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-26 19:02:00 +00:00
Cherry Zhang	ccaed50c7b	[dev.ssa] cmd/compile: handle boolean values for SSA on ARM Fix hardcoded flag register mask in ssa/flagalloc.go by auto-generating the mask. Also fix a mistake (in previous CL) about conditional branches. Progress on SSA backend for ARM. Still not complete. Now "container/ring" package compiles and tests passed. Updates #15365. Change-Id: Id7c8805c30dbb8107baedb485ed0f71f59ed6ea8 Reviewed-on: https://go-review.googlesource.com/23093 Reviewed-by: Keith Randall <khr@golang.org>	2016-05-19 02:48:36 +00:00
Keith Randall	075880a8e8	cmd/compile: fix build Run live vars test only on ssa builds. We can't just drop KeepAlive ops during regalloc. We need to replace them with copies. Change-Id: Ib4b3b1381415db88fdc2165fc0a9541b73ad9759 Reviewed-on: https://go-review.googlesource.com/23225 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-05-18 20:44:00 +00:00
Keith Randall	3572c6418b	cmd/compile: keep pointer input arguments live throughout function Introduce a KeepAlive op which makes sure that its argument is kept live until the KeepAlive. Use KeepAlive to mark pointer input arguments as live after each function call and at each return. We do this change only for pointer arguments. Those are the critical ones to handle because they might have finalizers. Doing compound arguments (slices, structs, ...) is more complicated because we would need to track field liveness individually (we do that for auto variables now, but inputs requires extra trickery). Turn off the automatic marking of args as live. That way, when args are explicitly nulled, plive will know that the original argument is dead. The KeepAlive op will be the eventual implementation of runtime.KeepAlive. Fixes #15277 Change-Id: I5f223e65d99c9f8342c03fbb1512c4d363e903e5 Reviewed-on: https://go-review.googlesource.com/22365 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2016-05-18 19:25:27 +00:00
David Chase	6b99fb5bea	cmd/compile: use sparse algorithm for phis in large program This adds a sparse method for locating nearest ancestors in a dominator tree, and checks blocks with more than one predecessor for differences and inserts phi functions where there are. Uses reversed post order to cut number of passes, running it from first def to last use ("last use" for paramout and mem is end-of-program; last use for a phi input from a backedge is the source of the back edge) Includes a cutover from old algorithm to new to avoid paying large constant factor for small programs. This keeps normal builds running at about the same time, while not running over-long on large machine-generated inputs. Add "phase" flags for ssa/build -- ssa/build/stats prints number of blocks, values (before and after linking references and inserting phis, so expansion can be measured), and their product; the product governs the cutover, where a good value seems to be somewhere between 1 and 5 million. Among the files compiled by make.bash, this is the shape of the tail of the distribution for #blocks, #vars, and their product: #blocks #vars product max 6171 28180 173,898,780 99.9% 1641 6548 10,401,878 99% 463 1909 873,721 95% 152 639 95,235 90% 84 359 30,021 The old algorithm is indeed usually fastest, for 99%ile values of usually. The fix to LookupVarOutgoing ( https://go-review.googlesource.com/#/c/22790/ ) deals with some of the same problems addressed by this CL, but on at least one bug ( #15537 ) this change is still a significant help. With this CL: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 4m35.200s user 13m16.644s sys 0m36.712s and pprof reports 3.4GB allocated in one of the larger profiles With tip: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 10m36.569s user 25m52.286s sys 4m3.696s and pprof reports 8.3GB allocated in the same larger profile With this CL, most of the compilation time on the benchmarked input is spent in register/stack allocation (cumulative 53%) and in the sparse lookup algorithm itself (cumulative 20%). Fixes #15537. Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00 Reviewed-on: https://go-review.googlesource.com/22342 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-16 21:08:05 +00:00
David Chase	3c09001917	cmd/compile: correct sparseSet probes in regalloc to avoid index error In regalloc, a sparse map is preallocated for later use by spill-in-loop sinking. However, variables (spills) are added during register allocation before spill sinking, and a map query involving any of these new variables will index out of bounds in the map. To fix: 1) fix the queries to use s.orig[v.ID].ID instead, to ensure proper indexing. Note that s.orig will be nil for values that are not eligible for spilling (like memory and flags). 2) add a test. Fixes #15585. Change-Id: I8f2caa93b132a0f2a9161d2178320d5550583075 Reviewed-on: https://go-review.googlesource.com/22911 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-09 18:35:44 +00:00
Keith Randall	4fa050024f	cmd/compile: enable constant-time CFG editing Provide indexes along with block pointers for Preds and Succs arrays. This allows us to splice edges in and out of those arrays in constant time. Fixes worst-case O(n^2) behavior in deadcode and fuse. benchmark old ns/op new ns/op delta BenchmarkFuse1-8 2065 2057 -0.39% BenchmarkFuse10-8 9408 9073 -3.56% BenchmarkFuse100-8 105238 76277 -27.52% BenchmarkFuse1000-8 3982562 1026750 -74.22% BenchmarkFuse10000-8 301220329 12824005 -95.74% BenchmarkDeadCode1-8 1588 1566 -1.39% BenchmarkDeadCode10-8 4333 4250 -1.92% BenchmarkDeadCode100-8 32031 32574 +1.70% BenchmarkDeadCode1000-8 590407 468275 -20.69% BenchmarkDeadCode10000-8 17822890 5000818 -71.94% BenchmarkDeadCode100000-8 1388706640 78021127 -94.38% BenchmarkDeadCode200000-8 5372518479 168598762 -96.86% Change-Id: Iccabdbb9343fd1c921ba07bbf673330a1c36ee17 Reviewed-on: https://go-review.googlesource.com/22589 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-05 15:58:59 +00:00
Keith Randall	ade0eb2f06	cmd/compile: fix reslice := is the wrong thing here. The new variable masks the old variable so we allocate the slice afresh each time around the loop. Change-Id: I759c30e1bfa88f40decca6dd7d1e051e14ca0844 Reviewed-on: https://go-review.googlesource.com/22679 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Minux Ma <minux@golang.org>	2016-05-01 22:11:45 +00:00
Keith Randall	8ad8d7d87e	cmd/compile: Use pre-regalloc value ID in lateSpillUse The cached copy's ID is sometimes outside the bounds of the orig array. There's no reason to start at the cached copy and work backwards to the original value. We already have the original value ID at all the callsites. Fixes noopt build Change-Id: I313508a1917e838a87e8cc83b2ef3c2e4a8db304 Reviewed-on: https://go-review.googlesource.com/22355 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-21 21:25:50 +00:00
Keith Randall	b57ac33331	cmd/compile: forward-looking desired register biasing Improve forward-looking desired register calculations. It is now inter-block and handles a bunch more cases. Fixes #14504 Fixes #14828 Fixes #15254 Change-Id: Ic240fa0ec6a779d80f577f55c8a6c4ac8c1a940a Reviewed-on: https://go-review.googlesource.com/22160 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-04-20 15:31:42 +00:00

1 2 3

113 Commits