cybercyst/go - go - Gitea: Git with a cup of tea

mirror of https://github.com/golang/go.git synced 2025-05-24 00:41:24 +00:00

Author	SHA1	Message	Date
Josh Bleecher Snyder	3d6647d6f8	cmd/compile: improve regalloc live values debug printing Before: live values at end of each block b1: v3 v2 v7 avoid=0 b2: v3 v13 avoid=81 b3: v19[AX] v3 avoid=81 b6: avoid=0 b7: avoid=0 b5: avoid=0 b4: v3 v18 avoid=81 After: live values at end of each block b1: v3 v2 v7 b2: v3 v13 avoid=AX DI b3: v19[AX] v3 avoid=AX DI b6: b7: b5: b4: v3 v18 avoid=AX DI Change-Id: Ibec5c76a16151832b8d49a21c640699fdc9a9d28 Reviewed-on: https://go-review.googlesource.com/109000 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-24 17:45:48 +00:00
Austin Clements	8871c930be	cmd/compile: don't lower OpConvert Currently, each architecture lowers OpConvert to an arch-specific OpXXXconvert. This is silly because OpConvert means the same thing on all architectures and is logically a no-op that exists only to keep track of conversions to and from unsafe.Pointer. Furthermore, lowering it makes it harder to recognize in other analyses, particularly liveness analysis. This CL eliminates the lowering of OpConvert, leaving it as the generic op until code generation time. The main complexity here is that we still need to register-allocate OpConvert operations. Currently, each arch's lowered OpConvert specifies all GP registers in its register mask. Ideally, OpConvert wouldn't affect value homing at all, and we could just copy the home of OpConvert's source, but this can potentially home an OpConvert in a LocalSlot, which neither regalloc nor stackalloc expect. Rather than try to disentangle this assumption from regalloc and stackalloc, we continue to register-allocate OpConvert, but teach regalloc that OpConvert can be allocated to any allocatable GP register. For #24543. Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6 Reviewed-on: https://go-review.googlesource.com/108496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-04-20 18:46:39 +00:00
David Chase	b9a365681f	cmd/compile: adjust is-statement on Pos's to improve debugging Stores to auto tmp variables can be hoisted to places where the line numbers make debugging look "jumpy". Turning those instructions into ones with is_stmt = 0 in the DWARF (accomplished by marking ssa nodes with NotStmt) makes debugging look better while still attributing the instructions with the correct line number. The same is true for certain register allocator spills and reloads. Change-Id: I97a394eb522d4911cc40b4bf5bf76d3d7221f6c0 Reviewed-on: https://go-review.googlesource.com/98415 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-04 22:14:59 +00:00
Josh Bleecher Snyder	014a9048d4	cmd/compile: prefer to evict a rematerializable register This resolves a long-standing regalloc TODO: If you must evict a register, choose to evict a register containing a rematerializable value, since that value won't need to be spilled. Provides very minor performance and size improvements. name old time/op new time/op delta BinaryTree17-8 2.20s ± 3% 2.18s ± 2% -0.77% (p=0.000 n=45+49) Fannkuch11-8 2.14s ± 2% 2.15s ± 2% +0.73% (p=0.000 n=43+44) FmtFprintfEmpty-8 30.6ns ± 4% 30.2ns ± 3% -1.14% (p=0.000 n=50+48) FmtFprintfString-8 54.5ns ± 6% 53.6ns ± 5% -1.64% (p=0.001 n=50+48) FmtFprintfInt-8 58.0ns ± 7% 57.6ns ± 4% ~ (p=0.220 n=50+50) FmtFprintfIntInt-8 85.3ns ± 2% 84.8ns ± 3% -0.62% (p=0.001 n=44+47) FmtFprintfPrefixedInt-8 93.9ns ± 6% 93.6ns ± 5% ~ (p=0.706 n=50+48) FmtFprintfFloat-8 178ns ± 4% 177ns ± 4% ~ (p=0.107 n=49+50) FmtManyArgs-8 376ns ± 4% 374ns ± 3% -0.58% (p=0.013 n=45+50) GobDecode-8 4.77ms ± 2% 4.76ms ± 3% ~ (p=0.059 n=47+46) GobEncode-8 4.04ms ± 2% 3.99ms ± 3% -1.13% (p=0.000 n=49+49) Gzip-8 177ms ± 2% 180ms ± 3% +1.43% (p=0.000 n=48+48) Gunzip-8 28.5ms ± 6% 28.3ms ± 5% ~ (p=0.104 n=50+49) HTTPClientServer-8 72.1µs ± 1% 72.0µs ± 1% -0.15% (p=0.042 n=48+42) JSONEncode-8 9.81ms ± 5% 10.03ms ± 6% +2.29% (p=0.000 n=50+49) JSONDecode-8 39.2ms ± 3% 39.3ms ± 2% ~ (p=0.095 n=49+49) Mandelbrot200-8 3.48ms ± 2% 3.46ms ± 2% -0.80% (p=0.000 n=47+48) GoParse-8 2.54ms ± 3% 2.51ms ± 3% -1.35% (p=0.000 n=49+49) RegexpMatchEasy0_32-8 66.0ns ± 7% 65.7ns ± 8% ~ (p=0.331 n=50+50) RegexpMatchEasy0_1K-8 155ns ± 4% 154ns ± 4% ~ (p=0.986 n=49+50) RegexpMatchEasy1_32-8 62.6ns ± 8% 62.2ns ± 5% ~ (p=0.395 n=50+49) RegexpMatchEasy1_1K-8 260ns ± 5% 255ns ± 3% -1.92% (p=0.000 n=49+49) RegexpMatchMedium_32-8 92.9ns ± 2% 91.8ns ± 2% -1.25% (p=0.000 n=46+48) RegexpMatchMedium_1K-8 27.7µs ± 3% 27.0µs ± 2% -2.59% (p=0.000 n=49+49) RegexpMatchHard_32-8 1.23µs ± 4% 1.21µs ± 2% -2.16% (p=0.000 n=49+44) RegexpMatchHard_1K-8 36.4µs ± 2% 35.7µs ± 2% -1.87% (p=0.000 n=48+49) Revcomp-8 274ms ± 2% 276ms ± 3% +0.70% (p=0.034 n=45+48) Template-8 45.1ms ± 8% 45.1ms ± 8% ~ (p=0.643 n=50+50) TimeParse-8 223ns ± 2% 223ns ± 2% ~ (p=0.401 n=47+47) TimeFormat-8 245ns ± 2% 246ns ± 3% ~ (p=0.758 n=49+50) [Geo mean] 36.5µs 36.3µs -0.54% name old object-bytes new object-bytes delta Template 480kB ± 0% 480kB ± 0% ~ (all equal) Unicode 214kB ± 0% 214kB ± 0% ~ (all equal) GoTypes 1.54MB ± 0% 1.54MB ± 0% -0.03% (p=0.008 n=5+5) Compiler 5.75MB ± 0% 5.75MB ± 0% ~ (all equal) SSA 14.6MB ± 0% 14.6MB ± 0% -0.01% (p=0.008 n=5+5) Flate 300kB ± 0% 300kB ± 0% -0.01% (p=0.008 n=5+5) GoParser 366kB ± 0% 366kB ± 0% ~ (all equal) Reflect 1.20MB ± 0% 1.20MB ± 0% ~ (all equal) Tar 413kB ± 0% 413kB ± 0% ~ (all equal) XML 529kB ± 0% 528kB ± 0% -0.13% (p=0.008 n=5+5) [Geo mean] 909kB 909kB -0.02% Change-Id: I46d37a55197683a98913f35801dc2b0d609653c8 Reviewed-on: https://go-review.googlesource.com/103240 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-03-29 18:47:18 +00:00
Josh Bleecher Snyder	dafca7de0f	cmd/compile: prefer rematerialization to copying Fixes #24132 name old time/op new time/op delta BinaryTree17-8 2.18s ± 2% 2.15s ± 2% -1.28% (p=0.000 n=25+26) Fannkuch11-8 2.16s ± 3% 2.13s ± 3% -1.54% (p=0.000 n=27+30) FmtFprintfEmpty-8 29.9ns ± 3% 29.6ns ± 3% -1.08% (p=0.001 n=29+26) FmtFprintfString-8 53.6ns ± 2% 54.0ns ± 4% ~ (p=0.193 n=28+29) FmtFprintfInt-8 56.8ns ± 3% 57.0ns ± 3% ~ (p=0.330 n=29+29) FmtFprintfIntInt-8 85.3ns ± 2% 85.8ns ± 3% +0.56% (p=0.042 n=30+29) FmtFprintfPrefixedInt-8 94.1ns ± 5% 99.0ns ± 8% +5.20% (p=0.000 n=27+30) FmtFprintfFloat-8 183ns ± 4% 182ns ± 3% ~ (p=0.619 n=30+26) FmtManyArgs-8 369ns ± 2% 369ns ± 2% ~ (p=0.748 n=27+29) GobDecode-8 4.78ms ± 2% 4.75ms ± 1% ~ (p=0.051 n=28+27) GobEncode-8 4.06ms ± 3% 4.07ms ± 3% ~ (p=0.781 n=29+30) Gzip-8 178ms ± 2% 177ms ± 2% ~ (p=0.171 n=29+30) Gunzip-8 28.2ms ± 7% 28.0ms ± 4% ~ (p=0.155 n=30+30) HTTPClientServer-8 71.5µs ± 3% 71.3µs ± 1% ~ (p=0.913 n=25+27) JSONEncode-8 9.71ms ± 5% 9.86ms ± 4% +1.55% (p=0.015 n=28+30) JSONDecode-8 38.8ms ± 2% 39.3ms ± 2% +1.41% (p=0.000 n=28+29) Mandelbrot200-8 3.47ms ± 6% 3.44ms ± 3% ~ (p=0.183 n=28+28) GoParse-8 2.55ms ± 2% 2.54ms ± 3% -0.58% (p=0.003 n=27+29) RegexpMatchEasy0_32-8 66.0ns ± 5% 65.3ns ± 4% ~ (p=0.124 n=30+30) RegexpMatchEasy0_1K-8 152ns ± 2% 152ns ± 3% ~ (p=0.881 n=30+30) RegexpMatchEasy1_32-8 62.9ns ± 9% 62.7ns ± 7% ~ (p=0.717 n=30+30) RegexpMatchEasy1_1K-8 263ns ± 3% 263ns ± 4% ~ (p=0.909 n=30+29) RegexpMatchMedium_32-8 93.4ns ± 3% 89.3ns ± 2% -4.32% (p=0.000 n=29+29) RegexpMatchMedium_1K-8 27.5µs ± 3% 27.1µs ± 2% -1.46% (p=0.000 n=30+27) RegexpMatchHard_32-8 1.33µs ± 3% 1.31µs ± 3% -1.50% (p=0.000 n=27+28) RegexpMatchHard_1K-8 39.4µs ± 2% 39.1µs ± 2% -0.54% (p=0.027 n=28+28) Revcomp-8 274ms ± 4% 276ms ± 2% +0.67% (p=0.048 n=29+28) Template-8 45.1ms ± 5% 44.6ms ± 7% -1.22% (p=0.029 n=30+29) TimeParse-8 227ns ± 3% 224ns ± 3% -1.25% (p=0.000 n=28+27) TimeFormat-8 248ns ± 3% 245ns ± 3% -1.33% (p=0.002 n=30+29) [Geo mean] 36.6µs 36.5µs -0.32% Change-Id: I24083f0013506b77e2d9da99c40ae2f67803285e Reviewed-on: https://go-review.googlesource.com/101076 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-28 18:43:49 +00:00
Alberto Donizetti	377a2cb2d2	cmd/compile: reduce allocations in regAllocState.regalloc name old time/op new time/op delta Template 281ms ± 2% 282ms ± 3% ~ (p=0.428 n=19+20) Unicode 138ms ± 6% 138ms ± 7% ~ (p=0.813 n=19+20) GoTypes 901ms ± 2% 895ms ± 2% ~ (p=0.050 n=19+20) Compiler 4.25s ± 1% 4.23s ± 1% -0.31% (p=0.031 n=19+18) SSA 9.77s ± 1% 9.78s ± 1% ~ (p=0.512 n=20+20) Flate 187ms ± 3% 187ms ± 4% ~ (p=0.687 n=20+19) GoParser 224ms ± 4% 222ms ± 3% ~ (p=0.301 n=20+20) Reflect 576ms ± 2% 576ms ± 2% ~ (p=0.620 n=20+20) Tar 262ms ± 3% 263ms ± 3% ~ (p=0.599 n=19+18) XML 322ms ± 4% 322ms ± 2% ~ (p=0.512 n=20+20) name old user-time/op new user-time/op delta Template 403ms ± 3% 399ms ± 5% ~ (p=0.149 n=17+20) Unicode 217ms ±12% 217ms ± 9% ~ (p=0.883 n=20+20) GoTypes 1.24s ± 3% 1.24s ± 3% ~ (p=0.718 n=20+20) Compiler 5.90s ± 3% 5.84s ± 5% ~ (p=0.217 n=18+20) SSA 14.0s ± 6% 14.1s ± 5% ~ (p=0.235 n=19+20) Flate 253ms ± 6% 254ms ± 5% ~ (p=0.749 n=20+19) GoParser 309ms ± 7% 307ms ± 5% ~ (p=0.398 n=20+20) Reflect 772ms ± 3% 771ms ± 3% ~ (p=0.901 n=20+19) Tar 368ms ± 5% 369ms ± 8% ~ (p=0.429 n=20+20) XML 435ms ± 5% 434ms ± 5% ~ (p=0.841 n=20+20) name old alloc/op new alloc/op delta Template 39.0MB ± 0% 38.9MB ± 0% -0.21% (p=0.000 n=20+19) Unicode 29.0MB ± 0% 29.0MB ± 0% -0.03% (p=0.000 n=20+20) GoTypes 116MB ± 0% 115MB ± 0% -0.33% (p=0.000 n=20+20) Compiler 498MB ± 0% 496MB ± 0% -0.37% (p=0.000 n=19+20) SSA 1.41GB ± 0% 1.40GB ± 0% -0.24% (p=0.000 n=20+20) Flate 25.0MB ± 0% 25.0MB ± 0% -0.22% (p=0.000 n=20+19) GoParser 31.0MB ± 0% 30.9MB ± 0% -0.23% (p=0.000 n=20+17) Reflect 77.1MB ± 0% 77.0MB ± 0% -0.12% (p=0.000 n=20+20) Tar 39.7MB ± 0% 39.6MB ± 0% -0.17% (p=0.000 n=20+20) XML 44.9MB ± 0% 44.8MB ± 0% -0.29% (p=0.000 n=20+20) name old allocs/op new allocs/op delta Template 386k ± 0% 385k ± 0% -0.28% (p=0.000 n=20+20) Unicode 337k ± 0% 336k ± 0% -0.07% (p=0.000 n=20+20) GoTypes 1.20M ± 0% 1.20M ± 0% -0.41% (p=0.000 n=20+20) Compiler 4.71M ± 0% 4.68M ± 0% -0.52% (p=0.000 n=20+20) SSA 11.7M ± 0% 11.6M ± 0% -0.31% (p=0.000 n=20+19) Flate 238k ± 0% 237k ± 0% -0.28% (p=0.000 n=18+20) GoParser 320k ± 0% 319k ± 0% -0.34% (p=0.000 n=20+19) Reflect 961k ± 0% 959k ± 0% -0.12% (p=0.000 n=20+20) Tar 397k ± 0% 396k ± 0% -0.23% (p=0.000 n=20+20) XML 419k ± 0% 417k ± 0% -0.39% (p=0.000 n=20+19) Change-Id: Ic7ec3614808d9892c1cab3991b996b7a3b8eff21 Reviewed-on: https://go-review.googlesource.com/102676 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-27 18:03:39 +00:00
Alberto Donizetti	2ba98f1ae9	cmd/compile: avoid some allocations in regalloc Compilebench: name old time/op new time/op delta Template 283ms ± 3% 281ms ± 4% ~ (p=0.242 n=20+20) Unicode 137ms ± 6% 135ms ± 6% ~ (p=0.194 n=20+19) GoTypes 890ms ± 2% 883ms ± 1% -0.74% (p=0.001 n=19+19) Compiler 4.21s ± 2% 4.20s ± 2% -0.40% (p=0.033 n=20+19) SSA 9.86s ± 2% 9.68s ± 1% -1.80% (p=0.000 n=20+19) Flate 185ms ± 5% 185ms ± 7% ~ (p=0.429 n=20+20) GoParser 222ms ± 3% 222ms ± 4% ~ (p=0.588 n=19+20) Reflect 572ms ± 2% 570ms ± 3% ~ (p=0.113 n=19+20) Tar 263ms ± 4% 259ms ± 2% -1.41% (p=0.013 n=20+20) XML 321ms ± 2% 321ms ± 4% ~ (p=0.835 n=20+19) name old user-time/op new user-time/op delta Template 400ms ± 5% 405ms ± 5% ~ (p=0.096 n=20+20) Unicode 217ms ± 8% 213ms ± 8% ~ (p=0.242 n=20+20) GoTypes 1.23s ± 3% 1.22s ± 3% ~ (p=0.923 n=19+20) Compiler 5.76s ± 6% 5.81s ± 2% ~ (p=0.687 n=20+19) SSA 14.2s ± 4% 14.0s ± 4% ~ (p=0.121 n=20+20) Flate 248ms ± 7% 251ms ±10% ~ (p=0.369 n=20+20) GoParser 308ms ± 5% 305ms ± 6% ~ (p=0.336 n=19+20) Reflect 771ms ± 2% 766ms ± 2% ~ (p=0.113 n=20+19) Tar 370ms ± 5% 362ms ± 7% -2.06% (p=0.036 n=19+20) XML 435ms ± 4% 432ms ± 5% ~ (p=0.369 n=20+20) name old alloc/op new alloc/op delta Template 39.5MB ± 0% 39.4MB ± 0% -0.20% (p=0.000 n=20+20) Unicode 29.1MB ± 0% 29.1MB ± 0% ~ (p=0.064 n=20+20) GoTypes 117MB ± 0% 117MB ± 0% -0.17% (p=0.000 n=20+20) Compiler 503MB ± 0% 502MB ± 0% -0.15% (p=0.000 n=19+19) SSA 1.42GB ± 0% 1.42GB ± 0% -0.16% (p=0.000 n=20+20) Flate 25.3MB ± 0% 25.3MB ± 0% -0.19% (p=0.000 n=20+20) GoParser 31.4MB ± 0% 31.3MB ± 0% -0.14% (p=0.000 n=20+18) Reflect 78.1MB ± 0% 77.9MB ± 0% -0.34% (p=0.000 n=20+19) Tar 40.1MB ± 0% 40.0MB ± 0% -0.17% (p=0.000 n=20+20) XML 45.3MB ± 0% 45.2MB ± 0% -0.13% (p=0.000 n=20+20) name old allocs/op new allocs/op delta Template 393k ± 0% 392k ± 0% -0.21% (p=0.000 n=20+19) Unicode 337k ± 0% 337k ± 0% -0.02% (p=0.000 n=20+20) GoTypes 1.22M ± 0% 1.22M ± 0% -0.21% (p=0.000 n=20+20) Compiler 4.77M ± 0% 4.76M ± 0% -0.16% (p=0.000 n=20+20) SSA 11.8M ± 0% 11.8M ± 0% -0.12% (p=0.000 n=20+20) Flate 242k ± 0% 241k ± 0% -0.20% (p=0.000 n=20+20) GoParser 324k ± 0% 324k ± 0% -0.14% (p=0.000 n=20+20) Reflect 985k ± 0% 981k ± 0% -0.38% (p=0.000 n=20+20) Tar 403k ± 0% 402k ± 0% -0.19% (p=0.000 n=20+20) XML 424k ± 0% 424k ± 0% -0.16% (p=0.000 n=19+20) Change-Id: I131e382b64cd6db11a9263a477d45d80c180c499 Reviewed-on: https://go-review.googlesource.com/102421 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-03-25 18:27:49 +00:00
Ilya Tocar	983dcf70ba	cmd/compile/internal/ssa: update regalloc in loops Currently we don't lift spill out of loop if loop contains call. However often we have code like this: for .. { if hard_case { call() } // simple case, without call } So instead of checking for any call, check for unavoidable call. For #22698 cases I see: mime/quotedprintable/Writer-6 10.9µs ± 4% 9.2µs ± 3% -15.02% (p=0.000 n=8+8) And: compress/flate/Encode/Twain/Huffman/1e4-6 99.4µs ± 6% 90.9µs ± 0% -8.57% (p=0.000 n=8+8) compress/flate/Encode/Twain/Huffman/1e5-6 760µs ± 1% 725µs ± 1% -4.56% (p=0.000 n=8+8) compress/flate/Encode/Twain/Huffman/1e6-6 7.55ms ± 0% 7.24ms ± 0% -4.07% (p=0.000 n=8+7) There are no significant changes on go1 benchmarks. But for cases with runtime arch checks, where we call generic version on old hardware, there are respectable performance gains: math/RoundToEven-6 1.43ns ± 0% 1.25ns ± 0% -12.59% (p=0.001 n=7+7) math/bits/OnesCount64-6 1.60ns ± 1% 1.42ns ± 1% -11.32% (p=0.000 n=8+8) Also on some runtime benchmarks loops have less loads and higher performance: runtime/RuneIterate/range1/ASCII-6 15.6ns ± 1% 13.9ns ± 1% -10.74% (p=0.000 n=7+8) runtime/ArrayEqual-6 3.22ns ± 0% 2.86ns ± 2% -11.06% (p=0.000 n=7+8) Fixes #22698 Updates #22234 Change-Id: I0ae2f19787d07a9026f064366dedbe601bf7257a Reviewed-on: https://go-review.googlesource.com/84055 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-03-20 21:02:39 +00:00
Daniel Martí	cd2cb6e3f5	cmd/compile: cache sparse maps across ssa passes This is done for sparse sets already, but it was missing for sparse maps. Only affects deadstore and regalloc, as they're the only ones that use sparse maps. name old time/op new time/op delta DSEPass-4 247µs ± 0% 216µs ± 0% -12.75% (p=0.008 n=5+5) DSEPassBlock-4 3.05ms ± 1% 2.87ms ± 1% -6.02% (p=0.002 n=6+6) CSEPass-4 2.30ms ± 0% 2.32ms ± 0% +0.53% (p=0.026 n=6+6) CSEPassBlock-4 23.8ms ± 0% 23.8ms ± 0% ~ (p=0.931 n=6+5) DeadcodePass-4 51.7µs ± 1% 51.5µs ± 2% ~ (p=0.429 n=5+6) DeadcodePassBlock-4 734µs ± 1% 742µs ± 3% ~ (p=0.394 n=6+6) MultiPass-4 152µs ± 0% 149µs ± 2% ~ (p=0.082 n=5+6) MultiPassBlock-4 2.67ms ± 1% 2.41ms ± 2% -9.77% (p=0.008 n=5+5) name old alloc/op new alloc/op delta DSEPass-4 41.2kB ± 0% 0.1kB ± 0% -99.68% (p=0.002 n=6+6) DSEPassBlock-4 560kB ± 0% 4kB ± 0% -99.34% (p=0.026 n=5+6) CSEPass-4 189kB ± 0% 189kB ± 0% ~ (all equal) CSEPassBlock-4 3.10MB ± 0% 3.10MB ± 0% ~ (p=0.444 n=5+5) DeadcodePass-4 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) DeadcodePassBlock-4 164kB ± 0% 164kB ± 0% ~ (all equal) MultiPass-4 240kB ± 0% 199kB ± 0% -17.06% (p=0.002 n=6+6) MultiPassBlock-4 3.60MB ± 0% 2.99MB ± 0% -17.06% (p=0.002 n=6+6) name old allocs/op new allocs/op delta DSEPass-4 8.00 ± 0% 4.00 ± 0% -50.00% (p=0.002 n=6+6) DSEPassBlock-4 240 ± 0% 120 ± 0% -50.00% (p=0.002 n=6+6) CSEPass-4 9.00 ± 0% 9.00 ± 0% ~ (all equal) CSEPassBlock-4 1.35k ± 0% 1.35k ± 0% ~ (all equal) DeadcodePass-4 3.00 ± 0% 3.00 ± 0% ~ (all equal) DeadcodePassBlock-4 9.00 ± 0% 9.00 ± 0% ~ (all equal) MultiPass-4 11.0 ± 0% 10.0 ± 0% -9.09% (p=0.002 n=6+6) MultiPassBlock-4 165 ± 0% 150 ± 0% -9.09% (p=0.002 n=6+6) Change-Id: I43860687c88f33605eb1415f36473c5cfe8fde4a Reviewed-on: https://go-review.googlesource.com/98449 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-03-15 17:24:39 +00:00
David Chase	c18ff18465	cmd/compile: decouple emitted block order from regalloc block order While tinkering with different block orders for the preemptible loop experiment, crashed the register allocator with a "bad" one (these exist). Realized that one knob was controlling two things (register allocation and branch patterns) and decided that life would be simpler if the two orders were independent. Ran some experiments and determined that we have probably, mostly, been optimizing for register allocation effects, not branch effects. Bad block orders for register allocation are somewhat costly. This will also allow separate experimentation with perhaps- better block orders for register allocation. Change-Id: I6ecf2f24cca178b6f8acc0d3c4caaef043c11ed9 Reviewed-on: https://go-review.googlesource.com/47314 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-22 03:02:34 +00:00
Heschi Kreinick	2075a9323d	cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was very bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-02-14 18:29:19 +00:00
Keith Randall	0153a4130d	cmd/compile: fix runtime.KeepAlive KeepAlive needs to introduce a use of the spill of the value it is keeping alive. Without that, we don't guarantee that the spill dominates the KeepAlive. This bug was probably introduced with the code to move spills down to the dominator of the restores, instead of always spilling just after the value itself (CL 34822). Fixes #22458. Change-Id: I94955a21960448ffdacc4df775fe1213967b1d4c Reviewed-on: https://go-review.googlesource.com/74210 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>	2017-10-30 19:55:02 +00:00
David Chase	ca360c3992	cmd/compile: better XPos for rematerialized values and JMPs This attempts to choose better values for values that are rematerialized (uses the XPos of the consumer, not the original) and for unconditional branches (uses the last assigned XPos in the block). The JMP branches seem to sometimes end up with a PC in the destination block, I think because of register movement or rematerialization that gets placed in predecessor blocks. This may be acceptable because (eyeball-empirically) that is often the line number of the target block, so the line number flow is correct. Added proper test, that checks both -N -l and regular compilation. The test is also capable (for gdb, delve soon) of tracking variable printing based on comments in the source code. There's substantial room for improvement in debugger behavior. Updates #21098. Change-Id: I13abd48a39141583b85576a015f561065819afd0 Reviewed-on: https://go-review.googlesource.com/50610 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-07 22:12:36 +00:00
Daniel Martí	ded2c65db3	cmd/compile: simplify a few bits of the code Remove an unused type, a few redundant returns and replace a few slice append loops with a single append. Change-Id: If07248180bae5631b5b152c6051d9635889997d5 Reviewed-on: https://go-review.googlesource.com/66851 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dave Cheney <dave@cheney.net>	2017-09-28 20:40:17 +00:00
Lynn Boger	b74b43de68	cmd/compile: request r12 for indirect calls on ppc64le On ppc64le, functions compiled with -shared expect r12 to hold the function's address for indirect calls. Previously this was enforced by generating a move instruction if the address wasn't already in r12. This change avoids that extra move by requesting r12 in the CALL ops that do indirect calls. As a result of adding support for plugins on ppc64le, it was discovered that there would be more cases where this extra move was needed, so this seemed like a better solution. Updates #20756 Change-Id: I6770885a46990f78c6d2902a715dcdaa822192a1 Reviewed-on: https://go-review.googlesource.com/62890 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-09-11 18:02:33 +00:00
Daniel Martí	99da8730b0	all: remove some double spaces from comments Went mainly for the ones that make no sense, such as the ones mid-sentence or after commas. Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0 Reviewed-on: https://go-review.googlesource.com/57293 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-26 15:09:09 +00:00
Keith Randall	770d8d8207	cmd/compile: free value earlier in nilcheck When we remove a nil check, add it back to the free Value pool immediately. Fixes #18732 Change-Id: I8d644faabbfb52157d3f2d071150ff0342ac28dc Reviewed-on: https://go-review.googlesource.com/58810 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-08-25 06:01:26 +00:00
Keith Randall	bf4d8d3d05	cmd/compile: rename SSA Register.Name to Register.String Just to get rid of lots of .Name() stutter in printf calls. Change-Id: I86cf00b3f7b2172387a1c6a7f189c1897fab6300 Reviewed-on: https://go-review.googlesource.com/56630 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-08-17 21:53:08 +00:00
Heschi Kreinick	4c54a047c6	[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-07-27 20:19:44 +00:00
Heschi Kreinick	2d57d94ac3	[dev.debug] cmd/compile: track variable decomposition in LocalSlot When the compiler decomposes a user variable, track its origin so that it can be recomposed during DWARF generation. Change-Id: Ia71c7f8e7f4d65f0652f1c97b0dda5d9cad41936 Reviewed-on: https://go-review.googlesource.com/50878 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-07-26 18:39:39 +00:00
David Chase	00263a8968	cmd/compile: reduce debugger-worsening line number churn Reuse block head or preceding instruction's line number for register allocator's spill, fill, copy, rematerialization instructionsl; and also for phi, and for no-src-pos instructions. Assembler creates same line number tables for copy-predecessor-line and for no-src-pos, but copy-predecessor produces better-looking assembly language output with -S and with GOSSAFUNC, and does not require changes to tests of existing assembly language. Split "copyInto" into two cases, one for register allocation, one for otherwise. This caused the test score line change count to increase by one, which may reflect legitimately useful information preserved. Without any special treatment for copyInto, the change count increases by 21 more, from 51 to 72 (i.e., quite a lot). There is a test; using two naive "scores" for line number churn, the old numbering is 2x or 4x worse. Fixes #18902. Change-Id: I0a0a69659d30ee4e5d10116a0dd2b8c5df8457b1 Reviewed-on: https://go-review.googlesource.com/36207 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-10 17:16:44 +00:00
Josh Bleecher Snyder	46b88c9fbc	cmd/compile: change ssa.Type into types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of types.Type. The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-09 23:01:51 +00:00
Matthew Dempsky	1e3570ac86	cmd/internal/objabi: extract shared functionality from obj Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing the assembler backends no longer requires reinstalling cmd/link or cmd/addr2line. There's also now one canonical definition of the object file format in cmd/internal/objabi/doc.go, with a warning to update all three implementations. objabi is still something of a grab bag of unrelated code (e.g., flag and environment variable handling probably belong in a separate "tool" package), but this is still progress. Fixes #15165. Fixes #20026. Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c Reviewed-on: https://go-review.googlesource.com/40972 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-04-19 00:00:09 +00:00
Cherry Zhang	8a5175df35	cmd/compile: improve startRegs calculation In register allocation, we calculate what values are used in and after the current block. If a value is used only after a function call, since registers are clobbered in call, we don't need to mark the value live at the entrance of the block. Before this CL it is considered live, and unnecessary copy or load may be generated when resolving merge edge. Fixes #14761. On AMD64: name old time/op new time/op delta BinaryTree17-12 2.84s ± 1% 2.81s ± 1% -1.06% (p=0.000 n=10+9) Fannkuch11-12 3.61s ± 0% 3.55s ± 1% -1.77% (p=0.000 n=10+9) FmtFprintfEmpty-12 50.4ns ± 4% 50.0ns ± 1% ~ (p=0.785 n=9+8) FmtFprintfString-12 80.0ns ± 3% 78.2ns ± 3% -2.35% (p=0.004 n=10+9) FmtFprintfInt-12 81.3ns ± 4% 81.8ns ± 2% ~ (p=0.159 n=10+10) FmtFprintfIntInt-12 120ns ± 4% 118ns ± 2% ~ (p=0.218 n=10+10) FmtFprintfPrefixedInt-12 152ns ± 3% 155ns ± 2% +2.11% (p=0.026 n=10+10) FmtFprintfFloat-12 240ns ± 1% 238ns ± 1% -0.79% (p=0.005 n=9+9) FmtManyArgs-12 504ns ± 1% 510ns ± 1% +1.14% (p=0.000 n=8+9) GobDecode-12 7.00ms ± 1% 6.99ms ± 0% ~ (p=0.497 n=9+10) GobEncode-12 5.47ms ± 1% 5.48ms ± 1% ~ (p=0.218 n=10+10) Gzip-12 258ms ± 2% 256ms ± 1% -0.96% (p=0.043 n=10+9) Gunzip-12 38.6ms ± 0% 38.3ms ± 0% -0.64% (p=0.000 n=9+8) HTTPClientServer-12 90.4µs ± 3% 87.2µs ±11% ~ (p=0.053 n=9+10) JSONEncode-12 15.6ms ± 0% 15.6ms ± 1% ~ (p=0.077 n=9+9) JSONDecode-12 55.1ms ± 1% 54.6ms ± 1% -0.85% (p=0.010 n=10+9) Mandelbrot200-12 4.49ms ± 0% 4.47ms ± 0% -0.25% (p=0.000 n=10+8) GoParse-12 3.38ms ± 0% 3.37ms ± 1% ~ (p=0.315 n=8+10) RegexpMatchEasy0_32-12 82.5ns ± 4% 82.0ns ± 0% ~ (p=0.164 n=10+8) RegexpMatchEasy0_1K-12 203ns ± 1% 202ns ± 1% -0.85% (p=0.000 n=9+10) RegexpMatchEasy1_32-12 82.3ns ± 1% 81.1ns ± 0% -1.39% (p=0.000 n=10+8) RegexpMatchEasy1_1K-12 357ns ± 1% 357ns ± 1% ~ (p=0.697 n=8+9) RegexpMatchMedium_32-12 125ns ± 2% 126ns ± 2% ~ (p=0.197 n=10+10) RegexpMatchMedium_1K-12 39.6µs ± 3% 39.6µs ± 1% ~ (p=0.971 n=10+10) RegexpMatchHard_32-12 1.99µs ± 2% 1.99µs ± 4% ~ (p=0.891 n=10+9) RegexpMatchHard_1K-12 60.1µs ± 3% 60.4µs ± 3% ~ (p=0.684 n=10+10) Revcomp-12 531ms ± 6% 441ms ± 0% -16.94% (p=0.000 n=10+9) Template-12 58.9ms ± 1% 58.7ms ± 1% ~ (p=0.315 n=10+10) TimeParse-12 319ns ± 1% 320ns ± 4% ~ (p=0.215 n=9+9) TimeFormat-12 345ns ± 0% 333ns ± 1% -3.36% (p=0.000 n=9+10) [Geo mean] 52.2µs 51.6µs -1.13% On ARM64: name old time/op new time/op delta BinaryTree17-8 8.53s ± 0% 8.36s ± 0% -1.89% (p=0.000 n=10+10) Fannkuch11-8 6.15s ± 0% 6.10s ± 0% -0.67% (p=0.000 n=10+10) FmtFprintfEmpty-8 117ns ± 0% 117ns ± 0% ~ (all equal) FmtFprintfString-8 192ns ± 0% 192ns ± 0% ~ (all equal) FmtFprintfInt-8 198ns ± 0% 198ns ± 0% ~ (p=0.211 n=10+10) FmtFprintfIntInt-8 289ns ± 0% 291ns ± 0% +0.59% (p=0.000 n=7+10) FmtFprintfPrefixedInt-8 320ns ± 2% 317ns ± 0% ~ (p=0.431 n=10+8) FmtFprintfFloat-8 538ns ± 0% 538ns ± 0% ~ (all equal) FmtManyArgs-8 1.17µs ± 1% 1.18µs ± 1% ~ (p=0.063 n=10+10) GobDecode-8 17.0ms ± 1% 17.2ms ± 1% +0.83% (p=0.000 n=10+10) GobEncode-8 14.2ms ± 0% 14.1ms ± 1% -0.78% (p=0.001 n=9+10) Gzip-8 806ms ± 0% 797ms ± 0% -1.12% (p=0.000 n=6+9) Gunzip-8 131ms ± 0% 130ms ± 0% -0.51% (p=0.000 n=10+9) HTTPClientServer-8 206µs ± 9% 212µs ± 2% ~ (p=0.829 n=10+8) JSONEncode-8 40.1ms ± 0% 40.1ms ± 0% ~ (p=0.136 n=9+9) JSONDecode-8 157ms ± 0% 151ms ± 0% -3.32% (p=0.000 n=9+9) Mandelbrot200-8 10.1ms ± 0% 10.1ms ± 0% -0.05% (p=0.000 n=9+8) GoParse-8 8.43ms ± 0% 8.43ms ± 0% ~ (p=0.912 n=10+10) RegexpMatchEasy0_32-8 228ns ± 1% 227ns ± 0% -0.26% (p=0.026 n=10+9) RegexpMatchEasy0_1K-8 1.92µs ± 0% 1.63µs ± 0% -15.18% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 258ns ± 1% 250ns ± 0% -2.83% (p=0.000 n=10+10) RegexpMatchEasy1_1K-8 2.39µs ± 0% 2.13µs ± 0% -10.94% (p=0.000 n=9+9) RegexpMatchMedium_32-8 352ns ± 0% 351ns ± 0% -0.29% (p=0.004 n=9+10) RegexpMatchMedium_1K-8 104µs ± 0% 105µs ± 0% +0.58% (p=0.000 n=8+9) RegexpMatchHard_32-8 5.84µs ± 0% 5.82µs ± 0% -0.27% (p=0.000 n=9+10) RegexpMatchHard_1K-8 177µs ± 0% 177µs ± 0% -0.07% (p=0.000 n=9+9) Revcomp-8 1.57s ± 1% 1.50s ± 1% -4.60% (p=0.000 n=9+10) Template-8 157ms ± 1% 153ms ± 1% -2.28% (p=0.000 n=10+9) TimeParse-8 779ns ± 1% 770ns ± 1% -1.18% (p=0.013 n=10+10) TimeFormat-8 823ns ± 2% 826ns ± 1% ~ (p=0.324 n=10+9) [Geo mean] 144µs 142µs -1.45% Reduce cmd/go text size by 0.5%. Change-Id: I9288ff983c4a7cf03fc0cb35b9b1750828013117 Reviewed-on: https://go-review.googlesource.com/38457 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-29 19:45:32 +00:00
Josh Bleecher Snyder	34975095d0	cmd/compile: provide pos and curfn to temp Concurrent compilation requires providing an explicit position and curfn to temp. This implementation of tempAt temporarily continues to use the globals lineno and Curfn, so as not to collide with mdempsky's work for #19683 eliminating the Curfn dependency from func nod. Updates #15756 Updates #19683 Change-Id: Ib3149ca4b0740e9f6eea44babc6f34cdd63028a9 Reviewed-on: https://go-review.googlesource.com/38592 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-03-25 00:09:21 +00:00
Keith Randall	27bc723b51	cmd/compile: initialize loop depths Regalloc uses loop depths - make sure they are initialized! Test to make sure we aren't pushing spills into loops. This fixes a generated-code performance bug introduced with the better spill placement change: https://go-review.googlesource.com/c/34822/ Update #19595 Change-Id: Ib9f0da6fb588503518847d7aab51e569fd3fa61e Reviewed-on: https://go-review.googlesource.com/38434 Reviewed-by: David Chase <drchase@google.com>	2017-03-23 16:41:22 +00:00
Josh Bleecher Snyder	aea3aff669	cmd/compile: separate ssa.Frontend and ssa.TypeSource Prior to this CL, the ssa.Frontend field was responsible for providing types to the backend during compilation. However, the types needed by the backend are few and static. It makes more sense to use a struct for them and to hang that struct off the ssa.Config, which is the correct home for readonly data. Now that Types is a struct, we can clean up the names a bit as well. This has the added benefit of allowing early construction of all types needed by the backend. This will be useful for concurrent backend compilation. Passes toolstash-check -all. No compiler performance change. Updates #15756 Change-Id: I021658c8cf2836d6a22bbc20cc828ac38c7da08a Reviewed-on: https://go-review.googlesource.com/38336 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-03-19 00:21:23 +00:00
Josh Bleecher Snyder	2cdb7f118a	cmd/compile: move Frontend field from ssa.Config to ssa.Func Suggested by mdempsky in CL 38232. This allows us to use the Frontend field to associate frontend state and information with a function. See the following CL in the series for examples. This is a giant CL, but it is almost entirely routine refactoring. The ssa test API is starting to feel a bit unwieldy. I will clean it up separately, once the dust has settled. Passes toolstash -cmp. Updates #15756 Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf Reviewed-on: https://go-review.googlesource.com/38327 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-03-17 23:18:57 +00:00
Josh Bleecher Snyder	a5e3cac895	cmd/compile: rearrange fields between ssa.Func, ssa.Cache, and ssa.Config This makes ssa.Func, ssa.Cache, and ssa.Config fulfill the roles laid out for them in CL 38160. The only non-trivial change in this CL is how cached values and blocks get IDs. Prior to this CL, their IDs were assigned as part of resetting the cache, and only modified IDs were reset. This required knowing how many values and blocks were modified, which required a tight coupling between ssa.Func and ssa.Config. To eliminate that coupling, we now zero values and blocks during reset, and assign their IDs when they are used. Since unused values and blocks have ID == 0, we can efficiently find the last used value/block, to avoid zeroing everything. Bulk zeroing is efficient, but not efficient enough to obviate the need to avoid zeroing everything every time. As a happy side-effect, ssa.Func.Free is no longer necessary. DebugHashMatch and friends now belong in func.go. They have been left in place for clarity and review. I will move them in a subsequent CL. Passes toolstash -cmp. No compiler performance impact. No change in 'go test cmd/compile/internal/ssa' execution time. Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098 Reviewed-on: https://go-review.googlesource.com/38167 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-17 05:21:42 +00:00
David Chase	886e9e6065	cmd/compile: put spills in better places Previously we always issued a spill right after the op that was being spilled. This CL pushes spills father away from the generator, hopefully pushing them into unlikely branches. For example: x = ... if unlikely { call ... } ... use x ... Used to compile to x = ... spill x if unlikely { call ... restore x } It now compiles to x = ... if unlikely { spill x call ... restore x } This is particularly useful for code which appends, as the only call is an unlikely call to growslice. It also helps for the spills needed around write barrier calls. The basic algorithm is walk down the dominator tree following a path where the block still dominates all of the restores. We're looking for a block that: 1) dominates all restores 2) has the value being spilled in a register 3) has a loop depth no deeper than the value being spilled The walking-down code is iterative. I was forced to limit it to searching 100 blocks so it doesn't become O(n^2). Maybe one day we'll find a better way. I had to delete most of David's code which pushed spills out of loops. I suspect this CL subsumes most of the cases that his code handled. Generally positive performance improvements, but hard to tell for sure with all the noise. (compilebench times are unchanged.) name old time/op new time/op delta BinaryTree17-12 2.91s ±15% 2.80s ±12% ~ (p=0.063 n=10+10) Fannkuch11-12 3.47s ± 0% 3.30s ± 4% -4.91% (p=0.000 n=9+10) FmtFprintfEmpty-12 48.0ns ± 1% 47.4ns ± 1% -1.32% (p=0.002 n=9+9) FmtFprintfString-12 85.6ns ±11% 79.4ns ± 3% -7.27% (p=0.005 n=10+10) FmtFprintfInt-12 91.8ns ±10% 85.9ns ± 4% ~ (p=0.203 n=10+9) FmtFprintfIntInt-12 135ns ±13% 127ns ± 1% -5.72% (p=0.025 n=10+9) FmtFprintfPrefixedInt-12 167ns ± 1% 168ns ± 2% ~ (p=0.580 n=9+10) FmtFprintfFloat-12 249ns ±11% 230ns ± 1% -7.32% (p=0.000 n=10+10) FmtManyArgs-12 504ns ± 7% 506ns ± 1% ~ (p=0.198 n=9+9) GobDecode-12 6.95ms ± 1% 7.04ms ± 1% +1.37% (p=0.001 n=10+10) GobEncode-12 6.32ms ±13% 6.04ms ± 1% ~ (p=0.063 n=10+10) Gzip-12 233ms ± 1% 235ms ± 0% +1.01% (p=0.000 n=10+9) Gunzip-12 40.1ms ± 1% 39.6ms ± 0% -1.12% (p=0.000 n=10+8) HTTPClientServer-12 227µs ± 9% 221µs ± 5% ~ (p=0.114 n=9+8) JSONEncode-12 16.1ms ± 2% 15.8ms ± 1% -2.09% (p=0.002 n=9+8) JSONDecode-12 61.8ms ±11% 57.9ms ± 1% -6.30% (p=0.000 n=10+9) Mandelbrot200-12 4.30ms ± 3% 4.28ms ± 1% ~ (p=0.203 n=10+8) GoParse-12 3.18ms ± 2% 3.18ms ± 2% ~ (p=0.579 n=10+10) RegexpMatchEasy0_32-12 76.7ns ± 1% 77.5ns ± 1% +0.92% (p=0.002 n=9+8) RegexpMatchEasy0_1K-12 239ns ± 3% 239ns ± 1% ~ (p=0.204 n=10+10) RegexpMatchEasy1_32-12 71.4ns ± 1% 70.6ns ± 0% -1.15% (p=0.000 n=10+9) RegexpMatchEasy1_1K-12 383ns ± 2% 390ns ±10% ~ (p=0.181 n=8+9) RegexpMatchMedium_32-12 114ns ± 0% 113ns ± 1% -0.88% (p=0.000 n=9+8) RegexpMatchMedium_1K-12 36.3µs ± 1% 36.8µs ± 1% +1.59% (p=0.000 n=10+8) RegexpMatchHard_32-12 1.90µs ± 1% 1.90µs ± 1% ~ (p=0.341 n=10+10) RegexpMatchHard_1K-12 59.4µs ±11% 57.8µs ± 1% ~ (p=0.968 n=10+9) Revcomp-12 461ms ± 1% 462ms ± 1% ~ (p=1.000 n=9+9) Template-12 67.5ms ± 1% 66.3ms ± 1% -1.77% (p=0.000 n=10+8) TimeParse-12 314ns ± 3% 309ns ± 0% -1.56% (p=0.000 n=9+8) TimeFormat-12 340ns ± 2% 331ns ± 1% -2.79% (p=0.000 n=10+10) The go binary is 0.2% larger. Not really sure why the size would change. Change-Id: Ia5116e53a3aeb025ef350ffc51c14ae5cc17871c Reviewed-on: https://go-review.googlesource.com/34822 Reviewed-by: David Chase <drchase@google.com>	2017-03-15 02:09:25 +00:00
Cherry Zhang	8a44c8efae	cmd/compile: don't spill rematerializeable value when resolving merge edges Fixes #19515. Change-Id: I4bcce152cef52d00fbb5ab4daf72a6e742bae27c Reviewed-on: https://go-review.googlesource.com/38158 Reviewed-by: Keith Randall <khr@golang.org>	2017-03-14 22:55:52 +00:00
Josh Bleecher Snyder	57546d67ec	cmd/compile: add reusable []Location to ssa.Config name old time/op new time/op delta Template 218ms ± 3% 214ms ± 3% -1.70% (p=0.000 n=30+30) Unicode 100ms ± 3% 100ms ± 4% ~ (p=0.614 n=29+30) GoTypes 657ms ± 1% 660ms ± 3% +0.46% (p=0.046 n=29+30) Compiler 2.80s ± 2% 2.80s ± 1% ~ (p=0.451 n=28+29) Flate 131ms ± 2% 132ms ± 4% ~ (p=1.000 n=29+29) GoParser 159ms ± 3% 160ms ± 5% ~ (p=0.341 n=28+30) Reflect 406ms ± 3% 408ms ± 4% ~ (p=0.511 n=28+30) Tar 118ms ± 4% 118ms ± 4% ~ (p=0.827 n=29+30) XML 222ms ± 6% 222ms ± 3% ~ (p=0.532 n=30+30) name old user-ns/op new user-ns/op delta Template 274user-ms ± 3% 272user-ms ± 3% -0.87% (p=0.015 n=29+30) Unicode 140user-ms ± 4% 140user-ms ± 3% ~ (p=0.735 n=29+30) GoTypes 890user-ms ± 1% 897user-ms ± 2% +0.88% (p=0.002 n=29+30) Compiler 3.88user-s ± 2% 3.89user-s ± 1% ~ (p=0.132 n=30+29) Flate 168user-ms ± 2% 157user-ms ± 4% -6.21% (p=0.000 n=25+28) GoParser 211user-ms ± 2% 213user-ms ± 5% ~ (p=0.086 n=28+30) Reflect 539user-ms ± 2% 541user-ms ± 3% ~ (p=0.267 n=27+29) Tar 156user-ms ± 7% 155user-ms ± 5% ~ (p=0.708 n=30+30) XML 291user-ms ± 5% 294user-ms ± 3% +0.83% (p=0.029 n=29+30) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 39.4MB ± 0% -3.26% (p=0.000 n=29+26) Unicode 30.8MB ± 0% 30.7MB ± 0% -0.40% (p=0.000 n=28+30) GoTypes 123MB ± 0% 119MB ± 0% -3.47% (p=0.000 n=30+29) Compiler 472MB ± 0% 455MB ± 0% -3.60% (p=0.000 n=30+30) Flate 26.5MB ± 0% 25.6MB ± 0% -3.21% (p=0.000 n=28+30) GoParser 32.3MB ± 0% 31.4MB ± 0% -2.98% (p=0.000 n=29+30) Reflect 84.4MB ± 0% 82.1MB ± 0% -2.83% (p=0.000 n=30+30) Tar 27.3MB ± 0% 26.5MB ± 0% -2.70% (p=0.000 n=29+29) XML 44.6MB ± 0% 43.1MB ± 0% -3.49% (p=0.000 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 399k ± 0% -0.35% (p=0.000 n=30+28) Unicode 331k ± 0% 331k ± 1% ~ (p=0.907 n=28+30) GoTypes 1.24M ± 0% 1.23M ± 0% -0.43% (p=0.000 n=30+30) Compiler 4.26M ± 0% 4.25M ± 0% -0.34% (p=0.000 n=29+30) Flate 252k ± 1% 251k ± 1% -0.41% (p=0.000 n=30+30) GoParser 325k ± 1% 324k ± 1% -0.31% (p=0.000 n=27+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.69% (p=0.000 n=30+30) Tar 266k ± 1% 265k ± 1% -0.51% (p=0.000 n=29+30) XML 416k ± 1% 415k ± 1% -0.36% (p=0.002 n=30+30) Change-Id: I8f784001324df83b2764c44f0e83a540e5beab34 Reviewed-on: https://go-review.googlesource.com/36212 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-02 22:39:32 +00:00
Russ Cox	47ce87877b	all: merge dev.inline into master Change-Id: I7715581a04e513dcda9918e853fa6b1ddc703770	2017-02-01 09:47:23 -05:00
Robert Griesemer	472c792e0a	[dev.inline] cmd/internal/src: introduce compact source position representation XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-01-09 22:43:22 +00:00
shawnps	067bab00a8	all: fix misspellings Change-Id: I429637ca91f7db4144f17621de851a548dc1ce76 Reviewed-on: https://go-review.googlesource.com/34923 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-01-07 16:53:25 +00:00
Robert Griesemer	eaca0e0529	[dev.inline] cmd/internal/src: introduce NoPos and use it instead Pos{} Using a variable instead of a composite literal makes the code independent of implementation changes of Pos. Per David Lazar's suggestion. Change-Id: I336967ac12a027c51a728a58ac6207cb5119af4a Reviewed-on: https://go-review.googlesource.com/34148 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2016-12-09 00:35:07 +00:00
Robert Griesemer	c10499b539	[dev.inline] cmd/compile/internal/ssa: another round of renames from line -> pos (cleanup) Mostly mechanical renames. Make variable names consistent with use. Change-Id: Iaa89d31deab11eca6e784595b58e779ad525c8a3 Reviewed-on: https://go-review.googlesource.com/34146 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-12-08 23:10:30 +00:00
Robert Griesemer	cfd17f51c8	[dev.inline] cmd/compile/internal/ssa: rename various fields from Line to Pos This is a mostly mechanical rename followed by manual fixes where necessary. Change-Id: Ie5c670b133db978f15dc03e50dc2da0c80fc8842 Reviewed-on: https://go-review.googlesource.com/34137 Reviewed-by: David Lazar <lazard@golang.org>	2016-12-08 21:36:52 +00:00
Robert Griesemer	82d0caea2c	[dev.inline] cmd/internal/src: make Pos implementation abstract Adjust cmd/compile accordingly. This will make it easier to replace the underlying implementation. Change-Id: I33645850bb18c839b24785b6222a9e028617addb Reviewed-on: https://go-review.googlesource.com/34133 Reviewed-by: David Lazar <lazard@golang.org>	2016-12-08 21:31:28 +00:00
Robert Griesemer	24597c080b	[dev.inline] cmd/compile: introduce cmd/internal/src.Pos type for line numbers This is a step toward chosing a different position representation. By introducing an explicit type, it will be easier to make the transition step-wise while ensuring everything keeps running. This has been reviewed via https://go-review.googlesource.com/#/c/34025/. Change-Id: Ibceddcd62d8f346321ac3250e3940e9c436ed684 Reviewed-on: https://go-review.googlesource.com/34132 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Lazar <lazard@golang.org>	2016-12-08 21:26:25 +00:00
Cherry Zhang	348275cda6	cmd/compile: make a copy of Phi input if it is still live Register of Phi input is allocated to the Phi. So if the Phi input is still live after Phi, we may need to use a spill. In this case, copy the Phi input to a spare register to avoid a spill. Originally targeted the code in issue #16187, and this CL indeed removes the spill, but doesn't seem to help on benchmark result. It may help in general, though. On AMD64: name old time/op new time/op delta BinaryTree17-12 2.79s ± 1% 2.76s ± 0% -1.33% (p=0.000 n=10+10) Fannkuch11-12 3.02s ± 0% 3.14s ± 0% +3.99% (p=0.000 n=10+10) FmtFprintfEmpty-12 51.2ns ± 0% 51.4ns ± 3% ~ (p=0.368 n=8+10) FmtFprintfString-12 145ns ± 0% 144ns ± 0% -0.69% (p=0.000 n=6+9) FmtFprintfInt-12 127ns ± 1% 124ns ± 1% -2.79% (p=0.000 n=10+9) FmtFprintfIntInt-12 186ns ± 0% 184ns ± 0% -1.34% (p=0.000 n=10+9) FmtFprintfPrefixedInt-12 196ns ± 0% 194ns ± 0% -0.97% (p=0.000 n=9+9) FmtFprintfFloat-12 293ns ± 2% 287ns ± 0% -2.00% (p=0.000 n=10+9) FmtManyArgs-12 847ns ± 1% 829ns ± 0% -2.17% (p=0.000 n=10+7) GobDecode-12 7.17ms ± 0% 7.18ms ± 0% ~ (p=0.123 n=10+10) GobEncode-12 6.08ms ± 1% 6.08ms ± 0% ~ (p=0.497 n=10+9) Gzip-12 277ms ± 1% 275ms ± 1% -0.47% (p=0.028 n=10+9) Gunzip-12 39.1ms ± 2% 38.2ms ± 1% -2.20% (p=0.000 n=10+9) HTTPClientServer-12 90.9µs ± 4% 87.7µs ± 2% -3.51% (p=0.001 n=9+10) JSONEncode-12 17.3ms ± 1% 16.5ms ± 0% -5.02% (p=0.000 n=9+9) JSONDecode-12 54.6ms ± 1% 54.1ms ± 0% -0.99% (p=0.000 n=9+9) Mandelbrot200-12 4.45ms ± 0% 4.45ms ± 0% -0.02% (p=0.006 n=8+9) GoParse-12 3.44ms ± 0% 3.48ms ± 1% +0.95% (p=0.000 n=10+10) RegexpMatchEasy0_32-12 84.9ns ± 0% 85.0ns ± 0% ~ (p=0.241 n=8+8) RegexpMatchEasy0_1K-12 867ns ± 3% 915ns ±11% +5.55% (p=0.037 n=10+10) RegexpMatchEasy1_32-12 82.7ns ± 5% 83.9ns ± 4% ~ (p=0.161 n=9+10) RegexpMatchEasy1_1K-12 361ns ± 1% 363ns ± 0% ~ (p=0.098 n=10+8) RegexpMatchMedium_32-12 126ns ± 0% 126ns ± 1% ~ (p=0.549 n=8+10) RegexpMatchMedium_1K-12 38.8µs ± 0% 39.1µs ± 0% +0.67% (p=0.000 n=9+8) RegexpMatchHard_32-12 1.95µs ± 0% 1.96µs ± 0% +0.43% (p=0.000 n=9+9) RegexpMatchHard_1K-12 59.0µs ± 0% 59.1µs ± 0% +0.27% (p=0.000 n=10+9) Revcomp-12 436ms ± 1% 431ms ± 1% -1.19% (p=0.005 n=10+10) Template-12 56.7ms ± 1% 57.1ms ± 1% +0.71% (p=0.001 n=10+9) TimeParse-12 312ns ± 0% 310ns ± 0% -0.80% (p=0.000 n=10+9) TimeFormat-12 336ns ± 0% 332ns ± 0% -1.19% (p=0.000 n=8+7) [Geo mean] 59.2µs 58.9µs -0.42% On PPC64: name old time/op new time/op delta BinaryTree17-2 4.67s ± 2% 4.71s ± 1% ~ (p=0.421 n=5+5) Fannkuch11-2 3.92s ± 1% 3.94s ± 0% +0.46% (p=0.032 n=5+5) FmtFprintfEmpty-2 122ns ± 0% 120ns ± 2% -1.80% (p=0.016 n=4+5) FmtFprintfString-2 305ns ± 1% 299ns ± 1% -1.84% (p=0.008 n=5+5) FmtFprintfInt-2 243ns ± 0% 241ns ± 1% -0.66% (p=0.016 n=4+5) FmtFprintfIntInt-2 361ns ± 1% 356ns ± 1% -1.49% (p=0.016 n=5+5) FmtFprintfPrefixedInt-2 355ns ± 1% 357ns ± 1% ~ (p=0.333 n=5+5) FmtFprintfFloat-2 502ns ± 2% 498ns ± 1% ~ (p=0.151 n=5+5) FmtManyArgs-2 1.55µs ± 2% 1.59µs ± 1% +2.52% (p=0.008 n=5+5) GobDecode-2 13.0ms ± 1% 13.0ms ± 1% ~ (p=0.841 n=5+5) GobEncode-2 11.8ms ± 1% 11.8ms ± 1% ~ (p=0.690 n=5+5) Gzip-2 499ms ± 1% 503ms ± 0% ~ (p=0.421 n=5+5) Gunzip-2 86.5ms ± 0% 86.4ms ± 1% ~ (p=0.841 n=5+5) HTTPClientServer-2 68.2µs ± 2% 69.6µs ± 3% ~ (p=0.151 n=5+5) JSONEncode-2 39.0ms ± 1% 37.2ms ± 1% -4.65% (p=0.008 n=5+5) JSONDecode-2 122ms ± 1% 126ms ± 1% +2.63% (p=0.008 n=5+5) Mandelbrot200-2 6.08ms ± 1% 5.89ms ± 1% -3.06% (p=0.008 n=5+5) GoParse-2 5.95ms ± 2% 5.98ms ± 1% ~ (p=0.421 n=5+5) RegexpMatchEasy0_32-2 331ns ± 1% 328ns ± 1% ~ (p=0.056 n=5+5) RegexpMatchEasy0_1K-2 1.45µs ± 0% 1.47µs ± 0% +1.13% (p=0.008 n=5+5) RegexpMatchEasy1_32-2 359ns ± 0% 353ns ± 0% -1.84% (p=0.008 n=5+5) RegexpMatchEasy1_1K-2 1.79µs ± 0% 1.81µs ± 1% +1.16% (p=0.008 n=5+5) RegexpMatchMedium_32-2 420ns ± 2% 413ns ± 0% -1.72% (p=0.008 n=5+5) RegexpMatchMedium_1K-2 70.2µs ± 1% 69.5µs ± 1% -1.09% (p=0.032 n=5+5) RegexpMatchHard_32-2 3.87µs ± 1% 3.65µs ± 0% -5.86% (p=0.008 n=5+5) RegexpMatchHard_1K-2 111µs ± 0% 105µs ± 0% -5.49% (p=0.016 n=5+4) Revcomp-2 1.00s ± 1% 1.01s ± 2% ~ (p=0.151 n=5+5) Template-2 113ms ± 1% 113ms ± 2% ~ (p=0.841 n=5+5) TimeParse-2 555ns ± 0% 550ns ± 1% -0.87% (p=0.032 n=5+5) TimeFormat-2 736ns ± 1% 704ns ± 1% -4.35% (p=0.008 n=5+5) [Geo mean] 120µs 119µs -0.77% Reduce "spilled value remains" by 0.6% in cmd/go on AMD64. Change-Id: If655df343b0f30d1a49ab1ab644f10c698b96f3e Reviewed-on: https://go-review.googlesource.com/32442 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-11-18 13:56:23 +00:00
Cherry Zhang	f9238a76ff	cmd/compile: make LR allocatable in non-leaf functions on ARM The mechanism is initially introduced (and reviewed) in CL 30597 on S390X. Reduce number of "spilled value remains" by 0.4% in cmd/go. Disabled on ARMv5 because LR is clobbered almost everywhere with inserted softfloat calls. Change-Id: I2934737ce2455909647ed2118fe2bd6f0aa5ac52 Reviewed-on: https://go-review.googlesource.com/32178 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-10-28 14:25:33 +00:00
Michael Munday	15817e409b	cmd/compile: make link register allocatable in non-leaf functions We save and restore the link register in non-leaf functions because it is clobbered by CALLs. It is therefore available for general purpose use. Only enabled on s390x currently. The RC4 benchmarks in particular benefit from the extra register: name old speed new speed delta RC4_128 243MB/s ± 2% 341MB/s ± 2% +40.46% (p=0.008 n=5+5) RC4_1K 267MB/s ± 0% 359MB/s ± 1% +34.32% (p=0.008 n=5+5) RC4_8K 271MB/s ± 0% 362MB/s ± 0% +33.61% (p=0.008 n=5+5) Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f Reviewed-on: https://go-review.googlesource.com/30597 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-11 18:52:35 +00:00
Keith Randall	93d5f43a29	cmd/compile: do regalloc check only when checkEnabled No point doing this check all the time. Fixes #15621 Change-Id: I1966c061986fe98fe9ebe146d6b9738c13cef724 Reviewed-on: https://go-review.googlesource.com/30670 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-07 17:33:15 +00:00
Keith Randall	1bddd2ee6a	cmd/compile: don't shuffle rematerializeable values around Better to just rematerialize them when needed instead of cross-register spilling or other techniques for keeping them in registers. This helps for amd64 code that does 1 << x. It is better to do loop: MOVQ $1, AX // materialize arg to SLLQ SLLQ CX, AX ... goto loop than to do MOVQ $1, AX // materialize outsize of loop loop: MOVQ AX, DX // save value that's about to be clobbered SLLQ CX, AX MOVQ DX, AX // move it back to the correct register goto loop Update #16092 Change-Id: If7ac290208f513061ebb0736e8a79dcb0ba338c0 Reviewed-on: https://go-review.googlesource.com/30471 TryBot-Result: Gobot Gobot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-10-06 02:46:43 +00:00
Matthew Dempsky	6dc356a76a	cmd/compile/internal/ssa: erase register copies deterministically Fixes #17288. Change-Id: I2ddd01d14667d5c6a2e19bd70489da8d9869d308 Reviewed-on: https://go-review.googlesource.com/30072 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-09-29 22:22:36 +00:00
Cherry Zhang	f876fb9bae	cmd/compile: move value around before kick it out of register When allocating registers, before kicking out the existing value, copy it to a spare register if there is one. So later use of this value can be found in register instead of reload from spill. This is very helpful for instructions of which the input and/or output can only be in specific registers, e.g. DIV on x86, MUL/DIV on MIPS. May also be helpful in general. For "go build -a cmd/go" on AMD64, reduce "spilled value remains" by 1% (not including args, which almost certainly remain). For the code in issue #16061 on AMD64: MaxRem-12 111µs ± 1% 94µs ± 0% -15.38% (p=0.008 n=5+5) Go1 benchmark on AMD64: BinaryTree17-12 2.32s ± 2% 2.30s ± 1% ~ (p=0.421 n=5+5) Fannkuch11-12 2.52s ± 0% 2.44s ± 0% -3.44% (p=0.008 n=5+5) FmtFprintfEmpty-12 39.9ns ± 3% 39.8ns ± 0% ~ (p=0.635 n=5+4) FmtFprintfString-12 114ns ± 1% 113ns ± 1% ~ (p=0.905 n=5+5) FmtFprintfInt-12 102ns ± 6% 98ns ± 1% ~ (p=0.087 n=5+5) FmtFprintfIntInt-12 146ns ± 5% 147ns ± 1% ~ (p=0.238 n=5+5) FmtFprintfPrefixedInt-12 155ns ± 2% 151ns ± 1% -2.58% (p=0.008 n=5+5) FmtFprintfFloat-12 231ns ± 1% 232ns ± 1% ~ (p=0.286 n=5+5) FmtManyArgs-12 657ns ± 1% 649ns ± 0% -1.31% (p=0.008 n=5+5) GobDecode-12 6.35ms ± 0% 6.29ms ± 1% ~ (p=0.056 n=5+5) GobEncode-12 5.38ms ± 1% 5.45ms ± 1% ~ (p=0.056 n=5+5) Gzip-12 209ms ± 0% 209ms ± 1% ~ (p=0.690 n=5+5) Gunzip-12 31.2ms ± 1% 31.1ms ± 1% ~ (p=0.548 n=5+5) HTTPClientServer-12 123µs ± 4% 130µs ± 8% ~ (p=0.151 n=5+5) JSONEncode-12 14.0ms ± 1% 14.0ms ± 1% ~ (p=0.421 n=5+5) JSONDecode-12 41.2ms ± 1% 41.1ms ± 2% ~ (p=0.421 n=5+5) Mandelbrot200-12 3.96ms ± 1% 3.98ms ± 0% ~ (p=0.421 n=5+5) GoParse-12 2.88ms ± 1% 2.88ms ± 1% ~ (p=0.841 n=5+5) RegexpMatchEasy0_32-12 68.0ns ± 3% 66.6ns ± 1% -2.00% (p=0.024 n=5+5) RegexpMatchEasy0_1K-12 728ns ± 8% 682ns ± 1% -6.26% (p=0.008 n=5+5) RegexpMatchEasy1_32-12 66.8ns ± 2% 66.0ns ± 1% ~ (p=0.302 n=5+5) RegexpMatchEasy1_1K-12 291ns ± 2% 288ns ± 1% ~ (p=0.111 n=5+5) RegexpMatchMedium_32-12 103ns ± 2% 100ns ± 0% -2.53% (p=0.016 n=5+4) RegexpMatchMedium_1K-12 31.9µs ± 1% 31.3µs ± 0% -1.75% (p=0.008 n=5+5) RegexpMatchHard_32-12 1.59µs ± 2% 1.59µs ± 1% ~ (p=0.548 n=5+5) RegexpMatchHard_1K-12 48.3µs ± 2% 47.7µs ± 1% ~ (p=0.222 n=5+5) Revcomp-12 340ms ± 1% 338ms ± 1% ~ (p=0.421 n=5+5) Template-12 46.3ms ± 1% 46.5ms ± 1% ~ (p=0.690 n=5+5) TimeParse-12 252ns ± 1% 247ns ± 0% -1.91% (p=0.000 n=5+4) TimeFormat-12 277ns ± 1% 267ns ± 0% -3.82% (p=0.008 n=5+5) [Geo mean] 48.8µs 48.3µs -0.93% It has very little effect on binary size and compiler speed. compilebench: Template 230ms ±10% 231ms ± 8% ~ (p=0.546 n=9+9) Unicode 123ms ± 6% 124ms ± 9% ~ (p=0.481 n=10+10) GoTypes 742ms ± 6% 755ms ± 3% ~ (p=0.123 n=10+10) Compiler 3.10s ± 3% 3.08s ± 1% ~ (p=0.631 n=10+10) Fixes #16061. Change-Id: Id99cdc7a182ee10a704fa0f04e8e0d0809b2ac56 Reviewed-on: https://go-review.googlesource.com/29732 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-09-27 15:44:58 +00:00
David Chase	cddddbc623	cmd/compile: use ISEL, cleanup use of zero & extensions Abandoned earlier efforts to expose zero register, but left it in numbering to decrease squirrelyness of register allocator. ISELrelOp used in code generation of bool := x relOp y. Some patterns added to better elide zero case and some sign extension. Updates: #17109 Change-Id: Ida7839f0023ca8f0ffddc0545f0ac269e65b05d9 Reviewed-on: https://go-review.googlesource.com/29380 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-09-22 17:36:39 +00:00
Keith Randall	75ce89c20d	cmd/compile: cache CFG-dependent computations We compute a lot of stuff based off the CFG: postorder traversal, dominators, dominator tree, loop nest. Multiple phases use this information and we end up recomputing some of it. Add a cache for this information so if the CFG hasn't changed, we can reuse the previous computation. Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2 Reviewed-on: https://go-review.googlesource.com/29356 Reviewed-by: David Chase <drchase@google.com>	2016-09-19 16:00:13 +00:00
Keith Randall	833ed7c431	cmd/compile: reorganize SSA register numbering Teach SSA about the cmd/internal/obj/$ARCH register numbering. It can then return that numbering when requested. Each architecture now does not need to know anything about the internal SSA numbering of registers. Change-Id: I34472a2736227c15482e60994eebcdd2723fa52d Reviewed-on: https://go-review.googlesource.com/29249 Reviewed-by: David Chase <drchase@google.com>	2016-09-16 19:01:55 +00:00

1 2 3

145 Commits