mirror of
https://github.com/golang/go.git
synced 2025-05-18 13:54:40 +00:00
802 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
e8f5a33191 |
cmd/compile: fix incorrect rewriting to if condition
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag, in the following categories: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The backend generates two consecutive branch instructions for 'LEnoov' and 'GTnoov' to model their expected behavior. A slight change to 'gc' and amd64/386 backends is made to unify the code generation. Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules identified on arm64, more might be needed on other arches, like 32-bit arm. Add two benchmarks profiling the aforementioned category 1&2 and category 3&4 separetely, we expect the first two categories will show performance improvement and the second will not result in visible regression compared with the non-optimized version. This change also updates TestFormats to support using %#x. Examples exhibiting where does the issue come from: 1: 'if x + 3 < 0' might be converted to: before: CMN $3, R0 BGE <else branch> // wrong branch is taken if 'x+3' overflows after: CMN $3, R0 BPL <else branch> 2: 'if y - 3 > 0' might be converted to: before: CMP $3, R0 BLE <else branch> // wrong branch is taken if 'y-3' underflows after: CMP $3, R0 BMI <else branch> BEQ <else branch> Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized version (not the parent commit), generally the optimization version outperforms. S1: name old time/op new time/op delta CondRewrite/SoloJump 13.6ns ± 0% 12.9ns ± 0% -5.15% (p=0.000 n=10+10) CondRewrite/CombJump 13.8ns ± 1% 12.9ns ± 0% -6.32% (p=0.000 n=10+10) S2: name old time/op new time/op delta CondRewrite/SoloJump 11.6ns ± 0% 10.9ns ± 0% -6.03% (p=0.000 n=10+10) CondRewrite/CombJump 11.4ns ± 0% 10.8ns ± 1% -5.53% (p=0.000 n=10+10) S3: name old time/op new time/op delta CondRewrite/SoloJump 7.36ns ± 0% 7.50ns ± 0% +1.79% (p=0.000 n=9+10) CondRewrite/CombJump 7.35ns ± 0% 7.75ns ± 0% +5.51% (p=0.000 n=8+9) S4: name old time/op new time/op delta CondRewrite/SoloJump-224 11.5ns ± 1% 10.9ns ± 0% -4.97% (p=0.000 n=10+10) CondRewrite/CombJump-224 11.9ns ± 0% 11.5ns ± 0% -2.95% (p=0.000 n=10+10) S5: name old time/op new time/op delta CondRewrite/SoloJump 10.0ns ± 0% 10.0ns ± 0% -0.45% (p=0.000 n=9+10) CondRewrite/CombJump 9.93ns ± 0% 9.77ns ± 0% -1.53% (p=0.000 n=10+9) Go1 perf. data: name old time/op new time/op delta BinaryTree17 6.29s ± 1% 6.30s ± 1% ~ (p=1.000 n=5+5) Fannkuch11 5.40s ± 0% 5.40s ± 0% ~ (p=0.841 n=5+5) FmtFprintfEmpty 97.9ns ± 0% 98.9ns ± 3% ~ (p=0.937 n=4+5) FmtFprintfString 171ns ± 3% 171ns ± 2% ~ (p=0.754 n=5+5) FmtFprintfInt 212ns ± 0% 217ns ± 6% +2.55% (p=0.008 n=5+5) FmtFprintfIntInt 296ns ± 1% 297ns ± 2% ~ (p=0.516 n=5+5) FmtFprintfPrefixedInt 371ns ± 2% 374ns ± 7% ~ (p=1.000 n=5+5) FmtFprintfFloat 435ns ± 1% 439ns ± 2% ~ (p=0.056 n=5+5) FmtManyArgs 1.37µs ± 1% 1.36µs ± 1% ~ (p=0.730 n=5+5) GobDecode 14.6ms ± 4% 14.4ms ± 4% ~ (p=0.690 n=5+5) GobEncode 11.8ms ±20% 11.6ms ±15% ~ (p=1.000 n=5+5) Gzip 507ms ± 0% 491ms ± 0% -3.22% (p=0.008 n=5+5) Gunzip 73.8ms ± 0% 73.9ms ± 0% ~ (p=0.690 n=5+5) HTTPClientServer 116µs ± 0% 116µs ± 0% ~ (p=0.686 n=4+4) JSONEncode 21.8ms ± 1% 21.6ms ± 2% ~ (p=0.151 n=5+5) JSONDecode 104ms ± 1% 103ms ± 1% -1.08% (p=0.016 n=5+5) Mandelbrot200 9.53ms ± 0% 9.53ms ± 0% ~ (p=0.421 n=5+5) GoParse 7.55ms ± 1% 7.51ms ± 1% ~ (p=0.151 n=5+5) RegexpMatchEasy0_32 158ns ± 0% 158ns ± 0% ~ (all equal) RegexpMatchEasy0_1K 606ns ± 1% 608ns ± 3% ~ (p=0.937 n=5+5) RegexpMatchEasy1_32 143ns ± 0% 144ns ± 1% ~ (p=0.095 n=5+4) RegexpMatchEasy1_1K 927ns ± 2% 944ns ± 2% ~ (p=0.056 n=5+5) RegexpMatchMedium_32 16.0ns ± 0% 16.0ns ± 0% ~ (all equal) RegexpMatchMedium_1K 69.3µs ± 2% 69.7µs ± 0% ~ (p=0.690 n=5+5) RegexpMatchHard_32 3.73µs ± 0% 3.73µs ± 1% ~ (p=0.984 n=5+5) RegexpMatchHard_1K 111µs ± 1% 110µs ± 0% ~ (p=0.151 n=5+5) Revcomp 1.91s ±47% 1.77s ±68% ~ (p=1.000 n=5+5) Template 138ms ± 1% 138ms ± 1% ~ (p=1.000 n=5+5) TimeParse 787ns ± 2% 785ns ± 1% ~ (p=0.540 n=5+5) TimeFormat 729ns ± 1% 726ns ± 1% ~ (p=0.151 n=5+5) Updates #38740 Change-Id: I06c604874acdc1e63e66452dadee5df053045222 Reviewed-on: https://go-review.googlesource.com/c/go/+/233097 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> |
||
|
601bc41da2 |
cmd/compile: don't emit stack maps for write barrier calls
These are necessarily deeply non-preemptible, so there's no point in emitting stack maps for them. We already mark them as unsafe points, so this only affects the runtime, since user code does not emit stack maps at unsafe points. SSAGenState.PrepareCall also excludes them when it's sanity checking call stack maps. Right now this only drops a handful of unnecessary stack maps from the runtime, but we're about to start emitting stack maps only at calls for user code, too. At that point, this will matter much more. For #36365. Change-Id: Ib3abfedfddc8e724d933a064fa4d573500627990 Reviewed-on: https://go-review.googlesource.com/c/go/+/230542 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
faafdf5115 |
cmd/compile: fix unsafe-points with stack maps
The compiler currently conflates whether a Value has a stack map with whether it's an unsafe point. For the most part, unsafe-points don't have stack maps, so this is mostly fine, but call instructions can be both an unsafe-point *and* have a stack map. For example, none of the instructions in a nosplit function should be preemptible, but calls must still have stack maps in case the called function grows the stack or get preempted. Currently, the compiler can't distinguish this case, so calls in nosplit functions are marked as safe-points just because they have stack maps. This is particularly problematic if a nosplit function calls another nosplit function, since this can introduce a preemption point where there should be none. We realized this was a problem for split-stack prologues a while back, and CL 207349 changed the encoding of unsafe-points to use the register map index instead of the stack map index so we could record both a stack map and an unsafe-point at the same instruction. But this was never extended into the compiler. This CL fixes this problem in the compiler. We make LivenessIndex slightly more abstract by separating unsafe-point marks from stack and register map indexes. We map this to the PCDATA encoding later when producing Progs. This isn't enough to fix the whole problem for nosplit functions, because obj still adds prologues and marks those as preemptible, but it's a step in the right direction. I checked this CL by comparing maps before and after this change in the runtime and net/http. In net/http, unsafe-points match exactly; at anything that isn't an unsafe-point, both the stack and register maps are unchanged by this CL. In the runtime, at every point that was a safe-point before this change, the stack maps agree (and mostly the runtime doesn't have register maps at all now). In both, all CALLs (except write barrier calls) have stack maps. For #36365. Change-Id: I066628938b02e78be5c81a6614295bcf7cc566c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/230541 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
f8ff12d480 |
cmd/compile: use dereference boundedness hint in ssa.addr
Follow-up to (and similar to) CL 228885. Triggers a handful of times in std+cmd. Change-Id: Ie04057ca3974ef9eef669335e326a5ed4b7472cc Reviewed-on: https://go-review.googlesource.com/c/go/+/228999 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
4e550bdacd |
cmd/compile: simplify state.addr
OADDR nodes can't be bounded. All calls to state.addr thus pass false. Remove the argument. Passes toolstash-check. Change-Id: I9a3fcf37f63b2b5094e043d39ab3b857b5090e91 Reviewed-on: https://go-review.googlesource.com/c/go/+/228788 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> |
||
|
e8518731be |
cmd/compile: use dereference boundedness hint during ssa conversion
This has a minor positive effect on generated code, particularly code using type switches. Change-Id: I7269769ab0d861ef6fc9e6d7809ffc3573c68340 Reviewed-on: https://go-review.googlesource.com/c/go/+/228885 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> |
||
|
843453d09e |
cmd/compile: fix misassumption about n.Left.Bounded()
n.Bounded() is overloaded for multiple meanings based on n.Op. We can't safely use n.Left.Bounded() without checking n.Left.Op. Change-Id: I71fe4faa24798dfe3a5705fa3419a35ef93b0ce2 Reviewed-on: https://go-review.googlesource.com/c/go/+/228677 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
||
|
2db4cc38a0 |
cmd/compile: improve generated code for concrete cases in type switches
Consider switch x:= x.(type) { case int: // int stmts case error: // error stmts } Prior to this change, we lowered this roughly as: if x, ok := x.(int); ok { // int stmts } else if x, ok := x.(error); ok { // error stmts } x, ok := x.(error) is implemented with a call to runtime.assertE2I2 or runtime.assertI2I2. x, ok := x.(int) generates inline code that checks whether x has type int, and populates x and ok as appropriate. We then immediately branch again on ok. The shortcircuit pass in the SSA backend is designed to recognize situations like this, in which we are immediately branching on a bool value that we just calculated with a branch. However, the shortcircuit pass has limitations when the intermediate state has phis. In this case, the phi value is x (the int). CL 222923 improved the situation, but many cases are still unhandled. I have further improvements in progress, which is how I found this particular problem, but they are expensive, and may or may not see the light of day. In the common case of a lone concrete type in a type switch case, it is easier and cheaper to simply lower a different way, roughly: if _, ok := x.(int); ok { x := x.(int) // int stmts } Instead of using a type assertion, though, we extract the value of x from the interface directly. This removes the need to track x (the int) across the branch on ok, which removes the phi, which lets the shortcircuit pass do its job. Benchmarks for encoding/binary show improvements, as well as some wild swings on the super fast benchmarks (alignment effects?): name old time/op new time/op delta ReadSlice1000Int32s-8 5.25µs ± 2% 4.87µs ± 3% -7.11% (p=0.000 n=44+49) ReadStruct-8 451ns ± 2% 417ns ± 2% -7.39% (p=0.000 n=45+46) WriteStruct-8 412ns ± 2% 405ns ± 3% -1.58% (p=0.000 n=46+48) ReadInts-8 296ns ± 8% 275ns ± 3% -7.23% (p=0.000 n=48+50) WriteInts-8 324ns ± 1% 318ns ± 2% -1.67% (p=0.000 n=44+49) WriteSlice1000Int32s-8 5.21µs ± 2% 4.92µs ± 1% -5.67% (p=0.000 n=46+44) PutUint16-8 0.58ns ± 2% 0.59ns ± 2% +0.63% (p=0.000 n=49+49) PutUint32-8 0.87ns ± 1% 0.58ns ± 1% -33.10% (p=0.000 n=46+44) PutUint64-8 0.66ns ± 2% 0.87ns ± 2% +33.07% (p=0.000 n=47+48) LittleEndianPutUint16-8 0.86ns ± 2% 0.87ns ± 2% +0.55% (p=0.003 n=47+50) LittleEndianPutUint32-8 0.87ns ± 1% 0.87ns ± 1% ~ (p=0.547 n=45+47) LittleEndianPutUint64-8 0.87ns ± 2% 0.87ns ± 1% ~ (p=0.451 n=46+47) ReadFloats-8 79.8ns ± 5% 75.9ns ± 2% -4.83% (p=0.000 n=50+47) WriteFloats-8 89.3ns ± 1% 88.9ns ± 1% -0.48% (p=0.000 n=46+44) ReadSlice1000Float32s-8 5.51µs ± 1% 4.87µs ± 2% -11.74% (p=0.000 n=47+46) WriteSlice1000Float32s-8 5.51µs ± 1% 4.93µs ± 1% -10.60% (p=0.000 n=48+47) PutUvarint32-8 25.9ns ± 2% 24.0ns ± 2% -7.02% (p=0.000 n=48+50) PutUvarint64-8 75.1ns ± 1% 61.5ns ± 2% -18.12% (p=0.000 n=45+47) [Geo mean] 57.3ns 54.3ns -5.33% Despite the rarity of type switches, this generates noticeably smaller binaries. file before after Δ % addr2line 4413296 4409200 -4096 -0.093% api 5982648 5962168 -20480 -0.342% cgo 4854168 4833688 -20480 -0.422% compile 19694784 19682560 -12224 -0.062% cover 5278008 5265720 -12288 -0.233% doc 4694824 4682536 -12288 -0.262% fix 3411336 3394952 -16384 -0.480% link 6721496 6717400 -4096 -0.061% nm 4371152 4358864 -12288 -0.281% objdump 4760960 4752768 -8192 -0.172% pprof 14810820 14790340 -20480 -0.138% trace 11681076 11668788 -12288 -0.105% vet 8285464 8244504 -40960 -0.494% total 115824120 115627576 -196544 -0.170% Compiler performance is marginally improved (note that go/types has many type switches): name old alloc/op new alloc/op delta Template 35.0MB ± 0% 35.0MB ± 0% +0.09% (p=0.008 n=5+5) Unicode 28.5MB ± 0% 28.5MB ± 0% ~ (p=0.548 n=5+5) GoTypes 114MB ± 0% 114MB ± 0% -0.76% (p=0.008 n=5+5) Compiler 541MB ± 0% 541MB ± 0% -0.03% (p=0.008 n=5+5) SSA 1.17GB ± 0% 1.17GB ± 0% ~ (p=0.841 n=5+5) Flate 21.9MB ± 0% 21.9MB ± 0% ~ (p=0.421 n=5+5) GoParser 26.9MB ± 0% 26.9MB ± 0% ~ (p=0.222 n=5+5) Reflect 74.6MB ± 0% 74.6MB ± 0% ~ (p=1.000 n=5+5) Tar 32.9MB ± 0% 32.8MB ± 0% ~ (p=0.056 n=5+5) XML 42.4MB ± 0% 42.1MB ± 0% -0.77% (p=0.008 n=5+5) [Geo mean] 73.2MB 73.1MB -0.15% name old allocs/op new allocs/op delta Template 377k ± 0% 377k ± 0% +0.06% (p=0.008 n=5+5) Unicode 354k ± 0% 354k ± 0% ~ (p=0.095 n=5+5) GoTypes 1.31M ± 0% 1.30M ± 0% -0.73% (p=0.008 n=5+5) Compiler 5.44M ± 0% 5.44M ± 0% -0.04% (p=0.008 n=5+5) SSA 11.7M ± 0% 11.7M ± 0% ~ (p=1.000 n=5+5) Flate 239k ± 0% 239k ± 0% ~ (p=1.000 n=5+5) GoParser 302k ± 0% 302k ± 0% -0.04% (p=0.008 n=5+5) Reflect 977k ± 0% 977k ± 0% ~ (p=0.690 n=5+5) Tar 346k ± 0% 346k ± 0% ~ (p=0.889 n=5+5) XML 431k ± 0% 430k ± 0% -0.25% (p=0.008 n=5+5) [Geo mean] 806k 806k -0.10% For packages with many type switches, this considerably shrinks function text size. Some examples: file before after Δ % encoding/binary.s 30726 29504 -1222 -3.977% go/printer.s 77597 76005 -1592 -2.052% cmd/vendor/golang.org/x/tools/go/ast/astutil.s 65704 63318 -2386 -3.631% cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8047 7714 -333 -4.138% Text size regressions are rare. Change-Id: Ic10982bbb04876250eaa5bfee97990141ae5fc28 Reviewed-on: https://go-review.googlesource.com/c/go/+/228106 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
1eb66be1b9 |
cmd/compile: enable Sqrt as a compiler intrinsic on riscv64
Change-Id: I829a02ced9aa73b45079e67194186116b39504b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/227805 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
|
ea7126fe14 |
cmd/compile: use a Sym type instead of interface{} for symbolic offsets
Will help with strongly typed rewrite rules. Change-Id: Ifbf316a49f4081322b3b8f13bc962713437d9aba Reviewed-on: https://go-review.googlesource.com/c/go/+/227785 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> |
||
|
28157b3292 |
cmd/compile: start implementing strongly typed aux and auxint fields
Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
||
|
376472ddb7 |
cmd/compile: clean up slice and string offsets/sizes
Minor cleanup: * Modernize comments. * Change from int to int64 to avoid conversions. * Use idiomatic names. Passes toolstash-check. Change-Id: I93560c81926c0f4e00f33129cb4846b53bea99e6 Reviewed-on: https://go-review.googlesource.com/c/go/+/227548 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> |
||
|
b6feb03b24 |
cmd/compile,runtime: pass only ptr and len to some runtime calls
Some runtime calls accept a slice, but only use ptr and len. This change modifies most such routines to accept only ptr and len. After this change, the only runtime calls that accept an unnecessary cap arg are concatstrings and slicerunetostring. Neither is particularly common, and both are complicated to modify. Negligible compiler performance impact. Shrinks binaries a little. There are only a few regressions; the one I investigated was due to register allocation fluctuation. Passes 'go test -race std cmd', modulo #38265 and #38266. Wow, does that take a long time to run. Updates #36890 file before after Δ % compile 19655024 19655152 +128 +0.001% cover 5244840 5236648 -8192 -0.156% dist 3662376 3658280 -4096 -0.112% link 6680056 6675960 -4096 -0.061% pprof 14789844 14777556 -12288 -0.083% test2json 2824744 2820648 -4096 -0.145% trace 11647876 11639684 -8192 -0.070% vet 8260472 8256376 -4096 -0.050% total 115163736 115118808 -44928 -0.039% Change-Id: Idb29fa6a81d6a82bfd3b65740b98cf3275ca0a78 Reviewed-on: https://go-review.googlesource.com/c/go/+/227163 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
bfd569fcb0 |
cmd/compile: delete the floating point Greater and Geq ops
Extend CL 220417 (which removed the integer Greater and Geq ops) to floating point comparisons. Greater and Geq can always be implemented using Less and Leq. Fixes #37316. Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/222397 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
6736b2fdb2 |
cmd/compile: refactor around HTMLWriter removing logger in favor of Func
Replace HTMLWriter's Logger field with a *Func. Implement Fatalf method for HTMLWriter which gets the Frontend() from the Func and calls down into it's Fatalf method, passing the msg and args along. Replace remaining calls to the old Logger with calls to logging methods on the Func. Change-Id: I966342ef9997396f3416fb152fa52d60080ebecb Reviewed-on: https://go-review.googlesource.com/c/go/+/227277 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
|
fff7509d47 |
cmd/compile: add intrinsic HasCPUFeature for checking cpu features
Before using some CPU instructions, we must check for their presence. We use global variables in the runtime package to record features. Prior to this CL, we issued a regular memory load for these features. The downside to this is that, because it is a regular memory load, it cannot be hoisted out of loops or otherwise reordered with other loads. This CL introduces a new intrinsic just for checking cpu features. It still ends up resulting in a memory load, but that memory load can now be floated to the entry block and rematerialized as needed. One downside is that the regular load could be combined with the comparison into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE. (It is possible that MOVBQZX+TESTQ+NE would be better.) This CL does only amd64. It is easy to extend to other architectures. For the benchmark in #36196, on my machine, this offers a mild speedup. name old time/op new time/op delta FMA-8 1.39ns ± 6% 1.29ns ± 9% -7.19% (p=0.000 n=97+96) NonFMA-8 2.03ns ±11% 2.04ns ±12% ~ (p=0.618 n=99+98) Updates #15808 Updates #36196 Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb Reviewed-on: https://go-review.googlesource.com/c/go/+/212360 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> |
||
|
8114242359 |
cmd/compile, runtime: use more registers for amd64 write barrier calls
The compiler-inserted write barrier calls use a special ABI for speed and to minimize the binary size impact. runtime.gcWriteBarrier takes its args in DI and AX. This change adds gcWriteBarrier wrapper functions, varying only in the register used for the second argument. (Allowing variation in the first argument doesn't offer improvements, which is convenient, as it avoids quadratic API growth.) This reduces the number of register copies. The goals are reduced binary size via reduced register pressure/copies. One downside to this change is that when the write barrier is on, we may bounce through several different write barrier wrappers, which is bad for the instruction cache. Package runtime write barrier benchmarks for this change: name old time/op new time/op delta WriteBarrier-8 16.6ns ± 6% 15.6ns ± 6% -5.73% (p=0.000 n=97+99) BulkWriteBarrier-8 4.37ns ± 7% 4.22ns ± 8% -3.45% (p=0.000 n=96+99) However, I don't particularly trust these numbers. I ran runtime.BenchmarkWriteBarrier multiple times as I rebased this change, and noticed that the results have high variance depending on the parent change, perhaps due to aligment. This change was stress tested with GOGC=1 GODEBUG=gccheckmark=1 go test std. This change reduces binary sizes: file before after Δ % addr2line 4308720 4296688 -12032 -0.279% api 5965592 5945368 -20224 -0.339% asm 5148088 5025464 -122624 -2.382% buildid 2848760 2844904 -3856 -0.135% cgo 4828968 4812840 -16128 -0.334% compile 19754720 19529744 -224976 -1.139% cover 5256840 5236600 -20240 -0.385% dist 3670312 3658264 -12048 -0.328% doc 4669608 4657576 -12032 -0.258% fix 3377976 3365944 -12032 -0.356% link 6614888 6586472 -28416 -0.430% nm 4258368 4254528 -3840 -0.090% objdump 4656336 4644304 -12032 -0.258% pack 2295176 2295432 +256 +0.011% pprof 14762356 14709364 -52992 -0.359% test2json 2824456 2820600 -3856 -0.137% trace 11684404 11643700 -40704 -0.348% vet 8284760 8252248 -32512 -0.392% total 115210328 114580040 -630288 -0.547% This change improves compiler performance: name old time/op new time/op delta Template 208ms ± 3% 207ms ± 3% -0.40% (p=0.030 n=43+44) Unicode 80.2ms ± 3% 81.3ms ± 3% +1.25% (p=0.000 n=41+44) GoTypes 699ms ± 3% 694ms ± 2% -0.71% (p=0.016 n=42+37) Compiler 3.26s ± 2% 3.23s ± 2% -0.86% (p=0.000 n=43+45) SSA 6.97s ± 1% 6.93s ± 1% -0.63% (p=0.000 n=43+45) Flate 134ms ± 3% 133ms ± 2% ~ (p=0.139 n=45+42) GoParser 165ms ± 2% 164ms ± 1% -0.79% (p=0.000 n=45+40) Reflect 434ms ± 4% 435ms ± 4% ~ (p=0.937 n=44+44) Tar 181ms ± 2% 181ms ± 2% ~ (p=0.702 n=43+45) XML 244ms ± 2% 244ms ± 2% ~ (p=0.237 n=45+44) [Geo mean] 403ms 402ms -0.29% name old user-time/op new user-time/op delta Template 271ms ± 2% 268ms ± 1% -1.40% (p=0.000 n=42+42) Unicode 117ms ± 3% 116ms ± 5% ~ (p=0.066 n=45+45) GoTypes 948ms ± 2% 936ms ± 2% -1.30% (p=0.000 n=41+40) Compiler 4.26s ± 1% 4.21s ± 2% -1.25% (p=0.000 n=37+45) SSA 9.52s ± 2% 9.41s ± 1% -1.18% (p=0.000 n=44+45) Flate 167ms ± 2% 165ms ± 2% -1.15% (p=0.000 n=44+41) GoParser 201ms ± 2% 198ms ± 1% -1.40% (p=0.000 n=43+43) Reflect 563ms ± 8% 560ms ± 7% ~ (p=0.206 n=45+44) Tar 224ms ± 2% 222ms ± 2% -0.81% (p=0.000 n=45+45) XML 308ms ± 2% 304ms ± 1% -1.17% (p=0.000 n=42+43) [Geo mean] 525ms 519ms -1.08% name old alloc/op new alloc/op delta Template 36.3MB ± 0% 36.3MB ± 0% ~ (p=0.421 n=5+5) Unicode 28.4MB ± 0% 28.3MB ± 0% ~ (p=0.056 n=5+5) GoTypes 121MB ± 0% 121MB ± 0% -0.14% (p=0.008 n=5+5) Compiler 567MB ± 0% 567MB ± 0% -0.06% (p=0.016 n=4+5) SSA 1.26GB ± 0% 1.26GB ± 0% -0.07% (p=0.008 n=5+5) Flate 22.9MB ± 0% 22.8MB ± 0% ~ (p=0.310 n=5+5) GoParser 28.0MB ± 0% 27.9MB ± 0% -0.09% (p=0.008 n=5+5) Reflect 78.4MB ± 0% 78.4MB ± 0% -0.03% (p=0.008 n=5+5) Tar 34.2MB ± 0% 34.2MB ± 0% -0.05% (p=0.008 n=5+5) XML 44.4MB ± 0% 44.4MB ± 0% -0.04% (p=0.016 n=5+5) [Geo mean] 76.4MB 76.3MB -0.05% name old allocs/op new allocs/op delta Template 356k ± 0% 356k ± 0% -0.13% (p=0.008 n=5+5) Unicode 326k ± 0% 326k ± 0% -0.07% (p=0.008 n=5+5) GoTypes 1.24M ± 0% 1.24M ± 0% -0.24% (p=0.008 n=5+5) Compiler 5.30M ± 0% 5.28M ± 0% -0.34% (p=0.008 n=5+5) SSA 11.9M ± 0% 11.9M ± 0% -0.16% (p=0.008 n=5+5) Flate 226k ± 0% 225k ± 0% -0.12% (p=0.008 n=5+5) GoParser 287k ± 0% 286k ± 0% -0.29% (p=0.008 n=5+5) Reflect 930k ± 0% 929k ± 0% -0.05% (p=0.008 n=5+5) Tar 332k ± 0% 331k ± 0% -0.12% (p=0.008 n=5+5) XML 411k ± 0% 411k ± 0% -0.12% (p=0.008 n=5+5) [Geo mean] 771k 770k -0.16% For some packages, this change significantly reduces the size of executable text. Examples: file before after Δ % cmd/internal/obj/arm.s 68658 66855 -1803 -2.626% cmd/internal/obj/mips.s 57486 56272 -1214 -2.112% cmd/internal/obj/arm64.s 152107 147163 -4944 -3.250% cmd/internal/obj/ppc64.s 125544 120456 -5088 -4.053% cmd/vendor/golang.org/x/tools/go/cfg.s 31699 30742 -957 -3.019% Full listing: file before after Δ % container/ring.s 1890 1870 -20 -1.058% container/list.s 5366 5390 +24 +0.447% internal/cpu.s 3298 3295 -3 -0.091% internal/testlog.s 1507 1501 -6 -0.398% image/color.s 8281 8248 -33 -0.399% runtime.s 480970 480075 -895 -0.186% sync.s 16497 16408 -89 -0.539% internal/singleflight.s 2591 2577 -14 -0.540% math/rand.s 10456 10438 -18 -0.172% cmd/go/internal/par.s 2801 2790 -11 -0.393% internal/reflectlite.s 28477 28417 -60 -0.211% errors.s 2750 2736 -14 -0.509% internal/oserror.s 446 434 -12 -2.691% sort.s 17061 17046 -15 -0.088% io.s 17063 16999 -64 -0.375% vendor/golang.org/x/crypto/hkdf.s 1962 1936 -26 -1.325% text/tabwriter.s 9617 9574 -43 -0.447% hash/crc64.s 3414 3408 -6 -0.176% hash/crc32.s 6657 6651 -6 -0.090% bytes.s 31932 31863 -69 -0.216% strconv.s 53158 52799 -359 -0.675% strings.s 42829 42665 -164 -0.383% encoding/ascii85.s 4833 4791 -42 -0.869% vendor/golang.org/x/text/transform.s 16810 16724 -86 -0.512% path.s 6848 6845 -3 -0.044% encoding/base32.s 9658 9592 -66 -0.683% bufio.s 23051 22908 -143 -0.620% compress/bzip2.s 11773 11764 -9 -0.076% image.s 37565 37502 -63 -0.168% syscall.s 82359 82279 -80 -0.097% regexp/syntax.s 83573 82930 -643 -0.769% image/jpeg.s 36535 36490 -45 -0.123% regexp.s 64396 64214 -182 -0.283% time.s 82724 82622 -102 -0.123% plugin.s 6539 6536 -3 -0.046% context.s 10959 10865 -94 -0.858% internal/poll.s 24286 24270 -16 -0.066% reflect.s 168304 167927 -377 -0.224% internal/fmtsort.s 7416 7376 -40 -0.539% os.s 52465 51787 -678 -1.292% cmd/go/internal/lockedfile/internal/filelock.s 2326 2317 -9 -0.387% os/signal.s 4657 4648 -9 -0.193% runtime/debug.s 6040 5998 -42 -0.695% encoding/binary.s 30838 30801 -37 -0.120% vendor/golang.org/x/net/route.s 23694 23491 -203 -0.857% path/filepath.s 17895 17889 -6 -0.034% cmd/vendor/golang.org/x/sys/unix.s 78125 78109 -16 -0.020% io/ioutil.s 6999 6996 -3 -0.043% encoding/base64.s 12094 12007 -87 -0.719% crypto/cipher.s 20466 20372 -94 -0.459% cmd/go/internal/robustio.s 2672 2669 -3 -0.112% encoding/pem.s 9302 9286 -16 -0.172% internal/obscuretestdata.s 1719 1695 -24 -1.396% crypto/aes.s 11014 11002 -12 -0.109% os/exec.s 29388 29231 -157 -0.534% cmd/internal/browser.s 2266 2260 -6 -0.265% internal/goroot.s 4601 4592 -9 -0.196% vendor/golang.org/x/crypto/chacha20poly1305.s 8945 8942 -3 -0.034% cmd/vendor/golang.org/x/crypto/ssh/terminal.s 27226 27195 -31 -0.114% index/suffixarray.s 36431 36411 -20 -0.055% fmt.s 77017 76709 -308 -0.400% encoding/hex.s 6241 6154 -87 -1.394% compress/lzw.s 7133 7069 -64 -0.897% database/sql/driver.s 18888 18877 -11 -0.058% net/url.s 29838 29739 -99 -0.332% debug/plan9obj.s 8329 8279 -50 -0.600% encoding/csv.s 12986 12902 -84 -0.647% debug/gosym.s 25403 25330 -73 -0.287% compress/flate.s 51192 50970 -222 -0.434% vendor/golang.org/x/net/dns/dnsmessage.s 86769 86208 -561 -0.647% compress/gzip.s 9791 9758 -33 -0.337% compress/zlib.s 7310 7277 -33 -0.451% archive/zip.s 42356 42166 -190 -0.449% debug/dwarf.s 108259 107730 -529 -0.489% encoding/json.s 106378 105910 -468 -0.440% os/user.s 14751 14724 -27 -0.183% database/sql.s 99011 98404 -607 -0.613% log.s 9466 9423 -43 -0.454% debug/pe.s 31272 31182 -90 -0.288% debug/macho.s 32764 32608 -156 -0.476% encoding/gob.s 136976 136517 -459 -0.335% vendor/golang.org/x/text/unicode/bidi.s 27318 27276 -42 -0.154% archive/tar.s 71416 70975 -441 -0.618% vendor/golang.org/x/net/http2/hpack.s 23892 23848 -44 -0.184% vendor/golang.org/x/text/secure/bidirule.s 3354 3351 -3 -0.089% mime/quotedprintable.s 5960 5925 -35 -0.587% net/http/internal.s 5874 5853 -21 -0.358% math/big.s 184147 183692 -455 -0.247% debug/elf.s 63775 63567 -208 -0.326% mime.s 39802 39709 -93 -0.234% encoding/xml.s 111038 110713 -325 -0.293% crypto/dsa.s 6044 6029 -15 -0.248% go/token.s 12139 12077 -62 -0.511% crypto/rand.s 6889 6866 -23 -0.334% go/scanner.s 19030 19008 -22 -0.116% flag.s 22320 22236 -84 -0.376% vendor/golang.org/x/text/unicode/norm.s 66652 66391 -261 -0.392% crypto/rsa.s 31671 31650 -21 -0.066% crypto/elliptic.s 51553 51403 -150 -0.291% internal/xcoff.s 22950 22822 -128 -0.558% go/constant.s 43750 43689 -61 -0.139% encoding/asn1.s 57086 57035 -51 -0.089% runtime/trace.s 2609 2603 -6 -0.230% crypto/x509/pkix.s 10458 10471 +13 +0.124% image/gif.s 27544 27385 -159 -0.577% vendor/golang.org/x/net/idna.s 24558 24502 -56 -0.228% image/png.s 42775 42685 -90 -0.210% vendor/golang.org/x/crypto/cryptobyte.s 33616 33493 -123 -0.366% go/ast.s 80684 80449 -235 -0.291% net/internal/socktest.s 16571 16535 -36 -0.217% crypto/ecdsa.s 11948 11936 -12 -0.100% text/template/parse.s 95138 94002 -1136 -1.194% runtime/pprof.s 59702 59639 -63 -0.106% testing.s 68427 68088 -339 -0.495% internal/testenv.s 5620 5596 -24 -0.427% testing/internal/testdeps.s 3312 3294 -18 -0.543% internal/trace.s 78473 78239 -234 -0.298% testing/iotest.s 4968 4908 -60 -1.208% os/signal/internal/pty.s 3011 2990 -21 -0.697% testing/quick.s 12179 12125 -54 -0.443% cmd/internal/bio.s 9286 9274 -12 -0.129% cmd/internal/src.s 17684 17663 -21 -0.119% cmd/internal/goobj2.s 12588 12558 -30 -0.238% cmd/internal/objabi.s 16408 16390 -18 -0.110% go/printer.s 77417 77308 -109 -0.141% go/parser.s 80045 79113 -932 -1.164% go/format.s 5434 5419 -15 -0.276% cmd/internal/goobj.s 26146 25954 -192 -0.734% runtime/pprof/internal/profile.s 102518 102178 -340 -0.332% text/template.s 95343 94935 -408 -0.428% cmd/internal/dwarf.s 31718 31572 -146 -0.460% cmd/vendor/golang.org/x/arch/arm/armasm.s 45240 45151 -89 -0.197% internal/lazytemplate.s 1470 1457 -13 -0.884% cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s 37253 37220 -33 -0.089% cmd/asm/internal/flags.s 2593 2590 -3 -0.116% cmd/asm/internal/lex.s 25068 24921 -147 -0.586% cmd/internal/buildid.s 18536 18263 -273 -1.473% cmd/vendor/golang.org/x/arch/x86/x86asm.s 80209 80105 -104 -0.130% go/doc.s 75140 74585 -555 -0.739% cmd/internal/edit.s 3893 3899 +6 +0.154% html/template.s 89377 88809 -568 -0.636% cmd/vendor/golang.org/x/arch/arm64/arm64asm.s 117998 117824 -174 -0.147% cmd/internal/obj.s 115015 114290 -725 -0.630% go/build.s 69379 68862 -517 -0.745% cmd/internal/objfile.s 48106 47982 -124 -0.258% cmd/cover.s 46239 46113 -126 -0.272% cmd/addr2line.s 2845 2833 -12 -0.422% cmd/internal/obj/arm.s 68658 66855 -1803 -2.626% cmd/internal/obj/mips.s 57486 56272 -1214 -2.112% cmd/internal/obj/riscv.s 63834 63006 -828 -1.297% cmd/compile/internal/syntax.s 146582 145456 -1126 -0.768% cmd/internal/obj/wasm.s 44117 44066 -51 -0.116% cmd/cgo.s 242645 241653 -992 -0.409% cmd/internal/obj/arm64.s 152107 147163 -4944 -3.250% net.s 295972 292010 -3962 -1.339% go/types.s 321371 319432 -1939 -0.603% vendor/golang.org/x/net/http/httpproxy.s 9450 9423 -27 -0.286% net/textproto.s 19455 19406 -49 -0.252% cmd/internal/obj/ppc64.s 125544 120456 -5088 -4.053% go/internal/srcimporter.s 6475 6409 -66 -1.019% log/syslog.s 8017 7929 -88 -1.098% cmd/compile/internal/logopt.s 10183 10162 -21 -0.206% net/mail.s 24085 23948 -137 -0.569% mime/multipart.s 21527 21420 -107 -0.497% cmd/internal/obj/s390x.s 127610 127757 +147 +0.115% go/internal/gcimporter.s 34913 34548 -365 -1.045% vendor/golang.org/x/net/nettest.s 28103 28016 -87 -0.310% cmd/go/internal/cfg.s 9967 9916 -51 -0.512% cmd/api.s 39703 39603 -100 -0.252% go/internal/gccgoimporter.s 56470 56120 -350 -0.620% go/importer.s 2077 2056 -21 -1.011% cmd/compile/internal/types.s 48202 47282 -920 -1.909% cmd/go/internal/str.s 4341 4320 -21 -0.484% cmd/internal/obj/x86.s 89440 88625 -815 -0.911% cmd/go/internal/base.s 12667 12580 -87 -0.687% cmd/go/internal/cache.s 30754 30571 -183 -0.595% cmd/doc.s 62976 62755 -221 -0.351% cmd/go/internal/search.s 20114 19993 -121 -0.602% cmd/vendor/golang.org/x/xerrors.s 17923 17855 -68 -0.379% cmd/go/internal/lockedfile.s 16451 16415 -36 -0.219% cmd/vendor/golang.org/x/mod/sumdb/note.s 18200 18150 -50 -0.275% cmd/vendor/golang.org/x/mod/module.s 17869 17851 -18 -0.101% cmd/asm/internal/arch.s 37533 37482 -51 -0.136% cmd/fix.s 87728 87492 -236 -0.269% cmd/vendor/golang.org/x/mod/sumdb/tlog.s 36394 36367 -27 -0.074% cmd/vendor/golang.org/x/mod/sumdb/dirhash.s 4990 4963 -27 -0.541% cmd/go/internal/imports.s 16499 16469 -30 -0.182% cmd/vendor/golang.org/x/mod/zip.s 18816 18745 -71 -0.377% cmd/go/internal/cmdflag.s 5126 5123 -3 -0.059% cmd/internal/test2json.s 9540 9452 -88 -0.922% cmd/go/internal/tool.s 3629 3623 -6 -0.165% cmd/go/internal/version.s 11232 11220 -12 -0.107% cmd/go/internal/mvs.s 25383 25179 -204 -0.804% cmd/nm.s 5815 5803 -12 -0.206% cmd/dist.s 210146 209140 -1006 -0.479% cmd/asm/internal/asm.s 68655 68549 -106 -0.154% cmd/vendor/golang.org/x/mod/modfile.s 72974 72510 -464 -0.636% cmd/go/internal/load.s 107548 106861 -687 -0.639% cmd/link/internal/sym.s 18708 18581 -127 -0.679% cmd/asm.s 3367 3343 -24 -0.713% cmd/gofmt.s 30795 30698 -97 -0.315% cmd/link/internal/objfile.s 21828 21630 -198 -0.907% cmd/pack.s 14878 14869 -9 -0.060% cmd/vendor/github.com/google/pprof/internal/elfexec.s 6788 6782 -6 -0.088% cmd/test2json.s 1647 1641 -6 -0.364% cmd/link/internal/loader.s 48677 48483 -194 -0.399% cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s 16783 16773 -10 -0.060% cmd/link/internal/loadelf.s 35464 35126 -338 -0.953% cmd/link/internal/loadmacho.s 29438 29180 -258 -0.876% cmd/link/internal/loadpe.s 16440 16371 -69 -0.420% cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2106 2100 -6 -0.285% cmd/link/internal/loadxcoff.s 11711 11615 -96 -0.820% cmd/vendor/golang.org/x/tools/go/analysis/internal/facts.s 14954 14883 -71 -0.475% cmd/vendor/golang.org/x/tools/go/ast/inspector.s 5394 5374 -20 -0.371% cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s 37029 36822 -207 -0.559% cmd/vendor/golang.org/x/tools/go/analysis/passes/inspect.s 340 337 -3 -0.882% cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s 9919 9858 -61 -0.615% cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s 6705 6690 -15 -0.224% cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s 9783 9741 -42 -0.429% cmd/vendor/golang.org/x/tools/go/cfg.s 31699 30742 -957 -3.019% cmd/vendor/golang.org/x/tools/go/analysis/passes/ifaceassert.s 2768 2762 -6 -0.217% cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s 3031 2998 -33 -1.089% cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s 4382 4376 -6 -0.137% cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s 8654 8642 -12 -0.139% cmd/vendor/golang.org/x/tools/go/analysis/passes/stringintconv.s 3458 3446 -12 -0.347% cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s 8011 7995 -16 -0.200% cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s 6205 6193 -12 -0.193% cmd/vendor/golang.org/x/tools/go/ast/astutil.s 66183 65861 -322 -0.487% cmd/vendor/github.com/google/pprof/profile.s 150844 150261 -583 -0.386% cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8057 8054 -3 -0.037% cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s 3670 3667 -3 -0.082% cmd/vendor/github.com/google/pprof/internal/measurement.s 10464 10440 -24 -0.229% cmd/vendor/golang.org/x/tools/go/types/typeutil.s 12319 12274 -45 -0.365% cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s 13503 13342 -161 -1.192% cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s 5261 5218 -43 -0.817% cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s 1462 1459 -3 -0.205% cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s 9594 9582 -12 -0.125% cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s 34397 34338 -59 -0.172% cmd/vendor/github.com/google/pprof/internal/graph.s 53225 52936 -289 -0.543% cmd/vendor/github.com/ianlancetaylor/demangle.s 177450 175329 -2121 -1.195% crypto/x509.s 147892 147388 -504 -0.341% cmd/go/internal/work.s 306465 304950 -1515 -0.494% cmd/go/internal/run.s 4664 4657 -7 -0.150% crypto/tls.s 313130 311833 -1297 -0.414% net/http/httptrace.s 3979 3905 -74 -1.860% net/smtp.s 14413 14344 -69 -0.479% cmd/link/internal/ld.s 545343 542279 -3064 -0.562% cmd/link/internal/mips.s 6218 6215 -3 -0.048% cmd/link/internal/mips64.s 6108 6103 -5 -0.082% cmd/link/internal/amd64.s 18154 18112 -42 -0.231% cmd/link/internal/arm64.s 22527 22494 -33 -0.146% cmd/link/internal/arm.s 22574 22494 -80 -0.354% cmd/link/internal/s390x.s 20779 20746 -33 -0.159% cmd/link/internal/wasm.s 16531 16493 -38 -0.230% cmd/link/internal/x86.s 18906 18849 -57 -0.301% cmd/link/internal/ppc64.s 26856 26778 -78 -0.290% net/http.s 559101 556513 -2588 -0.463% net/http/cookiejar.s 15912 15885 -27 -0.170% expvar.s 9531 9525 -6 -0.063% net/http/httptest.s 16616 16475 -141 -0.849% net/http/cgi.s 23624 23458 -166 -0.703% cmd/go/internal/web.s 16546 16489 -57 -0.344% cmd/vendor/golang.org/x/mod/sumdb.s 33197 33117 -80 -0.241% net/http/fcgi.s 19266 19169 -97 -0.503% net/http/httputil.s 39875 39728 -147 -0.369% cmd/vendor/github.com/google/pprof/internal/symbolz.s 5888 5867 -21 -0.357% net/rpc.s 34154 34003 -151 -0.442% cmd/vendor/github.com/google/pprof/internal/transport.s 2746 2716 -30 -1.092% cmd/vendor/github.com/google/pprof/internal/binutils.s 35999 35875 -124 -0.344% net/rpc/jsonrpc.s 6637 6598 -39 -0.588% cmd/vendor/github.com/google/pprof/internal/symbolizer.s 11533 11458 -75 -0.650% cmd/go/internal/get.s 62921 62803 -118 -0.188% cmd/vendor/github.com/google/pprof/internal/report.s 80364 80058 -306 -0.381% cmd/go/internal/modfetch/codehost.s 89680 89066 -614 -0.685% cmd/trace.s 117171 116701 -470 -0.401% cmd/vendor/github.com/google/pprof/internal/driver.s 144268 143297 -971 -0.673% cmd/go/internal/modfetch.s 126299 125860 -439 -0.348% cmd/vendor/github.com/google/pprof/driver.s 9042 9000 -42 -0.464% cmd/go/internal/modconv.s 17947 17889 -58 -0.323% cmd/pprof.s 12399 12326 -73 -0.589% cmd/go/internal/modload.s 151182 150389 -793 -0.525% cmd/go/internal/generate.s 11738 11636 -102 -0.869% cmd/go/internal/help.s 6571 6531 -40 -0.609% cmd/go/internal/clean.s 11174 11142 -32 -0.286% cmd/go/internal/vet.s 7897 7867 -30 -0.380% cmd/go/internal/envcmd.s 22176 22095 -81 -0.365% cmd/go/internal/list.s 15216 15067 -149 -0.979% cmd/go/internal/modget.s 38698 38519 -179 -0.463% cmd/go/internal/modcmd.s 46674 46441 -233 -0.499% cmd/go/internal/test.s 64664 64456 -208 -0.322% cmd/go.s 6730 6703 -27 -0.401% cmd/compile/internal/ssa.s 3592565 3582500 -10065 -0.280% cmd/compile/internal/gc.s 1549123 1537123 -12000 -0.775% cmd/compile/internal/riscv64.s 14579 14483 -96 -0.658% cmd/compile/internal/mips.s 20578 20419 -159 -0.773% cmd/compile/internal/ppc64.s 25524 25359 -165 -0.646% cmd/compile/internal/mips64.s 19795 19636 -159 -0.803% cmd/compile/internal/wasm.s 13329 13290 -39 -0.293% cmd/compile/internal/s390x.s 28097 27892 -205 -0.730% cmd/compile/internal/arm.s 31489 31321 -168 -0.534% cmd/compile/internal/arm64.s 29803 29590 -213 -0.715% cmd/compile/internal/amd64.s 32961 33221 +260 +0.789% cmd/compile/internal/x86.s 31029 30878 -151 -0.487% total 18534966 18440341 -94625 -0.511% Change-Id: I830d37364f14f0297800adc42c99f60a74c51aca Reviewed-on: https://go-review.googlesource.com/c/go/+/226367 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
efb0ac4ce6 |
cmd/compile: provide Add/Cas/Exchange atomic intrinsics on riscv64
Provide Add32, Add64, Cas32, Cas64, Exchange32 and Exchange64 atomic intrinsics on riscv64. Updates #36765 Change-Id: I9a3b7d2ce3d49f699171fd76a0fed891d149a6bb Reviewed-on: https://go-review.googlesource.com/c/go/+/223559 Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
ade988623e |
cmd/compile: provide Load32/Load64/Store32/Store64 atomic intrinsics on riscv64
Updates #36765 Change-Id: Id5ce5c5f60112e4f4cf9eec1b1ec120994934950 Reviewed-on: https://go-review.googlesource.com/c/go/+/223558 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
|
8c30971da6 |
cmd/compile: panic if trying to alias an intrinsic with no definitions
Currently if we try to alias an intrinsic which hasn't been defined for any architecture (such as by accidentally creating the alias before the intrinsic is created with addF), then we'll just silently not apply any intrinsics to those aliases. Catch this particular case by panicking in alias if we try to apply the alias and it did nothing. Change-Id: I98e75fc3f7206b08fc9267cedb8db3e109ec4f5d Reviewed-on: https://go-review.googlesource.com/c/go/+/224637 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
830ee4792e |
cmd/compile: declare runtime bit func aliases after math/bits intrinsics
Currently runtime/internal/sys bit-manipulation functions are aliased to math/bits functions, which are intrinsified. Unfortunately these aliases are declared before the intrinsified versions are generated, resulting in the generic version of the code being copied over. This change moves the aliases for bit operations in runtime/internal/sys after the addF calls to generate those intrinsics in SSA, so that the intrinsified SSA representation of those functions actually get copied over. This should improve the overall performance of the runtime (especially the page allocator) since these bit operations will actually be intrinsified now. Change-Id: I4377da13f9a7bb6aee608e50df0297148bf8f806 Reviewed-on: https://go-review.googlesource.com/c/go/+/224437 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
2e918c3aab |
cmd/compile: provide Load8/Store8 atomic intrinsics on riscv64
Updates #36765 Change-Id: Ieeb6bbc54e4841a1348ad50e80342ec4bc675e07 Reviewed-on: https://go-review.googlesource.com/c/go/+/223557 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
|
877ef86bec |
cmd/compile: add spectre mitigation mode enabled by -spectre
This commit adds a new cmd/compile flag -spectre, which accepts a comma-separated list of possible Spectre mitigations to apply, or the empty string (none), or "all". The only known mitigation right now is "index", which uses conditional moves to ensure that x86-64 CPUs do not speculate past index bounds checks. Speculating past index bounds checks may be problematic on systems running privileged servers that accept requests from untrusted users who can execute their own programs on the same machine. (And some more constraints that make it even more unlikely in practice.) The cases this protects against are analogous to the ones Microsoft explains in the "Array out of bounds load/store feeding ..." sections here: https://docs.microsoft.com/en-us/cpp/security/developer-guidance-speculative-execution?view=vs-2019#array-out-of-bounds-load-feeding-an-indirect-branch Change-Id: Ib7532d7e12466b17e04c4e2075c2a456dc98f610 Reviewed-on: https://go-review.googlesource.com/c/go/+/222660 Reviewed-by: Keith Randall <khr@golang.org> |
||
|
7e1028a9ff |
cmd/compile: avoid range over copy of array
Passes toostash-check. Slightly reduce compiler binary size: file before after Δ % compile 21087288 21070776 -16512 -0.078% total 131847020 131830508 -16512 -0.013% file before after Δ % cmd/compile/internal/gc.a 9007472 8999640 -7832 -0.087% total 127117794 127109962 -7832 -0.006% Change-Id: I4aadd68d0a7545770598bed9d3a4d05899b67b52 Reviewed-on: https://go-review.googlesource.com/c/go/+/205777 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
7b0b6c2f7e |
cmd/compile: simplify converted SSA form for 'if false'
The goal here is to make it easier for a human to examine the SSA when a function contains lots of dead code. No significant compiler metric or generated code differences. Change-Id: I81915fa4639bc8820cc9a5e45e526687d0d1f57a Reviewed-on: https://go-review.googlesource.com/c/go/+/221791 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
ab7ecea0c8 |
cmd/compile: add intrinsics for runtime/internal/math on MIPS64x
name old time/op new time/op delta MulUintptr/small 8.42ns ± 0% 5.93ns ± 0% -29.66% (p=0.000 n=9+10) MulUintptr/large 11.1ns ± 0% 7.4ns ± 0% -33.17% (p=0.000 n=10+9) Change-Id: I6659a886389660461fc2c90bd248243f6e7c29d5 Reviewed-on: https://go-review.googlesource.com/c/go/+/210897 Run-TryBot: Meng Zhuo <mengzhuo1203@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
2962c96c9f |
cmd/compile: lower float to uint conversions on s390x
Add rules for lowering float <-> unsigned int on s390x. During compilation, Cvt64Uto64F rule triggers around 80 times, Cvt64Fto64U rule triggers around 20 times, Cvt64Uto32F rule triggers around 5 times. Change-Id: If4c9d128b9132fce8c0bea9abc09cb43a5df7989 Reviewed-on: https://go-review.googlesource.com/c/go/+/209177 Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
|
cb74dcc172 |
cmd/compile: remove Greater* and Geq* generic integer ops
The generic Greater and Geq ops can always be replaced with the Less and Leq ops. This CL therefore removes them. This simplifies the compiler since it reduces the number of operations that need handling in both code and in rewrite rules. This will be especially true when adding control flow optimizations such as the integer-in-range optimizations in CL 165998. Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040 Reviewed-on: https://go-review.googlesource.com/c/go/+/220417 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
2e4f490b31 |
cmd/compile,cmd/link: fix and re-enable open-coded defers on riscv64
The R_CALLRISCV relocation marker is on the JALR instruction, however the actual relocation is currently two instructions previous for the AUIPC+ADDI sequence. Adjust the platform dependent offset accordingly and re-enable open-coded defers. Fixes #36786. Change-Id: I71597c193c447930fbe94ce44b7355e89ae877bb Reviewed-on: https://go-review.googlesource.com/c/go/+/216797 Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
a858d15f11 |
cmd/compile: disable open-coded defers on riscv64
Open-coded defers are currently broken on riscv64 - disable them for the time being. All of the standard package tests now pass on linux/riscv64. Updates issue #27532 and #36786 Change-Id: I20fc25ce91dfad48be32409ba5c64ca9a6acef1d Reviewed-on: https://go-review.googlesource.com/c/go/+/216517 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Dan Scales <danscales@google.com> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
|
98d2717499 |
cmd/compile: implement compiler for riscv64
Based on riscv-go port. Updates #27532 Change-Id: Ia329daa243db63ff334053b8807ea96b97ce3acf Reviewed-on: https://go-review.googlesource.com/c/go/+/204631 Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
a037582eff |
cmd/compile: mark empty block preemptible
Currently, a block's control instruction gets the liveness info of the last Value in the block. However, for an empty block, the control instruction gets the invalid liveness info and therefore not preemptible. One example is empty infinite loop, which has only a control instruction. The control instruction being non- preemptible makes the whole loop non-preemptible. Fix this by using a different, preemptible liveness info for empty block's control. We can choose an arbitrary preemptible liveness info, as at run time we don't really use the liveness map at that instruction. As before, if the last Value in the block is non-preemptible, so is the block control. For example, the conditional branch in the write barrier test block is still non-preemptible. Also, only update liveness info if we are actually emitting instructions. So zero-width Values' liveness info (which are always invalid) won't affect the block control's liveness info. For example, if the last Values in a block is a tuple-generating operation and a Select, the block control instruction is still preemptible. Fixes #35923. Change-Id: Ic5225f3254b07e4955f7905329b544515907642b Reviewed-on: https://go-review.googlesource.com/c/go/+/209659 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> |
||
|
0e02cfb369 |
cmd/compile: try harder to not use an empty src.XPos for a bogus line
The fix for #35652 did not guarantee that it was using a non-empty src position to replace an empty one. The new code checks again and falls back to a more certain position. (The input in question compiles to a single empty infinite loop, and none of the actual instructions had any source position at all. That is a bug, but given the pathology of this input, not one worth dealing with this late in the release cycle, if ever.) Literally: 00000 (5) TEXT "".f(SB), ABIInternal 00001 (5) PCDATA $0, $-2 00002 (5) PCDATA $1, $-2 00003 (5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 00004 (5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 00005 (5) FUNCDATA $2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) b2 00006 (?) XCHGL AX, AX b6 00007 (+1048575) JMP 6 00008 (?) END TODO: Add runtime.InfiniteLoop(), replace infinite loops with a call to that, and use an eco-friendly runtime.gopark instead. (This was Cherry's excellent idea.) Updates #35652 Fixes #35695 Change-Id: I4b9a841142ee4df0f6b10863cfa0721a7e13b437 Reviewed-on: https://go-review.googlesource.com/c/go/+/207964 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
9bba63bbbe |
cmd/compile: make a better bogus line for empty infinite loops
The old recipe for making an infinite loop not be infinite in the debugger could create an instruction (Prog) with a line number not tied to any file (index == 0). This caused downstream failures in DWARF processing. So don't do that. Also adds a test, also adds a check+panic to ensure that the next time this happens the error is less mystifying. Fixes #35652 Change-Id: I04f30bc94fdc4aef20dd9130561303ff84fd945e Reviewed-on: https://go-review.googlesource.com/c/go/+/207613 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
b3885dbc93 |
cmd/compile, runtime: intrinsify atomic And8 and Or8 on s390x
Intrinsify these functions to match other platforms. Update the sequence of instructions used in the assembly implementations to match the intrinsics. Also, add a micro benchmark so we can more easily measure the performance of these two functions: name old time/op new time/op delta And8-8 5.33ns ± 7% 2.55ns ± 8% -52.12% (p=0.000 n=20+20) And8Parallel-8 7.39ns ± 5% 3.74ns ± 4% -49.34% (p=0.000 n=20+20) Or8-8 4.84ns ±15% 2.64ns ±11% -45.50% (p=0.000 n=20+20) Or8Parallel-8 7.27ns ± 3% 3.84ns ± 4% -47.10% (p=0.000 n=19+20) By using a 'rotate then xor selected bits' instruction combined with either a 'load and and' or a 'load and or' instruction we can implement And8 and Or8 with far fewer instructions. Replacing 'compare and swap' with atomic instructions may also improve performance when there is contention. Change-Id: I28bb8032052b73ae8ccdf6e4c612d2877085fa01 Reviewed-on: https://go-review.googlesource.com/c/go/+/204277 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
||
|
f07059d949 |
cmd/compile: rename sizeof_Array and array_* to slice_*
Renames variables sizeof_Array and other array_* variables that were actually intended for slices and not arrays. Change-Id: I391b95880cc77cabb8472efe694b7dd19545f31a Reviewed-on: https://go-review.googlesource.com/c/go/+/180919 Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com> Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
|
cd53fddabb |
cmd/compile: add framework for logging optimizer (non)actions to LSP
This is intended to allow IDEs to note where the optimizer was not able to improve users' code. There may be other applications for this, for example in studying effectiveness of optimizer changes more quickly than running benchmarks, or in verifying that code changes did not accidentally disable optimizations in performance-critical code. Logging of nilcheck (bad) for amd64 is implemented as proof-of-concept. In general, the intent is that optimizations that didn't happen are what will be logged, because that is believed to be what IDE users want. Added flag -json=version,dest Check that version=0. (Future compilers will support a few recent versions, I hope that version is always <=3.) Dest is expected to be one of: /path (or \path in Windows) will create directory /path and fill it w/ json files file://path will create directory path, intended either for I:\dont\know\enough\about\windows\paths trustme_I_know_what_I_am_doing_probably_testing Not passing an absolute path name usually leads to json splattered all over source directories, or failure when those directories are not writeable. If you want a foot-gun, you have to ask for it. The JSON output is directed to subdirectories of dest, where each subdirectory is net/url.PathEscape of the package name, and each for each foo.go in the package, net/url.PathEscape(foo).json is created. The first line of foo.json contains version and context information, and subsequent lines contains LSP-conforming JSON describing the missing optimizations. Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/204338 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
a0262b201f |
cmd/compile: intrinsify functions added to runtime/internal/sys
This restores intrinsic status to functions copied from math/bits into runtime/internal/sys, as an aid to runtime performance. Updates #35112. Change-Id: I41a7d87cf00f1e64d82aa95c5b1000bc128de820 Reviewed-on: https://go-review.googlesource.com/c/go/+/206200 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
543c6d2e0d |
math, cmd/compile: rename Fma to FMA
This API was added for #25819, where it was discussed as math.FMA. The commit adding it used math.Fma, presumably for consistency with the rest of the unusual names in package math (Sincos, Acosh, Erfcinv, Float32bits, etc). I believe that using an idiomatic Go name is more important here than consistency with these other names, most of which are historical baggage from C's standard library. Early additions like Float32frombits happened before "uppercase for export" (so they were originally like "float32frombits") and they were not properly reconsidered when we uppercased the symbols to export them. That's a mistake we live with. The names of functions we have added since then, and even a few that were legacy, are more properly Go-cased, such as IsNaN, IsInf, and RoundToEven, rather than Isnan, Isinf, and Roundtoeven. And also constants like MaxFloat32. For new API, we should keep using proper Go-cased symbols instead of minimally-upper-cased-C symbols. So math.FMA, not math.Fma. This API has not yet been released, so this change does not break the compatibility promise. This CL also modifies cmd/compile, since the compiler knows the name of the function. I could have stopped at changing the string constants, but it seemed to make more sense to use a consistent casing everywhere. Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e Reviewed-on: https://go-review.googlesource.com/c/go/+/205317 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
||
|
cc47b0d2cd |
cmd/compile: handle some missing cases of non-SSAable values for args of open-coded defers
In my experimentation, I had found that most non-SSAable expressions were converted to autotmp variables during AST evaluation. However, this was not true generally, as witnessed by issue #35213, which has a non-SSAable field reference of a struct that is not converted to an autotmp. So, I fixed openDeferSave() to handle non-SSAable nodes more generally, and make sure that these non-SSAable expressions are not evaluated more than once (which could incorrectly repeat side effects). Fixes #35213 Change-Id: I8043d5576b455e94163599e930ca0275e550d594 Reviewed-on: https://go-review.googlesource.com/c/go/+/203888 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
97592b3c14 |
cmd/compile: intrinsics for runtime/internal/atomic.Store8
For #10958, #24543, but makes sense on its own. Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a Reviewed-on: https://go-review.googlesource.com/c/go/+/203284 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
be64a19d99 |
cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata
Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff Reviewed-on: https://go-review.googlesource.com/c/go/+/202340 Reviewed-by: Austin Clements <austin@google.com> |
||
|
03fb1f607b |
cmd/compile: don't use FMA on plan9
CL 137156 introduces an intrinsic on AMD64 that executes vfmadd231sd when feature detection is successful. However, because floating-point isn't allowed in note handler, the builder disables SSE instructions, and fails when attempting to execute this instruction. This change disables FMA on plan9 to immediately use the software fallback. Fixes #35063. Change-Id: I87d8f0995bd2f15013d203e618938f5079c9eed2 Reviewed-on: https://go-review.googlesource.com/c/go/+/202617 Reviewed-by: Keith Randall <khr@golang.org> |
||
|
58b031949b |
cmd/compile: add fma intrinsic for arm
This change introduces an arm intrinsic that generates the FMULAD instruction for the fused-multiply-add operation on systems that support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite rule translates the generic intrinsic to FMULAD. Updates #25819. Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa Reviewed-on: https://go-review.googlesource.com/c/go/+/142117 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
7a6da218b1 |
cmd/compile: add fma intrinsic for amd64
To permit ssa-level optimization, this change introduces an amd64 intrinsic that generates the VFMADD231SD instruction for the fused-multiply-add operation on systems that support it. System support is detected via cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic ("Fma") to VFMADD231SD. The benchmark compares the software implementation (old) with the intrinsic (new). name old time/op new time/op delta Fma-4 27.2ns ± 1% 1.0ns ± 9% -96.48% (p=0.008 n=5+5) Updates #25819. Change-Id: I966655e5f96817a5d06dff5942418a3915b09584 Reviewed-on: https://go-review.googlesource.com/c/go/+/137156 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
33425ab8db |
cmd/compile: introduce generic ssa intrinsic for fused-multiply-add
In order to make math.FMA a compiler intrinsic for ISAs like ARM64, PPC64[le], and S390X, a generic 3-argument opcode "Fma" is provided and rewritten as ARM64: (Fma x y z) -> (FMADDD z x y) PPC64: (Fma x y z) -> (FMADD x y z) S390X: (Fma x y z) -> (FMADD z x y) Updates #25819. Change-Id: Ie5bc628311e6feeb28ddf9adaa6e702c8c291efa Reviewed-on: https://go-review.googlesource.com/c/go/+/131959 Run-TryBot: Akhil Indurti <aindurti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
b76e6f8825 |
Revert "cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata"
This reverts CL 190098. Reason for revert: broke several builders. Change-Id: I69161352f9ded02537d8815f259c4d391edd9220 Reviewed-on: https://go-review.googlesource.com/c/go/+/201519 Run-TryBot: Bryan C. Mills <bcmills@google.com> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Dan Scales <danscales@google.com> |
||
|
dad616375f |
cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata
Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go cmd without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28 Reviewed-on: https://go-review.googlesource.com/c/go/+/190098 Reviewed-by: Austin Clements <austin@google.com> |
||
|
c4817f5d4f |
cmd/compile: on Wasm and AIX, let deferred nil function panic at invocation
The Go spec requires If a deferred function value evaluates to nil, execution panics when the function is invoked, not when the "defer" statement is executed. On Wasm and AIX, currently we actually emit a nil check at the point of defer statement, which will make it panic too early. This CL fixes this. Also, on Wasm, now the nil function will be passed through deferreturn to jmpdefer, which does an explicit nil check and calls sigpanic if it is nil. This sigpanic, being called from assembly, is ABI0. So change the assembler backend to also handle sigpanic in ABI0. Fixes #34926. Updates #8047. Change-Id: I28489a571cee36d2aef041f917b8cfdc31d557d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/201297 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
||
|
50f1157760 |
cmd/compile: add math/bits.Mul64 intrinsic on mips64x
Benchmark: name old time/op new time/op delta Mul 36.0ns ± 1% 2.8ns ± 0% -92.31% (p=0.000 n=10+10) Mul32 4.37ns ± 0% 4.37ns ± 0% ~ (p=0.429 n=6+10) Mul64 36.4ns ± 0% 2.8ns ± 0% -92.37% (p=0.000 n=10+9) Change-Id: Ic4f4e5958adbf24999abcee721d0180b5413fca7 Reviewed-on: https://go-review.googlesource.com/c/go/+/200582 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |