mirror of
https://github.com/golang/go.git
synced 2025-05-07 00:23:03 +00:00
526 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
927fdb7843 |
cmd/compile: simplify intrinsification of TrailingZeros16 and TrailingZeros8
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64 and S390X, rather than having a custom intrinsic. Note that for PPC64 this actually allows the existing Ctz16 and Ctz8 rules to be used. Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4 Reviewed-on: https://go-review.googlesource.com/c/go/+/651816 Reviewed-by: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> |
||
|
43e6525986 |
cmd/compile: load properly constant values from itabs
While looking at the SSA of following code, i noticed that these rules do not work properly, and the types are loaded indirectly through an itab, instead of statically. type M interface{ M() } type A interface{ A() } type Impl struct{} func (*Impl) M() {} func (*Impl) A() {} func main() { var a M = &Impl{} a.(A).A() } Change-Id: Ia275993f81a2e7302102d4ff87ac28586023d13c GitHub-Last-Rev: 4bfc9019172929d0b0f1c8a1b7eb28cdbc9b87ef GitHub-Pull-Request: golang/go#71784 Reviewed-on: https://go-review.googlesource.com/c/go/+/649500 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> |
||
|
d524e1eccd |
cmd/compile: on AMD64, turn x < 128 into x <= 127
x < 128 -> x <= 127 x >= 128 -> x > 127 This allows for shorter encoding as 127 fits into a single-byte immediate. archive/tar benchmark (Alder Lake 12600K) name old time/op new time/op delta /Writer/USTAR-16 1.46µs ± 0% 1.32µs ± 0% -9.43% (p=0.008 n=5+5) /Writer/GNU-16 1.85µs ± 1% 1.79µs ± 1% -3.47% (p=0.008 n=5+5) /Writer/PAX-16 3.21µs ± 0% 3.11µs ± 2% -2.96% (p=0.008 n=5+5) /Reader/USTAR-16 1.38µs ± 1% 1.37µs ± 0% ~ (p=0.127 n=5+4) /Reader/GNU-16 798ns ± 1% 800ns ± 2% ~ (p=0.548 n=5+5) /Reader/PAX-16 3.07µs ± 1% 3.00µs ± 0% -2.35% (p=0.008 n=5+5) [Geo mean] 1.76µs 1.70µs -3.15% compilecmp: hash/maphash hash/maphash.(*Hash).Write 517 -> 510 (-1.35%) runtime runtime.traceReadCPU 1626 -> 1615 (-0.68%) runtime [cmd/compile] runtime.traceReadCPU 1626 -> 1615 (-0.68%) math/rand/v2 type:.eq.[128]float32 65 -> 59 (-9.23%) bytes bytes.trimLeftUnicode 378 -> 373 (-1.32%) bytes.IndexAny 1189 -> 1157 (-2.69%) bytes.LastIndexAny 1256 -> 1239 (-1.35%) bytes.lastIndexFunc 263 -> 261 (-0.76%) strings strings.FieldsFuncSeq.func1 411 -> 399 (-2.92%) strings.EqualFold 625 -> 624 (-0.16%) strings.trimLeftUnicode 248 -> 231 (-6.85%) math/rand type:.eq.[128]float32 65 -> 59 (-9.23%) bytes [cmd/compile] bytes.LastIndexAny 1256 -> 1239 (-1.35%) bytes.lastIndexFunc 263 -> 261 (-0.76%) bytes.trimLeftUnicode 378 -> 373 (-1.32%) bytes.IndexAny 1189 -> 1157 (-2.69%) regexp/syntax regexp/syntax.(*parser).parseEscape 1113 -> 1102 (-0.99%) math/rand/v2 [cmd/compile] type:.eq.[128]float32 65 -> 59 (-9.23%) strings [cmd/compile] strings.EqualFold 625 -> 624 (-0.16%) strings.FieldsFuncSeq.func1 411 -> 399 (-2.92%) strings.trimLeftUnicode 248 -> 231 (-6.85%) math/rand [cmd/compile] type:.eq.[128]float32 65 -> 59 (-9.23%) regexp regexp.(*inputString).context 198 -> 197 (-0.51%) regexp.(*inputBytes).context 221 -> 212 (-4.07%) image/jpeg image/jpeg.(*decoder).processDQT 500 -> 491 (-1.80%) regexp/syntax [cmd/compile] regexp/syntax.(*parser).parseEscape 1113 -> 1102 (-0.99%) regexp [cmd/compile] regexp.(*inputString).context 198 -> 197 (-0.51%) regexp.(*inputBytes).context 221 -> 212 (-4.07%) encoding/csv encoding/csv.(*Writer).fieldNeedsQuotes 269 -> 266 (-1.12%) cmd/vendor/golang.org/x/sys/unix type:.eq.[131]struct 855 -> 823 (-3.74%) vendor/golang.org/x/text/unicode/norm vendor/golang.org/x/text/unicode/norm.nextDecomposed 4831 -> 4826 (-0.10%) vendor/golang.org/x/text/unicode/norm.(*Iter).returnSlice 281 -> 275 (-2.14%) vendor/golang.org/x/text/secure/bidirule vendor/golang.org/x/text/secure/bidirule.init.0 85 -> 83 (-2.35%) go/scanner go/scanner.isDigit 100 -> 98 (-2.00%) go/scanner.(*Scanner).next 431 -> 422 (-2.09%) go/scanner.isLetter 142 -> 124 (-12.68%) encoding/asn1 encoding/asn1.parseTagAndLength 1189 -> 1182 (-0.59%) encoding/asn1.makeField 3481 -> 3463 (-0.52%) text/scanner text/scanner.(*Scanner).next 1242 -> 1236 (-0.48%) archive/tar archive/tar.isASCII 133 -> 127 (-4.51%) archive/tar.(*Writer).writeRawFile 1206 -> 1198 (-0.66%) archive/tar.(*Reader).readHeader.func1 9 -> 7 (-22.22%) archive/tar.toASCII 393 -> 383 (-2.54%) archive/tar.splitUSTARPath 405 -> 396 (-2.22%) archive/tar.(*Writer).writePAXHeader.func1 627 -> 620 (-1.12%) text/template text/template.jsIsSpecial 59 -> 57 (-3.39%) go/doc go/doc.assumedPackageName 714 -> 701 (-1.82%) vendor/golang.org/x/net/http/httpguts vendor/golang.org/x/net/http/httpguts.headerValueContainsToken 965 -> 952 (-1.35%) vendor/golang.org/x/net/http/httpguts.tokenEqual 280 -> 269 (-3.93%) vendor/golang.org/x/net/http/httpguts.IsTokenRune 28 -> 26 (-7.14%) net/mail net/mail.isVchar 26 -> 24 (-7.69%) net/mail.isAtext 106 -> 104 (-1.89%) net/mail.(*Address).String 1084 -> 1052 (-2.95%) net/mail.isQtext 39 -> 37 (-5.13%) net/mail.isMultibyte 9 -> 7 (-22.22%) net/mail.isDtext 45 -> 43 (-4.44%) net/mail.(*addrParser).consumeQuotedString 1050 -> 1029 (-2.00%) net/mail.quoteString 741 -> 714 (-3.64%) cmd/internal/obj/s390x cmd/internal/obj/s390x.preprocess 6405 -> 6393 (-0.19%) cmd/internal/obj/x86 cmd/internal/obj/x86.toDisp8 303 -> 301 (-0.66%) fmt [cmd/compile] fmt.Fprintf 4726 -> 4662 (-1.35%) go/scanner [cmd/compile] go/scanner.(*Scanner).next 431 -> 422 (-2.09%) go/scanner.isLetter 142 -> 124 (-12.68%) go/scanner.isDigit 100 -> 98 (-2.00%) cmd/compile/internal/syntax cmd/compile/internal/syntax.(*source).nextch 879 -> 847 (-3.64%) cmd/vendor/golang.org/x/mod/module cmd/vendor/golang.org/x/mod/module.checkElem 1253 -> 1235 (-1.44%) cmd/vendor/golang.org/x/mod/module.escapeString 519 -> 517 (-0.39%) go/doc [cmd/compile] go/doc.assumedPackageName 714 -> 701 (-1.82%) cmd/compile/internal/syntax [cmd/compile] cmd/compile/internal/syntax.(*scanner).escape 1965 -> 1933 (-1.63%) cmd/compile/internal/syntax.(*scanner).next 8975 -> 8847 (-1.43%) cmd/internal/obj/s390x [cmd/compile] cmd/internal/obj/s390x.preprocess 6405 -> 6393 (-0.19%) cmd/internal/obj/x86 [cmd/compile] cmd/internal/obj/x86.toDisp8 303 -> 301 (-0.66%) cmd/internal/gcprog cmd/internal/gcprog.(*Writer).Repeat 688 -> 677 (-1.60%) cmd/internal/gcprog.(*Writer).varint 442 -> 439 (-0.68%) cmd/compile/internal/ir cmd/compile/internal/ir.splitPkg 331 -> 325 (-1.81%) cmd/compile/internal/ir [cmd/compile] cmd/compile/internal/ir.splitPkg 331 -> 325 (-1.81%) net/http net/http.containsDotDot.FieldsFuncSeq.func1 411 -> 399 (-2.92%) net/http.isNotToken 33 -> 30 (-9.09%) net/http.containsDotDot 606 -> 588 (-2.97%) net/http.isCookieNameValid 197 -> 191 (-3.05%) net/http.parsePattern 4330 -> 4317 (-0.30%) net/http.ParseCookie 1099 -> 1096 (-0.27%) net/http.validMethod 197 -> 187 (-5.08%) cmd/vendor/golang.org/x/text/unicode/norm cmd/vendor/golang.org/x/text/unicode/norm.(*Iter).returnSlice 281 -> 275 (-2.14%) cmd/vendor/golang.org/x/text/unicode/norm.nextDecomposed 4831 -> 4826 (-0.10%) net/http/cookiejar net/http/cookiejar.encode 1936 -> 1918 (-0.93%) expvar expvar.appendJSONQuote 972 -> 965 (-0.72%) cmd/cgo/internal/test cmd/cgo/internal/test.stack128 116 -> 114 (-1.72%) cmd/vendor/rsc.io/markdown cmd/vendor/rsc.io/markdown.newATXHeading 1637 -> 1628 (-0.55%) cmd/vendor/rsc.io/markdown.isUnicodePunct 197 -> 179 (-9.14%) Change-Id: I578bdf42ef229d687d526e378d697ced51e1880c Reviewed-on: https://go-review.googlesource.com/c/go/+/639935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
beac2f7d3b |
cmd/compile: fix sign extension of paired 32-bit loads on arm64
Fixes #71759 Change-Id: Iab05294ac933cc9972949158d3fe2bdc3073df5e Reviewed-on: https://go-review.googlesource.com/c/go/+/649895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> |
||
|
187fd2698d |
cmd/compile: make write barrier code amenable to paired loads/stores
It currently isn't because it does load/store/load/store/... Rework to do overwrite processing in pairs so it is instead load/load/store/store/... Change-Id: If7be629bc4048da5f2386dafb8f05759b79e9e2b Reviewed-on: https://go-review.googlesource.com/c/go/+/631495 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
a0029e95e5 |
cmd/compile: regalloc: handle desired registers of 2-output insns
Particularly with 2-word load instructions, this becomes important. Classic example is: func f(p *string) string { return *p } We want the two loads to put the return values directly into the two ABI return registers. At this point in the stack, cmd/go is 1.1% smaller. Change-Id: I51fd1710238e81d15aab2bfb816d73c8e7c207b1 Reviewed-on: https://go-review.googlesource.com/c/go/+/631137 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
20d7c57422 |
cmd/compile: pair loads and stores on arm64
Look for possible paired load/store operations on arm64. I don't expect this would be a lot faster, but it will save binary space, and indirectly through the icache at least a bit of time. Change-Id: I4dd73b0e6329c4659b7453998f9b75320fcf380b Reviewed-on: https://go-review.googlesource.com/c/go/+/629256 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> |
||
|
89c2f282dc |
cmd/compile: move []byte->string map key optimization to ssa
If we call slicebytetostring immediately (with no intervening writes) before calling map access or delete functions with the resulting string as the key, then we can just use the ptr/len of the slicebytetostring argument as the key. This avoids an allocation. Fixes #44898 Update #71132 There's old code in cmd/compile/internal/walk/order.go that handles some of these cases. 1. m[string(b)] 2. s := string(b); m[s] 3. m[[2]string{string(b1),string(b2)}] The old code handled cases 1&3. The new code handles cases 1&2. We'll leave the old code around to keep 3 working, although it seems not terribly common. Case 2 happens particularly after inlining, so it is pretty common. Change-Id: I8913226ca79d2c65f4e2bd69a38ac8c976a57e43 Reviewed-on: https://go-review.googlesource.com/c/go/+/640656 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
43b7e67040 |
cmd/compile: lower x*z + y to FMA if FMA enabled
There is a generic opcode for FMA, but we don't use it in rewrite rules. This is maybe because some archs, like WASM and MIPS don't have a late lowering rule for it. Fixes #71204 Intel Alder Lake 12600k (GOAMD64=v3): math: name old time/op new time/op delta Acos-16 4.58ns ± 0% 3.36ns ± 0% -26.68% (p=0.008 n=5+5) Acosh-16 8.04ns ± 1% 6.44ns ± 0% -19.95% (p=0.008 n=5+5) Asin-16 4.28ns ± 0% 3.32ns ± 0% -22.24% (p=0.008 n=5+5) Asinh-16 9.92ns ± 0% 8.62ns ± 0% -13.13% (p=0.008 n=5+5) Atan-16 2.31ns ± 0% 1.84ns ± 0% -20.02% (p=0.008 n=5+5) Atanh-16 7.79ns ± 0% 7.03ns ± 0% -9.67% (p=0.008 n=5+5) Atan2-16 3.93ns ± 0% 3.52ns ± 0% -10.35% (p=0.000 n=5+4) Cbrt-16 4.62ns ± 0% 4.41ns ± 0% -4.57% (p=0.016 n=4+5) Ceil-16 0.14ns ± 1% 0.14ns ± 2% ~ (p=0.103 n=5+5) Copysign-16 0.33ns ± 0% 0.33ns ± 0% +0.03% (p=0.029 n=4+4) Cos-16 4.87ns ± 0% 4.75ns ± 0% -2.44% (p=0.016 n=5+4) Cosh-16 4.86ns ± 0% 4.86ns ± 0% ~ (p=0.317 n=5+5) Erf-16 2.71ns ± 0% 2.25ns ± 0% -16.69% (p=0.008 n=5+5) Erfc-16 3.06ns ± 0% 2.67ns ± 0% -13.00% (p=0.016 n=5+4) Erfinv-16 3.88ns ± 0% 2.84ns ± 3% -26.83% (p=0.008 n=5+5) Erfcinv-16 4.08ns ± 0% 3.01ns ± 1% -26.27% (p=0.008 n=5+5) Exp-16 3.29ns ± 0% 3.37ns ± 2% +2.64% (p=0.016 n=4+5) ExpGo-16 8.44ns ± 0% 7.48ns ± 1% -11.37% (p=0.008 n=5+5) Expm1-16 4.46ns ± 0% 3.69ns ± 2% -17.26% (p=0.016 n=4+5) Exp2-16 8.20ns ± 0% 7.39ns ± 2% -9.94% (p=0.008 n=5+5) Exp2Go-16 8.26ns ± 0% 7.23ns ± 0% -12.49% (p=0.016 n=4+5) Abs-16 0.26ns ± 3% 0.22ns ± 1% -16.34% (p=0.008 n=5+5) Dim-16 0.38ns ± 1% 0.40ns ± 2% +5.02% (p=0.008 n=5+5) Floor-16 0.11ns ± 1% 0.17ns ± 4% +54.99% (p=0.008 n=5+5) Max-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.619 n=5+5) Min-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.484 n=5+5) Mod-16 13.4ns ± 1% 12.8ns ± 0% -4.21% (p=0.016 n=5+4) Frexp-16 1.70ns ± 0% 1.71ns ± 0% +0.46% (p=0.008 n=5+5) Gamma-16 3.97ns ± 0% 3.97ns ± 0% ~ (p=0.643 n=5+5) Hypot-16 2.11ns ± 0% 2.11ns ± 0% ~ (p=0.762 n=5+5) HypotGo-16 2.48ns ± 4% 2.26ns ± 0% -8.94% (p=0.008 n=5+5) Ilogb-16 1.67ns ± 0% 1.67ns ± 0% -0.07% (p=0.048 n=5+5) J0-16 19.8ns ± 0% 19.3ns ± 0% ~ (p=0.079 n=4+5) J1-16 19.4ns ± 0% 18.9ns ± 0% -2.63% (p=0.000 n=5+4) Jn-16 41.5ns ± 0% 40.6ns ± 0% -2.32% (p=0.016 n=4+5) Ldexp-16 2.26ns ± 0% 2.26ns ± 0% ~ (p=0.683 n=5+5) Lgamma-16 4.40ns ± 0% 4.21ns ± 0% -4.21% (p=0.008 n=5+5) Log-16 4.05ns ± 0% 4.05ns ± 0% ~ (all equal) Logb-16 1.69ns ± 0% 1.69ns ± 0% ~ (p=0.429 n=5+5) Log1p-16 5.00ns ± 0% 3.99ns ± 0% -20.14% (p=0.008 n=5+5) Log10-16 4.22ns ± 0% 4.21ns ± 0% -0.15% (p=0.008 n=5+5) Log2-16 2.27ns ± 0% 2.25ns ± 0% -0.94% (p=0.008 n=5+5) Modf-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.492 n=5+5) Nextafter32-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.079 n=4+5) Nextafter64-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.095 n=4+5) PowInt-16 10.8ns ± 0% 10.8ns ± 0% ~ (all equal) PowFrac-16 25.3ns ± 0% 25.3ns ± 0% -0.09% (p=0.000 n=5+4) Pow10Pos-16 0.52ns ± 1% 0.52ns ± 0% ~ (p=0.810 n=5+5) Pow10Neg-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.381 n=5+5) Round-16 0.93ns ± 0% 0.93ns ± 0% ~ (p=0.056 n=5+5) RoundToEven-16 1.64ns ± 0% 1.64ns ± 0% ~ (all equal) Remainder-16 12.4ns ± 2% 12.0ns ± 0% -3.27% (p=0.008 n=5+5) Signbit-16 0.37ns ± 0% 0.37ns ± 0% -0.19% (p=0.008 n=5+5) Sin-16 4.04ns ± 0% 3.92ns ± 0% -3.13% (p=0.000 n=4+5) Sincos-16 5.99ns ± 0% 5.80ns ± 0% -3.03% (p=0.008 n=5+5) Sinh-16 5.22ns ± 0% 5.22ns ± 0% ~ (p=0.651 n=5+4) SqrtIndirect-16 0.41ns ± 0% 0.41ns ± 0% ~ (p=0.333 n=4+5) SqrtLatency-16 2.66ns ± 0% 2.66ns ± 0% ~ (p=0.079 n=4+5) SqrtIndirectLatency-16 2.66ns ± 0% 2.66ns ± 0% ~ (p=1.000 n=5+5) SqrtGoLatency-16 30.1ns ± 0% 28.6ns ± 1% -4.84% (p=0.008 n=5+5) SqrtPrime-16 645ns ± 0% 645ns ± 0% ~ (p=0.095 n=5+4) Tan-16 4.21ns ± 0% 4.09ns ± 0% -2.76% (p=0.029 n=4+4) Tanh-16 5.36ns ± 0% 5.36ns ± 0% ~ (p=0.444 n=5+5) Trunc-16 0.12ns ± 6% 0.11ns ± 1% -6.79% (p=0.008 n=5+5) Y0-16 19.2ns ± 0% 18.7ns ± 0% -2.52% (p=0.000 n=5+4) Y1-16 19.1ns ± 0% 18.4ns ± 0% ~ (p=0.079 n=4+5) Yn-16 40.7ns ± 0% 39.5ns ± 0% -2.82% (p=0.008 n=5+5) Float64bits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.603 n=5+5) Float64frombits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.984 n=4+5) Float32bits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.778 n=4+5) Float32frombits-16 0.21ns ± 0% 0.20ns ± 0% ~ (p=0.397 n=5+5) FMA-16 0.82ns ± 0% 0.82ns ± 0% +0.02% (p=0.029 n=4+4) [Geo mean] 2.87ns 2.74ns -4.61% math/cmplx: name old time/op new time/op delta Abs-16 2.07ns ± 0% 2.05ns ± 0% -0.70% (p=0.016 n=5+4) Acos-16 36.5ns ± 0% 35.7ns ± 0% -2.33% (p=0.029 n=4+4) Acosh-16 37.0ns ± 0% 36.2ns ± 0% -2.20% (p=0.008 n=5+5) Asin-16 36.5ns ± 0% 35.7ns ± 0% -2.29% (p=0.008 n=5+5) Asinh-16 33.5ns ± 0% 31.6ns ± 0% -5.51% (p=0.008 n=5+5) Atan-16 15.5ns ± 0% 13.9ns ± 0% -10.61% (p=0.008 n=5+5) Atanh-16 15.0ns ± 0% 13.6ns ± 0% -9.73% (p=0.008 n=5+5) Conj-16 0.11ns ± 5% 0.11ns ± 1% ~ (p=0.421 n=5+5) Cos-16 12.3ns ± 0% 12.2ns ± 0% -0.60% (p=0.000 n=4+5) Cosh-16 12.1ns ± 0% 12.0ns ± 0% ~ (p=0.079 n=4+5) Exp-16 10.0ns ± 0% 9.8ns ± 0% -1.77% (p=0.008 n=5+5) Log-16 14.5ns ± 0% 13.7ns ± 0% -5.67% (p=0.008 n=5+5) Log10-16 14.5ns ± 0% 13.7ns ± 0% -5.55% (p=0.000 n=5+4) Phase-16 5.11ns ± 0% 4.25ns ± 0% -16.90% (p=0.008 n=5+5) Polar-16 7.12ns ± 0% 6.35ns ± 0% -10.90% (p=0.008 n=5+5) Pow-16 64.3ns ± 0% 63.7ns ± 0% -0.97% (p=0.008 n=5+5) Rect-16 5.74ns ± 0% 5.58ns ± 0% -2.73% (p=0.016 n=4+5) Sin-16 12.2ns ± 0% 12.2ns ± 0% -0.54% (p=0.000 n=4+5) Sinh-16 12.1ns ± 0% 12.0ns ± 0% -0.58% (p=0.000 n=5+4) Sqrt-16 5.30ns ± 0% 5.18ns ± 0% -2.36% (p=0.008 n=5+5) Tan-16 22.7ns ± 0% 22.6ns ± 0% -0.33% (p=0.008 n=5+5) Tanh-16 21.2ns ± 0% 20.9ns ± 0% -1.32% (p=0.008 n=5+5) [Geo mean] 11.3ns 10.8ns -3.97% Change-Id: Idcc4b357ba68477929c126289e5095b27a827b1b Reviewed-on: https://go-review.googlesource.com/c/go/+/646335 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
a7e331e671 |
cmd/compile: implement signed loads from read-only memory
In addition to unsigned loads which already exist. This helps code that does switches on strings to constant-fold the switch away when the string being switched on is constant. Fixes #71699 Change-Id: If3051af0f7255d2a573da6f96b153a987a7f159d Reviewed-on: https://go-review.googlesource.com/c/go/+/649295 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@google.com> |
||
|
072eea9b3b |
cmd/compile: avoid ifaceeq call if we know the interface is direct
We can just use == if the interface is direct. Fixes #70738 Change-Id: Ia9a644791a370fec969c04c42d28a9b58f16911f Reviewed-on: https://go-review.googlesource.com/c/go/+/635435 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
cd595be6d6 |
cmd/compile: prefer an add when shifting left by 1
ADD(Q|L) has generally twice the throughput. Came up in CL 626998. Throughput by arch: Zen 4: SHLL (R64, 1): 0.5 ADD (R64, R64): 0.25 Intel Alder Lake: SHLL (R64, 1): 0.5 ADD (R64, R64): 0.2 Intel Haswell: SHLL (R64, 1): 0.5 ADD (R64, R64): 0.25 Also include a minor opt for: (x + x) << c -> x << (c + 1) Before this, the code: func addShift(x int64) int64 { return (x + x) << 1 } emitted two instructions: ADDQ AX, AX SHLQ $1, AX but we can do it in a single shift: SHLQ $2, AX Add a codegen test for clearing the last bit. compilecmp linux/amd64: math math.sqrt 243 -> 242 (-0.41%) math [cmd/compile] math.sqrt 243 -> 242 (-0.41%) runtime runtime.selectgo 5455 -> 5445 (-0.18%) runtime.sysargs 665 -> 662 (-0.45%) runtime.isPinned 145 -> 141 (-2.76%) runtime.atoi64 198 -> 194 (-2.02%) runtime.setPinned 714 -> 709 (-0.70%) runtime [cmd/compile] runtime.sysargs 665 -> 662 (-0.45%) runtime.setPinned 714 -> 709 (-0.70%) runtime.atoi64 198 -> 194 (-2.02%) runtime.isPinned 145 -> 141 (-2.76%) strconv strconv.computeBounds 109 -> 107 (-1.83%) strconv.FormatInt 201 -> 197 (-1.99%) strconv.ryuFtoaShortest 1298 -> 1266 (-2.47%) strconv.small 144 -> 134 (-6.94%) strconv.AppendInt 357 -> 344 (-3.64%) strconv.ryuDigits32 490 -> 488 (-0.41%) strconv.AppendUint 342 -> 340 (-0.58%) strconv [cmd/compile] strconv.FormatInt 201 -> 197 (-1.99%) strconv.ryuFtoaShortest 1298 -> 1266 (-2.47%) strconv.ryuDigits32 490 -> 488 (-0.41%) strconv.AppendUint 342 -> 340 (-0.58%) strconv.computeBounds 109 -> 107 (-1.83%) strconv.small 144 -> 134 (-6.94%) strconv.AppendInt 357 -> 344 (-3.64%) image image.Rectangle.Inset 101 -> 97 (-3.96%) regexp/syntax regexp/syntax.inCharClass.func1 111 -> 110 (-0.90%) regexp/syntax.(*compiler).quest 586 -> 573 (-2.22%) regexp/syntax.ranges.Less 153 -> 150 (-1.96%) regexp/syntax.(*compiler).loop 583 -> 568 (-2.57%) time time.Time.Before 179 -> 161 (-10.06%) time.Time.Compare 189 -> 166 (-12.17%) time.Time.Sub 444 -> 425 (-4.28%) time.Time.UnixMicro 106 -> 95 (-10.38%) time.div 592 -> 587 (-0.84%) time.Time.UnixNano 85 -> 78 (-8.24%) time.(*Time).UnixMilli 141 -> 140 (-0.71%) time.Time.UnixMilli 106 -> 95 (-10.38%) time.(*Time).UnixMicro 141 -> 140 (-0.71%) time.Time.After 179 -> 161 (-10.06%) time.Time.Equal 170 -> 150 (-11.76%) time.Time.AppendBinary 766 -> 757 (-1.17%) time.Time.IsZero 74 -> 66 (-10.81%) time.(*Time).UnixNano 124 -> 113 (-8.87%) time.(*Time).IsZero 113 -> 108 (-4.42%) regexp regexp.(*Regexp).FindAllStringSubmatch.func1 590 -> 569 (-3.56%) regexp.QuoteMeta 485 -> 469 (-3.30%) regexp/syntax [cmd/compile] regexp/syntax.inCharClass.func1 111 -> 110 (-0.90%) regexp/syntax.(*compiler).loop 583 -> 568 (-2.57%) regexp/syntax.(*compiler).quest 586 -> 573 (-2.22%) regexp/syntax.ranges.Less 153 -> 150 (-1.96%) encoding/base64 encoding/base64.decodedLen 92 -> 90 (-2.17%) encoding/base64.(*Encoding).DecodedLen 99 -> 97 (-2.02%) time [cmd/compile] time.(*Time).IsZero 113 -> 108 (-4.42%) time.Time.IsZero 74 -> 66 (-10.81%) time.(*Time).UnixNano 124 -> 113 (-8.87%) time.Time.UnixMilli 106 -> 95 (-10.38%) time.Time.Equal 170 -> 150 (-11.76%) time.Time.UnixMicro 106 -> 95 (-10.38%) time.(*Time).UnixMicro 141 -> 140 (-0.71%) time.Time.Before 179 -> 161 (-10.06%) time.Time.UnixNano 85 -> 78 (-8.24%) time.Time.AppendBinary 766 -> 757 (-1.17%) time.div 592 -> 587 (-0.84%) time.Time.After 179 -> 161 (-10.06%) time.Time.Compare 189 -> 166 (-12.17%) time.(*Time).UnixMilli 141 -> 140 (-0.71%) time.Time.Sub 444 -> 425 (-4.28%) index/suffixarray index/suffixarray.sais_8_32 1677 -> 1645 (-1.91%) index/suffixarray.sais_32 1677 -> 1645 (-1.91%) index/suffixarray.sais_64 1677 -> 1654 (-1.37%) index/suffixarray.sais_8_64 1677 -> 1654 (-1.37%) index/suffixarray.writeInt 249 -> 247 (-0.80%) os os.Expand 1070 -> 1051 (-1.78%) os.Chtimes 787 -> 774 (-1.65%) regexp [cmd/compile] regexp.(*Regexp).FindAllStringSubmatch.func1 590 -> 569 (-3.56%) regexp.QuoteMeta 485 -> 469 (-3.30%) encoding/base64 [cmd/compile] encoding/base64.decodedLen 92 -> 90 (-2.17%) encoding/base64.(*Encoding).DecodedLen 99 -> 97 (-2.02%) encoding/hex encoding/hex.Encode 138 -> 136 (-1.45%) encoding/hex.(*decoder).Read 830 -> 824 (-0.72%) crypto/des crypto/des.initFeistelBox 235 -> 229 (-2.55%) crypto/des.cryptBlock 549 -> 538 (-2.00%) os [cmd/compile] os.Chtimes 787 -> 774 (-1.65%) os.Expand 1070 -> 1051 (-1.78%) math/big math/big.newFloat 238 -> 223 (-6.30%) math/big.nat.mul 2138 -> 2122 (-0.75%) math/big.karatsubaSqr 1372 -> 1369 (-0.22%) math/big.(*Float).sqrtInverse 895 -> 878 (-1.90%) math/big.basicSqr 1032 -> 1017 (-1.45%) cmd/vendor/golang.org/x/sys/unix cmd/vendor/golang.org/x/sys/unix.TimeToTimespec 72 -> 66 (-8.33%) encoding/json encoding/json.Indent 404 -> 403 (-0.25%) encoding/json.MarshalIndent 303 -> 297 (-1.98%) testing testing.(*T).Deadline 84 -> 82 (-2.38%) testing.(*M).Run 3545 -> 3525 (-0.56%) archive/zip archive/zip.headerFileInfo.ModTime 229 -> 223 (-2.62%) encoding/gob encoding/gob.(*encoderState).encodeInt 474 -> 469 (-1.05%) crypto/elliptic crypto/elliptic.Marshal 728 -> 714 (-1.92%) debug/buildinfo debug/buildinfo.readString 325 -> 315 (-3.08%) image/png image/png.(*decoder).readImagePass 10866 -> 10834 (-0.29%) archive/tar archive/tar.Header.allowedFormats.func3 1768 -> 1736 (-1.81%) archive/tar.formatPAXTime 389 -> 358 (-7.97%) archive/tar.(*Writer).writeGNUHeader 741 -> 727 (-1.89%) archive/tar.readGNUSparseMap0x1 709 -> 695 (-1.97%) archive/tar.(*Writer).templateV7Plus 915 -> 909 (-0.66%) crypto/internal/cryptotest crypto/internal/cryptotest.TestHash.func4 890 -> 879 (-1.24%) crypto/internal/cryptotest.TestStream.func6.1 646 -> 645 (-0.15%) crypto/internal/cryptotest.testCipher.func3 1300 -> 1289 (-0.85%) internal/pkgbits internal/pkgbits.(*Encoder).Int64 113 -> 103 (-8.85%) internal/pkgbits.(*Encoder).rawVarint 74 -> 72 (-2.70%) testing/quick testing/quick.(*Config).getRand 316 -> 315 (-0.32%) log/slog log/slog.TimeValue 489 -> 479 (-2.04%) runtime/pprof runtime/pprof.(*profileBuilder).build 2341 -> 2322 (-0.81%) internal/coverage/cfile internal/coverage/cfile.(*emitState).openMetaFile 824 -> 822 (-0.24%) internal/coverage/cfile.(*emitState).openCounterFile 904 -> 892 (-1.33%) cmd/internal/objabi cmd/internal/objabi.expandArgs 1177 -> 1169 (-0.68%) crypto/ecdsa crypto/ecdsa.pointFromAffine 1162 -> 1144 (-1.55%) net net.minNonzeroTime 313 -> 308 (-1.60%) net.cgoLookupAddrPTR 812 -> 797 (-1.85%) net.(*IPNet).String 851 -> 827 (-2.82%) net.IP.AppendText 488 -> 471 (-3.48%) net.IPMask.String 281 -> 270 (-3.91%) net.partialDeadline 374 -> 366 (-2.14%) net.hexString 249 -> 240 (-3.61%) net.IP.String 454 -> 453 (-0.22%) internal/fuzz internal/fuzz.newPcgRand 240 -> 234 (-2.50%) crypto/x509 crypto/x509.(*Certificate).isValid 2642 -> 2611 (-1.17%) cmd/internal/obj/s390x cmd/internal/obj/s390x.buildop 33676 -> 33644 (-0.10%) encoding/hex [cmd/compile] encoding/hex.(*decoder).Read 830 -> 824 (-0.72%) encoding/hex.Encode 138 -> 136 (-1.45%) cmd/internal/objabi [cmd/compile] cmd/internal/objabi.expandArgs 1177 -> 1169 (-0.68%) math/big [cmd/compile] math/big.(*Float).sqrtInverse 895 -> 878 (-1.90%) math/big.nat.mul 2138 -> 2122 (-0.75%) math/big.karatsubaSqr 1372 -> 1369 (-0.22%) math/big.basicSqr 1032 -> 1017 (-1.45%) math/big.newFloat 238 -> 223 (-6.30%) encoding/json [cmd/compile] encoding/json.MarshalIndent 303 -> 297 (-1.98%) encoding/json.Indent 404 -> 403 (-0.25%) cmd/covdata main.(*metaMerge).emitCounters 985 -> 973 (-1.22%) runtime/pprof [cmd/compile] runtime/pprof.(*profileBuilder).build 2341 -> 2322 (-0.81%) cmd/compile/internal/syntax cmd/compile/internal/syntax.(*source).fill 722 -> 703 (-2.63%) cmd/dist main.runInstall 19081 -> 19049 (-0.17%) crypto/tls crypto/tls.extractPadding 176 -> 175 (-0.57%) slices.Clone[[]crypto/tls.SignatureScheme,crypto/tls.SignatureScheme] 253 -> 247 (-2.37%) slices.Clone[[]uint16,uint16] 253 -> 247 (-2.37%) slices.Clone[[]crypto/tls.CurveID,crypto/tls.CurveID] 253 -> 247 (-2.37%) crypto/tls.(*Config).cipherSuites 335 -> 326 (-2.69%) slices.DeleteFunc[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 437 -> 434 (-0.69%) crypto/tls.dial 1349 -> 1339 (-0.74%) slices.DeleteFunc[go.shape.[]uint16,go.shape.uint16] 437 -> 434 (-0.69%) internal/pkgbits [cmd/compile] internal/pkgbits.(*Encoder).Int64 113 -> 103 (-8.85%) internal/pkgbits.(*Encoder).rawVarint 74 -> 72 (-2.70%) cmd/compile/internal/syntax [cmd/compile] cmd/compile/internal/syntax.(*source).fill 722 -> 703 (-2.63%) cmd/internal/obj/s390x [cmd/compile] cmd/internal/obj/s390x.buildop 33676 -> 33644 (-0.10%) cmd/go/internal/trace cmd/go/internal/trace.Flow 910 -> 886 (-2.64%) cmd/go/internal/trace.(*Span).Done 311 -> 304 (-2.25%) cmd/go/internal/trace.StartSpan 620 -> 615 (-0.81%) cmd/internal/script cmd/internal/script.(*Engine).Execute.func2 534 -> 528 (-1.12%) cmd/link/internal/loader cmd/link/internal/loader.(*Loader).SetSymSect 344 -> 338 (-1.74%) net/http net/http.(*Transport).queueForIdleConn 1797 -> 1766 (-1.73%) net/http.(*Transport).getConn 2149 -> 2131 (-0.84%) net/http.(*http2ClientConn).tooIdleLocked 207 -> 197 (-4.83%) net/http.(*http2responseWriter).SetWriteDeadline.func1 520 -> 508 (-2.31%) net/http.(*Cookie).Valid 837 -> 818 (-2.27%) net/http.(*http2responseWriter).SetReadDeadline 373 -> 357 (-4.29%) net/http.checkIfRange 701 -> 690 (-1.57%) net/http.(*http2SettingsFrame).Value 325 -> 298 (-8.31%) net/http.(*http2SettingsFrame).HasDuplicates 777 -> 767 (-1.29%) net/http.(*Server).Serve 1746 -> 1739 (-0.40%) net/http.http2traceGotConn 569 -> 556 (-2.28%) net/http/pprof net/http/pprof.collectProfile 242 -> 239 (-1.24%) cmd/compile/internal/coverage cmd/compile/internal/coverage.metaHashAndLen 439 -> 438 (-0.23%) cmd/vendor/golang.org/x/telemetry/internal/upload cmd/vendor/golang.org/x/telemetry/internal/upload.(*uploader).findWork 4570 -> 4540 (-0.66%) cmd/vendor/golang.org/x/telemetry/internal/upload.(*uploader).reports 3604 -> 3572 (-0.89%) cmd/compile/internal/coverage [cmd/compile] cmd/compile/internal/coverage.metaHashAndLen 439 -> 438 (-0.23%) cmd/vendor/golang.org/x/text/language cmd/vendor/golang.org/x/text/language.regionGroupDist 287 -> 284 (-1.05%) cmd/go/internal/vcweb cmd/go/internal/vcweb.(*Server).overview.func1 1045 -> 1041 (-0.38%) cmd/go/internal/vcs cmd/go/internal/vcs.expand 761 -> 741 (-2.63%) cmd/compile/internal/inline/inlheur slices.stableCmpFunc[go.shape.struct 2300 -> 2284 (-0.70%) cmd/compile/internal/inline/inlheur [cmd/compile] slices.stableCmpFunc[go.shape.struct 2300 -> 2284 (-0.70%) cmd/go/internal/modfetch/codehost cmd/go/internal/modfetch/codehost.bzrParseStat 2217 -> 2213 (-0.18%) cmd/link/internal/ld cmd/link/internal/ld.decodetypeStructFieldCount 157 -> 152 (-3.18%) cmd/link/internal/ld.(*Link).address 12559 -> 12495 (-0.51%) cmd/link/internal/ld.(*dodataState).allocateDataSections 18345 -> 18205 (-0.76%) cmd/link/internal/ld.elfshreloc 618 -> 616 (-0.32%) cmd/link/internal/ld.(*deadcodePass).decodetypeMethods 794 -> 779 (-1.89%) cmd/link/internal/ld.(*dodataState).assignDsymsToSection 668 -> 663 (-0.75%) cmd/link/internal/ld.relocSectFn 285 -> 284 (-0.35%) cmd/link/internal/ld.decodetypeIfaceMethodCount 146 -> 144 (-1.37%) cmd/link/internal/ld.decodetypeArrayLen 157 -> 152 (-3.18%) cmd/link/internal/arm64 cmd/link/internal/arm64.gensymlate.func1 895 -> 888 (-0.78%) cmd/go/internal/modload cmd/go/internal/modload.queryProxy.func3 1029 -> 1012 (-1.65%) cmd/go/internal/load cmd/go/internal/load.(*Package).setBuildInfo 8453 -> 8447 (-0.07%) cmd/go/internal/clean cmd/go/internal/clean.runClean 2120 -> 2104 (-0.75%) cmd/compile/internal/ssa cmd/compile/internal/ssa.(*poset).aliasnodes 2010 -> 1978 (-1.59%) cmd/compile/internal/ssa.rewriteValueARM64_OpARM64MOVHstoreidx2 730 -> 719 (-1.51%) cmd/compile/internal/ssa.(*debugState).buildLocationLists 3326 -> 3294 (-0.96%) cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDLconst 3069 -> 2941 (-4.17%) cmd/compile/internal/ssa.(*debugState).processValue 9756 -> 9724 (-0.33%) cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDQconst 3069 -> 2941 (-4.17%) cmd/compile/internal/ssa.(*poset).mergeroot 1079 -> 1054 (-2.32%) cmd/compile/internal/ssa [cmd/compile] cmd/compile/internal/ssa.rewriteValueARM64_OpARM64MOVHstoreidx2 730 -> 719 (-1.51%) cmd/compile/internal/ssa.(*poset).aliasnodes 2010 -> 1978 (-1.59%) cmd/compile/internal/ssa.(*poset).mergeroot 1079 -> 1054 (-2.32%) cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDQconst 3069 -> 2941 (-4.17%) cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDLconst 3069 -> 2941 (-4.17%) file before after Δ % math/bits.s 2352 2354 +2 +0.085% math/bits [cmd/compile].s 2352 2354 +2 +0.085% math.s 35675 35674 -1 -0.003% math [cmd/compile].s 35675 35674 -1 -0.003% runtime.s 577251 577245 -6 -0.001% runtime [cmd/compile].s 642419 642438 +19 +0.003% sort.s 37434 37435 +1 +0.003% strconv.s 48391 48343 -48 -0.099% sort [cmd/compile].s 37434 37435 +1 +0.003% bufio.s 21386 21418 +32 +0.150% strconv [cmd/compile].s 48391 48343 -48 -0.099% image.s 34978 35022 +44 +0.126% regexp/syntax.s 81719 81781 +62 +0.076% time.s 94341 94184 -157 -0.166% regexp.s 60411 60399 -12 -0.020% bufio [cmd/compile].s 21512 21544 +32 +0.149% encoding/binary.s 34062 34087 +25 +0.073% regexp/syntax [cmd/compile].s 81719 81781 +62 +0.076% encoding/base64.s 11907 11903 -4 -0.034% time [cmd/compile].s 94341 94184 -157 -0.166% index/suffixarray.s 41633 41527 -106 -0.255% os.s 101770 101738 -32 -0.031% regexp [cmd/compile].s 60411 60399 -12 -0.020% encoding/binary [cmd/compile].s 37173 37198 +25 +0.067% encoding/base64 [cmd/compile].s 11907 11903 -4 -0.034% os/exec.s 23900 23907 +7 +0.029% encoding/hex.s 6038 6030 -8 -0.132% crypto/des.s 5073 5056 -17 -0.335% os [cmd/compile].s 102030 101998 -32 -0.031% vendor/golang.org/x/net/http2/hpack.s 22027 22033 +6 +0.027% math/big.s 164808 164753 -55 -0.033% cmd/vendor/golang.org/x/sys/unix.s 121450 121444 -6 -0.005% encoding/json.s 110294 110287 -7 -0.006% testing.s 115303 115281 -22 -0.019% archive/zip.s 65329 65325 -4 -0.006% os/user.s 10078 10080 +2 +0.020% encoding/gob.s 143788 143783 -5 -0.003% crypto/elliptic.s 30686 30704 +18 +0.059% go/doc/comment.s 49401 49433 +32 +0.065% debug/buildinfo.s 9095 9085 -10 -0.110% image/png.s 36113 36081 -32 -0.089% archive/tar.s 71994 71897 -97 -0.135% crypto/internal/cryptotest.s 60872 60849 -23 -0.038% internal/pkgbits.s 20441 20429 -12 -0.059% testing/quick.s 8236 8235 -1 -0.012% log/slog.s 77568 77558 -10 -0.013% internal/trace/internal/oldtrace.s 52885 52896 +11 +0.021% runtime/pprof.s 123978 123969 -9 -0.007% internal/coverage/cfile.s 25198 25184 -14 -0.056% cmd/internal/objabi.s 19954 19946 -8 -0.040% crypto/ecdsa.s 29159 29141 -18 -0.062% log/slog/internal/benchmarks.s 6694 6695 +1 +0.015% net.s 299569 299503 -66 -0.022% os/exec [cmd/compile].s 23888 23895 +7 +0.029% internal/trace.s 179226 179240 +14 +0.008% internal/fuzz.s 86190 86191 +1 +0.001% crypto/x509.s 177195 177164 -31 -0.017% cmd/internal/obj/s390x.s 121642 121610 -32 -0.026% cmd/internal/obj/ppc64.s 140118 140122 +4 +0.003% encoding/hex [cmd/compile].s 6149 6141 -8 -0.130% cmd/internal/objabi [cmd/compile].s 19954 19946 -8 -0.040% cmd/internal/obj/arm64.s 158523 158555 +32 +0.020% go/doc/comment [cmd/compile].s 49512 49544 +32 +0.065% math/big [cmd/compile].s 166394 166339 -55 -0.033% encoding/json [cmd/compile].s 110712 110705 -7 -0.006% cmd/covdata.s 39699 39687 -12 -0.030% runtime/pprof [cmd/compile].s 125209 125200 -9 -0.007% cmd/compile/internal/syntax.s 181755 181736 -19 -0.010% cmd/dist.s 177893 177861 -32 -0.018% crypto/tls.s 389157 389113 -44 -0.011% internal/pkgbits [cmd/compile].s 41644 41632 -12 -0.029% cmd/compile/internal/syntax [cmd/compile].s 196105 196086 -19 -0.010% cmd/compile/internal/types.s 71315 71345 +30 +0.042% cmd/internal/obj/s390x [cmd/compile].s 121733 121701 -32 -0.026% cmd/go/internal/trace.s 4796 4760 -36 -0.751% cmd/internal/obj/arm64 [cmd/compile].s 168120 168147 +27 +0.016% cmd/internal/obj/ppc64 [cmd/compile].s 140219 140223 +4 +0.003% cmd/internal/script.s 83442 83436 -6 -0.007% cmd/link/internal/loader.s 93299 93294 -5 -0.005% net/http.s 620639 620472 -167 -0.027% net/http/pprof.s 35016 35013 -3 -0.009% cmd/compile/internal/coverage.s 6668 6667 -1 -0.015% cmd/vendor/golang.org/x/telemetry/internal/upload.s 34210 34148 -62 -0.181% cmd/compile/internal/coverage [cmd/compile].s 6664 6663 -1 -0.015% cmd/vendor/golang.org/x/text/language.s 48077 48074 -3 -0.006% cmd/go/internal/vcweb.s 45193 45189 -4 -0.009% cmd/go/internal/vcs.s 44749 44729 -20 -0.045% cmd/compile/internal/inline/inlheur.s 83758 83742 -16 -0.019% cmd/compile/internal/inline/inlheur [cmd/compile].s 84773 84757 -16 -0.019% cmd/go/internal/modfetch/codehost.s 89098 89094 -4 -0.004% cmd/trace.s 257550 257564 +14 +0.005% cmd/link/internal/ld.s 641945 641706 -239 -0.037% cmd/link/internal/arm64.s 34805 34798 -7 -0.020% cmd/go/internal/modload.s 328971 328954 -17 -0.005% cmd/go/internal/load.s 178877 178871 -6 -0.003% cmd/go/internal/clean.s 11006 10990 -16 -0.145% cmd/compile/internal/ssa.s 3552843 3553347 +504 +0.014% cmd/compile/internal/ssa [cmd/compile].s 3752511 3753123 +612 +0.016% total 36179015 36178687 -328 -0.001% Change-Id: I251c2898ccf3c9931d162d87dabbd49cf4ec73a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/641757 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
0825475599 |
cmd/compile: do not treat OpLocalAddr as load in DSE
Fixes #70409 Fixes #47107 Change-Id: I82a66c46f6b76c68e156b5d937273b0316975d44 Reviewed-on: https://go-review.googlesource.com/c/go/+/629016 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> |
||
|
e57769d5ad |
cmd/compile: on AMD64, prefer XOR/AND for (x & 1) == 0 check
It's shorter to encode. Additionally, XOR and AND generally have higher throughput than BT/SET*. compilecmp: runtime runtime.(*sweepClass).split 58 -> 56 (-3.45%) runtime.sweepClass.split 14 -> 11 (-21.43%) runtime [cmd/compile] runtime.(*sweepClass).split 58 -> 56 (-3.45%) runtime.sweepClass.split 14 -> 11 (-21.43%) strconv strconv.ryuFtoaShortest changed strconv [cmd/compile] strconv.ryuFtoaShortest changed math/big math/big.(*Int).MulRange 255 -> 252 (-1.18%) testing/quick testing/quick.sizedValue changed internal/fuzz internal/fuzz.(*pcgRand).bool 69 -> 70 (+1.45%) cmd/internal/obj/x86 cmd/internal/obj/x86.(*AsmBuf).asmevex changed math/big [cmd/compile] math/big.(*Int).MulRange 255 -> 252 (-1.18%) cmd/internal/obj/x86 [cmd/compile] cmd/internal/obj/x86.(*AsmBuf).asmevex changed net/http net/http.(*http2stream).isPushed 11 -> 10 (-9.09%) cmd/vendor/github.com/google/pprof/internal/binutils cmd/vendor/github.com/google/pprof/internal/binutils.(*file).computeBase changed Change-Id: I9cb2987eb263c85ee4e93d6f8455c91a55273173 Reviewed-on: https://go-review.googlesource.com/c/go/+/640975 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
78e6f2a1c8 |
runtime: rename mapiterinit and mapiternext
mapiterinit allows external linkname. These users must allocate their own iter struct for initialization by mapiterinit. Since the type is unexported, they also must define the struct themselves. As a result, they of course define the struct matching the old hiter definition (in map_noswiss.go). The old definition is smaller on 32-bit platforms. On those platforms, mapiternext will clobber memory outside of the caller's allocation. On all platforms, the pointer layout between the old hiter and new maps.Iter does not match. Thus the GC may miss pointers and free reachable objects early, or it may see non-pointers that look like heap pointers and throw due to invalid references to free objects. To avoid these issues, we must keep mapiterinit and mapiternext with the old hiter definition. The most straightforward way to do this is to use mapiterinit and mapiternext as a compatibility layer between the old and new iter types. The first step to that is to move normal map use off of these functions, which is what this CL does. Introduce new mapIterStart and mapIterNext functions that replace the former functions everywhere in the toolchain. These have the same behavior as the old functions. This CL temporarily makes the old functions throw to ensure we don't have hidden dependencies on them. We cannot remove them entirely because GOEXPERIMENT=noswissmap still uses the old names, and internal/goobj requires all builtins to exist regardless of GOEXPERIMENT. The next CL will introduce the compatibility layer. I want to avoid using linkname between runtime and reflect, as that would also allow external linknames. So mapIterStart and mapIterNext are duplicated in reflect, which can be done trivially, as it imports internal/runtime/maps. For #71408. Change-Id: I6a6a636c6d4bd1392618c67ca648d3f061afe669 Reviewed-on: https://go-review.googlesource.com/c/go/+/643898 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
c5e205e928 |
internal/runtime/maps: re-enable some tests
Re-enable tests for stack-allocated maps and fast map accessors. Those are implemented now. Update #54766 Change-Id: I8c019702bd9fb077b2fe3f7c78e8e9e10d2263a6 Reviewed-on: https://go-review.googlesource.com/c/go/+/642376 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Keith Randall <khr@golang.org> |
||
|
44a6f817ea |
cmd/compile: fix write barrier coalescing
We can't coalesce a non-WB store with a subsequent Move, as the result of the store might be the source of the move. There's a simple codegen test. Not sure how we might do a real test, as all the repro's I've come up with are very expensive and unreliable. Fixes #71228 Change-Id: If18bf181a266b9b90964e2591cd2e61a7168371c Reviewed-on: https://go-review.googlesource.com/c/go/+/642197 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> |
||
|
c4e6ab9750 |
cmd/compile: modify CSE to remove redundant OpLocalAddrs
Remove the OpLocalAddrs that are unnecessary in the CSE pass, so the following passes like DSE and memcombine can do its work better. Fixes #70300 Change-Id: I600025d49eeadb3ca4f092d614428399750f69bc Reviewed-on: https://go-review.googlesource.com/c/go/+/628075 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
f0b0109242 |
cmd/compile: pull multiple adds out of an unsafe.Pointer<->uintptr conversion
This came up in some swissmap code. Change-Id: I3c6705a5cafec8cb4953dfa9535cf0b45255cc83 Reviewed-on: https://go-review.googlesource.com/c/go/+/629516 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> |
||
|
ab55465098 |
cmd/compile: wire up math/bits.TrailingZeros intrinsics for loong64
Micro-benchmark results on Loongson 3A5000 and 3A6000: goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | TrailingZeros 1.7240n ± 0% 0.8120n ± 0% -52.90% (p=0.000 n=20) TrailingZeros8 1.0530n ± 0% 0.8015n ± 0% -23.88% (p=0.000 n=20) TrailingZeros16 2.072n ± 0% 1.015n ± 0% -51.01% (p=0.000 n=20) TrailingZeros32 1.7160n ± 0% 0.8122n ± 0% -52.67% (p=0.000 n=20) TrailingZeros64 2.0060n ± 0% 0.8125n ± 0% -59.50% (p=0.000 n=20) geomean 1.669n 0.8470n -49.25% goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A5000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | TrailingZeros 2.6275n ± 0% 0.9120n ± 0% -65.29% (p=0.000 n=20) TrailingZeros8 1.451n ± 0% 1.163n ± 0% -19.85% (p=0.000 n=20) TrailingZeros16 3.069n ± 0% 1.201n ± 0% -60.87% (p=0.000 n=20) TrailingZeros32 2.9060n ± 0% 0.9115n ± 0% -68.63% (p=0.000 n=20) TrailingZeros64 2.6305n ± 0% 0.9115n ± 0% -65.35% (p=0.000 n=20) geomean 2.456n 1.011n -58.83% This patch is a copy of CL 479498. Co-authored-by: WANG Xuerui <git@xen0n.name> Change-Id: I1a5b2114a844dc0d02c8e68f41ce2443ac3b5fda Reviewed-on: https://go-review.googlesource.com/c/go/+/624356 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
745ec75719 |
cmd/compile/internal/ssa: improve carry addition rules on PPC64
Fold constant int16 addends for usages of math/bits.Add64(x,const,0) on PPC64. This usage shows up in a few crypto implementations; notably the go wrapper for CL 626176. Change-Id: I6963163330487d04e0479b4fdac235f97bb96889 Reviewed-on: https://go-review.googlesource.com/c/go/+/625899 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
fb9b946adc |
cmd/compile: optimize math/bits.OnesCount{16,32,64} implementation on loong64
Use Loong64's LSX instruction VPCNT to implement math/bits.OnesCount{16,32,64} and make it intrinsic. Benchmark results on loongson 3A5000 and 3A6000 machines: goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A5000-HV @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | OnesCount 4.413n ± 0% 1.401n ± 0% -68.25% (p=0.000 n=10) OnesCount8 1.364n ± 0% 1.363n ± 0% ~ (p=0.130 n=10) OnesCount16 2.112n ± 0% 1.534n ± 0% -27.37% (p=0.000 n=10) OnesCount32 4.533n ± 0% 1.529n ± 0% -66.27% (p=0.000 n=10) OnesCount64 4.565n ± 0% 1.531n ± 1% -66.46% (p=0.000 n=10) geomean 3.048n 1.470n -51.78% goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | OnesCount 3.553n ± 0% 1.201n ± 0% -66.20% (p=0.000 n=10) OnesCount8 0.8021n ± 0% 0.8004n ± 0% -0.21% (p=0.000 n=10) OnesCount16 1.216n ± 0% 1.000n ± 0% -17.76% (p=0.000 n=10) OnesCount32 3.006n ± 0% 1.035n ± 0% -65.57% (p=0.000 n=10) OnesCount64 3.503n ± 0% 1.035n ± 0% -70.45% (p=0.000 n=10) geomean 2.053n 1.006n -51.01% Change-Id: I07a5b8da2bb48711b896387ec7625145804affc8 Reviewed-on: https://go-review.googlesource.com/c/go/+/620978 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
583d750fa1 |
cmd/compile: wire up bits.Reverse intrinsics for loong64
Micro-benchmark results on Loongson 3A5000 and 3A6000: goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A6000 @ 2500.00MHz | CL 624576 | this CL | | sec/op | sec/op vs base | Reverse 2.8130n ± 0% 0.8008n ± 0% -71.53% (p=0.000 n=20) Reverse8 0.7014n ± 0% 0.4040n ± 0% -42.40% (p=0.000 n=20) Reverse16 1.2975n ± 0% 0.6632n ± 1% -48.89% (p=0.000 n=20) Reverse32 2.7520n ± 0% 0.4042n ± 0% -85.31% (p=0.000 n=20) Reverse64 2.8970n ± 0% 0.4041n ± 0% -86.05% (p=0.000 n=20) geomean 1.828n 0.5116n -72.01% goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A5000 @ 2500.00MHz | CL 624576 | this CL | | sec/op | sec/op vs base | Reverse 4.0050n ± 0% 0.8011n ± 0% -80.00% (p=0.000 n=20) Reverse8 0.8010n ± 0% 0.5210n ± 1% -34.96% (p=0.000 n=20) Reverse16 1.6160n ± 0% 0.6008n ± 0% -62.82% (p=0.000 n=20) Reverse32 3.8550n ± 0% 0.5179n ± 0% -86.57% (p=0.000 n=20) Reverse64 3.8050n ± 0% 0.5177n ± 0% -86.40% (p=0.000 n=20) geomean 2.378n 0.5828n -75.49% Updates #59120 This patch is a copy of CL 483656. Co-authored-by: WANG Xuerui <git@xen0n.name> Change-Id: I98681091763279279c8404bd0295785f13ea1c8e Reviewed-on: https://go-review.googlesource.com/c/go/+/624276 Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> |
||
|
e6cc9d228a |
cmd/compile: implement FMA codegen for loong64
Benchmark results on Loongson 3A5000 and 3A6000: goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | FMA 25.930n ± 0% 2.002n ± 0% -92.28% (p=0.000 n=10) goos: linux goarch: loong64 pkg: math cpu: Loongson-3A5000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | FMA 32.840n ± 0% 2.002n ± 0% -93.90% (p=0.000 n=10) Updates #59120 This patch is a copy of CL 483355. Co-authored-by: WANG Xuerui <git@xen0n.name> Change-Id: I88b89d23f00864f9173a182a47ee135afec7ed6e Reviewed-on: https://go-review.googlesource.com/c/go/+/625335 Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> |
||
|
d6fb0ab2c7 |
cmd/compile: wire up Bswap/ReverseBytes intrinsics for loong64
Micro-benchmark results on Loongson 3A5000 and 3A6000: goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | ReverseBytes 2.0020n ± 0% 0.4040n ± 0% -79.82% (p=0.000 n=20) ReverseBytes16 0.8866n ± 1% 0.8007n ± 0% -9.69% (p=0.000 n=20) ReverseBytes32 1.2195n ± 0% 0.8007n ± 0% -34.34% (p=0.000 n=20) ReverseBytes64 2.0705n ± 0% 0.8008n ± 0% -61.32% (p=0.000 n=20) geomean 1.455n 0.6749n -53.62% goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A5000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | ReverseBytes 2.8040n ± 0% 0.5205n ± 0% -81.44% (p=0.000 n=20) ReverseBytes16 0.7066n ± 0% 0.8011n ± 0% +13.37% (p=0.000 n=20) ReverseBytes32 1.5500n ± 0% 0.8010n ± 0% -48.32% (p=0.000 n=20) ReverseBytes64 2.7665n ± 0% 0.8010n ± 0% -71.05% (p=0.000 n=20) geomean 1.707n 0.7192n -57.87% Updates #59120 This patch is a copy of CL 483357. Co-authored-by: WANG Xuerui <git@xen0n.name> Change-Id: If355354cd031533df91991fcc3392e5a6c314295 Reviewed-on: https://go-review.googlesource.com/c/go/+/624576 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> |
||
|
d98c51809d |
cmd/compile: wire up math/bits.Len intrinsics for loong64
For the SubFromLen64 codegen test case to work as intended, we need to fold c-(-(x-d)) into x+(c-d). Still, some instances of LeadingZeros are not optimized into single CLZ instructions right now (actually, the LeadingZeros micro-benchmarks are currently still compiled with redundant adds/subs of 64, due to interference of loop optimizations before lowering), but perf numbers indicate it's not that bad after all. Micro-benchmark results on Loongson 3A5000 and 3A6000: goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A5000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | LeadingZeros 3.660n ± 0% 1.348n ± 0% -63.17% (p=0.000 n=20) LeadingZeros8 1.777n ± 0% 1.767n ± 0% -0.56% (p=0.000 n=20) LeadingZeros16 2.816n ± 0% 1.770n ± 0% -37.14% (p=0.000 n=20) LeadingZeros32 5.293n ± 1% 1.683n ± 0% -68.21% (p=0.000 n=20) LeadingZeros64 3.622n ± 0% 1.349n ± 0% -62.76% (p=0.000 n=20) geomean 3.229n 1.571n -51.35% goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | LeadingZeros 2.410n ± 0% 1.103n ± 1% -54.23% (p=0.000 n=20) LeadingZeros8 1.236n ± 0% 1.501n ± 0% +21.44% (p=0.000 n=20) LeadingZeros16 2.106n ± 0% 1.501n ± 0% -28.73% (p=0.000 n=20) LeadingZeros32 2.860n ± 0% 1.324n ± 0% -53.72% (p=0.000 n=20) LeadingZeros64 2.6135n ± 0% 0.9509n ± 0% -63.62% (p=0.000 n=20) geomean 2.159n 1.256n -41.81% Updates #59120 This patch is a copy of CL 483356. Co-authored-by: WANG Xuerui <git@xen0n.name> Change-Id: Iee81a17f7da06d77a427e73dfcc016f2b15ae556 Reviewed-on: https://go-review.googlesource.com/c/go/+/624575 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn> |
||
|
5f88755f43 |
cmd/compile: add loong64-specific inlining for runtime.memmove
goos: linux goarch: loong64 pkg: runtime cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | Memmove/0 0.8004n ± 0% 0.4002n ± 0% -50.00% (p=0.000 n=20) Memmove/1 2.494n ± 0% 2.136n ± 0% -14.35% (p=0.000 n=20) Memmove/2 2.802n ± 0% 2.512n ± 0% -10.35% (p=0.000 n=20) Memmove/3 2.802n ± 0% 2.497n ± 0% -10.92% (p=0.000 n=20) Memmove/4 3.202n ± 0% 2.808n ± 0% -12.30% (p=0.000 n=20) Memmove/5 2.821n ± 0% 2.658n ± 0% -5.76% (p=0.000 n=20) Memmove/6 2.819n ± 0% 2.657n ± 0% -5.73% (p=0.000 n=20) Memmove/7 2.820n ± 0% 2.654n ± 0% -5.87% (p=0.000 n=20) Memmove/8 3.202n ± 0% 2.814n ± 0% -12.12% (p=0.000 n=20) Memmove/9 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20) Memmove/10 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20) Memmove/11 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20) Memmove/12 3.202n ± 0% 3.010n ± 0% -6.01% (p=0.000 n=20) Memmove/13 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20) Memmove/14 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20) Memmove/15 3.202n ± 0% 3.010n ± 0% -6.01% (p=0.000 n=20) Memmove/16 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20) Memmove/32 3.602n ± 0% 3.603n ± 0% +0.03% (p=0.000 n=20) Memmove/64 4.202n ± 0% 4.204n ± 0% +0.05% (p=0.000 n=20) Memmove/128 8.005n ± 0% 8.007n ± 0% +0.02% (p=0.000 n=20) Memmove/256 11.21n ± 0% 10.81n ± 0% -3.57% (p=0.000 n=20) Memmove/512 17.65n ± 0% 17.96n ± 0% +1.73% (p=0.000 n=20) Memmove/1024 30.48n ± 0% 30.46n ± 0% -0.07% (p=0.000 n=20) Memmove/2048 56.43n ± 0% 56.30n ± 0% -0.24% (p=0.000 n=20) Memmove/4096 107.7n ± 0% 107.6n ± 0% -0.09% (p=0.000 n=20) MemmoveOverlap/32 4.002n ± 0% 4.003n ± 0% +0.02% (p=0.002 n=20) MemmoveOverlap/64 4.603n ± 0% 4.603n ± 0% ~ (p=0.286 n=20) MemmoveOverlap/128 8.704n ± 0% 8.699n ± 0% ~ (p=0.180 n=20) MemmoveOverlap/256 12.01n ± 0% 11.76n ± 0% -2.08% (p=0.000 n=20) MemmoveOverlap/512 18.42n ± 0% 18.36n ± 0% -0.33% (p=0.000 n=20) MemmoveOverlap/1024 31.23n ± 0% 31.16n ± 0% -0.21% (p=0.000 n=20) MemmoveOverlap/2048 57.42n ± 0% 56.82n ± 0% -1.04% (p=0.000 n=20) MemmoveOverlap/4096 108.5n ± 0% 108.0n ± 0% -0.46% (p=0.000 n=20) MemmoveUnalignedDst/0 2.804n ± 0% 2.447n ± 0% -12.70% (p=0.000 n=20) MemmoveUnalignedDst/1 2.802n ± 0% 2.491n ± 0% -11.12% (p=0.000 n=20) MemmoveUnalignedDst/2 3.202n ± 0% 2.808n ± 0% -12.29% (p=0.000 n=20) MemmoveUnalignedDst/3 3.202n ± 0% 2.814n ± 0% -12.12% (p=0.000 n=20) MemmoveUnalignedDst/4 3.602n ± 0% 3.202n ± 0% -11.10% (p=0.000 n=20) MemmoveUnalignedDst/5 3.202n ± 0% 3.203n ± 0% +0.03% (p=0.014 n=20) MemmoveUnalignedDst/6 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedDst/7 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedDst/8 3.602n ± 0% 3.202n ± 0% -11.10% (p=0.000 n=20) MemmoveUnalignedDst/9 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedDst/10 3.602n ± 0% 3.602n ± 0% ~ (p=0.091 n=20) MemmoveUnalignedDst/11 3.602n ± 0% 3.602n ± 0% ~ (p=0.613 n=20) MemmoveUnalignedDst/12 3.602n ± 0% 3.602n ± 0% ~ (p=0.165 n=20) MemmoveUnalignedDst/13 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedDst/14 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedDst/15 3.602n ± 0% 3.602n ± 0% 0.00% (p=0.027 n=20) MemmoveUnalignedDst/16 3.602n ± 0% 3.602n ± 0% ~ (p=0.661 n=20) MemmoveUnalignedDst/32 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedDst/64 6.804n ± 0% 6.804n ± 0% ~ (p=0.204 n=20) MemmoveUnalignedDst/128 12.61n ± 0% 12.61n ± 0% ~ (p=1.000 n=20) ¹ MemmoveUnalignedDst/256 16.33n ± 2% 16.32n ± 2% ~ (p=0.839 n=20) MemmoveUnalignedDst/512 25.61n ± 0% 24.71n ± 0% -3.51% (p=0.000 n=20) MemmoveUnalignedDst/1024 42.81n ± 0% 42.82n ± 0% ~ (p=0.973 n=20) MemmoveUnalignedDst/2048 74.86n ± 0% 76.03n ± 0% +1.56% (p=0.000 n=20) MemmoveUnalignedDst/4096 152.0n ± 11% 152.0n ± 0% 0.00% (p=0.013 n=20) MemmoveUnalignedDstOverlap/32 5.319n ± 0% 5.558n ± 1% +4.50% (p=0.000 n=20) MemmoveUnalignedDstOverlap/64 8.006n ± 0% 8.025n ± 0% +0.24% (p=0.000 n=20) MemmoveUnalignedDstOverlap/128 9.631n ± 0% 9.601n ± 0% -0.31% (p=0.000 n=20) MemmoveUnalignedDstOverlap/256 13.79n ± 2% 13.58n ± 1% ~ (p=0.234 n=20) MemmoveUnalignedDstOverlap/512 21.38n ± 0% 21.30n ± 0% -0.37% (p=0.000 n=20) MemmoveUnalignedDstOverlap/1024 41.71n ± 0% 41.70n ± 0% ~ (p=0.887 n=20) MemmoveUnalignedDstOverlap/2048 81.63n ± 0% 81.61n ± 0% ~ (p=0.481 n=20) MemmoveUnalignedDstOverlap/4096 162.6n ± 0% 162.6n ± 0% ~ (p=0.171 n=20) MemmoveUnalignedSrc/0 2.808n ± 0% 2.482n ± 0% -11.61% (p=0.000 n=20) MemmoveUnalignedSrc/1 2.804n ± 0% 2.577n ± 0% -8.08% (p=0.000 n=20) MemmoveUnalignedSrc/2 3.202n ± 0% 2.806n ± 0% -12.37% (p=0.000 n=20) MemmoveUnalignedSrc/3 3.202n ± 0% 2.808n ± 0% -12.30% (p=0.000 n=20) MemmoveUnalignedSrc/4 3.602n ± 0% 3.202n ± 0% -11.10% (p=0.000 n=20) MemmoveUnalignedSrc/5 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrc/6 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrc/7 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrc/8 3.602n ± 0% 3.202n ± 0% -11.10% (p=0.000 n=20) MemmoveUnalignedSrc/9 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrc/10 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrc/11 3.602n ± 0% 3.602n ± 0% ~ (p=0.746 n=20) MemmoveUnalignedSrc/12 3.602n ± 0% 3.602n ± 0% ~ (p=0.407 n=20) MemmoveUnalignedSrc/13 3.603n ± 0% 3.602n ± 0% -0.03% (p=0.001 n=20) MemmoveUnalignedSrc/14 3.603n ± 0% 3.602n ± 0% -0.01% (p=0.013 n=20) MemmoveUnalignedSrc/15 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrc/16 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrc/32 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrc/64 4.803n ± 0% 4.803n ± 0% 0.00% (p=0.008 n=20) MemmoveUnalignedSrc/128 8.405n ± 0% 8.405n ± 0% 0.00% (p=0.003 n=20) MemmoveUnalignedSrc/256 12.04n ± 3% 12.20n ± 2% ~ (p=0.151 n=20) MemmoveUnalignedSrc/512 19.11n ± 0% 19.10n ± 3% ~ (p=0.621 n=20) MemmoveUnalignedSrc/1024 35.62n ± 0% 35.62n ± 0% ~ (p=0.407 n=20) MemmoveUnalignedSrc/2048 68.04n ± 0% 68.35n ± 0% +0.46% (p=0.000 n=20) MemmoveUnalignedSrc/4096 133.2n ± 1% 133.3n ± 0% ~ (p=0.131 n=20) MemmoveUnalignedSrcDst/f_16_0 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrcDst/b_16_0 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrcDst/f_16_1 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrcDst/b_16_1 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrcDst/f_16_4 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrcDst/b_16_4 4.202n ± 0% 4.202n ± 0% ~ (p=0.661 n=20) MemmoveUnalignedSrcDst/f_16_7 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrcDst/b_16_7 4.203n ± 0% 4.202n ± 0% -0.02% (p=0.008 n=20) MemmoveUnalignedSrcDst/f_64_0 6.103n ± 0% 6.100n ± 0% ~ (p=0.595 n=20) MemmoveUnalignedSrcDst/b_64_0 6.103n ± 0% 6.102n ± 0% ~ (p=0.973 n=20) MemmoveUnalignedSrcDst/f_64_1 7.419n ± 0% 7.226n ± 0% -2.59% (p=0.000 n=20) MemmoveUnalignedSrcDst/b_64_1 6.745n ± 0% 6.941n ± 0% +2.89% (p=0.000 n=20) MemmoveUnalignedSrcDst/f_64_4 7.420n ± 0% 7.223n ± 0% -2.65% (p=0.000 n=20) MemmoveUnalignedSrcDst/b_64_4 6.753n ± 0% 6.941n ± 0% +2.79% (p=0.000 n=20) MemmoveUnalignedSrcDst/f_64_7 7.423n ± 0% 7.204n ± 0% -2.96% (p=0.000 n=20) MemmoveUnalignedSrcDst/b_64_7 6.750n ± 0% 6.941n ± 0% +2.83% (p=0.000 n=20) MemmoveUnalignedSrcDst/f_256_0 12.96n ± 0% 12.99n ± 0% +0.27% (p=0.000 n=20) MemmoveUnalignedSrcDst/b_256_0 12.91n ± 0% 12.94n ± 0% +0.23% (p=0.000 n=20) MemmoveUnalignedSrcDst/f_256_1 17.21n ± 0% 17.21n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrcDst/b_256_1 17.61n ± 0% 17.61n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrcDst/f_256_4 16.21n ± 0% 16.21n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrcDst/b_256_4 16.41n ± 0% 16.41n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrcDst/f_256_7 14.12n ± 0% 14.10n ± 0% ~ (p=0.307 n=20) MemmoveUnalignedSrcDst/b_256_7 14.81n ± 0% 14.81n ± 0% ~ (p=1.000 n=20) ¹ MemmoveUnalignedSrcDst/f_4096_0 109.3n ± 0% 109.4n ± 0% +0.09% (p=0.004 n=20) MemmoveUnalignedSrcDst/b_4096_0 109.6n ± 0% 109.6n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrcDst/f_4096_1 113.5n ± 0% 113.5n ± 0% ~ (p=1.000 n=20) MemmoveUnalignedSrcDst/b_4096_1 113.7n ± 0% 113.7n ± 0% ~ (p=1.000 n=20) ¹ MemmoveUnalignedSrcDst/f_4096_4 112.3n ± 0% 112.3n ± 0% ~ (p=0.763 n=20) MemmoveUnalignedSrcDst/b_4096_4 112.6n ± 0% 112.9n ± 1% +0.31% (p=0.032 n=20) MemmoveUnalignedSrcDst/f_4096_7 110.6n ± 0% 110.6n ± 0% ~ (p=1.000 n=20) ¹ MemmoveUnalignedSrcDst/b_4096_7 111.1n ± 0% 111.1n ± 0% ~ (p=1.000 n=20) ¹ MemmoveUnalignedSrcDst/f_65536_0 4.801µ ± 0% 4.818µ ± 0% +0.34% (p=0.000 n=20) MemmoveUnalignedSrcDst/b_65536_0 5.027µ ± 0% 5.036µ ± 0% +0.19% (p=0.007 n=20) MemmoveUnalignedSrcDst/f_65536_1 4.815µ ± 0% 4.729µ ± 0% -1.78% (p=0.000 n=20) MemmoveUnalignedSrcDst/b_65536_1 4.659µ ± 0% 4.737µ ± 1% +1.69% (p=0.000 n=20) MemmoveUnalignedSrcDst/f_65536_4 4.807µ ± 0% 4.721µ ± 0% -1.78% (p=0.000 n=20) MemmoveUnalignedSrcDst/b_65536_4 4.659µ ± 0% 4.601µ ± 0% -1.23% (p=0.000 n=20) MemmoveUnalignedSrcDst/f_65536_7 4.868µ ± 0% 4.759µ ± 0% -2.23% (p=0.000 n=20) MemmoveUnalignedSrcDst/b_65536_7 4.665µ ± 0% 4.709µ ± 0% +0.93% (p=0.000 n=20) MemmoveUnalignedSrcOverlap/32 6.804n ± 0% 6.810n ± 0% +0.09% (p=0.000 n=20) MemmoveUnalignedSrcOverlap/64 10.41n ± 0% 10.42n ± 0% +0.10% (p=0.000 n=20) MemmoveUnalignedSrcOverlap/128 11.59n ± 0% 11.58n ± 0% ~ (p=0.414 n=20) MemmoveUnalignedSrcOverlap/256 14.22n ± 0% 14.29n ± 0% +0.46% (p=0.000 n=20) MemmoveUnalignedSrcOverlap/512 23.11n ± 0% 23.04n ± 0% -0.28% (p=0.001 n=20) MemmoveUnalignedSrcOverlap/1024 41.44n ± 0% 41.47n ± 0% ~ (p=0.693 n=20) MemmoveUnalignedSrcOverlap/2048 81.25n ± 0% 81.25n ± 0% ~ (p=0.405 n=20) MemmoveUnalignedSrcOverlap/4096 166.1n ± 0% 166.1n ± 0% ~ (p=0.451 n=20) geomean 13.02n 12.69n -2.51% ¹ all samples are equal Change-Id: I712adc7670f6ae360714ec5a770d00d76c8700ed Reviewed-on: https://go-review.googlesource.com/c/go/+/618815 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> |
||
|
aef81a7551 |
cmd/compile: add rules to optimize go codes to constant 0 on loong64
goos: linux goarch: loong64 pkg: test/bench/go1 cpu: Loongson-3A6000 @ 2500.00MHz │ old.bench │ new.bench │ │ sec/op │ sec/op vs base │ BinaryTree17 7.735 ± 1% 7.716 ± 1% -0.23% (p=0.041 n=15) Fannkuch11 2.645 ± 0% 2.646 ± 0% +0.05% (p=0.013 n=15) FmtFprintfEmpty 35.87n ± 0% 35.89n ± 0% +0.06% (p=0.000 n=15) FmtFprintfString 59.54n ± 0% 59.47n ± 0% ~ (p=0.213 n=15) FmtFprintfInt 62.23n ± 0% 62.06n ± 0% ~ (p=0.212 n=15) FmtFprintfIntInt 98.16n ± 0% 97.90n ± 0% -0.26% (p=0.000 n=15) FmtFprintfPrefixedInt 117.0n ± 0% 116.7n ± 0% -0.26% (p=0.000 n=15) FmtFprintfFloat 204.6n ± 0% 204.2n ± 0% -0.20% (p=0.000 n=15) FmtManyArgs 456.3n ± 0% 455.4n ± 0% -0.20% (p=0.000 n=15) GobDecode 7.210m ± 0% 7.156m ± 1% -0.75% (p=0.000 n=15) GobEncode 8.143m ± 1% 8.177m ± 1% ~ (p=0.806 n=15) Gzip 280.2m ± 0% 279.7m ± 0% -0.19% (p=0.005 n=15) Gunzip 32.71m ± 0% 32.65m ± 0% -0.19% (p=0.000 n=15) HTTPClientServer 53.76µ ± 0% 53.65µ ± 0% ~ (p=0.083 n=15) JSONEncode 9.297m ± 0% 9.295m ± 0% ~ (p=0.806 n=15) JSONDecode 46.97m ± 1% 47.07m ± 1% ~ (p=0.683 n=15) Mandelbrot200 4.602m ± 0% 4.600m ± 0% -0.05% (p=0.001 n=15) GoParse 4.682m ± 0% 4.670m ± 1% -0.25% (p=0.001 n=15) RegexpMatchEasy0_32 59.80n ± 0% 59.63n ± 0% -0.28% (p=0.000 n=15) RegexpMatchEasy0_1K 458.3n ± 0% 457.3n ± 0% -0.22% (p=0.001 n=15) RegexpMatchEasy1_32 59.39n ± 0% 59.23n ± 0% -0.27% (p=0.000 n=15) RegexpMatchEasy1_1K 557.9n ± 0% 556.6n ± 0% -0.23% (p=0.001 n=15) RegexpMatchMedium_32 803.6n ± 0% 801.8n ± 0% -0.22% (p=0.001 n=15) RegexpMatchMedium_1K 27.32µ ± 0% 27.26µ ± 0% -0.21% (p=0.000 n=15) RegexpMatchHard_32 1.385µ ± 0% 1.382µ ± 0% -0.22% (p=0.000 n=15) RegexpMatchHard_1K 40.93µ ± 0% 40.83µ ± 0% -0.24% (p=0.000 n=15) Revcomp 474.8m ± 0% 474.3m ± 0% ~ (p=0.250 n=15) Template 77.41m ± 1% 76.63m ± 1% -1.01% (p=0.023 n=15) TimeParse 271.1n ± 0% 271.2n ± 0% +0.04% (p=0.022 n=15) TimeFormat 290.0n ± 0% 289.8n ± 0% ~ (p=0.118 n=15) geomean 51.73µ 51.64µ -0.18% Change-Id: I45a1e6c85bb3cea0f62766ec932432803e9af10a Reviewed-on: https://go-review.googlesource.com/c/go/+/619315 Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> |
||
|
bb07aa644b |
cmd/compile: add shift optimization test
For #69635 Change-Id: Id5696dc9724c3b3afcd7b60a6994f98c5309eb0e Reviewed-on: https://go-review.googlesource.com/c/go/+/621755 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> |
||
|
711552e98a |
cmd/compile: optimize type switch for a single runtime known type with a case var
Change-Id: I03ba70076d6dd3c0b9624d14699b7dd91a3c0e9b Reviewed-on: https://go-review.googlesource.com/c/go/+/618476 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> |
||
|
1846dd5a31 |
cmd/compile/internal/ssa: fix PPC64 shift codegen regression
CL 621357 introduced new generic lowering rules which caused several shift related codegen test failures. Add new rules to fix the test regressions, and cleanup tests which are changed but not regressed. Some CLRLSLDI tests are removed as they are no test CLRLSLDI rules. Fixes #70003 Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034 Reviewed-on: https://go-review.googlesource.com/c/go/+/622236 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
91d07ac71c |
cmd/compile: inline constant sized memclrNoHeapPointers calls on loong64
Tested that on loong64, the optimization effect is negative for constant size cases greater than 512. So only enable inlining for constant size cases less than 512. goos: linux goarch: loong64 pkg: runtime cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | MemclrKnownSize1 2.4070n ± 0% 0.4004n ± 0% -83.37% (p=0.000 n=20) MemclrKnownSize2 2.1365n ± 0% 0.4004n ± 0% -81.26% (p=0.000 n=20) MemclrKnownSize4 2.4445n ± 0% 0.4004n ± 0% -83.62% (p=0.000 n=20) MemclrKnownSize8 2.4200n ± 0% 0.4004n ± 0% -83.45% (p=0.000 n=20) MemclrKnownSize16 2.8030n ± 0% 0.8007n ± 0% -71.43% (p=0.000 n=20) MemclrKnownSize32 2.803n ± 0% 1.602n ± 0% -42.85% (p=0.000 n=20) MemclrKnownSize64 3.250n ± 0% 2.402n ± 0% -26.08% (p=0.000 n=20) MemclrKnownSize112 6.006n ± 0% 2.819n ± 0% -53.06% (p=0.000 n=20) MemclrKnownSize128 6.006n ± 0% 3.240n ± 0% -46.05% (p=0.000 n=20) MemclrKnownSize192 6.807n ± 0% 5.205n ± 0% -23.53% (p=0.000 n=20) MemclrKnownSize248 7.608n ± 0% 6.301n ± 0% -17.19% (p=0.000 n=20) MemclrKnownSize256 7.608n ± 0% 6.707n ± 0% -11.84% (p=0.000 n=20) MemclrKnownSize512 13.61n ± 0% 13.61n ± 0% ~ (p=0.374 n=20) MemclrKnownSize1024 26.43n ± 0% 26.43n ± 0% ~ (p=0.826 n=20) MemclrKnownSize4096 103.3n ± 0% 103.3n ± 0% ~ (p=1.000 n=20) MemclrKnownSize512KiB 26.29µ ± 0% 26.29µ ± 0% -0.00% (p=0.012 n=20) geomean 10.05n 5.006n -50.18% | bench.old | bench.new | | B/s | B/s vs base | MemclrKnownSize1 396.2Mi ± 0% 2381.9Mi ± 0% +501.21% (p=0.000 n=20) MemclrKnownSize2 892.8Mi ± 0% 4764.0Mi ± 0% +433.59% (p=0.000 n=20) MemclrKnownSize4 1.524Gi ± 0% 9.305Gi ± 0% +510.56% (p=0.000 n=20) MemclrKnownSize8 3.079Gi ± 0% 18.609Gi ± 0% +504.42% (p=0.000 n=20) MemclrKnownSize16 5.316Gi ± 0% 18.609Gi ± 0% +250.05% (p=0.000 n=20) MemclrKnownSize32 10.63Gi ± 0% 18.61Gi ± 0% +75.00% (p=0.000 n=20) MemclrKnownSize64 18.34Gi ± 0% 24.81Gi ± 0% +35.27% (p=0.000 n=20) MemclrKnownSize112 17.37Gi ± 0% 37.01Gi ± 0% +113.08% (p=0.000 n=20) MemclrKnownSize128 19.85Gi ± 0% 36.80Gi ± 0% +85.39% (p=0.000 n=20) MemclrKnownSize192 26.27Gi ± 0% 34.35Gi ± 0% +30.77% (p=0.000 n=20) MemclrKnownSize248 30.36Gi ± 0% 36.66Gi ± 0% +20.75% (p=0.000 n=20) MemclrKnownSize256 31.34Gi ± 0% 35.55Gi ± 0% +13.43% (p=0.000 n=20) MemclrKnownSize512 35.02Gi ± 0% 35.03Gi ± 0% +0.00% (p=0.030 n=20) MemclrKnownSize1024 36.09Gi ± 0% 36.09Gi ± 0% ~ (p=0.101 n=20) MemclrKnownSize4096 36.93Gi ± 0% 36.93Gi ± 0% +0.00% (p=0.003 n=20) MemclrKnownSize512KiB 18.57Gi ± 0% 18.57Gi ± 0% +0.00% (p=0.041 n=20) geomean 10.13Gi 20.33Gi +100.72% Change-Id: I460a56f7ccc9f820ca2c1934c1c517b9614809ac Reviewed-on: https://go-review.googlesource.com/c/go/+/621355 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Pratt <mpratt@google.com> |
||
|
74163c895a |
cmd/compile: use STP/LDP around morestack on arm64
The spill/restore code around morestack is almost never exectued, so we should make it as small as possible. Using 2-register loads/stores makes sense here. Also, the offsets from SP are pretty small so the offset almost always fits in the (smaller than a normal load/store) offset field of the instruction. Makes cmd/go 0.6% smaller. Change-Id: I8845283c1b269a259498153924428f6173bda293 Reviewed-on: https://go-review.googlesource.com/c/go/+/621556 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
||
|
ef3e1dae2f |
cmd/compile: optimize loong64 with register indexed load/store
goos: linux goarch: loong64 pkg: test/bench/go1 cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | BinaryTree17 7.766 ± 1% 7.640 ± 2% -1.62% (p=0.000 n=20) Fannkuch11 2.649 ± 0% 2.358 ± 0% -10.96% (p=0.000 n=20) FmtFprintfEmpty 35.89n ± 0% 35.87n ± 0% -0.06% (p=0.000 n=20) FmtFprintfString 59.44n ± 0% 57.25n ± 2% -3.68% (p=0.000 n=20) FmtFprintfInt 62.07n ± 0% 60.04n ± 0% -3.27% (p=0.000 n=20) FmtFprintfIntInt 97.90n ± 0% 97.26n ± 0% -0.65% (p=0.000 n=20) FmtFprintfPrefixedInt 116.7n ± 0% 119.2n ± 0% +2.14% (p=0.000 n=20) FmtFprintfFloat 204.5n ± 0% 201.9n ± 0% -1.30% (p=0.000 n=20) FmtManyArgs 455.9n ± 0% 466.8n ± 0% +2.39% (p=0.000 n=20) GobDecode 7.458m ± 1% 7.138m ± 1% -4.28% (p=0.000 n=20) GobEncode 8.573m ± 1% 8.473m ± 1% ~ (p=0.091 n=20) Gzip 280.2m ± 0% 284.9m ± 0% +1.67% (p=0.000 n=20) Gunzip 32.68m ± 0% 32.67m ± 0% ~ (p=0.211 n=20) HTTPClientServer 54.22µ ± 0% 53.24µ ± 0% -1.80% (p=0.000 n=20) JSONEncode 9.427m ± 1% 9.152m ± 0% -2.92% (p=0.000 n=20) JSONDecode 47.08m ± 1% 46.85m ± 1% -0.49% (p=0.007 n=20) Mandelbrot200 4.601m ± 0% 4.605m ± 0% +0.08% (p=0.000 n=20) GoParse 4.776m ± 0% 4.655m ± 1% -2.52% (p=0.000 n=20) RegexpMatchEasy0_32 59.77n ± 0% 57.59n ± 0% -3.66% (p=0.000 n=20) RegexpMatchEasy0_1K 458.1n ± 0% 458.8n ± 0% +0.15% (p=0.000 n=20) RegexpMatchEasy1_32 59.36n ± 0% 59.24n ± 0% -0.20% (p=0.000 n=20) RegexpMatchEasy1_1K 557.7n ± 0% 560.2n ± 0% +0.46% (p=0.000 n=20) RegexpMatchMedium_32 803.1n ± 0% 772.8n ± 0% -3.77% (p=0.000 n=20) RegexpMatchMedium_1K 27.29µ ± 0% 25.88µ ± 0% -5.18% (p=0.000 n=20) RegexpMatchHard_32 1.385µ ± 0% 1.304µ ± 0% -5.85% (p=0.000 n=20) RegexpMatchHard_1K 40.92µ ± 0% 39.58µ ± 0% -3.27% (p=0.000 n=20) Revcomp 474.3m ± 0% 410.0m ± 0% -13.56% (p=0.000 n=20) Template 78.16m ± 0% 76.32m ± 1% -2.36% (p=0.000 n=20) TimeParse 271.8n ± 0% 272.1n ± 0% +0.11% (p=0.000 n=20) TimeFormat 292.3n ± 0% 294.8n ± 0% +0.86% (p=0.000 n=20) geomean 51.98µ 50.82µ -2.22% Change-Id: Ia78f1ddee8f1d9ec7192a4b8d2a4ec6058679956 Reviewed-on: https://go-review.googlesource.com/c/go/+/615918 Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> |
||
|
7e2487cf65 |
cmd/compile: avoid dynamic type when possible
If the expression type is a single compile-time known type, use that type instead of the dynamic one, so the later passes of the compiler could skip un-necessary runtime calls. Thanks Youlin Feng for writing the original test case. Change-Id: I3f65ab90f041474a9731338a82136c1d394c1773 Reviewed-on: https://go-review.googlesource.com/c/go/+/616975 Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
||
|
f243cf6016 |
cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on loong64
Use float <-> int register moves without conversion instead of stores and loads to move float <-> int values like arm64 and mips64. goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz │ bench.old │ bench.new │ │ sec/op │ sec/op vs base │ Acos 15.98n ± 0% 15.94n ± 0% -0.25% (p=0.000 n=20) Acosh 27.75n ± 0% 25.56n ± 0% -7.89% (p=0.000 n=20) Asin 15.85n ± 0% 15.76n ± 0% -0.57% (p=0.000 n=20) Asinh 39.79n ± 0% 37.69n ± 0% -5.28% (p=0.000 n=20) Atan 7.261n ± 0% 7.242n ± 0% -0.27% (p=0.000 n=20) Atanh 28.30n ± 0% 27.62n ± 0% -2.40% (p=0.000 n=20) Atan2 15.85n ± 0% 15.75n ± 0% -0.63% (p=0.000 n=20) Cbrt 27.02n ± 0% 21.08n ± 0% -21.98% (p=0.000 n=20) Ceil 2.830n ± 1% 2.896n ± 1% +2.31% (p=0.000 n=20) Copysign 0.8022n ± 0% 0.8004n ± 0% -0.22% (p=0.000 n=20) Cos 11.64n ± 0% 11.61n ± 0% -0.26% (p=0.000 n=20) Cosh 35.98n ± 0% 33.44n ± 0% -7.05% (p=0.000 n=20) Erf 10.09n ± 0% 10.08n ± 0% -0.10% (p=0.000 n=20) Erfc 11.40n ± 0% 11.35n ± 0% -0.44% (p=0.000 n=20) Erfinv 12.31n ± 0% 12.29n ± 0% -0.16% (p=0.000 n=20) Erfcinv 12.16n ± 0% 12.17n ± 0% +0.08% (p=0.000 n=20) Exp 28.41n ± 0% 26.44n ± 0% -6.95% (p=0.000 n=20) ExpGo 28.68n ± 0% 27.07n ± 0% -5.60% (p=0.000 n=20) Expm1 17.21n ± 0% 16.75n ± 0% -2.67% (p=0.000 n=20) Exp2 24.71n ± 0% 23.01n ± 0% -6.88% (p=0.000 n=20) Exp2Go 25.17n ± 0% 23.91n ± 0% -4.99% (p=0.000 n=20) Abs 0.8004n ± 0% 0.8004n ± 0% ~ (p=0.224 n=20) Dim 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=20) ¹ Floor 2.848n ± 0% 2.859n ± 0% +0.39% (p=0.000 n=20) Max 3.074n ± 0% 3.071n ± 0% ~ (p=0.481 n=20) Min 3.179n ± 0% 3.176n ± 0% -0.09% (p=0.003 n=20) Mod 49.62n ± 0% 44.82n ± 0% -9.67% (p=0.000 n=20) Frexp 7.604n ± 0% 6.803n ± 0% -10.53% (p=0.000 n=20) Gamma 18.01n ± 0% 17.61n ± 0% -2.22% (p=0.000 n=20) Hypot 7.204n ± 0% 7.604n ± 0% +5.55% (p=0.000 n=20) HypotGo 7.204n ± 0% 7.604n ± 0% +5.56% (p=0.000 n=20) Ilogb 6.003n ± 0% 6.003n ± 0% ~ (p=0.407 n=20) J0 76.43n ± 0% 76.24n ± 0% -0.25% (p=0.000 n=20) J1 76.44n ± 0% 76.44n ± 0% ~ (p=1.000 n=20) Jn 168.2n ± 0% 168.5n ± 0% +0.18% (p=0.000 n=20) Ldexp 8.804n ± 0% 7.604n ± 0% -13.63% (p=0.000 n=20) Lgamma 19.01n ± 0% 19.01n ± 0% ~ (p=0.695 n=20) Log 19.38n ± 0% 19.12n ± 0% -1.34% (p=0.000 n=20) Logb 6.003n ± 0% 6.003n ± 0% ~ (p=1.000 n=20) Log1p 18.57n ± 0% 16.72n ± 0% -9.96% (p=0.000 n=20) Log10 20.67n ± 0% 20.45n ± 0% -1.06% (p=0.000 n=20) Log2 9.605n ± 0% 8.804n ± 0% -8.34% (p=0.000 n=20) Modf 4.402n ± 0% 4.402n ± 0% ~ (p=1.000 n=20) Nextafter32 7.204n ± 0% 5.603n ± 0% -22.22% (p=0.000 n=20) Nextafter64 6.803n ± 0% 6.003n ± 0% -11.76% (p=0.000 n=20) PowInt 39.62n ± 0% 37.22n ± 0% -6.06% (p=0.000 n=20) PowFrac 120.9n ± 0% 108.9n ± 0% -9.93% (p=0.000 n=20) Pow10Pos 1.601n ± 0% 1.601n ± 0% ~ (p=0.487 n=20) Pow10Neg 2.675n ± 0% 2.675n ± 0% ~ (p=1.000 n=20) Round 3.018n ± 0% 2.401n ± 0% -20.46% (p=0.000 n=20) RoundToEven 3.822n ± 0% 3.001n ± 0% -21.48% (p=0.000 n=20) Remainder 45.62n ± 0% 42.42n ± 0% -7.01% (p=0.000 n=20) Signbit 0.9075n ± 0% 0.8004n ± 0% -11.81% (p=0.000 n=20) Sin 12.65n ± 0% 12.65n ± 0% ~ (p=0.503 n=20) Sincos 14.81n ± 0% 14.60n ± 0% -1.42% (p=0.000 n=20) Sinh 36.75n ± 0% 35.11n ± 0% -4.46% (p=0.000 n=20) SqrtIndirect 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=20) ¹ SqrtLatency 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20) SqrtIndirectLatency 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20) SqrtGoLatency 52.85n ± 0% 40.82n ± 0% -22.76% (p=0.000 n=20) SqrtPrime 887.4n ± 0% 887.4n ± 0% ~ (p=0.751 n=20) Tan 13.95n ± 0% 13.97n ± 0% +0.18% (p=0.000 n=20) Tanh 36.79n ± 0% 34.89n ± 0% -5.16% (p=0.000 n=20) Trunc 2.849n ± 0% 2.861n ± 0% +0.42% (p=0.000 n=20) Y0 77.44n ± 0% 77.64n ± 0% +0.26% (p=0.000 n=20) Y1 74.41n ± 0% 74.33n ± 0% -0.11% (p=0.000 n=20) Yn 158.7n ± 0% 159.0n ± 0% +0.19% (p=0.000 n=20) Float64bits 0.8774n ± 0% 0.4002n ± 0% -54.39% (p=0.000 n=20) Float64frombits 0.8042n ± 0% 0.4002n ± 0% -50.24% (p=0.000 n=20) Float32bits 1.1230n ± 0% 0.5336n ± 0% -52.48% (p=0.000 n=20) Float32frombits 1.0670n ± 0% 0.8004n ± 0% -24.99% (p=0.000 n=20) FMA 2.001n ± 0% 2.001n ± 0% ~ (p=0.605 n=20) geomean 10.87n 10.10n -7.15% ¹ all samples are equal goos: linux goarch: loong64 pkg: math cpu: Loongson-3A5000 @ 2500.00MHz │ bench.old │ bench.new │ │ sec/op │ sec/op vs base │ Acos 33.10n ± 0% 31.95n ± 2% -3.46% (p=0.000 n=20) Acosh 58.38n ± 0% 50.44n ± 0% -13.60% (p=0.000 n=20) Asin 32.70n ± 0% 31.94n ± 0% -2.32% (p=0.000 n=20) Asinh 57.65n ± 0% 50.83n ± 0% -11.82% (p=0.000 n=20) Atan 14.21n ± 0% 14.21n ± 0% ~ (p=0.501 n=20) Atanh 60.86n ± 0% 54.44n ± 0% -10.56% (p=0.000 n=20) Atan2 32.02n ± 0% 34.02n ± 0% +6.25% (p=0.000 n=20) Cbrt 55.58n ± 0% 40.64n ± 0% -26.88% (p=0.000 n=20) Ceil 9.566n ± 0% 9.566n ± 0% ~ (p=0.463 n=20) Copysign 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.806 n=20) Cos 18.02n ± 0% 18.02n ± 0% ~ (p=0.191 n=20) Cosh 64.44n ± 0% 65.64n ± 0% +1.86% (p=0.000 n=20) Erf 16.15n ± 0% 16.16n ± 0% ~ (p=0.770 n=20) Erfc 18.71n ± 0% 18.83n ± 0% +0.61% (p=0.000 n=20) Erfinv 19.33n ± 0% 19.34n ± 0% ~ (p=0.513 n=20) Erfcinv 18.90n ± 0% 19.78n ± 0% +4.63% (p=0.000 n=20) Exp 50.04n ± 0% 49.66n ± 0% -0.75% (p=0.000 n=20) ExpGo 50.03n ± 0% 50.03n ± 0% ~ (p=0.723 n=20) Expm1 28.41n ± 0% 28.27n ± 0% -0.49% (p=0.000 n=20) Exp2 50.08n ± 0% 51.23n ± 0% +2.31% (p=0.000 n=20) Exp2Go 49.77n ± 0% 49.89n ± 0% +0.24% (p=0.000 n=20) Abs 0.8009n ± 0% 0.8006n ± 0% ~ (p=0.317 n=20) Dim 1.987n ± 0% 1.993n ± 0% +0.28% (p=0.001 n=20) Floor 8.543n ± 0% 8.548n ± 0% ~ (p=0.509 n=20) Max 6.670n ± 0% 6.672n ± 0% ~ (p=0.335 n=20) Min 6.694n ± 0% 6.694n ± 0% ~ (p=0.459 n=20) Mod 56.44n ± 0% 53.23n ± 0% -5.70% (p=0.000 n=20) Frexp 8.409n ± 0% 7.606n ± 0% -9.55% (p=0.000 n=20) Gamma 35.64n ± 0% 35.23n ± 0% -1.15% (p=0.000 n=20) Hypot 11.21n ± 0% 10.61n ± 0% -5.31% (p=0.000 n=20) HypotGo 11.50n ± 0% 11.01n ± 0% -4.30% (p=0.000 n=20) Ilogb 7.606n ± 0% 6.804n ± 0% -10.54% (p=0.000 n=20) J0 125.3n ± 0% 126.5n ± 0% +0.96% (p=0.000 n=20) J1 124.9n ± 0% 125.3n ± 0% +0.32% (p=0.000 n=20) Jn 264.3n ± 0% 265.9n ± 0% +0.61% (p=0.000 n=20) Ldexp 9.606n ± 0% 9.204n ± 0% -4.19% (p=0.000 n=20) Lgamma 38.82n ± 0% 38.85n ± 0% +0.06% (p=0.019 n=20) Log 38.44n ± 0% 28.04n ± 0% -27.06% (p=0.000 n=20) Logb 8.405n ± 0% 7.605n ± 0% -9.52% (p=0.000 n=20) Log1p 31.62n ± 0% 27.11n ± 0% -14.26% (p=0.000 n=20) Log10 38.83n ± 0% 28.42n ± 0% -26.81% (p=0.000 n=20) Log2 11.21n ± 0% 10.41n ± 0% -7.14% (p=0.000 n=20) Modf 5.204n ± 0% 5.205n ± 0% ~ (p=0.983 n=20) Nextafter32 8.809n ± 0% 7.208n ± 0% -18.18% (p=0.000 n=20) Nextafter64 8.405n ± 0% 8.406n ± 0% +0.01% (p=0.007 n=20) PowInt 48.83n ± 0% 44.78n ± 0% -8.28% (p=0.000 n=20) PowFrac 146.9n ± 0% 142.1n ± 0% -3.23% (p=0.000 n=20) Pow10Pos 2.334n ± 0% 2.333n ± 0% ~ (p=0.110 n=20) Pow10Neg 4.803n ± 0% 4.803n ± 0% ~ (p=0.130 n=20) Round 4.816n ± 0% 3.819n ± 0% -20.70% (p=0.000 n=20) RoundToEven 5.735n ± 0% 5.204n ± 0% -9.26% (p=0.000 n=20) Remainder 52.05n ± 0% 49.64n ± 0% -4.63% (p=0.000 n=20) Signbit 1.201n ± 0% 1.001n ± 0% -16.65% (p=0.000 n=20) Sin 20.63n ± 0% 20.64n ± 0% +0.05% (p=0.040 n=20) Sincos 23.82n ± 0% 24.62n ± 0% +3.36% (p=0.000 n=20) Sinh 71.25n ± 0% 68.44n ± 0% -3.94% (p=0.000 n=20) SqrtIndirect 2.001n ± 0% 2.001n ± 0% ~ (p=0.182 n=20) SqrtLatency 4.003n ± 0% 4.003n ± 0% ~ (p=0.754 n=20) SqrtIndirectLatency 4.003n ± 0% 4.003n ± 0% ~ (p=0.773 n=20) SqrtGoLatency 60.84n ± 0% 81.26n ± 0% +33.56% (p=0.000 n=20) SqrtPrime 1.791µ ± 0% 1.791µ ± 0% ~ (p=0.784 n=20) Tan 27.22n ± 0% 27.22n ± 0% ~ (p=0.819 n=20) Tanh 70.88n ± 0% 69.04n ± 0% -2.60% (p=0.000 n=20) Trunc 8.543n ± 0% 8.543n ± 0% ~ (p=0.784 n=20) Y0 122.9n ± 0% 122.9n ± 0% ~ (p=0.559 n=20) Y1 123.3n ± 0% 121.7n ± 0% -1.30% (p=0.000 n=20) Yn 263.0n ± 0% 262.6n ± 0% -0.15% (p=0.000 n=20) Float64bits 1.2010n ± 0% 0.6004n ± 0% -50.01% (p=0.000 n=20) Float64frombits 1.2010n ± 0% 0.6004n ± 0% -50.01% (p=0.000 n=20) Float32bits 1.7010n ± 0% 0.8005n ± 0% -52.94% (p=0.000 n=20) Float32frombits 1.5010n ± 0% 0.8005n ± 0% -46.67% (p=0.000 n=20) FMA 2.001n ± 0% 2.001n ± 0% ~ (p=0.238 n=20) geomean 17.41n 16.15n -7.19% Change-Id: I0a0c263af2f07203eab1782e69c706f20c689d8d Reviewed-on: https://go-review.googlesource.com/c/go/+/604737 Auto-Submit: Tim King <taking@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Tim King <taking@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> |
||
|
2c5b707b3b |
cmd/compile: optimize RotateLeft8/16 on loong64
goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A6000 @ 2500.00MHz │ bench.old │ bench.new │ │ sec/op │ sec/op vs base │ RotateLeft8 1.401n ± 0% 1.201n ± 0% -14.28% (p=0.000 n=20) RotateLeft16 1.4010n ± 0% 0.8032n ± 0% -42.67% (p=0.000 n=20) geomean 1.401n 0.9822n -29.90% goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A5000 @ 2500.00MHz │ bench.old │ bench.new │ │ sec/op │ sec/op vs base │ RotateLeft8 1.576n ± 0% 1.310n ± 0% -16.88% (p=0.000 n=20) RotateLeft16 1.576n ± 0% 1.166n ± 0% -26.02% (p=0.000 n=20) geomean 1.576n 1.236n -21.58% Change-Id: I39c18306be0b8fd31b57bd0911714abd1783b50e Reviewed-on: https://go-review.googlesource.com/c/go/+/604738 Auto-Submit: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Tim King <taking@google.com> |
||
|
2982253c42 |
test/codegen: add Rotate test for riscv64
Change-Id: I7d996b8d46fbeef933943f806052a30f1f8d50c3 Reviewed-on: https://go-review.googlesource.com/c/go/+/588836 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Tim King <taking@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> |
||
|
fe69121bc5 |
cmd/compile: optimize []byte(string1 + string2)
This CL optimizes the compilation of string-to-bytes conversion in the case of string additions. Fixes #62407 Change-Id: Ic47df758478e5d061880620025c4ec7dbbff8a64 Reviewed-on: https://go-review.googlesource.com/c/go/+/527935 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Tim King <taking@google.com> |
||
|
e126129d76 |
cmd/compile/internal/ssa: combine shift and addition for riscv64 rva22u64
When GORISCV64 enables rva22u64, combined shift and addition using the SH1ADD, SH2ADD and SH3ADD instructions that are available via the Zba extension. This results in more than 2000 instructions being removed from the Go binary on riscv64. Change-Id: Ia62ae7dda3d8083cff315113421bee73f518eea8 Reviewed-on: https://go-review.googlesource.com/c/go/+/606636 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> |
||
|
36b45bca66 |
cmd/compile: regalloc: drop values that aren't used until after a call
No point in keeping values in registers when their next use is after a call, as we'd have to spill/restore them anyway. cmd/go is 0.1% smaller. Fixes #59297 Change-Id: I10ee761d0d23229f57de278f734c44d6a8dccd6c Reviewed-on: https://go-review.googlesource.com/c/go/+/509255 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
||
|
2b0a157d68 |
cmd/compile: intrinsify math.MulUintptr on PPC64
This can be done efficiently with few instructions. This also adds MULHDUCC for further codegen improvement. Change-Id: I06320ba4383a679341b911a237a360ef07b19168 Reviewed-on: https://go-review.googlesource.com/c/go/+/605975 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Archana Ravindar <aravinda@redhat.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
||
|
02a9f51011 |
test/codegen: add initial codegen tests for integer min/max
Change-Id: I006370053748edbec930c7279ee88a805009aa0d Reviewed-on: https://go-review.googlesource.com/c/go/+/606976 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
b2cdaf7346 |
cmd/compile: improve unneeded zeroing removal
After newobject, we don't need to write zeroes to initialize the object. It has already been zeroed by the allocator. This is already handled in most cases, but because we run builtin decomposition after the opt pass, we don't handle cases where the zero of a compound builtin is being written. Improve the zero detector to handle those cases. Fixes #68845 Change-Id: If3dde2e304a05e5a6a6723565191d5444b334bcc Reviewed-on: https://go-review.googlesource.com/c/go/+/605255 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Auto-Submit: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> |
||
|
7273509466 |
cmd/compile: add additional arm64 bit field rules
Get rid of TODO in prove pass. We currently avoid marking shifts of constants as bounded, where bounded means we don't have to worry about <0 or >=bitwidth shifts. We do this because it causes different rule applications during lowering which cause some codegen tests to fail. Add some new rules which ensure that we get the right final instruction sequence regardless of the ordering. Then we can remove this special case. Change-Id: I4e962d4f09992b42ab47e123de5ded3b8b8fb205 Reviewed-on: https://go-review.googlesource.com/c/go/+/602935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> |
||
|
9b4268c3df |
cmd/compile: simplify prove pass
We don't need noLimit checks in a bunch of places. Also simplify folding of provable constant results. At this point in the CL stack, compilebench reports no performance changes. The only thing of note is that binaries got a bit smaller. name old text-bytes new text-bytes delta HelloSize 960kB ± 0% 952kB ± 0% -0.83% (p=0.000 n=10+10) CmdGoSize 12.3MB ± 0% 12.1MB ± 0% -1.53% (p=0.000 n=10+10) Change-Id: Id4be75eec0f8c93f2f3b93a8521ce2278ee2ee2c Reviewed-on: https://go-review.googlesource.com/c/go/+/599197 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> |
||
|
3b96eebcbd |
cmd/compile: rewrite the constant parts of the prove pass
Handles a lot more cases where constant ranges can eliminate various (mostly bounds failure) paths. Fixes #66826 Fixes #66692 Fixes #48213 Update #57959 TODO: remove constant logic from poset code, no longer needed. Change-Id: Id196436fcd8a0c84c7d59c04f93bd92e26a0fd7e Reviewed-on: https://go-review.googlesource.com/c/go/+/599096 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> |
||
|
ff14e08cd3 |
cmd/compile, math: improve implementation of math.{Max,Min} on loong64
Make math.{Min,Max} intrinsics and implement math.{archMax,archMin} in hardware. goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz │ old.bench │ new.bench │ │ sec/op │ sec/op vs base │ Max 7.606n ± 0% 3.087n ± 0% -59.41% (p=0.000 n=20) Min 7.205n ± 0% 2.904n ± 0% -59.69% (p=0.000 n=20) MinFloat 37.220n ± 0% 4.802n ± 0% -87.10% (p=0.000 n=20) MaxFloat 33.620n ± 0% 4.802n ± 0% -85.72% (p=0.000 n=20) geomean 16.18n 3.792n -76.57% goos: linux goarch: loong64 pkg: runtime cpu: Loongson-3A5000 @ 2500.00MHz │ old.bench │ new.bench │ │ sec/op │ sec/op vs base │ Max 10.010n ± 0% 7.196n ± 0% -28.11% (p=0.000 n=20) Min 8.806n ± 0% 7.155n ± 0% -18.75% (p=0.000 n=20) MinFloat 60.010n ± 0% 7.976n ± 0% -86.71% (p=0.000 n=20) MaxFloat 56.410n ± 0% 7.980n ± 0% -85.85% (p=0.000 n=20) geomean 23.37n 7.566n -67.63% Updates #59120. Change-Id: I6815d20bc304af3cbf5d6ca8fe0ca1c2ddebea2d Reviewed-on: https://go-review.googlesource.com/c/go/+/580283 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> |
||
|
1985c0ccf9 |
cmd/compile,runtime: disable swissmap fast variants
Temporary measure to reduce the required MVP code. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: I44dc8acd0dc8280c6beb40451998e84bc85c238a Reviewed-on: https://go-review.googlesource.com/c/go/+/580915 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
c18ff29295 |
cmd/compile: make sync/atomic AND/OR operations intrinsic on amd64
Update #61395 Change-Id: I59a950f48efc587dfdffce00e2f4f3ab99d8df00 Reviewed-on: https://go-review.googlesource.com/c/go/+/594738 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com> |