526 Commits

Author SHA1 Message Date
Joel Sing
927fdb7843 cmd/compile: simplify intrinsification of TrailingZeros16 and TrailingZeros8
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64
and S390X, rather than having a custom intrinsic. Note that for PPC64 this
actually allows the existing Ctz16 and Ctz8 rules to be used.

Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/651816
Reviewed-by: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-27 03:45:44 -08:00
Mateusz Poliwczak
43e6525986 cmd/compile: load properly constant values from itabs
While looking at the SSA of following code, i noticed
that these rules do not work properly, and the types
are loaded indirectly through an itab, instead of statically.

type M interface{ M() }
type A interface{ A() }

type Impl struct{}
func (*Impl) M() {}
func (*Impl) A() {}

func main() {
        var a M = &Impl{}
        a.(A).A()
}

Change-Id: Ia275993f81a2e7302102d4ff87ac28586023d13c
GitHub-Last-Rev: 4bfc9019172929d0b0f1c8a1b7eb28cdbc9b87ef
GitHub-Pull-Request: golang/go#71784
Reviewed-on: https://go-review.googlesource.com/c/go/+/649500
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-19 13:39:00 -08:00
Jakub Ciolek
d524e1eccd cmd/compile: on AMD64, turn x < 128 into x <= 127
x < 128 -> x <= 127
x >= 128 -> x > 127

This allows for shorter encoding as 127 fits into
a single-byte immediate.

archive/tar benchmark (Alder Lake 12600K)

name              old time/op    new time/op    delta
/Writer/USTAR-16    1.46µs ± 0%    1.32µs ± 0%  -9.43%  (p=0.008 n=5+5)
/Writer/GNU-16      1.85µs ± 1%    1.79µs ± 1%  -3.47%  (p=0.008 n=5+5)
/Writer/PAX-16      3.21µs ± 0%    3.11µs ± 2%  -2.96%  (p=0.008 n=5+5)
/Reader/USTAR-16    1.38µs ± 1%    1.37µs ± 0%    ~     (p=0.127 n=5+4)
/Reader/GNU-16       798ns ± 1%     800ns ± 2%    ~     (p=0.548 n=5+5)
/Reader/PAX-16      3.07µs ± 1%    3.00µs ± 0%  -2.35%  (p=0.008 n=5+5)
[Geo mean]          1.76µs         1.70µs       -3.15%

compilecmp:

hash/maphash
hash/maphash.(*Hash).Write 517 -> 510  (-1.35%)

runtime
runtime.traceReadCPU 1626 -> 1615  (-0.68%)

runtime [cmd/compile]
runtime.traceReadCPU 1626 -> 1615  (-0.68%)

math/rand/v2
type:.eq.[128]float32 65 -> 59  (-9.23%)

bytes
bytes.trimLeftUnicode 378 -> 373  (-1.32%)
bytes.IndexAny 1189 -> 1157  (-2.69%)
bytes.LastIndexAny 1256 -> 1239  (-1.35%)
bytes.lastIndexFunc 263 -> 261  (-0.76%)

strings
strings.FieldsFuncSeq.func1 411 -> 399  (-2.92%)
strings.EqualFold 625 -> 624  (-0.16%)
strings.trimLeftUnicode 248 -> 231  (-6.85%)

math/rand
type:.eq.[128]float32 65 -> 59  (-9.23%)

bytes [cmd/compile]
bytes.LastIndexAny 1256 -> 1239  (-1.35%)
bytes.lastIndexFunc 263 -> 261  (-0.76%)
bytes.trimLeftUnicode 378 -> 373  (-1.32%)
bytes.IndexAny 1189 -> 1157  (-2.69%)

regexp/syntax
regexp/syntax.(*parser).parseEscape 1113 -> 1102  (-0.99%)

math/rand/v2 [cmd/compile]
type:.eq.[128]float32 65 -> 59  (-9.23%)

strings [cmd/compile]
strings.EqualFold 625 -> 624  (-0.16%)
strings.FieldsFuncSeq.func1 411 -> 399  (-2.92%)
strings.trimLeftUnicode 248 -> 231  (-6.85%)

math/rand [cmd/compile]
type:.eq.[128]float32 65 -> 59  (-9.23%)

regexp
regexp.(*inputString).context 198 -> 197  (-0.51%)
regexp.(*inputBytes).context 221 -> 212  (-4.07%)

image/jpeg
image/jpeg.(*decoder).processDQT 500 -> 491  (-1.80%)

regexp/syntax [cmd/compile]
regexp/syntax.(*parser).parseEscape 1113 -> 1102  (-0.99%)

regexp [cmd/compile]
regexp.(*inputString).context 198 -> 197  (-0.51%)
regexp.(*inputBytes).context 221 -> 212  (-4.07%)

encoding/csv
encoding/csv.(*Writer).fieldNeedsQuotes 269 -> 266  (-1.12%)

cmd/vendor/golang.org/x/sys/unix
type:.eq.[131]struct 855 -> 823  (-3.74%)

vendor/golang.org/x/text/unicode/norm
vendor/golang.org/x/text/unicode/norm.nextDecomposed 4831 -> 4826  (-0.10%)
vendor/golang.org/x/text/unicode/norm.(*Iter).returnSlice 281 -> 275  (-2.14%)

vendor/golang.org/x/text/secure/bidirule
vendor/golang.org/x/text/secure/bidirule.init.0 85 -> 83  (-2.35%)

go/scanner
go/scanner.isDigit 100 -> 98  (-2.00%)
go/scanner.(*Scanner).next 431 -> 422  (-2.09%)
go/scanner.isLetter 142 -> 124  (-12.68%)

encoding/asn1
encoding/asn1.parseTagAndLength 1189 -> 1182  (-0.59%)
encoding/asn1.makeField 3481 -> 3463  (-0.52%)

text/scanner
text/scanner.(*Scanner).next 1242 -> 1236  (-0.48%)

archive/tar
archive/tar.isASCII 133 -> 127  (-4.51%)
archive/tar.(*Writer).writeRawFile 1206 -> 1198  (-0.66%)
archive/tar.(*Reader).readHeader.func1 9 -> 7  (-22.22%)
archive/tar.toASCII 393 -> 383  (-2.54%)
archive/tar.splitUSTARPath 405 -> 396  (-2.22%)
archive/tar.(*Writer).writePAXHeader.func1 627 -> 620  (-1.12%)

text/template
text/template.jsIsSpecial 59 -> 57  (-3.39%)

go/doc
go/doc.assumedPackageName 714 -> 701  (-1.82%)

vendor/golang.org/x/net/http/httpguts
vendor/golang.org/x/net/http/httpguts.headerValueContainsToken 965 -> 952  (-1.35%)
vendor/golang.org/x/net/http/httpguts.tokenEqual 280 -> 269  (-3.93%)
vendor/golang.org/x/net/http/httpguts.IsTokenRune 28 -> 26  (-7.14%)

net/mail
net/mail.isVchar 26 -> 24  (-7.69%)
net/mail.isAtext 106 -> 104  (-1.89%)
net/mail.(*Address).String 1084 -> 1052  (-2.95%)
net/mail.isQtext 39 -> 37  (-5.13%)
net/mail.isMultibyte 9 -> 7  (-22.22%)
net/mail.isDtext 45 -> 43  (-4.44%)
net/mail.(*addrParser).consumeQuotedString 1050 -> 1029  (-2.00%)
net/mail.quoteString 741 -> 714  (-3.64%)

cmd/internal/obj/s390x
cmd/internal/obj/s390x.preprocess 6405 -> 6393  (-0.19%)

cmd/internal/obj/x86
cmd/internal/obj/x86.toDisp8 303 -> 301  (-0.66%)

fmt [cmd/compile]
fmt.Fprintf 4726 -> 4662  (-1.35%)

go/scanner [cmd/compile]
go/scanner.(*Scanner).next 431 -> 422  (-2.09%)
go/scanner.isLetter 142 -> 124  (-12.68%)
go/scanner.isDigit 100 -> 98  (-2.00%)

cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*source).nextch 879 -> 847  (-3.64%)

cmd/vendor/golang.org/x/mod/module
cmd/vendor/golang.org/x/mod/module.checkElem 1253 -> 1235  (-1.44%)
cmd/vendor/golang.org/x/mod/module.escapeString 519 -> 517  (-0.39%)

go/doc [cmd/compile]
go/doc.assumedPackageName 714 -> 701  (-1.82%)

cmd/compile/internal/syntax [cmd/compile]
cmd/compile/internal/syntax.(*scanner).escape 1965 -> 1933  (-1.63%)
cmd/compile/internal/syntax.(*scanner).next 8975 -> 8847  (-1.43%)

cmd/internal/obj/s390x [cmd/compile]
cmd/internal/obj/s390x.preprocess 6405 -> 6393  (-0.19%)

cmd/internal/obj/x86 [cmd/compile]
cmd/internal/obj/x86.toDisp8 303 -> 301  (-0.66%)

cmd/internal/gcprog
cmd/internal/gcprog.(*Writer).Repeat 688 -> 677  (-1.60%)
cmd/internal/gcprog.(*Writer).varint 442 -> 439  (-0.68%)

cmd/compile/internal/ir
cmd/compile/internal/ir.splitPkg 331 -> 325  (-1.81%)

cmd/compile/internal/ir [cmd/compile]
cmd/compile/internal/ir.splitPkg 331 -> 325  (-1.81%)

net/http
net/http.containsDotDot.FieldsFuncSeq.func1 411 -> 399  (-2.92%)
net/http.isNotToken 33 -> 30  (-9.09%)
net/http.containsDotDot 606 -> 588  (-2.97%)
net/http.isCookieNameValid 197 -> 191  (-3.05%)
net/http.parsePattern 4330 -> 4317  (-0.30%)
net/http.ParseCookie 1099 -> 1096  (-0.27%)
net/http.validMethod 197 -> 187  (-5.08%)

cmd/vendor/golang.org/x/text/unicode/norm
cmd/vendor/golang.org/x/text/unicode/norm.(*Iter).returnSlice 281 -> 275  (-2.14%)
cmd/vendor/golang.org/x/text/unicode/norm.nextDecomposed 4831 -> 4826  (-0.10%)

net/http/cookiejar
net/http/cookiejar.encode 1936 -> 1918  (-0.93%)

expvar
expvar.appendJSONQuote 972 -> 965  (-0.72%)

cmd/cgo/internal/test
cmd/cgo/internal/test.stack128 116 -> 114  (-1.72%)

cmd/vendor/rsc.io/markdown
cmd/vendor/rsc.io/markdown.newATXHeading 1637 -> 1628  (-0.55%)
cmd/vendor/rsc.io/markdown.isUnicodePunct 197 -> 179  (-9.14%)

Change-Id: I578bdf42ef229d687d526e378d697ced51e1880c
Reviewed-on: https://go-review.googlesource.com/c/go/+/639935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-16 07:23:13 -08:00
Keith Randall
beac2f7d3b cmd/compile: fix sign extension of paired 32-bit loads on arm64
Fixes #71759

Change-Id: Iab05294ac933cc9972949158d3fe2bdc3073df5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/649895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-02-15 07:53:28 -08:00
Keith Randall
187fd2698d cmd/compile: make write barrier code amenable to paired loads/stores
It currently isn't because it does load/store/load/store/...
Rework to do overwrite processing in pairs so it is instead
load/load/store/store/...

Change-Id: If7be629bc4048da5f2386dafb8f05759b79e9e2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/631495
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-13 14:08:14 -08:00
Keith Randall
a0029e95e5 cmd/compile: regalloc: handle desired registers of 2-output insns
Particularly with 2-word load instructions, this becomes important.
Classic example is:

    func f(p *string) string {
        return *p
    }

We want the two loads to put the return values directly into
the two ABI return registers.

At this point in the stack, cmd/go is 1.1% smaller.

Change-Id: I51fd1710238e81d15aab2bfb816d73c8e7c207b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/631137
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-13 14:08:07 -08:00
khr@golang.org
20d7c57422 cmd/compile: pair loads and stores on arm64
Look for possible paired load/store operations on arm64.
I don't expect this would be a lot faster, but it will save
binary space, and indirectly through the icache at least a bit
of time.

Change-Id: I4dd73b0e6329c4659b7453998f9b75320fcf380b
Reviewed-on: https://go-review.googlesource.com/c/go/+/629256
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-13 14:07:47 -08:00
Keith Randall
89c2f282dc cmd/compile: move []byte->string map key optimization to ssa
If we call slicebytetostring immediately (with no intervening writes)
before calling map access or delete functions with the resulting
string as the key, then we can just use the ptr/len of the
slicebytetostring argument as the key. This avoids an allocation.

Fixes #44898
Update #71132

There's old code in cmd/compile/internal/walk/order.go that handles
some of these cases.

1. m[string(b)]
2. s := string(b); m[s]
3. m[[2]string{string(b1),string(b2)}]

The old code handled cases 1&3. The new code handles cases 1&2.
We'll leave the old code around to keep 3 working, although it seems
not terribly common.

Case 2 happens particularly after inlining, so it is pretty common.

Change-Id: I8913226ca79d2c65f4e2bd69a38ac8c976a57e43
Reviewed-on: https://go-review.googlesource.com/c/go/+/640656
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-13 13:03:07 -08:00
Jakub Ciolek
43b7e67040 cmd/compile: lower x*z + y to FMA if FMA enabled
There is a generic opcode for FMA, but we don't use it in rewrite rules.
This is maybe because some archs, like WASM and MIPS don't have a late
lowering rule for it.

Fixes #71204

Intel Alder Lake 12600k (GOAMD64=v3):

math:

name                    old time/op  new time/op  delta
Acos-16                 4.58ns ± 0%  3.36ns ± 0%  -26.68%  (p=0.008 n=5+5)
Acosh-16                8.04ns ± 1%  6.44ns ± 0%  -19.95%  (p=0.008 n=5+5)
Asin-16                 4.28ns ± 0%  3.32ns ± 0%  -22.24%  (p=0.008 n=5+5)
Asinh-16                9.92ns ± 0%  8.62ns ± 0%  -13.13%  (p=0.008 n=5+5)
Atan-16                 2.31ns ± 0%  1.84ns ± 0%  -20.02%  (p=0.008 n=5+5)
Atanh-16                7.79ns ± 0%  7.03ns ± 0%   -9.67%  (p=0.008 n=5+5)
Atan2-16                3.93ns ± 0%  3.52ns ± 0%  -10.35%  (p=0.000 n=5+4)
Cbrt-16                 4.62ns ± 0%  4.41ns ± 0%   -4.57%  (p=0.016 n=4+5)
Ceil-16                 0.14ns ± 1%  0.14ns ± 2%     ~     (p=0.103 n=5+5)
Copysign-16             0.33ns ± 0%  0.33ns ± 0%   +0.03%  (p=0.029 n=4+4)
Cos-16                  4.87ns ± 0%  4.75ns ± 0%   -2.44%  (p=0.016 n=5+4)
Cosh-16                 4.86ns ± 0%  4.86ns ± 0%     ~     (p=0.317 n=5+5)
Erf-16                  2.71ns ± 0%  2.25ns ± 0%  -16.69%  (p=0.008 n=5+5)
Erfc-16                 3.06ns ± 0%  2.67ns ± 0%  -13.00%  (p=0.016 n=5+4)
Erfinv-16               3.88ns ± 0%  2.84ns ± 3%  -26.83%  (p=0.008 n=5+5)
Erfcinv-16              4.08ns ± 0%  3.01ns ± 1%  -26.27%  (p=0.008 n=5+5)
Exp-16                  3.29ns ± 0%  3.37ns ± 2%   +2.64%  (p=0.016 n=4+5)
ExpGo-16                8.44ns ± 0%  7.48ns ± 1%  -11.37%  (p=0.008 n=5+5)
Expm1-16                4.46ns ± 0%  3.69ns ± 2%  -17.26%  (p=0.016 n=4+5)
Exp2-16                 8.20ns ± 0%  7.39ns ± 2%   -9.94%  (p=0.008 n=5+5)
Exp2Go-16               8.26ns ± 0%  7.23ns ± 0%  -12.49%  (p=0.016 n=4+5)
Abs-16                  0.26ns ± 3%  0.22ns ± 1%  -16.34%  (p=0.008 n=5+5)
Dim-16                  0.38ns ± 1%  0.40ns ± 2%   +5.02%  (p=0.008 n=5+5)
Floor-16                0.11ns ± 1%  0.17ns ± 4%  +54.99%  (p=0.008 n=5+5)
Max-16                  1.24ns ± 0%  1.24ns ± 0%     ~     (p=0.619 n=5+5)
Min-16                  1.24ns ± 0%  1.24ns ± 0%     ~     (p=0.484 n=5+5)
Mod-16                  13.4ns ± 1%  12.8ns ± 0%   -4.21%  (p=0.016 n=5+4)
Frexp-16                1.70ns ± 0%  1.71ns ± 0%   +0.46%  (p=0.008 n=5+5)
Gamma-16                3.97ns ± 0%  3.97ns ± 0%     ~     (p=0.643 n=5+5)
Hypot-16                2.11ns ± 0%  2.11ns ± 0%     ~     (p=0.762 n=5+5)
HypotGo-16              2.48ns ± 4%  2.26ns ± 0%   -8.94%  (p=0.008 n=5+5)
Ilogb-16                1.67ns ± 0%  1.67ns ± 0%   -0.07%  (p=0.048 n=5+5)
J0-16                   19.8ns ± 0%  19.3ns ± 0%     ~     (p=0.079 n=4+5)
J1-16                   19.4ns ± 0%  18.9ns ± 0%   -2.63%  (p=0.000 n=5+4)
Jn-16                   41.5ns ± 0%  40.6ns ± 0%   -2.32%  (p=0.016 n=4+5)
Ldexp-16                2.26ns ± 0%  2.26ns ± 0%     ~     (p=0.683 n=5+5)
Lgamma-16               4.40ns ± 0%  4.21ns ± 0%   -4.21%  (p=0.008 n=5+5)
Log-16                  4.05ns ± 0%  4.05ns ± 0%     ~     (all equal)
Logb-16                 1.69ns ± 0%  1.69ns ± 0%     ~     (p=0.429 n=5+5)
Log1p-16                5.00ns ± 0%  3.99ns ± 0%  -20.14%  (p=0.008 n=5+5)
Log10-16                4.22ns ± 0%  4.21ns ± 0%   -0.15%  (p=0.008 n=5+5)
Log2-16                 2.27ns ± 0%  2.25ns ± 0%   -0.94%  (p=0.008 n=5+5)
Modf-16                 1.44ns ± 0%  1.44ns ± 0%     ~     (p=0.492 n=5+5)
Nextafter32-16          2.09ns ± 0%  2.09ns ± 0%     ~     (p=0.079 n=4+5)
Nextafter64-16          2.09ns ± 0%  2.09ns ± 0%     ~     (p=0.095 n=4+5)
PowInt-16               10.8ns ± 0%  10.8ns ± 0%     ~     (all equal)
PowFrac-16              25.3ns ± 0%  25.3ns ± 0%   -0.09%  (p=0.000 n=5+4)
Pow10Pos-16             0.52ns ± 1%  0.52ns ± 0%     ~     (p=0.810 n=5+5)
Pow10Neg-16             0.82ns ± 0%  0.82ns ± 0%     ~     (p=0.381 n=5+5)
Round-16                0.93ns ± 0%  0.93ns ± 0%     ~     (p=0.056 n=5+5)
RoundToEven-16          1.64ns ± 0%  1.64ns ± 0%     ~     (all equal)
Remainder-16            12.4ns ± 2%  12.0ns ± 0%   -3.27%  (p=0.008 n=5+5)
Signbit-16              0.37ns ± 0%  0.37ns ± 0%   -0.19%  (p=0.008 n=5+5)
Sin-16                  4.04ns ± 0%  3.92ns ± 0%   -3.13%  (p=0.000 n=4+5)
Sincos-16               5.99ns ± 0%  5.80ns ± 0%   -3.03%  (p=0.008 n=5+5)
Sinh-16                 5.22ns ± 0%  5.22ns ± 0%     ~     (p=0.651 n=5+4)
SqrtIndirect-16         0.41ns ± 0%  0.41ns ± 0%     ~     (p=0.333 n=4+5)
SqrtLatency-16          2.66ns ± 0%  2.66ns ± 0%     ~     (p=0.079 n=4+5)
SqrtIndirectLatency-16  2.66ns ± 0%  2.66ns ± 0%     ~     (p=1.000 n=5+5)
SqrtGoLatency-16        30.1ns ± 0%  28.6ns ± 1%   -4.84%  (p=0.008 n=5+5)
SqrtPrime-16             645ns ± 0%   645ns ± 0%     ~     (p=0.095 n=5+4)
Tan-16                  4.21ns ± 0%  4.09ns ± 0%   -2.76%  (p=0.029 n=4+4)
Tanh-16                 5.36ns ± 0%  5.36ns ± 0%     ~     (p=0.444 n=5+5)
Trunc-16                0.12ns ± 6%  0.11ns ± 1%   -6.79%  (p=0.008 n=5+5)
Y0-16                   19.2ns ± 0%  18.7ns ± 0%   -2.52%  (p=0.000 n=5+4)
Y1-16                   19.1ns ± 0%  18.4ns ± 0%     ~     (p=0.079 n=4+5)
Yn-16                   40.7ns ± 0%  39.5ns ± 0%   -2.82%  (p=0.008 n=5+5)
Float64bits-16          0.21ns ± 0%  0.21ns ± 0%     ~     (p=0.603 n=5+5)
Float64frombits-16      0.21ns ± 0%  0.21ns ± 0%     ~     (p=0.984 n=4+5)
Float32bits-16          0.21ns ± 0%  0.21ns ± 0%     ~     (p=0.778 n=4+5)
Float32frombits-16      0.21ns ± 0%  0.20ns ± 0%     ~     (p=0.397 n=5+5)
FMA-16                  0.82ns ± 0%  0.82ns ± 0%   +0.02%  (p=0.029 n=4+4)
[Geo mean]              2.87ns       2.74ns        -4.61%

math/cmplx:

name        old time/op  new time/op  delta
Abs-16      2.07ns ± 0%  2.05ns ± 0%   -0.70%  (p=0.016 n=5+4)
Acos-16     36.5ns ± 0%  35.7ns ± 0%   -2.33%  (p=0.029 n=4+4)
Acosh-16    37.0ns ± 0%  36.2ns ± 0%   -2.20%  (p=0.008 n=5+5)
Asin-16     36.5ns ± 0%  35.7ns ± 0%   -2.29%  (p=0.008 n=5+5)
Asinh-16    33.5ns ± 0%  31.6ns ± 0%   -5.51%  (p=0.008 n=5+5)
Atan-16     15.5ns ± 0%  13.9ns ± 0%  -10.61%  (p=0.008 n=5+5)
Atanh-16    15.0ns ± 0%  13.6ns ± 0%   -9.73%  (p=0.008 n=5+5)
Conj-16     0.11ns ± 5%  0.11ns ± 1%     ~     (p=0.421 n=5+5)
Cos-16      12.3ns ± 0%  12.2ns ± 0%   -0.60%  (p=0.000 n=4+5)
Cosh-16     12.1ns ± 0%  12.0ns ± 0%     ~     (p=0.079 n=4+5)
Exp-16      10.0ns ± 0%   9.8ns ± 0%   -1.77%  (p=0.008 n=5+5)
Log-16      14.5ns ± 0%  13.7ns ± 0%   -5.67%  (p=0.008 n=5+5)
Log10-16    14.5ns ± 0%  13.7ns ± 0%   -5.55%  (p=0.000 n=5+4)
Phase-16    5.11ns ± 0%  4.25ns ± 0%  -16.90%  (p=0.008 n=5+5)
Polar-16    7.12ns ± 0%  6.35ns ± 0%  -10.90%  (p=0.008 n=5+5)
Pow-16      64.3ns ± 0%  63.7ns ± 0%   -0.97%  (p=0.008 n=5+5)
Rect-16     5.74ns ± 0%  5.58ns ± 0%   -2.73%  (p=0.016 n=4+5)
Sin-16      12.2ns ± 0%  12.2ns ± 0%   -0.54%  (p=0.000 n=4+5)
Sinh-16     12.1ns ± 0%  12.0ns ± 0%   -0.58%  (p=0.000 n=5+4)
Sqrt-16     5.30ns ± 0%  5.18ns ± 0%   -2.36%  (p=0.008 n=5+5)
Tan-16      22.7ns ± 0%  22.6ns ± 0%   -0.33%  (p=0.008 n=5+5)
Tanh-16     21.2ns ± 0%  20.9ns ± 0%   -1.32%  (p=0.008 n=5+5)
[Geo mean]  11.3ns       10.8ns        -3.97%

Change-Id: Idcc4b357ba68477929c126289e5095b27a827b1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/646335
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-02-13 12:34:33 -08:00
Keith Randall
a7e331e671 cmd/compile: implement signed loads from read-only memory
In addition to unsigned loads which already exist.

This helps code that does switches on strings to constant-fold
the switch away when the string being switched on is constant.

Fixes #71699

Change-Id: If3051af0f7255d2a573da6f96b153a987a7f159d
Reviewed-on: https://go-review.googlesource.com/c/go/+/649295
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
2025-02-13 12:27:55 -08:00
Keith Randall
072eea9b3b cmd/compile: avoid ifaceeq call if we know the interface is direct
We can just use == if the interface is direct.

Fixes #70738

Change-Id: Ia9a644791a370fec969c04c42d28a9b58f16911f
Reviewed-on: https://go-review.googlesource.com/c/go/+/635435
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-10 13:28:41 -08:00
Jakub Ciolek
cd595be6d6 cmd/compile: prefer an add when shifting left by 1
ADD(Q|L) has generally twice the throughput.

Came up in CL 626998.

Throughput by arch:

Zen 4:

SHLL (R64, 1):   0.5
ADD  (R64, R64): 0.25

Intel Alder Lake:

SHLL (R64, 1):   0.5
ADD  (R64, R64): 0.2

Intel Haswell:

SHLL (R64, 1):   0.5
ADD  (R64, R64): 0.25

Also include a minor opt for:

(x + x) << c -> x << (c + 1)

Before this, the code:

func addShift(x int64) int64 {
    return (x + x) << 1
}

emitted two instructions:

        ADDQ    AX, AX
        SHLQ    $1, AX

but we can do it in a single shift:

        SHLQ    $2, AX

Add a codegen test for clearing the last bit.

compilecmp linux/amd64:

math
math.sqrt 243 -> 242  (-0.41%)

math [cmd/compile]
math.sqrt 243 -> 242  (-0.41%)

runtime
runtime.selectgo 5455 -> 5445  (-0.18%)
runtime.sysargs 665 -> 662  (-0.45%)
runtime.isPinned 145 -> 141  (-2.76%)
runtime.atoi64 198 -> 194  (-2.02%)
runtime.setPinned 714 -> 709  (-0.70%)

runtime [cmd/compile]
runtime.sysargs 665 -> 662  (-0.45%)
runtime.setPinned 714 -> 709  (-0.70%)
runtime.atoi64 198 -> 194  (-2.02%)
runtime.isPinned 145 -> 141  (-2.76%)

strconv
strconv.computeBounds 109 -> 107  (-1.83%)
strconv.FormatInt 201 -> 197  (-1.99%)
strconv.ryuFtoaShortest 1298 -> 1266  (-2.47%)
strconv.small 144 -> 134  (-6.94%)
strconv.AppendInt 357 -> 344  (-3.64%)
strconv.ryuDigits32 490 -> 488  (-0.41%)
strconv.AppendUint 342 -> 340  (-0.58%)

strconv [cmd/compile]
strconv.FormatInt 201 -> 197  (-1.99%)
strconv.ryuFtoaShortest 1298 -> 1266  (-2.47%)
strconv.ryuDigits32 490 -> 488  (-0.41%)
strconv.AppendUint 342 -> 340  (-0.58%)
strconv.computeBounds 109 -> 107  (-1.83%)
strconv.small 144 -> 134  (-6.94%)
strconv.AppendInt 357 -> 344  (-3.64%)

image
image.Rectangle.Inset 101 -> 97  (-3.96%)

regexp/syntax
regexp/syntax.inCharClass.func1 111 -> 110  (-0.90%)
regexp/syntax.(*compiler).quest 586 -> 573  (-2.22%)
regexp/syntax.ranges.Less 153 -> 150  (-1.96%)
regexp/syntax.(*compiler).loop 583 -> 568  (-2.57%)

time
time.Time.Before 179 -> 161  (-10.06%)
time.Time.Compare 189 -> 166  (-12.17%)
time.Time.Sub 444 -> 425  (-4.28%)
time.Time.UnixMicro 106 -> 95  (-10.38%)
time.div 592 -> 587  (-0.84%)
time.Time.UnixNano 85 -> 78  (-8.24%)
time.(*Time).UnixMilli 141 -> 140  (-0.71%)
time.Time.UnixMilli 106 -> 95  (-10.38%)
time.(*Time).UnixMicro 141 -> 140  (-0.71%)
time.Time.After 179 -> 161  (-10.06%)
time.Time.Equal 170 -> 150  (-11.76%)
time.Time.AppendBinary 766 -> 757  (-1.17%)
time.Time.IsZero 74 -> 66  (-10.81%)
time.(*Time).UnixNano 124 -> 113  (-8.87%)
time.(*Time).IsZero 113 -> 108  (-4.42%)

regexp
regexp.(*Regexp).FindAllStringSubmatch.func1 590 -> 569  (-3.56%)
regexp.QuoteMeta 485 -> 469  (-3.30%)

regexp/syntax [cmd/compile]
regexp/syntax.inCharClass.func1 111 -> 110  (-0.90%)
regexp/syntax.(*compiler).loop 583 -> 568  (-2.57%)
regexp/syntax.(*compiler).quest 586 -> 573  (-2.22%)
regexp/syntax.ranges.Less 153 -> 150  (-1.96%)

encoding/base64
encoding/base64.decodedLen 92 -> 90  (-2.17%)
encoding/base64.(*Encoding).DecodedLen 99 -> 97  (-2.02%)

time [cmd/compile]
time.(*Time).IsZero 113 -> 108  (-4.42%)
time.Time.IsZero 74 -> 66  (-10.81%)
time.(*Time).UnixNano 124 -> 113  (-8.87%)
time.Time.UnixMilli 106 -> 95  (-10.38%)
time.Time.Equal 170 -> 150  (-11.76%)
time.Time.UnixMicro 106 -> 95  (-10.38%)
time.(*Time).UnixMicro 141 -> 140  (-0.71%)
time.Time.Before 179 -> 161  (-10.06%)
time.Time.UnixNano 85 -> 78  (-8.24%)
time.Time.AppendBinary 766 -> 757  (-1.17%)
time.div 592 -> 587  (-0.84%)
time.Time.After 179 -> 161  (-10.06%)
time.Time.Compare 189 -> 166  (-12.17%)
time.(*Time).UnixMilli 141 -> 140  (-0.71%)
time.Time.Sub 444 -> 425  (-4.28%)

index/suffixarray
index/suffixarray.sais_8_32 1677 -> 1645  (-1.91%)
index/suffixarray.sais_32 1677 -> 1645  (-1.91%)
index/suffixarray.sais_64 1677 -> 1654  (-1.37%)
index/suffixarray.sais_8_64 1677 -> 1654  (-1.37%)
index/suffixarray.writeInt 249 -> 247  (-0.80%)

os
os.Expand 1070 -> 1051  (-1.78%)
os.Chtimes 787 -> 774  (-1.65%)

regexp [cmd/compile]
regexp.(*Regexp).FindAllStringSubmatch.func1 590 -> 569  (-3.56%)
regexp.QuoteMeta 485 -> 469  (-3.30%)

encoding/base64 [cmd/compile]
encoding/base64.decodedLen 92 -> 90  (-2.17%)
encoding/base64.(*Encoding).DecodedLen 99 -> 97  (-2.02%)

encoding/hex
encoding/hex.Encode 138 -> 136  (-1.45%)
encoding/hex.(*decoder).Read 830 -> 824  (-0.72%)

crypto/des
crypto/des.initFeistelBox 235 -> 229  (-2.55%)
crypto/des.cryptBlock 549 -> 538  (-2.00%)

os [cmd/compile]
os.Chtimes 787 -> 774  (-1.65%)
os.Expand 1070 -> 1051  (-1.78%)

math/big
math/big.newFloat 238 -> 223  (-6.30%)
math/big.nat.mul 2138 -> 2122  (-0.75%)
math/big.karatsubaSqr 1372 -> 1369  (-0.22%)
math/big.(*Float).sqrtInverse 895 -> 878  (-1.90%)
math/big.basicSqr 1032 -> 1017  (-1.45%)

cmd/vendor/golang.org/x/sys/unix
cmd/vendor/golang.org/x/sys/unix.TimeToTimespec 72 -> 66  (-8.33%)

encoding/json
encoding/json.Indent 404 -> 403  (-0.25%)
encoding/json.MarshalIndent 303 -> 297  (-1.98%)

testing
testing.(*T).Deadline 84 -> 82  (-2.38%)
testing.(*M).Run 3545 -> 3525  (-0.56%)

archive/zip
archive/zip.headerFileInfo.ModTime 229 -> 223  (-2.62%)

encoding/gob
encoding/gob.(*encoderState).encodeInt 474 -> 469  (-1.05%)

crypto/elliptic
crypto/elliptic.Marshal 728 -> 714  (-1.92%)

debug/buildinfo
debug/buildinfo.readString 325 -> 315  (-3.08%)

image/png
image/png.(*decoder).readImagePass 10866 -> 10834  (-0.29%)

archive/tar
archive/tar.Header.allowedFormats.func3 1768 -> 1736  (-1.81%)
archive/tar.formatPAXTime 389 -> 358  (-7.97%)
archive/tar.(*Writer).writeGNUHeader 741 -> 727  (-1.89%)
archive/tar.readGNUSparseMap0x1 709 -> 695  (-1.97%)
archive/tar.(*Writer).templateV7Plus 915 -> 909  (-0.66%)

crypto/internal/cryptotest
crypto/internal/cryptotest.TestHash.func4 890 -> 879  (-1.24%)
crypto/internal/cryptotest.TestStream.func6.1 646 -> 645  (-0.15%)
crypto/internal/cryptotest.testCipher.func3 1300 -> 1289  (-0.85%)

internal/pkgbits
internal/pkgbits.(*Encoder).Int64 113 -> 103  (-8.85%)
internal/pkgbits.(*Encoder).rawVarint 74 -> 72  (-2.70%)

testing/quick
testing/quick.(*Config).getRand 316 -> 315  (-0.32%)

log/slog
log/slog.TimeValue 489 -> 479  (-2.04%)

runtime/pprof
runtime/pprof.(*profileBuilder).build 2341 -> 2322  (-0.81%)

internal/coverage/cfile
internal/coverage/cfile.(*emitState).openMetaFile 824 -> 822  (-0.24%)
internal/coverage/cfile.(*emitState).openCounterFile 904 -> 892  (-1.33%)

cmd/internal/objabi
cmd/internal/objabi.expandArgs 1177 -> 1169  (-0.68%)

crypto/ecdsa
crypto/ecdsa.pointFromAffine 1162 -> 1144  (-1.55%)

net
net.minNonzeroTime 313 -> 308  (-1.60%)
net.cgoLookupAddrPTR 812 -> 797  (-1.85%)
net.(*IPNet).String 851 -> 827  (-2.82%)
net.IP.AppendText 488 -> 471  (-3.48%)
net.IPMask.String 281 -> 270  (-3.91%)
net.partialDeadline 374 -> 366  (-2.14%)
net.hexString 249 -> 240  (-3.61%)
net.IP.String 454 -> 453  (-0.22%)

internal/fuzz
internal/fuzz.newPcgRand 240 -> 234  (-2.50%)

crypto/x509
crypto/x509.(*Certificate).isValid 2642 -> 2611  (-1.17%)

cmd/internal/obj/s390x
cmd/internal/obj/s390x.buildop 33676 -> 33644  (-0.10%)

encoding/hex [cmd/compile]
encoding/hex.(*decoder).Read 830 -> 824  (-0.72%)
encoding/hex.Encode 138 -> 136  (-1.45%)

cmd/internal/objabi [cmd/compile]
cmd/internal/objabi.expandArgs 1177 -> 1169  (-0.68%)

math/big [cmd/compile]
math/big.(*Float).sqrtInverse 895 -> 878  (-1.90%)
math/big.nat.mul 2138 -> 2122  (-0.75%)
math/big.karatsubaSqr 1372 -> 1369  (-0.22%)
math/big.basicSqr 1032 -> 1017  (-1.45%)
math/big.newFloat 238 -> 223  (-6.30%)

encoding/json [cmd/compile]
encoding/json.MarshalIndent 303 -> 297  (-1.98%)
encoding/json.Indent 404 -> 403  (-0.25%)

cmd/covdata
main.(*metaMerge).emitCounters 985 -> 973  (-1.22%)

runtime/pprof [cmd/compile]
runtime/pprof.(*profileBuilder).build 2341 -> 2322  (-0.81%)

cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*source).fill 722 -> 703  (-2.63%)

cmd/dist
main.runInstall 19081 -> 19049  (-0.17%)

crypto/tls
crypto/tls.extractPadding 176 -> 175  (-0.57%)
slices.Clone[[]crypto/tls.SignatureScheme,crypto/tls.SignatureScheme] 253 -> 247  (-2.37%)
slices.Clone[[]uint16,uint16] 253 -> 247  (-2.37%)
slices.Clone[[]crypto/tls.CurveID,crypto/tls.CurveID] 253 -> 247  (-2.37%)
crypto/tls.(*Config).cipherSuites 335 -> 326  (-2.69%)
slices.DeleteFunc[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 437 -> 434  (-0.69%)
crypto/tls.dial 1349 -> 1339  (-0.74%)
slices.DeleteFunc[go.shape.[]uint16,go.shape.uint16] 437 -> 434  (-0.69%)

internal/pkgbits [cmd/compile]
internal/pkgbits.(*Encoder).Int64 113 -> 103  (-8.85%)
internal/pkgbits.(*Encoder).rawVarint 74 -> 72  (-2.70%)

cmd/compile/internal/syntax [cmd/compile]
cmd/compile/internal/syntax.(*source).fill 722 -> 703  (-2.63%)

cmd/internal/obj/s390x [cmd/compile]
cmd/internal/obj/s390x.buildop 33676 -> 33644  (-0.10%)

cmd/go/internal/trace
cmd/go/internal/trace.Flow 910 -> 886  (-2.64%)
cmd/go/internal/trace.(*Span).Done 311 -> 304  (-2.25%)
cmd/go/internal/trace.StartSpan 620 -> 615  (-0.81%)

cmd/internal/script
cmd/internal/script.(*Engine).Execute.func2 534 -> 528  (-1.12%)

cmd/link/internal/loader
cmd/link/internal/loader.(*Loader).SetSymSect 344 -> 338  (-1.74%)

net/http
net/http.(*Transport).queueForIdleConn 1797 -> 1766  (-1.73%)
net/http.(*Transport).getConn 2149 -> 2131  (-0.84%)
net/http.(*http2ClientConn).tooIdleLocked 207 -> 197  (-4.83%)
net/http.(*http2responseWriter).SetWriteDeadline.func1 520 -> 508  (-2.31%)
net/http.(*Cookie).Valid 837 -> 818  (-2.27%)
net/http.(*http2responseWriter).SetReadDeadline 373 -> 357  (-4.29%)
net/http.checkIfRange 701 -> 690  (-1.57%)
net/http.(*http2SettingsFrame).Value 325 -> 298  (-8.31%)
net/http.(*http2SettingsFrame).HasDuplicates 777 -> 767  (-1.29%)
net/http.(*Server).Serve 1746 -> 1739  (-0.40%)
net/http.http2traceGotConn 569 -> 556  (-2.28%)

net/http/pprof
net/http/pprof.collectProfile 242 -> 239  (-1.24%)

cmd/compile/internal/coverage
cmd/compile/internal/coverage.metaHashAndLen 439 -> 438  (-0.23%)

cmd/vendor/golang.org/x/telemetry/internal/upload
cmd/vendor/golang.org/x/telemetry/internal/upload.(*uploader).findWork 4570 -> 4540  (-0.66%)
cmd/vendor/golang.org/x/telemetry/internal/upload.(*uploader).reports 3604 -> 3572  (-0.89%)

cmd/compile/internal/coverage [cmd/compile]
cmd/compile/internal/coverage.metaHashAndLen 439 -> 438  (-0.23%)

cmd/vendor/golang.org/x/text/language
cmd/vendor/golang.org/x/text/language.regionGroupDist 287 -> 284  (-1.05%)

cmd/go/internal/vcweb
cmd/go/internal/vcweb.(*Server).overview.func1 1045 -> 1041  (-0.38%)

cmd/go/internal/vcs
cmd/go/internal/vcs.expand 761 -> 741  (-2.63%)

cmd/compile/internal/inline/inlheur
slices.stableCmpFunc[go.shape.struct 2300 -> 2284  (-0.70%)

cmd/compile/internal/inline/inlheur [cmd/compile]
slices.stableCmpFunc[go.shape.struct 2300 -> 2284  (-0.70%)

cmd/go/internal/modfetch/codehost
cmd/go/internal/modfetch/codehost.bzrParseStat 2217 -> 2213  (-0.18%)

cmd/link/internal/ld
cmd/link/internal/ld.decodetypeStructFieldCount 157 -> 152  (-3.18%)
cmd/link/internal/ld.(*Link).address 12559 -> 12495  (-0.51%)
cmd/link/internal/ld.(*dodataState).allocateDataSections 18345 -> 18205  (-0.76%)
cmd/link/internal/ld.elfshreloc 618 -> 616  (-0.32%)
cmd/link/internal/ld.(*deadcodePass).decodetypeMethods 794 -> 779  (-1.89%)
cmd/link/internal/ld.(*dodataState).assignDsymsToSection 668 -> 663  (-0.75%)
cmd/link/internal/ld.relocSectFn 285 -> 284  (-0.35%)
cmd/link/internal/ld.decodetypeIfaceMethodCount 146 -> 144  (-1.37%)
cmd/link/internal/ld.decodetypeArrayLen 157 -> 152  (-3.18%)

cmd/link/internal/arm64
cmd/link/internal/arm64.gensymlate.func1 895 -> 888  (-0.78%)

cmd/go/internal/modload
cmd/go/internal/modload.queryProxy.func3 1029 -> 1012  (-1.65%)

cmd/go/internal/load
cmd/go/internal/load.(*Package).setBuildInfo 8453 -> 8447  (-0.07%)

cmd/go/internal/clean
cmd/go/internal/clean.runClean 2120 -> 2104  (-0.75%)

cmd/compile/internal/ssa
cmd/compile/internal/ssa.(*poset).aliasnodes 2010 -> 1978  (-1.59%)
cmd/compile/internal/ssa.rewriteValueARM64_OpARM64MOVHstoreidx2 730 -> 719  (-1.51%)
cmd/compile/internal/ssa.(*debugState).buildLocationLists 3326 -> 3294  (-0.96%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDLconst 3069 -> 2941  (-4.17%)
cmd/compile/internal/ssa.(*debugState).processValue 9756 -> 9724  (-0.33%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDQconst 3069 -> 2941  (-4.17%)
cmd/compile/internal/ssa.(*poset).mergeroot 1079 -> 1054  (-2.32%)

cmd/compile/internal/ssa [cmd/compile]
cmd/compile/internal/ssa.rewriteValueARM64_OpARM64MOVHstoreidx2 730 -> 719  (-1.51%)
cmd/compile/internal/ssa.(*poset).aliasnodes 2010 -> 1978  (-1.59%)
cmd/compile/internal/ssa.(*poset).mergeroot 1079 -> 1054  (-2.32%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDQconst 3069 -> 2941  (-4.17%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDLconst 3069 -> 2941  (-4.17%)

file                                                before   after    Δ       %
math/bits.s                                         2352     2354     +2      +0.085%
math/bits [cmd/compile].s                           2352     2354     +2      +0.085%
math.s                                              35675    35674    -1      -0.003%
math [cmd/compile].s                                35675    35674    -1      -0.003%
runtime.s                                           577251   577245   -6      -0.001%
runtime [cmd/compile].s                             642419   642438   +19     +0.003%
sort.s                                              37434    37435    +1      +0.003%
strconv.s                                           48391    48343    -48     -0.099%
sort [cmd/compile].s                                37434    37435    +1      +0.003%
bufio.s                                             21386    21418    +32     +0.150%
strconv [cmd/compile].s                             48391    48343    -48     -0.099%
image.s                                             34978    35022    +44     +0.126%
regexp/syntax.s                                     81719    81781    +62     +0.076%
time.s                                              94341    94184    -157    -0.166%
regexp.s                                            60411    60399    -12     -0.020%
bufio [cmd/compile].s                               21512    21544    +32     +0.149%
encoding/binary.s                                   34062    34087    +25     +0.073%
regexp/syntax [cmd/compile].s                       81719    81781    +62     +0.076%
encoding/base64.s                                   11907    11903    -4      -0.034%
time [cmd/compile].s                                94341    94184    -157    -0.166%
index/suffixarray.s                                 41633    41527    -106    -0.255%
os.s                                                101770   101738   -32     -0.031%
regexp [cmd/compile].s                              60411    60399    -12     -0.020%
encoding/binary [cmd/compile].s                     37173    37198    +25     +0.067%
encoding/base64 [cmd/compile].s                     11907    11903    -4      -0.034%
os/exec.s                                           23900    23907    +7      +0.029%
encoding/hex.s                                      6038     6030     -8      -0.132%
crypto/des.s                                        5073     5056     -17     -0.335%
os [cmd/compile].s                                  102030   101998   -32     -0.031%
vendor/golang.org/x/net/http2/hpack.s               22027    22033    +6      +0.027%
math/big.s                                          164808   164753   -55     -0.033%
cmd/vendor/golang.org/x/sys/unix.s                  121450   121444   -6      -0.005%
encoding/json.s                                     110294   110287   -7      -0.006%
testing.s                                           115303   115281   -22     -0.019%
archive/zip.s                                       65329    65325    -4      -0.006%
os/user.s                                           10078    10080    +2      +0.020%
encoding/gob.s                                      143788   143783   -5      -0.003%
crypto/elliptic.s                                   30686    30704    +18     +0.059%
go/doc/comment.s                                    49401    49433    +32     +0.065%
debug/buildinfo.s                                   9095     9085     -10     -0.110%
image/png.s                                         36113    36081    -32     -0.089%
archive/tar.s                                       71994    71897    -97     -0.135%
crypto/internal/cryptotest.s                        60872    60849    -23     -0.038%
internal/pkgbits.s                                  20441    20429    -12     -0.059%
testing/quick.s                                     8236     8235     -1      -0.012%
log/slog.s                                          77568    77558    -10     -0.013%
internal/trace/internal/oldtrace.s                  52885    52896    +11     +0.021%
runtime/pprof.s                                     123978   123969   -9      -0.007%
internal/coverage/cfile.s                           25198    25184    -14     -0.056%
cmd/internal/objabi.s                               19954    19946    -8      -0.040%
crypto/ecdsa.s                                      29159    29141    -18     -0.062%
log/slog/internal/benchmarks.s                      6694     6695     +1      +0.015%
net.s                                               299569   299503   -66     -0.022%
os/exec [cmd/compile].s                             23888    23895    +7      +0.029%
internal/trace.s                                    179226   179240   +14     +0.008%
internal/fuzz.s                                     86190    86191    +1      +0.001%
crypto/x509.s                                       177195   177164   -31     -0.017%
cmd/internal/obj/s390x.s                            121642   121610   -32     -0.026%
cmd/internal/obj/ppc64.s                            140118   140122   +4      +0.003%
encoding/hex [cmd/compile].s                        6149     6141     -8      -0.130%
cmd/internal/objabi [cmd/compile].s                 19954    19946    -8      -0.040%
cmd/internal/obj/arm64.s                            158523   158555   +32     +0.020%
go/doc/comment [cmd/compile].s                      49512    49544    +32     +0.065%
math/big [cmd/compile].s                            166394   166339   -55     -0.033%
encoding/json [cmd/compile].s                       110712   110705   -7      -0.006%
cmd/covdata.s                                       39699    39687    -12     -0.030%
runtime/pprof [cmd/compile].s                       125209   125200   -9      -0.007%
cmd/compile/internal/syntax.s                       181755   181736   -19     -0.010%
cmd/dist.s                                          177893   177861   -32     -0.018%
crypto/tls.s                                        389157   389113   -44     -0.011%
internal/pkgbits [cmd/compile].s                    41644    41632    -12     -0.029%
cmd/compile/internal/syntax [cmd/compile].s         196105   196086   -19     -0.010%
cmd/compile/internal/types.s                        71315    71345    +30     +0.042%
cmd/internal/obj/s390x [cmd/compile].s              121733   121701   -32     -0.026%
cmd/go/internal/trace.s                             4796     4760     -36     -0.751%
cmd/internal/obj/arm64 [cmd/compile].s              168120   168147   +27     +0.016%
cmd/internal/obj/ppc64 [cmd/compile].s              140219   140223   +4      +0.003%
cmd/internal/script.s                               83442    83436    -6      -0.007%
cmd/link/internal/loader.s                          93299    93294    -5      -0.005%
net/http.s                                          620639   620472   -167    -0.027%
net/http/pprof.s                                    35016    35013    -3      -0.009%
cmd/compile/internal/coverage.s                     6668     6667     -1      -0.015%
cmd/vendor/golang.org/x/telemetry/internal/upload.s 34210    34148    -62     -0.181%
cmd/compile/internal/coverage [cmd/compile].s       6664     6663     -1      -0.015%
cmd/vendor/golang.org/x/text/language.s             48077    48074    -3      -0.006%
cmd/go/internal/vcweb.s                             45193    45189    -4      -0.009%
cmd/go/internal/vcs.s                               44749    44729    -20     -0.045%
cmd/compile/internal/inline/inlheur.s               83758    83742    -16     -0.019%
cmd/compile/internal/inline/inlheur [cmd/compile].s 84773    84757    -16     -0.019%
cmd/go/internal/modfetch/codehost.s                 89098    89094    -4      -0.004%
cmd/trace.s                                         257550   257564   +14     +0.005%
cmd/link/internal/ld.s                              641945   641706   -239    -0.037%
cmd/link/internal/arm64.s                           34805    34798    -7      -0.020%
cmd/go/internal/modload.s                           328971   328954   -17     -0.005%
cmd/go/internal/load.s                              178877   178871   -6      -0.003%
cmd/go/internal/clean.s                             11006    10990    -16     -0.145%
cmd/compile/internal/ssa.s                          3552843  3553347  +504    +0.014%
cmd/compile/internal/ssa [cmd/compile].s            3752511  3753123  +612    +0.016%
total                                               36179015 36178687 -328    -0.001%

Change-Id: I251c2898ccf3c9931d162d87dabbd49cf4ec73a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/641757
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-06 09:05:20 -08:00
Youlin Feng
0825475599 cmd/compile: do not treat OpLocalAddr as load in DSE
Fixes #70409
Fixes #47107

Change-Id: I82a66c46f6b76c68e156b5d937273b0316975d44
Reviewed-on: https://go-review.googlesource.com/c/go/+/629016
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2025-02-04 12:52:01 -08:00
Jakub Ciolek
e57769d5ad cmd/compile: on AMD64, prefer XOR/AND for (x & 1) == 0 check
It's shorter to encode. Additionally, XOR and AND generally
have higher throughput than BT/SET*.

compilecmp:

runtime
runtime.(*sweepClass).split 58 -> 56  (-3.45%)
runtime.sweepClass.split 14 -> 11  (-21.43%)

runtime [cmd/compile]
runtime.(*sweepClass).split 58 -> 56  (-3.45%)
runtime.sweepClass.split 14 -> 11  (-21.43%)

strconv
strconv.ryuFtoaShortest changed

strconv [cmd/compile]
strconv.ryuFtoaShortest changed

math/big
math/big.(*Int).MulRange 255 -> 252  (-1.18%)

testing/quick
testing/quick.sizedValue changed

internal/fuzz
internal/fuzz.(*pcgRand).bool 69 -> 70  (+1.45%)

cmd/internal/obj/x86
cmd/internal/obj/x86.(*AsmBuf).asmevex changed

math/big [cmd/compile]
math/big.(*Int).MulRange 255 -> 252  (-1.18%)

cmd/internal/obj/x86 [cmd/compile]
cmd/internal/obj/x86.(*AsmBuf).asmevex changed

net/http
net/http.(*http2stream).isPushed 11 -> 10  (-9.09%)

cmd/vendor/github.com/google/pprof/internal/binutils
cmd/vendor/github.com/google/pprof/internal/binutils.(*file).computeBase changed

Change-Id: I9cb2987eb263c85ee4e93d6f8455c91a55273173
Reviewed-on: https://go-review.googlesource.com/c/go/+/640975
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-03 08:42:01 -08:00
Michael Pratt
78e6f2a1c8 runtime: rename mapiterinit and mapiternext
mapiterinit allows external linkname. These users must allocate their
own iter struct for initialization by mapiterinit. Since the type is
unexported, they also must define the struct themselves. As a result,
they of course define the struct matching the old hiter definition (in
map_noswiss.go).

The old definition is smaller on 32-bit platforms. On those platforms,
mapiternext will clobber memory outside of the caller's allocation.

On all platforms, the pointer layout between the old hiter and new
maps.Iter does not match. Thus the GC may miss pointers and free
reachable objects early, or it may see non-pointers that look like heap
pointers and throw due to invalid references to free objects.

To avoid these issues, we must keep mapiterinit and mapiternext with the
old hiter definition. The most straightforward way to do this is to use
mapiterinit and mapiternext as a compatibility layer between the old and
new iter types.

The first step to that is to move normal map use off of these functions,
which is what this CL does.

Introduce new mapIterStart and mapIterNext functions that replace the
former functions everywhere in the toolchain. These have the same
behavior as the old functions.

This CL temporarily makes the old functions throw to ensure we don't
have hidden dependencies on them. We cannot remove them entirely because
GOEXPERIMENT=noswissmap still uses the old names, and internal/goobj
requires all builtins to exist regardless of GOEXPERIMENT. The next CL
will introduce the compatibility layer.

I want to avoid using linkname between runtime and reflect, as that
would also allow external linknames. So mapIterStart and mapIterNext are
duplicated in reflect, which can be done trivially, as it imports
internal/runtime/maps.

For #71408.

Change-Id: I6a6a636c6d4bd1392618c67ca648d3f061afe669
Reviewed-on: https://go-review.googlesource.com/c/go/+/643898
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-01-28 10:54:43 -08:00
Keith Randall
c5e205e928 internal/runtime/maps: re-enable some tests
Re-enable tests for stack-allocated maps and fast map accessors.
Those are implemented now.

Update #54766

Change-Id: I8c019702bd9fb077b2fe3f7c78e8e9e10d2263a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/642376
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-01-14 09:55:06 -08:00
Keith Randall
44a6f817ea cmd/compile: fix write barrier coalescing
We can't coalesce a non-WB store with a subsequent Move, as the
result of the store might be the source of the move.

There's a simple codegen test. Not sure how we might do a real test,
as all the repro's I've come up with are very expensive and unreliable.

Fixes #71228

Change-Id: If18bf181a266b9b90964e2591cd2e61a7168371c
Reviewed-on: https://go-review.googlesource.com/c/go/+/642197
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2025-01-12 22:49:39 -08:00
Youlin Feng
c4e6ab9750 cmd/compile: modify CSE to remove redundant OpLocalAddrs
Remove the OpLocalAddrs that are unnecessary in the CSE pass, so the
following passes like DSE and memcombine can do its work better.

Fixes #70300

Change-Id: I600025d49eeadb3ca4f092d614428399750f69bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/628075
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2024-11-22 00:12:03 +00:00
Keith Randall
f0b0109242 cmd/compile: pull multiple adds out of an unsafe.Pointer<->uintptr conversion
This came up in some swissmap code.

Change-Id: I3c6705a5cafec8cb4953dfa9535cf0b45255cc83
Reviewed-on: https://go-review.googlesource.com/c/go/+/629516
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2024-11-21 22:57:04 +00:00
Xiaolin Zhao
ab55465098 cmd/compile: wire up math/bits.TrailingZeros intrinsics for loong64
Micro-benchmark results on Loongson 3A5000 and 3A6000:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
                |  bench.old   |              bench.new               |
                |    sec/op    |    sec/op     vs base                |
TrailingZeros     1.7240n ± 0%   0.8120n ± 0%  -52.90% (p=0.000 n=20)
TrailingZeros8    1.0530n ± 0%   0.8015n ± 0%  -23.88% (p=0.000 n=20)
TrailingZeros16    2.072n ± 0%    1.015n ± 0%  -51.01% (p=0.000 n=20)
TrailingZeros32   1.7160n ± 0%   0.8122n ± 0%  -52.67% (p=0.000 n=20)
TrailingZeros64   2.0060n ± 0%   0.8125n ± 0%  -59.50% (p=0.000 n=20)
geomean            1.669n        0.8470n       -49.25%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
                |  bench.old   |              bench.new               |
                |    sec/op    |    sec/op     vs base                |
TrailingZeros     2.6275n ± 0%   0.9120n ± 0%  -65.29% (p=0.000 n=20)
TrailingZeros8     1.451n ± 0%    1.163n ± 0%  -19.85% (p=0.000 n=20)
TrailingZeros16    3.069n ± 0%    1.201n ± 0%  -60.87% (p=0.000 n=20)
TrailingZeros32   2.9060n ± 0%   0.9115n ± 0%  -68.63% (p=0.000 n=20)
TrailingZeros64   2.6305n ± 0%   0.9115n ± 0%  -65.35% (p=0.000 n=20)
geomean            2.456n         1.011n       -58.83%

This patch is a copy of CL 479498.
Co-authored-by: WANG Xuerui <git@xen0n.name>

Change-Id: I1a5b2114a844dc0d02c8e68f41ce2443ac3b5fda
Reviewed-on: https://go-review.googlesource.com/c/go/+/624356
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-11-13 00:57:25 +00:00
Paul E. Murphy
745ec75719 cmd/compile/internal/ssa: improve carry addition rules on PPC64
Fold constant int16 addends for usages of math/bits.Add64(x,const,0)
on PPC64. This usage shows up in a few crypto implementations;
notably the go wrapper for CL 626176.

Change-Id: I6963163330487d04e0479b4fdac235f97bb96889
Reviewed-on: https://go-review.googlesource.com/c/go/+/625899
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2024-11-12 17:40:44 +00:00
Guoqi Chen
fb9b946adc cmd/compile: optimize math/bits.OnesCount{16,32,64} implementation on loong64
Use Loong64's LSX instruction VPCNT to implement math/bits.OnesCount{16,32,64}
and make it intrinsic.

Benchmark results on loongson 3A5000 and 3A6000 machines:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000-HV @ 2500.00MHz
            |   bench.old   |   bench.new                          |
            |    sec/op     |    sec/op       vs base               |
OnesCount      4.413n ± 0%     1.401n ± 0%   -68.25% (p=0.000 n=10)
OnesCount8     1.364n ± 0%     1.363n ± 0%         ~ (p=0.130 n=10)
OnesCount16    2.112n ± 0%     1.534n ± 0%   -27.37% (p=0.000 n=10)
OnesCount32    4.533n ± 0%     1.529n ± 0%   -66.27% (p=0.000 n=10)
OnesCount64    4.565n ± 0%     1.531n ± 1%   -66.46% (p=0.000 n=10)
geomean        3.048n          1.470n        -51.78%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
            |   bench.old   |   bench.new                          |
            |    sec/op     |    sec/op       vs base              |
OnesCount       3.553n ± 0%     1.201n ± 0%  -66.20% (p=0.000 n=10)
OnesCount8     0.8021n ± 0%    0.8004n ± 0%   -0.21% (p=0.000 n=10)
OnesCount16     1.216n ± 0%     1.000n ± 0%  -17.76% (p=0.000 n=10)
OnesCount32     3.006n ± 0%     1.035n ± 0%  -65.57% (p=0.000 n=10)
OnesCount64     3.503n ± 0%     1.035n ± 0%  -70.45% (p=0.000 n=10)
geomean         2.053n          1.006n       -51.01%

Change-Id: I07a5b8da2bb48711b896387ec7625145804affc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/620978
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-12 00:48:04 +00:00
Xiaolin Zhao
583d750fa1 cmd/compile: wire up bits.Reverse intrinsics for loong64
Micro-benchmark results on Loongson 3A5000 and 3A6000:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
          |  CL 624576   |               this CL                |
          |    sec/op    |    sec/op     vs base                |
Reverse     2.8130n ± 0%   0.8008n ± 0%  -71.53% (p=0.000 n=20)
Reverse8    0.7014n ± 0%   0.4040n ± 0%  -42.40% (p=0.000 n=20)
Reverse16   1.2975n ± 0%   0.6632n ± 1%  -48.89% (p=0.000 n=20)
Reverse32   2.7520n ± 0%   0.4042n ± 0%  -85.31% (p=0.000 n=20)
Reverse64   2.8970n ± 0%   0.4041n ± 0%  -86.05% (p=0.000 n=20)
geomean      1.828n        0.5116n       -72.01%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
          |  CL 624576   |               this CL                |
          |    sec/op    |    sec/op     vs base                |
Reverse     4.0050n ± 0%   0.8011n ± 0%  -80.00% (p=0.000 n=20)
Reverse8    0.8010n ± 0%   0.5210n ± 1%  -34.96% (p=0.000 n=20)
Reverse16   1.6160n ± 0%   0.6008n ± 0%  -62.82% (p=0.000 n=20)
Reverse32   3.8550n ± 0%   0.5179n ± 0%  -86.57% (p=0.000 n=20)
Reverse64   3.8050n ± 0%   0.5177n ± 0%  -86.40% (p=0.000 n=20)
geomean      2.378n        0.5828n       -75.49%

Updates #59120

This patch is a copy of CL 483656.
Co-authored-by: WANG Xuerui <git@xen0n.name>

Change-Id: I98681091763279279c8404bd0295785f13ea1c8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/624276
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2024-11-11 00:08:45 +00:00
Xiaolin Zhao
e6cc9d228a cmd/compile: implement FMA codegen for loong64
Benchmark results on Loongson 3A5000 and 3A6000:

goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
    |  bench.old   |              bench.new              |
    |    sec/op    |   sec/op     vs base                |
FMA   25.930n ± 0%   2.002n ± 0%  -92.28% (p=0.000 n=10)

goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
    |  bench.old   |              bench.new              |
    |    sec/op    |   sec/op     vs base                |
FMA   32.840n ± 0%   2.002n ± 0%  -93.90% (p=0.000 n=10)

Updates #59120

This patch is a copy of CL 483355.
Co-authored-by: WANG Xuerui <git@xen0n.name>

Change-Id: I88b89d23f00864f9173a182a47ee135afec7ed6e
Reviewed-on: https://go-review.googlesource.com/c/go/+/625335
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-11-08 01:05:48 +00:00
Xiaolin Zhao
d6fb0ab2c7 cmd/compile: wire up Bswap/ReverseBytes intrinsics for loong64
Micro-benchmark results on Loongson 3A5000 and 3A6000:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
               |  bench.old   |              bench.new               |
               |    sec/op    |    sec/op     vs base                |
ReverseBytes     2.0020n ± 0%   0.4040n ± 0%  -79.82% (p=0.000 n=20)
ReverseBytes16   0.8866n ± 1%   0.8007n ± 0%   -9.69% (p=0.000 n=20)
ReverseBytes32   1.2195n ± 0%   0.8007n ± 0%  -34.34% (p=0.000 n=20)
ReverseBytes64   2.0705n ± 0%   0.8008n ± 0%  -61.32% (p=0.000 n=20)
geomean           1.455n        0.6749n       -53.62%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
               |  bench.old   |              bench.new               |
               |    sec/op    |    sec/op     vs base                |
ReverseBytes     2.8040n ± 0%   0.5205n ± 0%  -81.44% (p=0.000 n=20)
ReverseBytes16   0.7066n ± 0%   0.8011n ± 0%  +13.37% (p=0.000 n=20)
ReverseBytes32   1.5500n ± 0%   0.8010n ± 0%  -48.32% (p=0.000 n=20)
ReverseBytes64   2.7665n ± 0%   0.8010n ± 0%  -71.05% (p=0.000 n=20)
geomean           1.707n        0.7192n       -57.87%

Updates #59120

This patch is a copy of CL 483357.
Co-authored-by: WANG Xuerui <git@xen0n.name>

Change-Id: If355354cd031533df91991fcc3392e5a6c314295
Reviewed-on: https://go-review.googlesource.com/c/go/+/624576
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-11-06 03:12:50 +00:00
Xiaolin Zhao
d98c51809d cmd/compile: wire up math/bits.Len intrinsics for loong64
For the SubFromLen64 codegen test case to work as intended, we need
to fold c-(-(x-d)) into x+(c-d).

Still, some instances of LeadingZeros are not optimized into single
CLZ instructions right now (actually, the LeadingZeros micro-benchmarks
are currently still compiled with redundant adds/subs of 64, due to
interference of loop optimizations before lowering), but perf numbers
indicate it's not that bad after all.

Micro-benchmark results on Loongson 3A5000 and 3A6000:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
               |  bench.old  |              bench.new              |
               |   sec/op    |   sec/op     vs base                |
LeadingZeros     3.660n ± 0%   1.348n ± 0%  -63.17% (p=0.000 n=20)
LeadingZeros8    1.777n ± 0%   1.767n ± 0%   -0.56% (p=0.000 n=20)
LeadingZeros16   2.816n ± 0%   1.770n ± 0%  -37.14% (p=0.000 n=20)
LeadingZeros32   5.293n ± 1%   1.683n ± 0%  -68.21% (p=0.000 n=20)
LeadingZeros64   3.622n ± 0%   1.349n ± 0%  -62.76% (p=0.000 n=20)
geomean          3.229n        1.571n       -51.35%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
               |  bench.old   |              bench.new               |
               |    sec/op    |    sec/op     vs base                |
LeadingZeros      2.410n ± 0%    1.103n ± 1%  -54.23% (p=0.000 n=20)
LeadingZeros8     1.236n ± 0%    1.501n ± 0%  +21.44% (p=0.000 n=20)
LeadingZeros16    2.106n ± 0%    1.501n ± 0%  -28.73% (p=0.000 n=20)
LeadingZeros32    2.860n ± 0%    1.324n ± 0%  -53.72% (p=0.000 n=20)
LeadingZeros64   2.6135n ± 0%   0.9509n ± 0%  -63.62% (p=0.000 n=20)
geomean           2.159n         1.256n       -41.81%

Updates #59120

This patch is a copy of CL 483356.
Co-authored-by: WANG Xuerui <git@xen0n.name>

Change-Id: Iee81a17f7da06d77a427e73dfcc016f2b15ae556
Reviewed-on: https://go-review.googlesource.com/c/go/+/624575
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2024-11-06 00:40:40 +00:00
Xiaolin Zhao
5f88755f43 cmd/compile: add loong64-specific inlining for runtime.memmove
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
                                 |   bench.old   |               bench.new                |
                                 |    sec/op     |    sec/op     vs base                  |
Memmove/0                          0.8004n ±  0%   0.4002n ± 0%  -50.00% (p=0.000 n=20)
Memmove/1                           2.494n ±  0%    2.136n ± 0%  -14.35% (p=0.000 n=20)
Memmove/2                           2.802n ±  0%    2.512n ± 0%  -10.35% (p=0.000 n=20)
Memmove/3                           2.802n ±  0%    2.497n ± 0%  -10.92% (p=0.000 n=20)
Memmove/4                           3.202n ±  0%    2.808n ± 0%  -12.30% (p=0.000 n=20)
Memmove/5                           2.821n ±  0%    2.658n ± 0%   -5.76% (p=0.000 n=20)
Memmove/6                           2.819n ±  0%    2.657n ± 0%   -5.73% (p=0.000 n=20)
Memmove/7                           2.820n ±  0%    2.654n ± 0%   -5.87% (p=0.000 n=20)
Memmove/8                           3.202n ±  0%    2.814n ± 0%  -12.12% (p=0.000 n=20)
Memmove/9                           3.202n ±  0%    3.009n ± 0%   -6.03% (p=0.000 n=20)
Memmove/10                          3.202n ±  0%    3.009n ± 0%   -6.03% (p=0.000 n=20)
Memmove/11                          3.202n ±  0%    3.009n ± 0%   -6.03% (p=0.000 n=20)
Memmove/12                          3.202n ±  0%    3.010n ± 0%   -6.01% (p=0.000 n=20)
Memmove/13                          3.202n ±  0%    3.009n ± 0%   -6.03% (p=0.000 n=20)
Memmove/14                          3.202n ±  0%    3.009n ± 0%   -6.03% (p=0.000 n=20)
Memmove/15                          3.202n ±  0%    3.010n ± 0%   -6.01% (p=0.000 n=20)
Memmove/16                          3.202n ±  0%    3.009n ± 0%   -6.03% (p=0.000 n=20)
Memmove/32                          3.602n ±  0%    3.603n ± 0%   +0.03% (p=0.000 n=20)
Memmove/64                          4.202n ±  0%    4.204n ± 0%   +0.05% (p=0.000 n=20)
Memmove/128                         8.005n ±  0%    8.007n ± 0%   +0.02% (p=0.000 n=20)
Memmove/256                         11.21n ±  0%    10.81n ± 0%   -3.57% (p=0.000 n=20)
Memmove/512                         17.65n ±  0%    17.96n ± 0%   +1.73% (p=0.000 n=20)
Memmove/1024                        30.48n ±  0%    30.46n ± 0%   -0.07% (p=0.000 n=20)
Memmove/2048                        56.43n ±  0%    56.30n ± 0%   -0.24% (p=0.000 n=20)
Memmove/4096                        107.7n ±  0%    107.6n ± 0%   -0.09% (p=0.000 n=20)
MemmoveOverlap/32                   4.002n ±  0%    4.003n ± 0%   +0.02% (p=0.002 n=20)
MemmoveOverlap/64                   4.603n ±  0%    4.603n ± 0%        ~ (p=0.286 n=20)
MemmoveOverlap/128                  8.704n ±  0%    8.699n ± 0%        ~ (p=0.180 n=20)
MemmoveOverlap/256                  12.01n ±  0%    11.76n ± 0%   -2.08% (p=0.000 n=20)
MemmoveOverlap/512                  18.42n ±  0%    18.36n ± 0%   -0.33% (p=0.000 n=20)
MemmoveOverlap/1024                 31.23n ±  0%    31.16n ± 0%   -0.21% (p=0.000 n=20)
MemmoveOverlap/2048                 57.42n ±  0%    56.82n ± 0%   -1.04% (p=0.000 n=20)
MemmoveOverlap/4096                 108.5n ±  0%    108.0n ± 0%   -0.46% (p=0.000 n=20)
MemmoveUnalignedDst/0               2.804n ±  0%    2.447n ± 0%  -12.70% (p=0.000 n=20)
MemmoveUnalignedDst/1               2.802n ±  0%    2.491n ± 0%  -11.12% (p=0.000 n=20)
MemmoveUnalignedDst/2               3.202n ±  0%    2.808n ± 0%  -12.29% (p=0.000 n=20)
MemmoveUnalignedDst/3               3.202n ±  0%    2.814n ± 0%  -12.12% (p=0.000 n=20)
MemmoveUnalignedDst/4               3.602n ±  0%    3.202n ± 0%  -11.10% (p=0.000 n=20)
MemmoveUnalignedDst/5               3.202n ±  0%    3.203n ± 0%   +0.03% (p=0.014 n=20)
MemmoveUnalignedDst/6               3.202n ±  0%    3.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedDst/7               3.202n ±  0%    3.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedDst/8               3.602n ±  0%    3.202n ± 0%  -11.10% (p=0.000 n=20)
MemmoveUnalignedDst/9               3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedDst/10              3.602n ±  0%    3.602n ± 0%        ~ (p=0.091 n=20)
MemmoveUnalignedDst/11              3.602n ±  0%    3.602n ± 0%        ~ (p=0.613 n=20)
MemmoveUnalignedDst/12              3.602n ±  0%    3.602n ± 0%        ~ (p=0.165 n=20)
MemmoveUnalignedDst/13              3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedDst/14              3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedDst/15              3.602n ±  0%    3.602n ± 0%    0.00% (p=0.027 n=20)
MemmoveUnalignedDst/16              3.602n ±  0%    3.602n ± 0%        ~ (p=0.661 n=20)
MemmoveUnalignedDst/32              4.002n ±  0%    4.002n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedDst/64              6.804n ±  0%    6.804n ± 0%        ~ (p=0.204 n=20)
MemmoveUnalignedDst/128             12.61n ±  0%    12.61n ± 0%        ~ (p=1.000 n=20) ¹
MemmoveUnalignedDst/256             16.33n ±  2%    16.32n ± 2%        ~ (p=0.839 n=20)
MemmoveUnalignedDst/512             25.61n ±  0%    24.71n ± 0%   -3.51% (p=0.000 n=20)
MemmoveUnalignedDst/1024            42.81n ±  0%    42.82n ± 0%        ~ (p=0.973 n=20)
MemmoveUnalignedDst/2048            74.86n ±  0%    76.03n ± 0%   +1.56% (p=0.000 n=20)
MemmoveUnalignedDst/4096            152.0n ± 11%    152.0n ± 0%    0.00% (p=0.013 n=20)
MemmoveUnalignedDstOverlap/32       5.319n ±  0%    5.558n ± 1%   +4.50% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/64       8.006n ±  0%    8.025n ± 0%   +0.24% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/128      9.631n ±  0%    9.601n ± 0%   -0.31% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/256      13.79n ±  2%    13.58n ± 1%        ~ (p=0.234 n=20)
MemmoveUnalignedDstOverlap/512      21.38n ±  0%    21.30n ± 0%   -0.37% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/1024     41.71n ±  0%    41.70n ± 0%        ~ (p=0.887 n=20)
MemmoveUnalignedDstOverlap/2048     81.63n ±  0%    81.61n ± 0%        ~ (p=0.481 n=20)
MemmoveUnalignedDstOverlap/4096     162.6n ±  0%    162.6n ± 0%        ~ (p=0.171 n=20)
MemmoveUnalignedSrc/0               2.808n ±  0%    2.482n ± 0%  -11.61% (p=0.000 n=20)
MemmoveUnalignedSrc/1               2.804n ±  0%    2.577n ± 0%   -8.08% (p=0.000 n=20)
MemmoveUnalignedSrc/2               3.202n ±  0%    2.806n ± 0%  -12.37% (p=0.000 n=20)
MemmoveUnalignedSrc/3               3.202n ±  0%    2.808n ± 0%  -12.30% (p=0.000 n=20)
MemmoveUnalignedSrc/4               3.602n ±  0%    3.202n ± 0%  -11.10% (p=0.000 n=20)
MemmoveUnalignedSrc/5               3.202n ±  0%    3.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/6               3.202n ±  0%    3.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/7               3.202n ±  0%    3.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/8               3.602n ±  0%    3.202n ± 0%  -11.10% (p=0.000 n=20)
MemmoveUnalignedSrc/9               3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/10              3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/11              3.602n ±  0%    3.602n ± 0%        ~ (p=0.746 n=20)
MemmoveUnalignedSrc/12              3.602n ±  0%    3.602n ± 0%        ~ (p=0.407 n=20)
MemmoveUnalignedSrc/13              3.603n ±  0%    3.602n ± 0%   -0.03% (p=0.001 n=20)
MemmoveUnalignedSrc/14              3.603n ±  0%    3.602n ± 0%   -0.01% (p=0.013 n=20)
MemmoveUnalignedSrc/15              3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/16              3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/32              4.002n ±  0%    4.002n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/64              4.803n ±  0%    4.803n ± 0%    0.00% (p=0.008 n=20)
MemmoveUnalignedSrc/128             8.405n ±  0%    8.405n ± 0%    0.00% (p=0.003 n=20)
MemmoveUnalignedSrc/256             12.04n ±  3%    12.20n ± 2%        ~ (p=0.151 n=20)
MemmoveUnalignedSrc/512             19.11n ±  0%    19.10n ± 3%        ~ (p=0.621 n=20)
MemmoveUnalignedSrc/1024            35.62n ±  0%    35.62n ± 0%        ~ (p=0.407 n=20)
MemmoveUnalignedSrc/2048            68.04n ±  0%    68.35n ± 0%   +0.46% (p=0.000 n=20)
MemmoveUnalignedSrc/4096            133.2n ±  1%    133.3n ± 0%        ~ (p=0.131 n=20)
MemmoveUnalignedSrcDst/f_16_0       4.202n ±  0%    4.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_0       4.202n ±  0%    4.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_16_1       4.202n ±  0%    4.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_1       4.202n ±  0%    4.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_16_4       4.202n ±  0%    4.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_4       4.202n ±  0%    4.202n ± 0%        ~ (p=0.661 n=20)
MemmoveUnalignedSrcDst/f_16_7       4.202n ±  0%    4.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_7       4.203n ±  0%    4.202n ± 0%   -0.02% (p=0.008 n=20)
MemmoveUnalignedSrcDst/f_64_0       6.103n ±  0%    6.100n ± 0%        ~ (p=0.595 n=20)
MemmoveUnalignedSrcDst/b_64_0       6.103n ±  0%    6.102n ± 0%        ~ (p=0.973 n=20)
MemmoveUnalignedSrcDst/f_64_1       7.419n ±  0%    7.226n ± 0%   -2.59% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_64_1       6.745n ±  0%    6.941n ± 0%   +2.89% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_64_4       7.420n ±  0%    7.223n ± 0%   -2.65% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_64_4       6.753n ±  0%    6.941n ± 0%   +2.79% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_64_7       7.423n ±  0%    7.204n ± 0%   -2.96% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_64_7       6.750n ±  0%    6.941n ± 0%   +2.83% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_256_0      12.96n ±  0%    12.99n ± 0%   +0.27% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_256_0      12.91n ±  0%    12.94n ± 0%   +0.23% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_256_1      17.21n ±  0%    17.21n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_256_1      17.61n ±  0%    17.61n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_256_4      16.21n ±  0%    16.21n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_256_4      16.41n ±  0%    16.41n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_256_7      14.12n ±  0%    14.10n ± 0%        ~ (p=0.307 n=20)
MemmoveUnalignedSrcDst/b_256_7      14.81n ±  0%    14.81n ± 0%        ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/f_4096_0     109.3n ±  0%    109.4n ± 0%   +0.09% (p=0.004 n=20)
MemmoveUnalignedSrcDst/b_4096_0     109.6n ±  0%    109.6n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_4096_1     113.5n ±  0%    113.5n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_4096_1     113.7n ±  0%    113.7n ± 0%        ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/f_4096_4     112.3n ±  0%    112.3n ± 0%        ~ (p=0.763 n=20)
MemmoveUnalignedSrcDst/b_4096_4     112.6n ±  0%    112.9n ± 1%   +0.31% (p=0.032 n=20)
MemmoveUnalignedSrcDst/f_4096_7     110.6n ±  0%    110.6n ± 0%        ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/b_4096_7     111.1n ±  0%    111.1n ± 0%        ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/f_65536_0    4.801µ ±  0%    4.818µ ± 0%   +0.34% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_0    5.027µ ±  0%    5.036µ ± 0%   +0.19% (p=0.007 n=20)
MemmoveUnalignedSrcDst/f_65536_1    4.815µ ±  0%    4.729µ ± 0%   -1.78% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_1    4.659µ ±  0%    4.737µ ± 1%   +1.69% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_65536_4    4.807µ ±  0%    4.721µ ± 0%   -1.78% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_4    4.659µ ±  0%    4.601µ ± 0%   -1.23% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_65536_7    4.868µ ±  0%    4.759µ ± 0%   -2.23% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_7    4.665µ ±  0%    4.709µ ± 0%   +0.93% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/32       6.804n ±  0%    6.810n ± 0%   +0.09% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/64       10.41n ±  0%    10.42n ± 0%   +0.10% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/128      11.59n ±  0%    11.58n ± 0%        ~ (p=0.414 n=20)
MemmoveUnalignedSrcOverlap/256      14.22n ±  0%    14.29n ± 0%   +0.46% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/512      23.11n ±  0%    23.04n ± 0%   -0.28% (p=0.001 n=20)
MemmoveUnalignedSrcOverlap/1024     41.44n ±  0%    41.47n ± 0%        ~ (p=0.693 n=20)
MemmoveUnalignedSrcOverlap/2048     81.25n ±  0%    81.25n ± 0%        ~ (p=0.405 n=20)
MemmoveUnalignedSrcOverlap/4096     166.1n ±  0%    166.1n ± 0%        ~ (p=0.451 n=20)
geomean                             13.02n          12.69n        -2.51%
¹ all samples are equal

Change-Id: I712adc7670f6ae360714ec5a770d00d76c8700ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/618815
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2024-11-05 00:44:11 +00:00
Xiaolin Zhao
aef81a7551 cmd/compile: add rules to optimize go codes to constant 0 on loong64
goos: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A6000 @ 2500.00MHz
                      │  old.bench  │             new.bench              │
                      │   sec/op    │   sec/op     vs base               │
BinaryTree17             7.735 ± 1%    7.716 ± 1%  -0.23% (p=0.041 n=15)
Fannkuch11               2.645 ± 0%    2.646 ± 0%  +0.05% (p=0.013 n=15)
FmtFprintfEmpty         35.87n ± 0%   35.89n ± 0%  +0.06% (p=0.000 n=15)
FmtFprintfString        59.54n ± 0%   59.47n ± 0%       ~ (p=0.213 n=15)
FmtFprintfInt           62.23n ± 0%   62.06n ± 0%       ~ (p=0.212 n=15)
FmtFprintfIntInt        98.16n ± 0%   97.90n ± 0%  -0.26% (p=0.000 n=15)
FmtFprintfPrefixedInt   117.0n ± 0%   116.7n ± 0%  -0.26% (p=0.000 n=15)
FmtFprintfFloat         204.6n ± 0%   204.2n ± 0%  -0.20% (p=0.000 n=15)
FmtManyArgs             456.3n ± 0%   455.4n ± 0%  -0.20% (p=0.000 n=15)
GobDecode               7.210m ± 0%   7.156m ± 1%  -0.75% (p=0.000 n=15)
GobEncode               8.143m ± 1%   8.177m ± 1%       ~ (p=0.806 n=15)
Gzip                    280.2m ± 0%   279.7m ± 0%  -0.19% (p=0.005 n=15)
Gunzip                  32.71m ± 0%   32.65m ± 0%  -0.19% (p=0.000 n=15)
HTTPClientServer        53.76µ ± 0%   53.65µ ± 0%       ~ (p=0.083 n=15)
JSONEncode              9.297m ± 0%   9.295m ± 0%       ~ (p=0.806 n=15)
JSONDecode              46.97m ± 1%   47.07m ± 1%       ~ (p=0.683 n=15)
Mandelbrot200           4.602m ± 0%   4.600m ± 0%  -0.05% (p=0.001 n=15)
GoParse                 4.682m ± 0%   4.670m ± 1%  -0.25% (p=0.001 n=15)
RegexpMatchEasy0_32     59.80n ± 0%   59.63n ± 0%  -0.28% (p=0.000 n=15)
RegexpMatchEasy0_1K     458.3n ± 0%   457.3n ± 0%  -0.22% (p=0.001 n=15)
RegexpMatchEasy1_32     59.39n ± 0%   59.23n ± 0%  -0.27% (p=0.000 n=15)
RegexpMatchEasy1_1K     557.9n ± 0%   556.6n ± 0%  -0.23% (p=0.001 n=15)
RegexpMatchMedium_32    803.6n ± 0%   801.8n ± 0%  -0.22% (p=0.001 n=15)
RegexpMatchMedium_1K    27.32µ ± 0%   27.26µ ± 0%  -0.21% (p=0.000 n=15)
RegexpMatchHard_32      1.385µ ± 0%   1.382µ ± 0%  -0.22% (p=0.000 n=15)
RegexpMatchHard_1K      40.93µ ± 0%   40.83µ ± 0%  -0.24% (p=0.000 n=15)
Revcomp                 474.8m ± 0%   474.3m ± 0%       ~ (p=0.250 n=15)
Template                77.41m ± 1%   76.63m ± 1%  -1.01% (p=0.023 n=15)
TimeParse               271.1n ± 0%   271.2n ± 0%  +0.04% (p=0.022 n=15)
TimeFormat              290.0n ± 0%   289.8n ± 0%       ~ (p=0.118 n=15)
geomean                 51.73µ        51.64µ       -0.18%

Change-Id: I45a1e6c85bb3cea0f62766ec932432803e9af10a
Reviewed-on: https://go-review.googlesource.com/c/go/+/619315
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-10-29 01:17:54 +00:00
Youlin Feng
bb07aa644b cmd/compile: add shift optimization test
For #69635

Change-Id: Id5696dc9724c3b3afcd7b60a6994f98c5309eb0e
Reviewed-on: https://go-review.googlesource.com/c/go/+/621755
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
2024-10-25 15:35:29 +00:00
Youlin Feng
711552e98a cmd/compile: optimize type switch for a single runtime known type with a case var
Change-Id: I03ba70076d6dd3c0b9624d14699b7dd91a3c0e9b
Reviewed-on: https://go-review.googlesource.com/c/go/+/618476
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2024-10-25 02:56:11 +00:00
Paul E. Murphy
1846dd5a31 cmd/compile/internal/ssa: fix PPC64 shift codegen regression
CL 621357 introduced new generic lowering rules which caused
several shift related codegen test failures.

Add new rules to fix the test regressions, and cleanup tests
which are changed but not regressed. Some CLRLSLDI tests are
removed as they are no test CLRLSLDI rules.

Fixes #70003

Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034
Reviewed-on: https://go-review.googlesource.com/c/go/+/622236
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-24 17:32:18 +00:00
Xiaolin Zhao
91d07ac71c cmd/compile: inline constant sized memclrNoHeapPointers calls on loong64
Tested that on loong64, the optimization effect is negative for
constant size cases greater than 512.
So only enable inlining for constant size cases less than 512.

goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
                      |  bench.old   |              bench.new               |
                      |    sec/op    |    sec/op     vs base                |
MemclrKnownSize1        2.4070n ± 0%   0.4004n ± 0%  -83.37% (p=0.000 n=20)
MemclrKnownSize2        2.1365n ± 0%   0.4004n ± 0%  -81.26% (p=0.000 n=20)
MemclrKnownSize4        2.4445n ± 0%   0.4004n ± 0%  -83.62% (p=0.000 n=20)
MemclrKnownSize8        2.4200n ± 0%   0.4004n ± 0%  -83.45% (p=0.000 n=20)
MemclrKnownSize16       2.8030n ± 0%   0.8007n ± 0%  -71.43% (p=0.000 n=20)
MemclrKnownSize32        2.803n ± 0%    1.602n ± 0%  -42.85% (p=0.000 n=20)
MemclrKnownSize64        3.250n ± 0%    2.402n ± 0%  -26.08% (p=0.000 n=20)
MemclrKnownSize112       6.006n ± 0%    2.819n ± 0%  -53.06% (p=0.000 n=20)
MemclrKnownSize128       6.006n ± 0%    3.240n ± 0%  -46.05% (p=0.000 n=20)
MemclrKnownSize192       6.807n ± 0%    5.205n ± 0%  -23.53% (p=0.000 n=20)
MemclrKnownSize248       7.608n ± 0%    6.301n ± 0%  -17.19% (p=0.000 n=20)
MemclrKnownSize256       7.608n ± 0%    6.707n ± 0%  -11.84% (p=0.000 n=20)
MemclrKnownSize512       13.61n ± 0%    13.61n ± 0%        ~ (p=0.374 n=20)
MemclrKnownSize1024      26.43n ± 0%    26.43n ± 0%        ~ (p=0.826 n=20)
MemclrKnownSize4096      103.3n ± 0%    103.3n ± 0%        ~ (p=1.000 n=20)
MemclrKnownSize512KiB    26.29µ ± 0%    26.29µ ± 0%   -0.00% (p=0.012 n=20)
geomean                  10.05n         5.006n       -50.18%

                      |  bench.old   |               bench.new                |
                      |     B/s      |      B/s       vs base                 |
MemclrKnownSize1        396.2Mi ± 0%   2381.9Mi ± 0%  +501.21% (p=0.000 n=20)
MemclrKnownSize2        892.8Mi ± 0%   4764.0Mi ± 0%  +433.59% (p=0.000 n=20)
MemclrKnownSize4        1.524Gi ± 0%    9.305Gi ± 0%  +510.56% (p=0.000 n=20)
MemclrKnownSize8        3.079Gi ± 0%   18.609Gi ± 0%  +504.42% (p=0.000 n=20)
MemclrKnownSize16       5.316Gi ± 0%   18.609Gi ± 0%  +250.05% (p=0.000 n=20)
MemclrKnownSize32       10.63Gi ± 0%    18.61Gi ± 0%   +75.00% (p=0.000 n=20)
MemclrKnownSize64       18.34Gi ± 0%    24.81Gi ± 0%   +35.27% (p=0.000 n=20)
MemclrKnownSize112      17.37Gi ± 0%    37.01Gi ± 0%  +113.08% (p=0.000 n=20)
MemclrKnownSize128      19.85Gi ± 0%    36.80Gi ± 0%   +85.39% (p=0.000 n=20)
MemclrKnownSize192      26.27Gi ± 0%    34.35Gi ± 0%   +30.77% (p=0.000 n=20)
MemclrKnownSize248      30.36Gi ± 0%    36.66Gi ± 0%   +20.75% (p=0.000 n=20)
MemclrKnownSize256      31.34Gi ± 0%    35.55Gi ± 0%   +13.43% (p=0.000 n=20)
MemclrKnownSize512      35.02Gi ± 0%    35.03Gi ± 0%    +0.00% (p=0.030 n=20)
MemclrKnownSize1024     36.09Gi ± 0%    36.09Gi ± 0%         ~ (p=0.101 n=20)
MemclrKnownSize4096     36.93Gi ± 0%    36.93Gi ± 0%    +0.00% (p=0.003 n=20)
MemclrKnownSize512KiB   18.57Gi ± 0%    18.57Gi ± 0%    +0.00% (p=0.041 n=20)
geomean                 10.13Gi         20.33Gi       +100.72%

Change-Id: I460a56f7ccc9f820ca2c1934c1c517b9614809ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/621355
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
2024-10-24 08:55:31 +00:00
Keith Randall
74163c895a cmd/compile: use STP/LDP around morestack on arm64
The spill/restore code around morestack is almost never exectued, so
we should make it as small as possible. Using 2-register loads/stores
makes sense here. Also, the offsets from SP are pretty small so the
offset almost always fits in the (smaller than a normal load/store)
offset field of the instruction.

Makes cmd/go 0.6% smaller.

Change-Id: I8845283c1b269a259498153924428f6173bda293
Reviewed-on: https://go-review.googlesource.com/c/go/+/621556
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-10-22 16:23:12 +00:00
Xiaolin Zhao
ef3e1dae2f cmd/compile: optimize loong64 with register indexed load/store
goos: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A6000 @ 2500.00MHz
                      |  bench.old  |              bench.new              |
                      |   sec/op    |   sec/op     vs base                |
BinaryTree17             7.766 ± 1%    7.640 ± 2%   -1.62% (p=0.000 n=20)
Fannkuch11               2.649 ± 0%    2.358 ± 0%  -10.96% (p=0.000 n=20)
FmtFprintfEmpty         35.89n ± 0%   35.87n ± 0%   -0.06% (p=0.000 n=20)
FmtFprintfString        59.44n ± 0%   57.25n ± 2%   -3.68% (p=0.000 n=20)
FmtFprintfInt           62.07n ± 0%   60.04n ± 0%   -3.27% (p=0.000 n=20)
FmtFprintfIntInt        97.90n ± 0%   97.26n ± 0%   -0.65% (p=0.000 n=20)
FmtFprintfPrefixedInt   116.7n ± 0%   119.2n ± 0%   +2.14% (p=0.000 n=20)
FmtFprintfFloat         204.5n ± 0%   201.9n ± 0%   -1.30% (p=0.000 n=20)
FmtManyArgs             455.9n ± 0%   466.8n ± 0%   +2.39% (p=0.000 n=20)
GobDecode               7.458m ± 1%   7.138m ± 1%   -4.28% (p=0.000 n=20)
GobEncode               8.573m ± 1%   8.473m ± 1%        ~ (p=0.091 n=20)
Gzip                    280.2m ± 0%   284.9m ± 0%   +1.67% (p=0.000 n=20)
Gunzip                  32.68m ± 0%   32.67m ± 0%        ~ (p=0.211 n=20)
HTTPClientServer        54.22µ ± 0%   53.24µ ± 0%   -1.80% (p=0.000 n=20)
JSONEncode              9.427m ± 1%   9.152m ± 0%   -2.92% (p=0.000 n=20)
JSONDecode              47.08m ± 1%   46.85m ± 1%   -0.49% (p=0.007 n=20)
Mandelbrot200           4.601m ± 0%   4.605m ± 0%   +0.08% (p=0.000 n=20)
GoParse                 4.776m ± 0%   4.655m ± 1%   -2.52% (p=0.000 n=20)
RegexpMatchEasy0_32     59.77n ± 0%   57.59n ± 0%   -3.66% (p=0.000 n=20)
RegexpMatchEasy0_1K     458.1n ± 0%   458.8n ± 0%   +0.15% (p=0.000 n=20)
RegexpMatchEasy1_32     59.36n ± 0%   59.24n ± 0%   -0.20% (p=0.000 n=20)
RegexpMatchEasy1_1K     557.7n ± 0%   560.2n ± 0%   +0.46% (p=0.000 n=20)
RegexpMatchMedium_32    803.1n ± 0%   772.8n ± 0%   -3.77% (p=0.000 n=20)
RegexpMatchMedium_1K    27.29µ ± 0%   25.88µ ± 0%   -5.18% (p=0.000 n=20)
RegexpMatchHard_32      1.385µ ± 0%   1.304µ ± 0%   -5.85% (p=0.000 n=20)
RegexpMatchHard_1K      40.92µ ± 0%   39.58µ ± 0%   -3.27% (p=0.000 n=20)
Revcomp                 474.3m ± 0%   410.0m ± 0%  -13.56% (p=0.000 n=20)
Template                78.16m ± 0%   76.32m ± 1%   -2.36% (p=0.000 n=20)
TimeParse               271.8n ± 0%   272.1n ± 0%   +0.11% (p=0.000 n=20)
TimeFormat              292.3n ± 0%   294.8n ± 0%   +0.86% (p=0.000 n=20)
geomean                 51.98µ        50.82µ        -2.22%

Change-Id: Ia78f1ddee8f1d9ec7192a4b8d2a4ec6058679956
Reviewed-on: https://go-review.googlesource.com/c/go/+/615918
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2024-10-17 07:32:25 +00:00
Cuong Manh Le
7e2487cf65 cmd/compile: avoid dynamic type when possible
If the expression type is a single compile-time known type, use that
type instead of the dynamic one, so the later passes of the compiler
could skip un-necessary runtime calls.

Thanks Youlin Feng for writing the original test case.

Change-Id: I3f65ab90f041474a9731338a82136c1d394c1773
Reviewed-on: https://go-review.googlesource.com/c/go/+/616975
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-10-07 19:12:01 +00:00
Xiaolin Zhao
f243cf6016 cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on loong64
Use float <-> int register moves without conversion instead of stores
and loads to move float <-> int values like arm64 and mips64.

goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
                    │  bench.old   │               bench.new                │
                    │    sec/op    │    sec/op     vs base                  │
Acos                   15.98n ± 0%    15.94n ± 0%   -0.25% (p=0.000 n=20)
Acosh                  27.75n ± 0%    25.56n ± 0%   -7.89% (p=0.000 n=20)
Asin                   15.85n ± 0%    15.76n ± 0%   -0.57% (p=0.000 n=20)
Asinh                  39.79n ± 0%    37.69n ± 0%   -5.28% (p=0.000 n=20)
Atan                   7.261n ± 0%    7.242n ± 0%   -0.27% (p=0.000 n=20)
Atanh                  28.30n ± 0%    27.62n ± 0%   -2.40% (p=0.000 n=20)
Atan2                  15.85n ± 0%    15.75n ± 0%   -0.63% (p=0.000 n=20)
Cbrt                   27.02n ± 0%    21.08n ± 0%  -21.98% (p=0.000 n=20)
Ceil                   2.830n ± 1%    2.896n ± 1%   +2.31% (p=0.000 n=20)
Copysign              0.8022n ± 0%   0.8004n ± 0%   -0.22% (p=0.000 n=20)
Cos                    11.64n ± 0%    11.61n ± 0%   -0.26% (p=0.000 n=20)
Cosh                   35.98n ± 0%    33.44n ± 0%   -7.05% (p=0.000 n=20)
Erf                    10.09n ± 0%    10.08n ± 0%   -0.10% (p=0.000 n=20)
Erfc                   11.40n ± 0%    11.35n ± 0%   -0.44% (p=0.000 n=20)
Erfinv                 12.31n ± 0%    12.29n ± 0%   -0.16% (p=0.000 n=20)
Erfcinv                12.16n ± 0%    12.17n ± 0%   +0.08% (p=0.000 n=20)
Exp                    28.41n ± 0%    26.44n ± 0%   -6.95% (p=0.000 n=20)
ExpGo                  28.68n ± 0%    27.07n ± 0%   -5.60% (p=0.000 n=20)
Expm1                  17.21n ± 0%    16.75n ± 0%   -2.67% (p=0.000 n=20)
Exp2                   24.71n ± 0%    23.01n ± 0%   -6.88% (p=0.000 n=20)
Exp2Go                 25.17n ± 0%    23.91n ± 0%   -4.99% (p=0.000 n=20)
Abs                   0.8004n ± 0%   0.8004n ± 0%        ~ (p=0.224 n=20)
Dim                    1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=20) ¹
Floor                  2.848n ± 0%    2.859n ± 0%   +0.39% (p=0.000 n=20)
Max                    3.074n ± 0%    3.071n ± 0%        ~ (p=0.481 n=20)
Min                    3.179n ± 0%    3.176n ± 0%   -0.09% (p=0.003 n=20)
Mod                    49.62n ± 0%    44.82n ± 0%   -9.67% (p=0.000 n=20)
Frexp                  7.604n ± 0%    6.803n ± 0%  -10.53% (p=0.000 n=20)
Gamma                  18.01n ± 0%    17.61n ± 0%   -2.22% (p=0.000 n=20)
Hypot                  7.204n ± 0%    7.604n ± 0%   +5.55% (p=0.000 n=20)
HypotGo                7.204n ± 0%    7.604n ± 0%   +5.56% (p=0.000 n=20)
Ilogb                  6.003n ± 0%    6.003n ± 0%        ~ (p=0.407 n=20)
J0                     76.43n ± 0%    76.24n ± 0%   -0.25% (p=0.000 n=20)
J1                     76.44n ± 0%    76.44n ± 0%        ~ (p=1.000 n=20)
Jn                     168.2n ± 0%    168.5n ± 0%   +0.18% (p=0.000 n=20)
Ldexp                  8.804n ± 0%    7.604n ± 0%  -13.63% (p=0.000 n=20)
Lgamma                 19.01n ± 0%    19.01n ± 0%        ~ (p=0.695 n=20)
Log                    19.38n ± 0%    19.12n ± 0%   -1.34% (p=0.000 n=20)
Logb                   6.003n ± 0%    6.003n ± 0%        ~ (p=1.000 n=20)
Log1p                  18.57n ± 0%    16.72n ± 0%   -9.96% (p=0.000 n=20)
Log10                  20.67n ± 0%    20.45n ± 0%   -1.06% (p=0.000 n=20)
Log2                   9.605n ± 0%    8.804n ± 0%   -8.34% (p=0.000 n=20)
Modf                   4.402n ± 0%    4.402n ± 0%        ~ (p=1.000 n=20)
Nextafter32            7.204n ± 0%    5.603n ± 0%  -22.22% (p=0.000 n=20)
Nextafter64            6.803n ± 0%    6.003n ± 0%  -11.76% (p=0.000 n=20)
PowInt                 39.62n ± 0%    37.22n ± 0%   -6.06% (p=0.000 n=20)
PowFrac                120.9n ± 0%    108.9n ± 0%   -9.93% (p=0.000 n=20)
Pow10Pos               1.601n ± 0%    1.601n ± 0%        ~ (p=0.487 n=20)
Pow10Neg               2.675n ± 0%    2.675n ± 0%        ~ (p=1.000 n=20)
Round                  3.018n ± 0%    2.401n ± 0%  -20.46% (p=0.000 n=20)
RoundToEven            3.822n ± 0%    3.001n ± 0%  -21.48% (p=0.000 n=20)
Remainder              45.62n ± 0%    42.42n ± 0%   -7.01% (p=0.000 n=20)
Signbit               0.9075n ± 0%   0.8004n ± 0%  -11.81% (p=0.000 n=20)
Sin                    12.65n ± 0%    12.65n ± 0%        ~ (p=0.503 n=20)
Sincos                 14.81n ± 0%    14.60n ± 0%   -1.42% (p=0.000 n=20)
Sinh                   36.75n ± 0%    35.11n ± 0%   -4.46% (p=0.000 n=20)
SqrtIndirect           1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=20) ¹
SqrtLatency            4.002n ± 0%    4.002n ± 0%        ~ (p=1.000 n=20)
SqrtIndirectLatency    4.002n ± 0%    4.002n ± 0%        ~ (p=1.000 n=20)
SqrtGoLatency          52.85n ± 0%    40.82n ± 0%  -22.76% (p=0.000 n=20)
SqrtPrime              887.4n ± 0%    887.4n ± 0%        ~ (p=0.751 n=20)
Tan                    13.95n ± 0%    13.97n ± 0%   +0.18% (p=0.000 n=20)
Tanh                   36.79n ± 0%    34.89n ± 0%   -5.16% (p=0.000 n=20)
Trunc                  2.849n ± 0%    2.861n ± 0%   +0.42% (p=0.000 n=20)
Y0                     77.44n ± 0%    77.64n ± 0%   +0.26% (p=0.000 n=20)
Y1                     74.41n ± 0%    74.33n ± 0%   -0.11% (p=0.000 n=20)
Yn                     158.7n ± 0%    159.0n ± 0%   +0.19% (p=0.000 n=20)
Float64bits           0.8774n ± 0%   0.4002n ± 0%  -54.39% (p=0.000 n=20)
Float64frombits       0.8042n ± 0%   0.4002n ± 0%  -50.24% (p=0.000 n=20)
Float32bits           1.1230n ± 0%   0.5336n ± 0%  -52.48% (p=0.000 n=20)
Float32frombits       1.0670n ± 0%   0.8004n ± 0%  -24.99% (p=0.000 n=20)
FMA                    2.001n ± 0%    2.001n ± 0%        ~ (p=0.605 n=20)
geomean                10.87n         10.10n        -7.15%
¹ all samples are equal

goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
                    │  bench.old   │              bench.new               │
                    │    sec/op    │    sec/op     vs base                │
Acos                   33.10n ± 0%    31.95n ± 2%   -3.46% (p=0.000 n=20)
Acosh                  58.38n ± 0%    50.44n ± 0%  -13.60% (p=0.000 n=20)
Asin                   32.70n ± 0%    31.94n ± 0%   -2.32% (p=0.000 n=20)
Asinh                  57.65n ± 0%    50.83n ± 0%  -11.82% (p=0.000 n=20)
Atan                   14.21n ± 0%    14.21n ± 0%        ~ (p=0.501 n=20)
Atanh                  60.86n ± 0%    54.44n ± 0%  -10.56% (p=0.000 n=20)
Atan2                  32.02n ± 0%    34.02n ± 0%   +6.25% (p=0.000 n=20)
Cbrt                   55.58n ± 0%    40.64n ± 0%  -26.88% (p=0.000 n=20)
Ceil                   9.566n ± 0%    9.566n ± 0%        ~ (p=0.463 n=20)
Copysign              0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.806 n=20)
Cos                    18.02n ± 0%    18.02n ± 0%        ~ (p=0.191 n=20)
Cosh                   64.44n ± 0%    65.64n ± 0%   +1.86% (p=0.000 n=20)
Erf                    16.15n ± 0%    16.16n ± 0%        ~ (p=0.770 n=20)
Erfc                   18.71n ± 0%    18.83n ± 0%   +0.61% (p=0.000 n=20)
Erfinv                 19.33n ± 0%    19.34n ± 0%        ~ (p=0.513 n=20)
Erfcinv                18.90n ± 0%    19.78n ± 0%   +4.63% (p=0.000 n=20)
Exp                    50.04n ± 0%    49.66n ± 0%   -0.75% (p=0.000 n=20)
ExpGo                  50.03n ± 0%    50.03n ± 0%        ~ (p=0.723 n=20)
Expm1                  28.41n ± 0%    28.27n ± 0%   -0.49% (p=0.000 n=20)
Exp2                   50.08n ± 0%    51.23n ± 0%   +2.31% (p=0.000 n=20)
Exp2Go                 49.77n ± 0%    49.89n ± 0%   +0.24% (p=0.000 n=20)
Abs                   0.8009n ± 0%   0.8006n ± 0%        ~ (p=0.317 n=20)
Dim                    1.987n ± 0%    1.993n ± 0%   +0.28% (p=0.001 n=20)
Floor                  8.543n ± 0%    8.548n ± 0%        ~ (p=0.509 n=20)
Max                    6.670n ± 0%    6.672n ± 0%        ~ (p=0.335 n=20)
Min                    6.694n ± 0%    6.694n ± 0%        ~ (p=0.459 n=20)
Mod                    56.44n ± 0%    53.23n ± 0%   -5.70% (p=0.000 n=20)
Frexp                  8.409n ± 0%    7.606n ± 0%   -9.55% (p=0.000 n=20)
Gamma                  35.64n ± 0%    35.23n ± 0%   -1.15% (p=0.000 n=20)
Hypot                  11.21n ± 0%    10.61n ± 0%   -5.31% (p=0.000 n=20)
HypotGo                11.50n ± 0%    11.01n ± 0%   -4.30% (p=0.000 n=20)
Ilogb                  7.606n ± 0%    6.804n ± 0%  -10.54% (p=0.000 n=20)
J0                     125.3n ± 0%    126.5n ± 0%   +0.96% (p=0.000 n=20)
J1                     124.9n ± 0%    125.3n ± 0%   +0.32% (p=0.000 n=20)
Jn                     264.3n ± 0%    265.9n ± 0%   +0.61% (p=0.000 n=20)
Ldexp                  9.606n ± 0%    9.204n ± 0%   -4.19% (p=0.000 n=20)
Lgamma                 38.82n ± 0%    38.85n ± 0%   +0.06% (p=0.019 n=20)
Log                    38.44n ± 0%    28.04n ± 0%  -27.06% (p=0.000 n=20)
Logb                   8.405n ± 0%    7.605n ± 0%   -9.52% (p=0.000 n=20)
Log1p                  31.62n ± 0%    27.11n ± 0%  -14.26% (p=0.000 n=20)
Log10                  38.83n ± 0%    28.42n ± 0%  -26.81% (p=0.000 n=20)
Log2                   11.21n ± 0%    10.41n ± 0%   -7.14% (p=0.000 n=20)
Modf                   5.204n ± 0%    5.205n ± 0%        ~ (p=0.983 n=20)
Nextafter32            8.809n ± 0%    7.208n ± 0%  -18.18% (p=0.000 n=20)
Nextafter64            8.405n ± 0%    8.406n ± 0%   +0.01% (p=0.007 n=20)
PowInt                 48.83n ± 0%    44.78n ± 0%   -8.28% (p=0.000 n=20)
PowFrac                146.9n ± 0%    142.1n ± 0%   -3.23% (p=0.000 n=20)
Pow10Pos               2.334n ± 0%    2.333n ± 0%        ~ (p=0.110 n=20)
Pow10Neg               4.803n ± 0%    4.803n ± 0%        ~ (p=0.130 n=20)
Round                  4.816n ± 0%    3.819n ± 0%  -20.70% (p=0.000 n=20)
RoundToEven            5.735n ± 0%    5.204n ± 0%   -9.26% (p=0.000 n=20)
Remainder              52.05n ± 0%    49.64n ± 0%   -4.63% (p=0.000 n=20)
Signbit                1.201n ± 0%    1.001n ± 0%  -16.65% (p=0.000 n=20)
Sin                    20.63n ± 0%    20.64n ± 0%   +0.05% (p=0.040 n=20)
Sincos                 23.82n ± 0%    24.62n ± 0%   +3.36% (p=0.000 n=20)
Sinh                   71.25n ± 0%    68.44n ± 0%   -3.94% (p=0.000 n=20)
SqrtIndirect           2.001n ± 0%    2.001n ± 0%        ~ (p=0.182 n=20)
SqrtLatency            4.003n ± 0%    4.003n ± 0%        ~ (p=0.754 n=20)
SqrtIndirectLatency    4.003n ± 0%    4.003n ± 0%        ~ (p=0.773 n=20)
SqrtGoLatency          60.84n ± 0%    81.26n ± 0%  +33.56% (p=0.000 n=20)
SqrtPrime              1.791µ ± 0%    1.791µ ± 0%        ~ (p=0.784 n=20)
Tan                    27.22n ± 0%    27.22n ± 0%        ~ (p=0.819 n=20)
Tanh                   70.88n ± 0%    69.04n ± 0%   -2.60% (p=0.000 n=20)
Trunc                  8.543n ± 0%    8.543n ± 0%        ~ (p=0.784 n=20)
Y0                     122.9n ± 0%    122.9n ± 0%        ~ (p=0.559 n=20)
Y1                     123.3n ± 0%    121.7n ± 0%   -1.30% (p=0.000 n=20)
Yn                     263.0n ± 0%    262.6n ± 0%   -0.15% (p=0.000 n=20)
Float64bits           1.2010n ± 0%   0.6004n ± 0%  -50.01% (p=0.000 n=20)
Float64frombits       1.2010n ± 0%   0.6004n ± 0%  -50.01% (p=0.000 n=20)
Float32bits           1.7010n ± 0%   0.8005n ± 0%  -52.94% (p=0.000 n=20)
Float32frombits       1.5010n ± 0%   0.8005n ± 0%  -46.67% (p=0.000 n=20)
FMA                    2.001n ± 0%    2.001n ± 0%        ~ (p=0.238 n=20)
geomean                17.41n         16.15n        -7.19%

Change-Id: I0a0c263af2f07203eab1782e69c706f20c689d8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/604737
Auto-Submit: Tim King <taking@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Tim King <taking@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2024-09-13 19:29:23 +00:00
Xiaolin Zhao
2c5b707b3b cmd/compile: optimize RotateLeft8/16 on loong64
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
             │  bench.old   │              bench.new               │
             │    sec/op    │    sec/op     vs base                │
RotateLeft8     1.401n ± 0%    1.201n ± 0%  -14.28% (p=0.000 n=20)
RotateLeft16   1.4010n ± 0%   0.8032n ± 0%  -42.67% (p=0.000 n=20)
geomean         1.401n        0.9822n       -29.90%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
             │  bench.old  │              bench.new              │
             │   sec/op    │   sec/op     vs base                │
RotateLeft8    1.576n ± 0%   1.310n ± 0%  -16.88% (p=0.000 n=20)
RotateLeft16   1.576n ± 0%   1.166n ± 0%  -26.02% (p=0.000 n=20)
geomean        1.576n        1.236n       -21.58%

Change-Id: I39c18306be0b8fd31b57bd0911714abd1783b50e
Reviewed-on: https://go-review.googlesource.com/c/go/+/604738
Auto-Submit: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Tim King <taking@google.com>
2024-09-13 17:15:09 +00:00
Meng Zhuo
2982253c42 test/codegen: add Rotate test for riscv64
Change-Id: I7d996b8d46fbeef933943f806052a30f1f8d50c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/588836
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Tim King <taking@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-09-11 01:37:00 +00:00
Paschalis Tsilias
fe69121bc5 cmd/compile: optimize []byte(string1 + string2)
This CL optimizes the compilation of string-to-bytes conversion in the
case of string additions.

Fixes #62407

Change-Id: Ic47df758478e5d061880620025c4ec7dbbff8a64
Reviewed-on: https://go-review.googlesource.com/c/go/+/527935
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Tim King <taking@google.com>
2024-09-10 21:20:57 +00:00
Joel Sing
e126129d76 cmd/compile/internal/ssa: combine shift and addition for riscv64 rva22u64
When GORISCV64 enables rva22u64, combined shift and addition using the
SH1ADD, SH2ADD and SH3ADD instructions that are available via the Zba
extension. This results in more than 2000 instructions being removed
from the Go binary on riscv64.

Change-Id: Ia62ae7dda3d8083cff315113421bee73f518eea8
Reviewed-on: https://go-review.googlesource.com/c/go/+/606636
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
2024-08-28 13:46:24 +00:00
Keith Randall
36b45bca66 cmd/compile: regalloc: drop values that aren't used until after a call
No point in keeping values in registers when their next use is after
a call, as we'd have to spill/restore them anyway.

cmd/go is 0.1% smaller.

Fixes #59297

Change-Id: I10ee761d0d23229f57de278f734c44d6a8dccd6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/509255
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-08-26 22:29:43 +00:00
Paul E. Murphy
2b0a157d68 cmd/compile: intrinsify math.MulUintptr on PPC64
This can be done efficiently with few instructions.

This also adds MULHDUCC for further codegen improvement.

Change-Id: I06320ba4383a679341b911a237a360ef07b19168
Reviewed-on: https://go-review.googlesource.com/c/go/+/605975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-08-26 17:02:43 +00:00
Joel Sing
02a9f51011 test/codegen: add initial codegen tests for integer min/max
Change-Id: I006370053748edbec930c7279ee88a805009aa0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/606976
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-08-23 15:17:17 +00:00
Keith Randall
b2cdaf7346 cmd/compile: improve unneeded zeroing removal
After newobject, we don't need to write zeroes to initialize the
object.  It has already been zeroed by the allocator.

This is already handled in most cases, but because we run builtin
decomposition after the opt pass, we don't handle cases where the zero
of a compound builtin is being written. Improve the zero detector to
handle those cases.

Fixes #68845

Change-Id: If3dde2e304a05e5a6a6723565191d5444b334bcc
Reviewed-on: https://go-review.googlesource.com/c/go/+/605255
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-08-14 18:16:29 +00:00
khr@golang.org
7273509466 cmd/compile: add additional arm64 bit field rules
Get rid of TODO in prove pass.
We currently avoid marking shifts of constants as bounded, where
bounded means we don't have to worry about <0 or >=bitwidth shifts.
We do this because it causes different rule applications during lowering
which cause some codegen tests to fail.

Add some new rules which ensure that we get the right final instruction
sequence regardless of the ordering. Then we can remove this special case.

Change-Id: I4e962d4f09992b42ab47e123de5ded3b8b8fb205
Reviewed-on: https://go-review.googlesource.com/c/go/+/602935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-08-12 21:03:55 +00:00
khr@golang.org
9b4268c3df cmd/compile: simplify prove pass
We don't need noLimit checks in a bunch of places.
Also simplify folding of provable constant results.

At this point in the CL stack, compilebench reports no performance
changes. The only thing of note is that binaries got a bit smaller.

name                      old text-bytes    new text-bytes    delta
HelloSize                       960kB ± 0%        952kB ± 0%  -0.83%  (p=0.000 n=10+10)
CmdGoSize                      12.3MB ± 0%       12.1MB ± 0%  -1.53%  (p=0.000 n=10+10)

Change-Id: Id4be75eec0f8c93f2f3b93a8521ce2278ee2ee2c
Reviewed-on: https://go-review.googlesource.com/c/go/+/599197
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-08-07 16:08:20 +00:00
khr@golang.org
3b96eebcbd cmd/compile: rewrite the constant parts of the prove pass
Handles a lot more cases where constant ranges can eliminate
various (mostly bounds failure) paths.

Fixes #66826
Fixes #66692
Fixes #48213
Update #57959

TODO: remove constant logic from poset code, no longer needed.

Change-Id: Id196436fcd8a0c84c7d59c04f93bd92e26a0fd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/599096
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-08-07 16:07:33 +00:00
Xiaolin Zhao
ff14e08cd3 cmd/compile, math: improve implementation of math.{Max,Min} on loong64
Make math.{Min,Max} intrinsics and implement math.{archMax,archMin}
in hardware.

goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
         │  old.bench   │              new.bench              │
         │    sec/op    │   sec/op     vs base                │
Max         7.606n ± 0%   3.087n ± 0%  -59.41% (p=0.000 n=20)
Min         7.205n ± 0%   2.904n ± 0%  -59.69% (p=0.000 n=20)
MinFloat   37.220n ± 0%   4.802n ± 0%  -87.10% (p=0.000 n=20)
MaxFloat   33.620n ± 0%   4.802n ± 0%  -85.72% (p=0.000 n=20)
geomean     16.18n        3.792n       -76.57%

goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A5000 @ 2500.00MHz
         │  old.bench   │              new.bench              │
         │    sec/op    │   sec/op     vs base                │
Max        10.010n ± 0%   7.196n ± 0%  -28.11% (p=0.000 n=20)
Min         8.806n ± 0%   7.155n ± 0%  -18.75% (p=0.000 n=20)
MinFloat   60.010n ± 0%   7.976n ± 0%  -86.71% (p=0.000 n=20)
MaxFloat   56.410n ± 0%   7.980n ± 0%  -85.85% (p=0.000 n=20)
geomean     23.37n        7.566n       -67.63%

Updates #59120.

Change-Id: I6815d20bc304af3cbf5d6ca8fe0ca1c2ddebea2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/580283
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2024-08-07 01:16:28 +00:00
Michael Pratt
1985c0ccf9 cmd/compile,runtime: disable swissmap fast variants
Temporary measure to reduce the required MVP code.

For #54766.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I44dc8acd0dc8280c6beb40451998e84bc85c238a
Reviewed-on: https://go-review.googlesource.com/c/go/+/580915
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-08-02 16:47:38 +00:00
Keith Randall
c18ff29295 cmd/compile: make sync/atomic AND/OR operations intrinsic on amd64
Update #61395

Change-Id: I59a950f48efc587dfdffce00e2f4f3ab99d8df00
Reviewed-on: https://go-review.googlesource.com/c/go/+/594738
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Nicolas Hillegeer <aktau@google.com>
2024-07-23 21:29:38 +00:00