mirror of
https://github.com/golang/go.git
synced 2025-05-22 16:09:37 +00:00
120 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
e8f5a33191 |
cmd/compile: fix incorrect rewriting to if condition
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag, in the following categories: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The backend generates two consecutive branch instructions for 'LEnoov' and 'GTnoov' to model their expected behavior. A slight change to 'gc' and amd64/386 backends is made to unify the code generation. Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules identified on arm64, more might be needed on other arches, like 32-bit arm. Add two benchmarks profiling the aforementioned category 1&2 and category 3&4 separetely, we expect the first two categories will show performance improvement and the second will not result in visible regression compared with the non-optimized version. This change also updates TestFormats to support using %#x. Examples exhibiting where does the issue come from: 1: 'if x + 3 < 0' might be converted to: before: CMN $3, R0 BGE <else branch> // wrong branch is taken if 'x+3' overflows after: CMN $3, R0 BPL <else branch> 2: 'if y - 3 > 0' might be converted to: before: CMP $3, R0 BLE <else branch> // wrong branch is taken if 'y-3' underflows after: CMP $3, R0 BMI <else branch> BEQ <else branch> Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized version (not the parent commit), generally the optimization version outperforms. S1: name old time/op new time/op delta CondRewrite/SoloJump 13.6ns ± 0% 12.9ns ± 0% -5.15% (p=0.000 n=10+10) CondRewrite/CombJump 13.8ns ± 1% 12.9ns ± 0% -6.32% (p=0.000 n=10+10) S2: name old time/op new time/op delta CondRewrite/SoloJump 11.6ns ± 0% 10.9ns ± 0% -6.03% (p=0.000 n=10+10) CondRewrite/CombJump 11.4ns ± 0% 10.8ns ± 1% -5.53% (p=0.000 n=10+10) S3: name old time/op new time/op delta CondRewrite/SoloJump 7.36ns ± 0% 7.50ns ± 0% +1.79% (p=0.000 n=9+10) CondRewrite/CombJump 7.35ns ± 0% 7.75ns ± 0% +5.51% (p=0.000 n=8+9) S4: name old time/op new time/op delta CondRewrite/SoloJump-224 11.5ns ± 1% 10.9ns ± 0% -4.97% (p=0.000 n=10+10) CondRewrite/CombJump-224 11.9ns ± 0% 11.5ns ± 0% -2.95% (p=0.000 n=10+10) S5: name old time/op new time/op delta CondRewrite/SoloJump 10.0ns ± 0% 10.0ns ± 0% -0.45% (p=0.000 n=9+10) CondRewrite/CombJump 9.93ns ± 0% 9.77ns ± 0% -1.53% (p=0.000 n=10+9) Go1 perf. data: name old time/op new time/op delta BinaryTree17 6.29s ± 1% 6.30s ± 1% ~ (p=1.000 n=5+5) Fannkuch11 5.40s ± 0% 5.40s ± 0% ~ (p=0.841 n=5+5) FmtFprintfEmpty 97.9ns ± 0% 98.9ns ± 3% ~ (p=0.937 n=4+5) FmtFprintfString 171ns ± 3% 171ns ± 2% ~ (p=0.754 n=5+5) FmtFprintfInt 212ns ± 0% 217ns ± 6% +2.55% (p=0.008 n=5+5) FmtFprintfIntInt 296ns ± 1% 297ns ± 2% ~ (p=0.516 n=5+5) FmtFprintfPrefixedInt 371ns ± 2% 374ns ± 7% ~ (p=1.000 n=5+5) FmtFprintfFloat 435ns ± 1% 439ns ± 2% ~ (p=0.056 n=5+5) FmtManyArgs 1.37µs ± 1% 1.36µs ± 1% ~ (p=0.730 n=5+5) GobDecode 14.6ms ± 4% 14.4ms ± 4% ~ (p=0.690 n=5+5) GobEncode 11.8ms ±20% 11.6ms ±15% ~ (p=1.000 n=5+5) Gzip 507ms ± 0% 491ms ± 0% -3.22% (p=0.008 n=5+5) Gunzip 73.8ms ± 0% 73.9ms ± 0% ~ (p=0.690 n=5+5) HTTPClientServer 116µs ± 0% 116µs ± 0% ~ (p=0.686 n=4+4) JSONEncode 21.8ms ± 1% 21.6ms ± 2% ~ (p=0.151 n=5+5) JSONDecode 104ms ± 1% 103ms ± 1% -1.08% (p=0.016 n=5+5) Mandelbrot200 9.53ms ± 0% 9.53ms ± 0% ~ (p=0.421 n=5+5) GoParse 7.55ms ± 1% 7.51ms ± 1% ~ (p=0.151 n=5+5) RegexpMatchEasy0_32 158ns ± 0% 158ns ± 0% ~ (all equal) RegexpMatchEasy0_1K 606ns ± 1% 608ns ± 3% ~ (p=0.937 n=5+5) RegexpMatchEasy1_32 143ns ± 0% 144ns ± 1% ~ (p=0.095 n=5+4) RegexpMatchEasy1_1K 927ns ± 2% 944ns ± 2% ~ (p=0.056 n=5+5) RegexpMatchMedium_32 16.0ns ± 0% 16.0ns ± 0% ~ (all equal) RegexpMatchMedium_1K 69.3µs ± 2% 69.7µs ± 0% ~ (p=0.690 n=5+5) RegexpMatchHard_32 3.73µs ± 0% 3.73µs ± 1% ~ (p=0.984 n=5+5) RegexpMatchHard_1K 111µs ± 1% 110µs ± 0% ~ (p=0.151 n=5+5) Revcomp 1.91s ±47% 1.77s ±68% ~ (p=1.000 n=5+5) Template 138ms ± 1% 138ms ± 1% ~ (p=1.000 n=5+5) TimeParse 787ns ± 2% 785ns ± 1% ~ (p=0.540 n=5+5) TimeFormat 729ns ± 1% 726ns ± 1% ~ (p=0.151 n=5+5) Updates #38740 Change-Id: I06c604874acdc1e63e66452dadee5df053045222 Reviewed-on: https://go-review.googlesource.com/c/go/+/233097 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> |
||
|
9ed0fb42e3 |
cmd/compile: add indexed memory modification ops to amd64
name old time/op new time/op delta Modify-16 404ns ± 1% 365ns ± 1% -9.73% (p=0.000 n=10+10) ConstModify-16 407ns ± 0% 385ns ± 2% -5.56% (p=0.000 n=9+10) Seems to generally help generated code. Binary size change is in the noise. Change-Id: I57891bfaf0f7dfc5d143bb9f7ebafc7079d2614f Reviewed-on: https://go-review.googlesource.com/c/go/+/228098 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
||
|
882ec701d2 |
cmd/compile: add indexed load+op operations to amd64
name old time/op new time/op delta LoadAdd-16 545ns ± 0% 456ns ± 0% -16.31% (p=0.000 n=10+10) Update #36468 Change-Id: I84f390d55490648fa1f58cdbc24fd74c4f1bc8c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/227960 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
||
|
b3e8a00060 |
cmd/compile: move duffcopy auxint calculation out of rewrite rules
Package amd64 is a more natural home for it. It also makes it easier to see how many bytes are being copied in ssa.html. Passes toolstash-check. Change-Id: I5ecf0f0f18e8db2faa2caf7a05028c310952bd94 Reviewed-on: https://go-review.googlesource.com/c/go/+/229703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
ea7126fe14 |
cmd/compile: use a Sym type instead of interface{} for symbolic offsets
Will help with strongly typed rewrite rules. Change-Id: Ifbf316a49f4081322b3b8f13bc962713437d9aba Reviewed-on: https://go-review.googlesource.com/c/go/+/227785 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> |
||
|
7ee8467b27 |
cmd/compile: use MOVBQZX for OpAMD64LoweredHasCPUFeature
In the commit message of CL 212360, I wrote: > This new intrinsic ... generates MOVB+TESTB+NE. > (It is possible that MOVBQZX+TESTQ+NE would be better.) I should have tested. MOVBQZX+TESTQ+NE does in fact appear to be better. For the benchmark in #36196, on my machine: name old time/op new time/op delta FMA-8 0.86ns ± 6% 0.70ns ± 5% -18.79% (p=0.000 n=98+97) NonFMA-8 0.61ns ± 5% 0.60ns ± 4% -0.74% (p=0.001 n=100+97) Interestingly, these are both considerably faster than the measurements I took a couple of months ago (1.4ns/2ns). It appears that CL 219131 (clearing VZEROUPPER in asyncPreempt) helped a lot. And FMA is now once again slower than NonFMA, although this change helps it regain some ground. Updates #15808 Updates #36351 Updates #36196 Change-Id: I8a326289a963b1939aaa7eaa2fab2ec536467c7d Reviewed-on: https://go-review.googlesource.com/c/go/+/227238 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
fff7509d47 |
cmd/compile: add intrinsic HasCPUFeature for checking cpu features
Before using some CPU instructions, we must check for their presence. We use global variables in the runtime package to record features. Prior to this CL, we issued a regular memory load for these features. The downside to this is that, because it is a regular memory load, it cannot be hoisted out of loops or otherwise reordered with other loads. This CL introduces a new intrinsic just for checking cpu features. It still ends up resulting in a memory load, but that memory load can now be floated to the entry block and rematerialized as needed. One downside is that the regular load could be combined with the comparison into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE. (It is possible that MOVBQZX+TESTQ+NE would be better.) This CL does only amd64. It is easy to extend to other architectures. For the benchmark in #36196, on my machine, this offers a mild speedup. name old time/op new time/op delta FMA-8 1.39ns ± 6% 1.29ns ± 9% -7.19% (p=0.000 n=97+96) NonFMA-8 2.03ns ±11% 2.04ns ±12% ~ (p=0.618 n=99+98) Updates #15808 Updates #36196 Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb Reviewed-on: https://go-review.googlesource.com/c/go/+/212360 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> |
||
|
bba88467f8 |
cmd/compile: add indexed-load CMP instructions
Things like CMPQ 4(AX)(BX*8), CX Fixes #37955 Change-Id: Icbed430f65c91a0e3f38a633d8321d79433ad8b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/224219 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
||
|
8114242359 |
cmd/compile, runtime: use more registers for amd64 write barrier calls
The compiler-inserted write barrier calls use a special ABI for speed and to minimize the binary size impact. runtime.gcWriteBarrier takes its args in DI and AX. This change adds gcWriteBarrier wrapper functions, varying only in the register used for the second argument. (Allowing variation in the first argument doesn't offer improvements, which is convenient, as it avoids quadratic API growth.) This reduces the number of register copies. The goals are reduced binary size via reduced register pressure/copies. One downside to this change is that when the write barrier is on, we may bounce through several different write barrier wrappers, which is bad for the instruction cache. Package runtime write barrier benchmarks for this change: name old time/op new time/op delta WriteBarrier-8 16.6ns ± 6% 15.6ns ± 6% -5.73% (p=0.000 n=97+99) BulkWriteBarrier-8 4.37ns ± 7% 4.22ns ± 8% -3.45% (p=0.000 n=96+99) However, I don't particularly trust these numbers. I ran runtime.BenchmarkWriteBarrier multiple times as I rebased this change, and noticed that the results have high variance depending on the parent change, perhaps due to aligment. This change was stress tested with GOGC=1 GODEBUG=gccheckmark=1 go test std. This change reduces binary sizes: file before after Δ % addr2line 4308720 4296688 -12032 -0.279% api 5965592 5945368 -20224 -0.339% asm 5148088 5025464 -122624 -2.382% buildid 2848760 2844904 -3856 -0.135% cgo 4828968 4812840 -16128 -0.334% compile 19754720 19529744 -224976 -1.139% cover 5256840 5236600 -20240 -0.385% dist 3670312 3658264 -12048 -0.328% doc 4669608 4657576 -12032 -0.258% fix 3377976 3365944 -12032 -0.356% link 6614888 6586472 -28416 -0.430% nm 4258368 4254528 -3840 -0.090% objdump 4656336 4644304 -12032 -0.258% pack 2295176 2295432 +256 +0.011% pprof 14762356 14709364 -52992 -0.359% test2json 2824456 2820600 -3856 -0.137% trace 11684404 11643700 -40704 -0.348% vet 8284760 8252248 -32512 -0.392% total 115210328 114580040 -630288 -0.547% This change improves compiler performance: name old time/op new time/op delta Template 208ms ± 3% 207ms ± 3% -0.40% (p=0.030 n=43+44) Unicode 80.2ms ± 3% 81.3ms ± 3% +1.25% (p=0.000 n=41+44) GoTypes 699ms ± 3% 694ms ± 2% -0.71% (p=0.016 n=42+37) Compiler 3.26s ± 2% 3.23s ± 2% -0.86% (p=0.000 n=43+45) SSA 6.97s ± 1% 6.93s ± 1% -0.63% (p=0.000 n=43+45) Flate 134ms ± 3% 133ms ± 2% ~ (p=0.139 n=45+42) GoParser 165ms ± 2% 164ms ± 1% -0.79% (p=0.000 n=45+40) Reflect 434ms ± 4% 435ms ± 4% ~ (p=0.937 n=44+44) Tar 181ms ± 2% 181ms ± 2% ~ (p=0.702 n=43+45) XML 244ms ± 2% 244ms ± 2% ~ (p=0.237 n=45+44) [Geo mean] 403ms 402ms -0.29% name old user-time/op new user-time/op delta Template 271ms ± 2% 268ms ± 1% -1.40% (p=0.000 n=42+42) Unicode 117ms ± 3% 116ms ± 5% ~ (p=0.066 n=45+45) GoTypes 948ms ± 2% 936ms ± 2% -1.30% (p=0.000 n=41+40) Compiler 4.26s ± 1% 4.21s ± 2% -1.25% (p=0.000 n=37+45) SSA 9.52s ± 2% 9.41s ± 1% -1.18% (p=0.000 n=44+45) Flate 167ms ± 2% 165ms ± 2% -1.15% (p=0.000 n=44+41) GoParser 201ms ± 2% 198ms ± 1% -1.40% (p=0.000 n=43+43) Reflect 563ms ± 8% 560ms ± 7% ~ (p=0.206 n=45+44) Tar 224ms ± 2% 222ms ± 2% -0.81% (p=0.000 n=45+45) XML 308ms ± 2% 304ms ± 1% -1.17% (p=0.000 n=42+43) [Geo mean] 525ms 519ms -1.08% name old alloc/op new alloc/op delta Template 36.3MB ± 0% 36.3MB ± 0% ~ (p=0.421 n=5+5) Unicode 28.4MB ± 0% 28.3MB ± 0% ~ (p=0.056 n=5+5) GoTypes 121MB ± 0% 121MB ± 0% -0.14% (p=0.008 n=5+5) Compiler 567MB ± 0% 567MB ± 0% -0.06% (p=0.016 n=4+5) SSA 1.26GB ± 0% 1.26GB ± 0% -0.07% (p=0.008 n=5+5) Flate 22.9MB ± 0% 22.8MB ± 0% ~ (p=0.310 n=5+5) GoParser 28.0MB ± 0% 27.9MB ± 0% -0.09% (p=0.008 n=5+5) Reflect 78.4MB ± 0% 78.4MB ± 0% -0.03% (p=0.008 n=5+5) Tar 34.2MB ± 0% 34.2MB ± 0% -0.05% (p=0.008 n=5+5) XML 44.4MB ± 0% 44.4MB ± 0% -0.04% (p=0.016 n=5+5) [Geo mean] 76.4MB 76.3MB -0.05% name old allocs/op new allocs/op delta Template 356k ± 0% 356k ± 0% -0.13% (p=0.008 n=5+5) Unicode 326k ± 0% 326k ± 0% -0.07% (p=0.008 n=5+5) GoTypes 1.24M ± 0% 1.24M ± 0% -0.24% (p=0.008 n=5+5) Compiler 5.30M ± 0% 5.28M ± 0% -0.34% (p=0.008 n=5+5) SSA 11.9M ± 0% 11.9M ± 0% -0.16% (p=0.008 n=5+5) Flate 226k ± 0% 225k ± 0% -0.12% (p=0.008 n=5+5) GoParser 287k ± 0% 286k ± 0% -0.29% (p=0.008 n=5+5) Reflect 930k ± 0% 929k ± 0% -0.05% (p=0.008 n=5+5) Tar 332k ± 0% 331k ± 0% -0.12% (p=0.008 n=5+5) XML 411k ± 0% 411k ± 0% -0.12% (p=0.008 n=5+5) [Geo mean] 771k 770k -0.16% For some packages, this change significantly reduces the size of executable text. Examples: file before after Δ % cmd/internal/obj/arm.s 68658 66855 -1803 -2.626% cmd/internal/obj/mips.s 57486 56272 -1214 -2.112% cmd/internal/obj/arm64.s 152107 147163 -4944 -3.250% cmd/internal/obj/ppc64.s 125544 120456 -5088 -4.053% cmd/vendor/golang.org/x/tools/go/cfg.s 31699 30742 -957 -3.019% Full listing: file before after Δ % container/ring.s 1890 1870 -20 -1.058% container/list.s 5366 5390 +24 +0.447% internal/cpu.s 3298 3295 -3 -0.091% internal/testlog.s 1507 1501 -6 -0.398% image/color.s 8281 8248 -33 -0.399% runtime.s 480970 480075 -895 -0.186% sync.s 16497 16408 -89 -0.539% internal/singleflight.s 2591 2577 -14 -0.540% math/rand.s 10456 10438 -18 -0.172% cmd/go/internal/par.s 2801 2790 -11 -0.393% internal/reflectlite.s 28477 28417 -60 -0.211% errors.s 2750 2736 -14 -0.509% internal/oserror.s 446 434 -12 -2.691% sort.s 17061 17046 -15 -0.088% io.s 17063 16999 -64 -0.375% vendor/golang.org/x/crypto/hkdf.s 1962 1936 -26 -1.325% text/tabwriter.s 9617 9574 -43 -0.447% hash/crc64.s 3414 3408 -6 -0.176% hash/crc32.s 6657 6651 -6 -0.090% bytes.s 31932 31863 -69 -0.216% strconv.s 53158 52799 -359 -0.675% strings.s 42829 42665 -164 -0.383% encoding/ascii85.s 4833 4791 -42 -0.869% vendor/golang.org/x/text/transform.s 16810 16724 -86 -0.512% path.s 6848 6845 -3 -0.044% encoding/base32.s 9658 9592 -66 -0.683% bufio.s 23051 22908 -143 -0.620% compress/bzip2.s 11773 11764 -9 -0.076% image.s 37565 37502 -63 -0.168% syscall.s 82359 82279 -80 -0.097% regexp/syntax.s 83573 82930 -643 -0.769% image/jpeg.s 36535 36490 -45 -0.123% regexp.s 64396 64214 -182 -0.283% time.s 82724 82622 -102 -0.123% plugin.s 6539 6536 -3 -0.046% context.s 10959 10865 -94 -0.858% internal/poll.s 24286 24270 -16 -0.066% reflect.s 168304 167927 -377 -0.224% internal/fmtsort.s 7416 7376 -40 -0.539% os.s 52465 51787 -678 -1.292% cmd/go/internal/lockedfile/internal/filelock.s 2326 2317 -9 -0.387% os/signal.s 4657 4648 -9 -0.193% runtime/debug.s 6040 5998 -42 -0.695% encoding/binary.s 30838 30801 -37 -0.120% vendor/golang.org/x/net/route.s 23694 23491 -203 -0.857% path/filepath.s 17895 17889 -6 -0.034% cmd/vendor/golang.org/x/sys/unix.s 78125 78109 -16 -0.020% io/ioutil.s 6999 6996 -3 -0.043% encoding/base64.s 12094 12007 -87 -0.719% crypto/cipher.s 20466 20372 -94 -0.459% cmd/go/internal/robustio.s 2672 2669 -3 -0.112% encoding/pem.s 9302 9286 -16 -0.172% internal/obscuretestdata.s 1719 1695 -24 -1.396% crypto/aes.s 11014 11002 -12 -0.109% os/exec.s 29388 29231 -157 -0.534% cmd/internal/browser.s 2266 2260 -6 -0.265% internal/goroot.s 4601 4592 -9 -0.196% vendor/golang.org/x/crypto/chacha20poly1305.s 8945 8942 -3 -0.034% cmd/vendor/golang.org/x/crypto/ssh/terminal.s 27226 27195 -31 -0.114% index/suffixarray.s 36431 36411 -20 -0.055% fmt.s 77017 76709 -308 -0.400% encoding/hex.s 6241 6154 -87 -1.394% compress/lzw.s 7133 7069 -64 -0.897% database/sql/driver.s 18888 18877 -11 -0.058% net/url.s 29838 29739 -99 -0.332% debug/plan9obj.s 8329 8279 -50 -0.600% encoding/csv.s 12986 12902 -84 -0.647% debug/gosym.s 25403 25330 -73 -0.287% compress/flate.s 51192 50970 -222 -0.434% vendor/golang.org/x/net/dns/dnsmessage.s 86769 86208 -561 -0.647% compress/gzip.s 9791 9758 -33 -0.337% compress/zlib.s 7310 7277 -33 -0.451% archive/zip.s 42356 42166 -190 -0.449% debug/dwarf.s 108259 107730 -529 -0.489% encoding/json.s 106378 105910 -468 -0.440% os/user.s 14751 14724 -27 -0.183% database/sql.s 99011 98404 -607 -0.613% log.s 9466 9423 -43 -0.454% debug/pe.s 31272 31182 -90 -0.288% debug/macho.s 32764 32608 -156 -0.476% encoding/gob.s 136976 136517 -459 -0.335% vendor/golang.org/x/text/unicode/bidi.s 27318 27276 -42 -0.154% archive/tar.s 71416 70975 -441 -0.618% vendor/golang.org/x/net/http2/hpack.s 23892 23848 -44 -0.184% vendor/golang.org/x/text/secure/bidirule.s 3354 3351 -3 -0.089% mime/quotedprintable.s 5960 5925 -35 -0.587% net/http/internal.s 5874 5853 -21 -0.358% math/big.s 184147 183692 -455 -0.247% debug/elf.s 63775 63567 -208 -0.326% mime.s 39802 39709 -93 -0.234% encoding/xml.s 111038 110713 -325 -0.293% crypto/dsa.s 6044 6029 -15 -0.248% go/token.s 12139 12077 -62 -0.511% crypto/rand.s 6889 6866 -23 -0.334% go/scanner.s 19030 19008 -22 -0.116% flag.s 22320 22236 -84 -0.376% vendor/golang.org/x/text/unicode/norm.s 66652 66391 -261 -0.392% crypto/rsa.s 31671 31650 -21 -0.066% crypto/elliptic.s 51553 51403 -150 -0.291% internal/xcoff.s 22950 22822 -128 -0.558% go/constant.s 43750 43689 -61 -0.139% encoding/asn1.s 57086 57035 -51 -0.089% runtime/trace.s 2609 2603 -6 -0.230% crypto/x509/pkix.s 10458 10471 +13 +0.124% image/gif.s 27544 27385 -159 -0.577% vendor/golang.org/x/net/idna.s 24558 24502 -56 -0.228% image/png.s 42775 42685 -90 -0.210% vendor/golang.org/x/crypto/cryptobyte.s 33616 33493 -123 -0.366% go/ast.s 80684 80449 -235 -0.291% net/internal/socktest.s 16571 16535 -36 -0.217% crypto/ecdsa.s 11948 11936 -12 -0.100% text/template/parse.s 95138 94002 -1136 -1.194% runtime/pprof.s 59702 59639 -63 -0.106% testing.s 68427 68088 -339 -0.495% internal/testenv.s 5620 5596 -24 -0.427% testing/internal/testdeps.s 3312 3294 -18 -0.543% internal/trace.s 78473 78239 -234 -0.298% testing/iotest.s 4968 4908 -60 -1.208% os/signal/internal/pty.s 3011 2990 -21 -0.697% testing/quick.s 12179 12125 -54 -0.443% cmd/internal/bio.s 9286 9274 -12 -0.129% cmd/internal/src.s 17684 17663 -21 -0.119% cmd/internal/goobj2.s 12588 12558 -30 -0.238% cmd/internal/objabi.s 16408 16390 -18 -0.110% go/printer.s 77417 77308 -109 -0.141% go/parser.s 80045 79113 -932 -1.164% go/format.s 5434 5419 -15 -0.276% cmd/internal/goobj.s 26146 25954 -192 -0.734% runtime/pprof/internal/profile.s 102518 102178 -340 -0.332% text/template.s 95343 94935 -408 -0.428% cmd/internal/dwarf.s 31718 31572 -146 -0.460% cmd/vendor/golang.org/x/arch/arm/armasm.s 45240 45151 -89 -0.197% internal/lazytemplate.s 1470 1457 -13 -0.884% cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s 37253 37220 -33 -0.089% cmd/asm/internal/flags.s 2593 2590 -3 -0.116% cmd/asm/internal/lex.s 25068 24921 -147 -0.586% cmd/internal/buildid.s 18536 18263 -273 -1.473% cmd/vendor/golang.org/x/arch/x86/x86asm.s 80209 80105 -104 -0.130% go/doc.s 75140 74585 -555 -0.739% cmd/internal/edit.s 3893 3899 +6 +0.154% html/template.s 89377 88809 -568 -0.636% cmd/vendor/golang.org/x/arch/arm64/arm64asm.s 117998 117824 -174 -0.147% cmd/internal/obj.s 115015 114290 -725 -0.630% go/build.s 69379 68862 -517 -0.745% cmd/internal/objfile.s 48106 47982 -124 -0.258% cmd/cover.s 46239 46113 -126 -0.272% cmd/addr2line.s 2845 2833 -12 -0.422% cmd/internal/obj/arm.s 68658 66855 -1803 -2.626% cmd/internal/obj/mips.s 57486 56272 -1214 -2.112% cmd/internal/obj/riscv.s 63834 63006 -828 -1.297% cmd/compile/internal/syntax.s 146582 145456 -1126 -0.768% cmd/internal/obj/wasm.s 44117 44066 -51 -0.116% cmd/cgo.s 242645 241653 -992 -0.409% cmd/internal/obj/arm64.s 152107 147163 -4944 -3.250% net.s 295972 292010 -3962 -1.339% go/types.s 321371 319432 -1939 -0.603% vendor/golang.org/x/net/http/httpproxy.s 9450 9423 -27 -0.286% net/textproto.s 19455 19406 -49 -0.252% cmd/internal/obj/ppc64.s 125544 120456 -5088 -4.053% go/internal/srcimporter.s 6475 6409 -66 -1.019% log/syslog.s 8017 7929 -88 -1.098% cmd/compile/internal/logopt.s 10183 10162 -21 -0.206% net/mail.s 24085 23948 -137 -0.569% mime/multipart.s 21527 21420 -107 -0.497% cmd/internal/obj/s390x.s 127610 127757 +147 +0.115% go/internal/gcimporter.s 34913 34548 -365 -1.045% vendor/golang.org/x/net/nettest.s 28103 28016 -87 -0.310% cmd/go/internal/cfg.s 9967 9916 -51 -0.512% cmd/api.s 39703 39603 -100 -0.252% go/internal/gccgoimporter.s 56470 56120 -350 -0.620% go/importer.s 2077 2056 -21 -1.011% cmd/compile/internal/types.s 48202 47282 -920 -1.909% cmd/go/internal/str.s 4341 4320 -21 -0.484% cmd/internal/obj/x86.s 89440 88625 -815 -0.911% cmd/go/internal/base.s 12667 12580 -87 -0.687% cmd/go/internal/cache.s 30754 30571 -183 -0.595% cmd/doc.s 62976 62755 -221 -0.351% cmd/go/internal/search.s 20114 19993 -121 -0.602% cmd/vendor/golang.org/x/xerrors.s 17923 17855 -68 -0.379% cmd/go/internal/lockedfile.s 16451 16415 -36 -0.219% cmd/vendor/golang.org/x/mod/sumdb/note.s 18200 18150 -50 -0.275% cmd/vendor/golang.org/x/mod/module.s 17869 17851 -18 -0.101% cmd/asm/internal/arch.s 37533 37482 -51 -0.136% cmd/fix.s 87728 87492 -236 -0.269% cmd/vendor/golang.org/x/mod/sumdb/tlog.s 36394 36367 -27 -0.074% cmd/vendor/golang.org/x/mod/sumdb/dirhash.s 4990 4963 -27 -0.541% cmd/go/internal/imports.s 16499 16469 -30 -0.182% cmd/vendor/golang.org/x/mod/zip.s 18816 18745 -71 -0.377% cmd/go/internal/cmdflag.s 5126 5123 -3 -0.059% cmd/internal/test2json.s 9540 9452 -88 -0.922% cmd/go/internal/tool.s 3629 3623 -6 -0.165% cmd/go/internal/version.s 11232 11220 -12 -0.107% cmd/go/internal/mvs.s 25383 25179 -204 -0.804% cmd/nm.s 5815 5803 -12 -0.206% cmd/dist.s 210146 209140 -1006 -0.479% cmd/asm/internal/asm.s 68655 68549 -106 -0.154% cmd/vendor/golang.org/x/mod/modfile.s 72974 72510 -464 -0.636% cmd/go/internal/load.s 107548 106861 -687 -0.639% cmd/link/internal/sym.s 18708 18581 -127 -0.679% cmd/asm.s 3367 3343 -24 -0.713% cmd/gofmt.s 30795 30698 -97 -0.315% cmd/link/internal/objfile.s 21828 21630 -198 -0.907% cmd/pack.s 14878 14869 -9 -0.060% cmd/vendor/github.com/google/pprof/internal/elfexec.s 6788 6782 -6 -0.088% cmd/test2json.s 1647 1641 -6 -0.364% cmd/link/internal/loader.s 48677 48483 -194 -0.399% cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s 16783 16773 -10 -0.060% cmd/link/internal/loadelf.s 35464 35126 -338 -0.953% cmd/link/internal/loadmacho.s 29438 29180 -258 -0.876% cmd/link/internal/loadpe.s 16440 16371 -69 -0.420% cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2106 2100 -6 -0.285% cmd/link/internal/loadxcoff.s 11711 11615 -96 -0.820% cmd/vendor/golang.org/x/tools/go/analysis/internal/facts.s 14954 14883 -71 -0.475% cmd/vendor/golang.org/x/tools/go/ast/inspector.s 5394 5374 -20 -0.371% cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s 37029 36822 -207 -0.559% cmd/vendor/golang.org/x/tools/go/analysis/passes/inspect.s 340 337 -3 -0.882% cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s 9919 9858 -61 -0.615% cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s 6705 6690 -15 -0.224% cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s 9783 9741 -42 -0.429% cmd/vendor/golang.org/x/tools/go/cfg.s 31699 30742 -957 -3.019% cmd/vendor/golang.org/x/tools/go/analysis/passes/ifaceassert.s 2768 2762 -6 -0.217% cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s 3031 2998 -33 -1.089% cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s 4382 4376 -6 -0.137% cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s 8654 8642 -12 -0.139% cmd/vendor/golang.org/x/tools/go/analysis/passes/stringintconv.s 3458 3446 -12 -0.347% cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s 8011 7995 -16 -0.200% cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s 6205 6193 -12 -0.193% cmd/vendor/golang.org/x/tools/go/ast/astutil.s 66183 65861 -322 -0.487% cmd/vendor/github.com/google/pprof/profile.s 150844 150261 -583 -0.386% cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8057 8054 -3 -0.037% cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s 3670 3667 -3 -0.082% cmd/vendor/github.com/google/pprof/internal/measurement.s 10464 10440 -24 -0.229% cmd/vendor/golang.org/x/tools/go/types/typeutil.s 12319 12274 -45 -0.365% cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s 13503 13342 -161 -1.192% cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s 5261 5218 -43 -0.817% cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s 1462 1459 -3 -0.205% cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s 9594 9582 -12 -0.125% cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s 34397 34338 -59 -0.172% cmd/vendor/github.com/google/pprof/internal/graph.s 53225 52936 -289 -0.543% cmd/vendor/github.com/ianlancetaylor/demangle.s 177450 175329 -2121 -1.195% crypto/x509.s 147892 147388 -504 -0.341% cmd/go/internal/work.s 306465 304950 -1515 -0.494% cmd/go/internal/run.s 4664 4657 -7 -0.150% crypto/tls.s 313130 311833 -1297 -0.414% net/http/httptrace.s 3979 3905 -74 -1.860% net/smtp.s 14413 14344 -69 -0.479% cmd/link/internal/ld.s 545343 542279 -3064 -0.562% cmd/link/internal/mips.s 6218 6215 -3 -0.048% cmd/link/internal/mips64.s 6108 6103 -5 -0.082% cmd/link/internal/amd64.s 18154 18112 -42 -0.231% cmd/link/internal/arm64.s 22527 22494 -33 -0.146% cmd/link/internal/arm.s 22574 22494 -80 -0.354% cmd/link/internal/s390x.s 20779 20746 -33 -0.159% cmd/link/internal/wasm.s 16531 16493 -38 -0.230% cmd/link/internal/x86.s 18906 18849 -57 -0.301% cmd/link/internal/ppc64.s 26856 26778 -78 -0.290% net/http.s 559101 556513 -2588 -0.463% net/http/cookiejar.s 15912 15885 -27 -0.170% expvar.s 9531 9525 -6 -0.063% net/http/httptest.s 16616 16475 -141 -0.849% net/http/cgi.s 23624 23458 -166 -0.703% cmd/go/internal/web.s 16546 16489 -57 -0.344% cmd/vendor/golang.org/x/mod/sumdb.s 33197 33117 -80 -0.241% net/http/fcgi.s 19266 19169 -97 -0.503% net/http/httputil.s 39875 39728 -147 -0.369% cmd/vendor/github.com/google/pprof/internal/symbolz.s 5888 5867 -21 -0.357% net/rpc.s 34154 34003 -151 -0.442% cmd/vendor/github.com/google/pprof/internal/transport.s 2746 2716 -30 -1.092% cmd/vendor/github.com/google/pprof/internal/binutils.s 35999 35875 -124 -0.344% net/rpc/jsonrpc.s 6637 6598 -39 -0.588% cmd/vendor/github.com/google/pprof/internal/symbolizer.s 11533 11458 -75 -0.650% cmd/go/internal/get.s 62921 62803 -118 -0.188% cmd/vendor/github.com/google/pprof/internal/report.s 80364 80058 -306 -0.381% cmd/go/internal/modfetch/codehost.s 89680 89066 -614 -0.685% cmd/trace.s 117171 116701 -470 -0.401% cmd/vendor/github.com/google/pprof/internal/driver.s 144268 143297 -971 -0.673% cmd/go/internal/modfetch.s 126299 125860 -439 -0.348% cmd/vendor/github.com/google/pprof/driver.s 9042 9000 -42 -0.464% cmd/go/internal/modconv.s 17947 17889 -58 -0.323% cmd/pprof.s 12399 12326 -73 -0.589% cmd/go/internal/modload.s 151182 150389 -793 -0.525% cmd/go/internal/generate.s 11738 11636 -102 -0.869% cmd/go/internal/help.s 6571 6531 -40 -0.609% cmd/go/internal/clean.s 11174 11142 -32 -0.286% cmd/go/internal/vet.s 7897 7867 -30 -0.380% cmd/go/internal/envcmd.s 22176 22095 -81 -0.365% cmd/go/internal/list.s 15216 15067 -149 -0.979% cmd/go/internal/modget.s 38698 38519 -179 -0.463% cmd/go/internal/modcmd.s 46674 46441 -233 -0.499% cmd/go/internal/test.s 64664 64456 -208 -0.322% cmd/go.s 6730 6703 -27 -0.401% cmd/compile/internal/ssa.s 3592565 3582500 -10065 -0.280% cmd/compile/internal/gc.s 1549123 1537123 -12000 -0.775% cmd/compile/internal/riscv64.s 14579 14483 -96 -0.658% cmd/compile/internal/mips.s 20578 20419 -159 -0.773% cmd/compile/internal/ppc64.s 25524 25359 -165 -0.646% cmd/compile/internal/mips64.s 19795 19636 -159 -0.803% cmd/compile/internal/wasm.s 13329 13290 -39 -0.293% cmd/compile/internal/s390x.s 28097 27892 -205 -0.730% cmd/compile/internal/arm.s 31489 31321 -168 -0.534% cmd/compile/internal/arm64.s 29803 29590 -213 -0.715% cmd/compile/internal/amd64.s 32961 33221 +260 +0.789% cmd/compile/internal/x86.s 31029 30878 -151 -0.487% total 18534966 18440341 -94625 -0.511% Change-Id: I830d37364f14f0297800adc42c99f60a74c51aca Reviewed-on: https://go-review.googlesource.com/c/go/+/226367 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
1b47fde55c |
cmd/compile: clarify division bounds check optimization
The name of the function should mention division. Eliminate double negatives from the comment describing it. Change-Id: Icef1a5139b3a91b86acb930af97938f5160f7342 Reviewed-on: https://go-review.googlesource.com/c/go/+/217001 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> |
||
|
40ebcfaa17 |
cmd/compile: enable nil check logging for other architectures.
Change-Id: If82ebd9cd6470863eb5de9e031e7905a66218857 Reviewed-on: https://go-review.googlesource.com/c/go/+/204159 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
cd53fddabb |
cmd/compile: add framework for logging optimizer (non)actions to LSP
This is intended to allow IDEs to note where the optimizer was not able to improve users' code. There may be other applications for this, for example in studying effectiveness of optimizer changes more quickly than running benchmarks, or in verifying that code changes did not accidentally disable optimizations in performance-critical code. Logging of nilcheck (bad) for amd64 is implemented as proof-of-concept. In general, the intent is that optimizations that didn't happen are what will be logged, because that is believed to be what IDE users want. Added flag -json=version,dest Check that version=0. (Future compilers will support a few recent versions, I hope that version is always <=3.) Dest is expected to be one of: /path (or \path in Windows) will create directory /path and fill it w/ json files file://path will create directory path, intended either for I:\dont\know\enough\about\windows\paths trustme_I_know_what_I_am_doing_probably_testing Not passing an absolute path name usually leads to json splattered all over source directories, or failure when those directories are not writeable. If you want a foot-gun, you have to ask for it. The JSON output is directed to subdirectories of dest, where each subdirectory is net/url.PathEscape of the package name, and each for each foo.go in the package, net/url.PathEscape(foo).json is created. The first line of foo.json contains version and context information, and subsequent lines contains LSP-conforming JSON describing the missing optimizations. Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/204338 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
97592b3c14 |
cmd/compile: intrinsics for runtime/internal/atomic.Store8
For #10958, #24543, but makes sense on its own. Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a Reviewed-on: https://go-review.googlesource.com/c/go/+/203284 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
7a6da218b1 |
cmd/compile: add fma intrinsic for amd64
To permit ssa-level optimization, this change introduces an amd64 intrinsic that generates the VFMADD231SD instruction for the fused-multiply-add operation on systems that support it. System support is detected via cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic ("Fma") to VFMADD231SD. The benchmark compares the software implementation (old) with the intrinsic (new). name old time/op new time/op delta Fma-4 27.2ns ± 1% 1.0ns ± 9% -96.48% (p=0.008 n=5+5) Updates #25819. Change-Id: I966655e5f96817a5d06dff5942418a3915b09584 Reviewed-on: https://go-review.googlesource.com/c/go/+/137156 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
11d7775c9f |
cmd/compile: remove some nacl SSA rules
Updates golang/go#30439 Change-Id: I7ef5301fbd650d26a37a1241ddf7ca1ccd58b89d Reviewed-on: https://go-review.googlesource.com/c/go/+/200941 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
||
|
9c2e7e8bed |
cmd/compile: allow multiple SSA block control values
Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
4a4e05b0b1 |
cmd/compile,runtime/internal/atomic: add Load8
Change-Id: Id52a5730cf9207ee7ccebac4ef12791dc5720e7c Reviewed-on: https://go-review.googlesource.com/c/go/+/172283 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
||
|
ad6c691542 |
cmd/compile: remove AUNDEF opcode
This opcode was only used to mark unreachable code for plive to use. plive now uses the SSA representation, so it knows locations are unreachable because they are ends of Exit blocks. It doesn't need these opcodes any more. These opcodes actually used space in the binary, 2 bytes per undef on x86 and more for other archs. Makes the amd64 go binary 0.2% smaller. Change-Id: I64c84c35db7c7949617a3a5830f09c8e5fcd2620 Reviewed-on: https://go-review.googlesource.com/c/go/+/171058 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
||
|
250b96a7bf |
cmd/compile: slightly optimize adding 128
'SUBQ $-0x80, r' is shorter to encode than 'ADDQ $0x80, r', and functionally equivalent. Use it instead. Shaves off a few bytes here and there: file before after Δ % compile 25935856 25927664 -8192 -0.032% nm 4251840 4247744 -4096 -0.096% Change-Id: Ia9e02ea38cbded6a52a613b92e3a914f878d931e Reviewed-on: https://go-review.googlesource.com/c/go/+/168344 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
||
|
2c423f063b |
cmd/compile,runtime: provide index information on bounds check failure
A few examples (for accessing a slice of length 3): s[-1] runtime error: index out of range [-1] s[3] runtime error: index out of range [3] with length 3 s[-1:0] runtime error: slice bounds out of range [-1:] s[3:0] runtime error: slice bounds out of range [3:0] s[3:-1] runtime error: slice bounds out of range [:-1] s[3:4] runtime error: slice bounds out of range [:4] with capacity 3 s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3 Note that in cases where there are multiple things wrong with the indexes (e.g. s[3:-1]), we report one of those errors kind of arbitrarily, currently the rightmost one. An exhaustive set of examples is in issue30116[u].out in the CL. The message text has the same prefix as the old message text. That leads to slightly awkward phrasing but hopefully minimizes the chance that code depending on the error text will break. Increases the size of the go binary by 0.5% (amd64). The panic functions take arguments in registers in order to keep the size of the compiled code as small as possible. Fixes #30116 Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd Reviewed-on: https://go-review.googlesource.com/c/go/+/161477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
||
|
07b4b4a1a8 |
cmd/compile: add scale field to SSA Ops
Refactoring only. This makes it easier to add ops that do indexed memory loads/stores. Passes toolstash-check. Change-Id: I82df0d4154718577ec42106fa1bc76571bf65096 Reviewed-on: https://go-review.googlesource.com/c/go/+/166425 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
5f5ea3fd4d |
cmd/compile: optimize amd64's ADDQconstmodify/ADDLconstmodify
This CL optimize amd64's code: "ADDQ $-1, MEM_OP" -> "DECQ MEM_OP" "ADDL $-1, MEM_OP" -> "DECL MEM_OP" 1. The total size of pkg/linux_amd64 (excluding cmd/compile) decreases about 0.1KB. 2. The go1 benchmark shows little regression, excluding noise. name old time/op new time/op delta BinaryTree17-4 2.60s ± 5% 2.64s ± 3% +1.53% (p=0.000 n=38+39) Fannkuch11-4 2.37s ± 2% 2.38s ± 2% ~ (p=0.950 n=40+40) FmtFprintfEmpty-4 40.4ns ± 5% 40.5ns ± 5% ~ (p=0.711 n=40+40) FmtFprintfString-4 72.4ns ± 5% 72.3ns ± 3% ~ (p=0.485 n=40+40) FmtFprintfInt-4 79.7ns ± 3% 80.1ns ± 3% ~ (p=0.124 n=40+40) FmtFprintfIntInt-4 126ns ± 3% 127ns ± 3% +0.71% (p=0.027 n=40+40) FmtFprintfPrefixedInt-4 153ns ± 4% 153ns ± 2% ~ (p=0.604 n=40+40) FmtFprintfFloat-4 206ns ± 5% 210ns ± 5% +1.79% (p=0.002 n=40+40) FmtManyArgs-4 498ns ± 3% 496ns ± 3% ~ (p=0.099 n=40+40) GobDecode-4 6.48ms ± 6% 6.47ms ± 7% ~ (p=0.686 n=39+40) GobEncode-4 5.95ms ± 7% 5.96ms ± 6% ~ (p=0.670 n=40+34) Gzip-4 224ms ± 6% 223ms ± 5% ~ (p=0.143 n=40+40) Gunzip-4 36.5ms ± 4% 36.5ms ± 4% ~ (p=0.556 n=40+40) HTTPClientServer-4 60.7µs ± 2% 59.9µs ± 3% -1.20% (p=0.000 n=39+39) JSONEncode-4 9.03ms ± 4% 9.04ms ± 4% ~ (p=0.589 n=40+40) JSONDecode-4 49.4ms ± 4% 49.2ms ± 4% ~ (p=0.276 n=40+40) Mandelbrot200-4 3.80ms ± 4% 3.79ms ± 4% ~ (p=0.837 n=40+40) GoParse-4 3.15ms ± 5% 3.13ms ± 5% ~ (p=0.240 n=40+40) RegexpMatchEasy0_32-4 72.9ns ± 3% 72.0ns ± 8% -1.25% (p=0.003 n=40+40) RegexpMatchEasy0_1K-4 229ns ± 5% 230ns ± 4% ~ (p=0.318 n=40+40) RegexpMatchEasy1_32-4 66.9ns ± 3% 67.3ns ± 7% ~ (p=0.817 n=40+40) RegexpMatchEasy1_1K-4 371ns ± 5% 370ns ± 4% ~ (p=0.275 n=40+40) RegexpMatchMedium_32-4 106ns ± 4% 104ns ± 7% -2.28% (p=0.000 n=40+40) RegexpMatchMedium_1K-4 32.0µs ± 2% 31.4µs ± 3% -2.08% (p=0.000 n=40+40) RegexpMatchHard_32-4 1.54µs ± 7% 1.52µs ± 3% -1.80% (p=0.007 n=39+40) RegexpMatchHard_1K-4 45.8µs ± 4% 45.5µs ± 3% ~ (p=0.707 n=40+40) Revcomp-4 401ms ± 5% 401ms ± 6% ~ (p=0.935 n=40+40) Template-4 62.4ms ± 4% 61.2ms ± 3% -1.85% (p=0.000 n=40+40) TimeParse-4 315ns ± 2% 318ns ± 3% +1.10% (p=0.002 n=40+40) TimeFormat-4 297ns ± 3% 298ns ± 3% ~ (p=0.238 n=40+40) [Geo mean] 45.8µs 45.7µs -0.22% name old speed new speed delta GobDecode-4 119MB/s ± 6% 119MB/s ± 7% ~ (p=0.684 n=39+40) GobEncode-4 129MB/s ± 7% 128MB/s ± 6% ~ (p=0.413 n=40+34) Gzip-4 86.6MB/s ± 6% 87.0MB/s ± 6% ~ (p=0.145 n=40+40) Gunzip-4 532MB/s ± 4% 532MB/s ± 4% ~ (p=0.556 n=40+40) JSONEncode-4 215MB/s ± 4% 215MB/s ± 4% ~ (p=0.583 n=40+40) JSONDecode-4 39.3MB/s ± 4% 39.5MB/s ± 4% ~ (p=0.277 n=40+40) GoParse-4 18.4MB/s ± 5% 18.5MB/s ± 5% ~ (p=0.229 n=40+40) RegexpMatchEasy0_32-4 439MB/s ± 3% 445MB/s ± 8% +1.28% (p=0.003 n=40+40) RegexpMatchEasy0_1K-4 4.46GB/s ± 4% 4.45GB/s ± 4% ~ (p=0.343 n=40+40) RegexpMatchEasy1_32-4 479MB/s ± 3% 476MB/s ± 7% ~ (p=0.855 n=40+40) RegexpMatchEasy1_1K-4 2.76GB/s ± 5% 2.77GB/s ± 4% ~ (p=0.250 n=40+40) RegexpMatchMedium_32-4 9.36MB/s ± 4% 9.58MB/s ± 6% +2.31% (p=0.001 n=40+40) RegexpMatchMedium_1K-4 32.0MB/s ± 2% 32.7MB/s ± 3% +2.12% (p=0.000 n=40+40) RegexpMatchHard_32-4 20.7MB/s ± 7% 21.1MB/s ± 3% +1.95% (p=0.005 n=40+40) RegexpMatchHard_1K-4 22.4MB/s ± 4% 22.5MB/s ± 3% ~ (p=0.689 n=40+40) Revcomp-4 634MB/s ± 5% 634MB/s ± 6% ~ (p=0.935 n=40+40) Template-4 31.1MB/s ± 3% 31.7MB/s ± 3% +1.88% (p=0.000 n=40+40) [Geo mean] 129MB/s 130MB/s +0.62% Change-Id: I9d61ee810d900920c572cbe89e2f1626bfed12b7 Reviewed-on: https://go-review.googlesource.com/c/145209 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
dd789550a7 |
cmd/compile: intrinsify math/bits.Sub on amd64
name old time/op new time/op delta Sub-8 1.12ns ± 1% 1.17ns ± 1% +5.20% (p=0.008 n=5+5) Sub32-8 1.11ns ± 0% 1.11ns ± 0% ~ (all samples are equal) Sub64-8 1.12ns ± 0% 1.18ns ± 1% +5.00% (p=0.016 n=4+5) Sub64multiple-8 4.10ns ± 1% 0.86ns ± 1% -78.93% (p=0.008 n=5+5) Fixes #28273 Change-Id: Ibcb6f2fd32d987c3bcbae4f4cd9d335a3de98548 Reviewed-on: https://go-review.googlesource.com/c/144258 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
899f3a2892 |
cmd/compile: intrinsify math/bits.Add on amd64
name old time/op new time/op delta Add-8 1.11ns ± 0% 1.18ns ± 0% +6.31% (p=0.029 n=4+4) Add32-8 1.02ns ± 0% 1.02ns ± 1% ~ (p=0.333 n=4+5) Add64-8 1.11ns ± 1% 1.17ns ± 0% +5.79% (p=0.008 n=5+5) Add64multiple-8 4.35ns ± 1% 0.86ns ± 0% -80.22% (p=0.000 n=5+4) The individual ops are a bit slower (but still very fast). Using the ops in carry chains is very fast. Update #28273 Change-Id: Id975f76df2b930abf0e412911d327b6c5b1befe5 Reviewed-on: https://go-review.googlesource.com/c/144257 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
13d5cd7847 |
cmd/compile: use proved bounds to remove signed division fix-ups
prove is able to find 94 occurrences in std cmd where a divisor can't have the value -1. The change removes the extraneous fix-up code for these cases. Fixes #25239 Change-Id: Ic184de971f47cc57c702eb72805b8e291c14035d Reviewed-on: https://go-review.googlesource.com/c/130215 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
a1ca4893ff |
cmd/compile: add intrinsics for runtime/internal/math on 386 and amd64
Add generic, 386 and amd64 specific ops and SSA rules for multiplication with overflow and branching based on overflow flags. Use these to intrinsify runtime/internal/math.MulUinptr. On amd64 mul, overflow := math.MulUintptr(a, b) if overflow { is lowered to two instructions: MULQ SI JO 0x10ee35c No codegen tests as codegen can not currently test unexported internal runtime functions. amd64: name old time/op new time/op delta MulUintptr/small 1.16ns ± 5% 0.88ns ± 6% -24.36% (p=0.000 n=19+20) MulUintptr/large 10.7ns ± 1% 1.1ns ± 1% -89.28% (p=0.000 n=17+19) Change-Id: If60739a86f820e5044d677276c21df90d3c7a86a Reviewed-on: https://go-review.googlesource.com/c/141820 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
ccc337d8ee |
cmd/compile: combine similar code in amd64's assembly generator
BSFQ/BSRQ/BSFL/BSRL/SQRTSD have similar logic in amd64's assembly generator. This CL combines them together while does not impact generated amd64 code. The total size of pkg/linux_amd64/cmd/compile/internal decreases about 1.8KB. Change-Id: I5f3210c5178c20ac9108877c69f17234baf5b6b7 Reviewed-on: https://go-review.googlesource.com/c/140438 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
2294e3ebd3 |
cmd/compile: combine similar code in amd64's assembly generator
This CL combines similar code in amd64's assembly generator. The total size of pkg/linux_amd64/cmd/compile/ decreases about 4.5KB, while the generated amd64 code is not affected. Change-Id: I4cdbdd22bde8857aafdc29b47fa100a906fa1598 Reviewed-on: https://go-review.googlesource.com/c/140298 Run-TryBot: Ben Shi <powerman1st@163.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
c6bf9a8109 |
cmd/compile: optimize AMD64's bit wise operation
Currently "arr[idx] |= 0x80" is compiled to MOVLload->BTSL->MOVLstore. And this CL optimizes it to a single BTSLconstmodify. Other bit wise operations with a direct memory operand are also implemented. 1. The size of the executable bin/go decreases about 4KB, and the total size of pkg/linux_amd64 (excluding cmd/compile) decreases about 0.6KB. 2. There a little improvement in the go1 benchmark test (excluding noise). name old time/op new time/op delta BinaryTree17-4 2.66s ± 4% 2.66s ± 3% ~ (p=0.596 n=49+49) Fannkuch11-4 2.38s ± 2% 2.32s ± 2% -2.69% (p=0.000 n=50+50) FmtFprintfEmpty-4 42.7ns ± 4% 43.2ns ± 7% +1.31% (p=0.009 n=50+50) FmtFprintfString-4 71.0ns ± 5% 72.0ns ± 3% +1.33% (p=0.000 n=50+50) FmtFprintfInt-4 80.7ns ± 4% 80.6ns ± 3% ~ (p=0.931 n=50+50) FmtFprintfIntInt-4 125ns ± 3% 126ns ± 4% ~ (p=0.051 n=50+50) FmtFprintfPrefixedInt-4 158ns ± 1% 142ns ± 3% -9.84% (p=0.000 n=36+50) FmtFprintfFloat-4 215ns ± 4% 212ns ± 4% -1.23% (p=0.002 n=50+50) FmtManyArgs-4 519ns ± 3% 510ns ± 3% -1.77% (p=0.000 n=50+50) GobDecode-4 6.49ms ± 6% 6.52ms ± 5% ~ (p=0.866 n=50+50) GobEncode-4 5.93ms ± 8% 6.01ms ± 7% ~ (p=0.076 n=50+50) Gzip-4 222ms ± 4% 224ms ± 8% +0.80% (p=0.001 n=50+50) Gunzip-4 36.6ms ± 5% 36.4ms ± 4% ~ (p=0.093 n=50+50) HTTPClientServer-4 59.1µs ± 1% 58.9µs ± 2% -0.24% (p=0.039 n=49+48) JSONEncode-4 9.23ms ± 4% 9.21ms ± 5% ~ (p=0.244 n=50+50) JSONDecode-4 48.8ms ± 4% 48.7ms ± 4% ~ (p=0.653 n=50+50) Mandelbrot200-4 3.81ms ± 4% 3.80ms ± 3% ~ (p=0.834 n=50+50) GoParse-4 3.20ms ± 5% 3.19ms ± 5% ~ (p=0.494 n=50+50) RegexpMatchEasy0_32-4 78.1ns ± 2% 77.4ns ± 3% -0.86% (p=0.005 n=50+50) RegexpMatchEasy0_1K-4 233ns ± 3% 233ns ± 3% ~ (p=0.074 n=50+50) RegexpMatchEasy1_32-4 74.2ns ± 3% 73.4ns ± 3% -1.06% (p=0.000 n=50+50) RegexpMatchEasy1_1K-4 369ns ± 2% 364ns ± 4% -1.41% (p=0.000 n=36+50) RegexpMatchMedium_32-4 109ns ± 4% 107ns ± 3% -2.06% (p=0.001 n=50+50) RegexpMatchMedium_1K-4 31.5µs ± 3% 30.8µs ± 3% -2.20% (p=0.000 n=50+50) RegexpMatchHard_32-4 1.57µs ± 3% 1.56µs ± 2% -0.57% (p=0.016 n=50+50) RegexpMatchHard_1K-4 47.4µs ± 4% 47.0µs ± 3% -0.82% (p=0.008 n=50+50) Revcomp-4 414ms ± 7% 412ms ± 7% ~ (p=0.285 n=50+50) Template-4 64.3ms ± 4% 62.7ms ± 3% -2.44% (p=0.000 n=50+50) TimeParse-4 316ns ± 3% 313ns ± 3% ~ (p=0.122 n=50+50) TimeFormat-4 291ns ± 3% 293ns ± 3% +0.80% (p=0.001 n=50+50) [Geo mean] 46.5µs 46.2µs -0.81% name old speed new speed delta GobDecode-4 118MB/s ± 6% 118MB/s ± 5% ~ (p=0.863 n=50+50) GobEncode-4 130MB/s ± 9% 128MB/s ± 8% ~ (p=0.076 n=50+50) Gzip-4 87.4MB/s ± 4% 86.8MB/s ± 7% -0.78% (p=0.002 n=50+50) Gunzip-4 531MB/s ± 5% 533MB/s ± 4% ~ (p=0.093 n=50+50) JSONEncode-4 210MB/s ± 4% 211MB/s ± 5% ~ (p=0.247 n=50+50) JSONDecode-4 39.8MB/s ± 4% 39.9MB/s ± 4% ~ (p=0.654 n=50+50) GoParse-4 18.1MB/s ± 5% 18.2MB/s ± 5% ~ (p=0.493 n=50+50) RegexpMatchEasy0_32-4 410MB/s ± 2% 413MB/s ± 3% +0.86% (p=0.004 n=50+50) RegexpMatchEasy0_1K-4 4.39GB/s ± 3% 4.38GB/s ± 3% ~ (p=0.063 n=50+50) RegexpMatchEasy1_32-4 432MB/s ± 3% 436MB/s ± 3% +1.07% (p=0.000 n=50+50) RegexpMatchEasy1_1K-4 2.77GB/s ± 2% 2.81GB/s ± 4% +1.46% (p=0.000 n=36+50) RegexpMatchMedium_32-4 9.16MB/s ± 3% 9.35MB/s ± 4% +2.09% (p=0.001 n=50+50) RegexpMatchMedium_1K-4 32.5MB/s ± 3% 33.2MB/s ± 3% +2.25% (p=0.000 n=50+50) RegexpMatchHard_32-4 20.4MB/s ± 3% 20.5MB/s ± 2% +0.56% (p=0.017 n=50+50) RegexpMatchHard_1K-4 21.6MB/s ± 4% 21.8MB/s ± 3% +0.83% (p=0.008 n=50+50) Revcomp-4 613MB/s ± 4% 618MB/s ± 7% ~ (p=0.152 n=48+50) Template-4 30.2MB/s ± 4% 30.9MB/s ± 3% +2.49% (p=0.000 n=50+50) [Geo mean] 127MB/s 128MB/s +0.64% Change-Id: If405198283855d75697f66cf894b2bef458f620e Reviewed-on: https://go-review.googlesource.com/135422 Reviewed-by: Keith Randall <khr@golang.org> |
||
|
d17ac29158 |
cmd/compile: simplify AMD64's assembly generator
AMD64's ADDQconstmodify/ADDLconstmodify have similar logic with other constmodify like operators, but seperated case statements. This CL simplify them with a fallthrough. Change-Id: Ia73ffeaddc5080182f68c06c9d9b48fe32a14e38 Reviewed-on: https://go-review.googlesource.com/135855 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
4545fea4fc |
cmd/compile/internal/amd64: simplify assembly generator
Merge two case-statements together, since they have similar logic. 1. That makes the assembly generator more clear. 2. The total size of cmd/compile decreases about 0.8KB. Change-Id: I0144a07152202ee7b21e323bcd5dea80a351a6e3 Reviewed-on: https://go-review.googlesource.com/134215 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
3bc34385fa |
cmd/compile: introduce more read-modify-write operations for amd64
Add suport of read-modify-write for AND/SUB/AND/OR/XOR on amd64. 1. The total size of pkg/linux_amd64 decreases about 4KB, excluding cmd/compile. 2. The go1 benchmark shows a little improvement, excluding noise. name old time/op new time/op delta BinaryTree17-4 2.63s ± 3% 2.65s ± 4% +1.01% (p=0.037 n=35+35) Fannkuch11-4 2.33s ± 2% 2.39s ± 2% +2.49% (p=0.000 n=35+35) FmtFprintfEmpty-4 45.4ns ± 5% 40.8ns ± 6% -10.09% (p=0.000 n=35+35) FmtFprintfString-4 73.3ns ± 4% 70.9ns ± 3% -3.23% (p=0.000 n=30+35) FmtFprintfInt-4 79.9ns ± 4% 79.5ns ± 3% ~ (p=0.736 n=34+35) FmtFprintfIntInt-4 126ns ± 4% 125ns ± 4% ~ (p=0.083 n=35+35) FmtFprintfPrefixedInt-4 152ns ± 6% 152ns ± 3% ~ (p=0.855 n=34+35) FmtFprintfFloat-4 215ns ± 4% 213ns ± 4% ~ (p=0.066 n=35+35) FmtManyArgs-4 522ns ± 3% 506ns ± 3% -3.15% (p=0.000 n=35+35) GobDecode-4 6.45ms ± 8% 6.51ms ± 7% +0.96% (p=0.026 n=35+35) GobEncode-4 6.10ms ± 6% 6.02ms ± 8% ~ (p=0.160 n=35+35) Gzip-4 228ms ± 3% 221ms ± 3% -2.92% (p=0.000 n=35+35) Gunzip-4 37.5ms ± 4% 37.2ms ± 3% -0.78% (p=0.036 n=35+35) HTTPClientServer-4 58.7µs ± 2% 59.2µs ± 1% +0.80% (p=0.000 n=33+33) JSONEncode-4 12.0ms ± 3% 12.2ms ± 3% +1.84% (p=0.008 n=35+35) JSONDecode-4 57.0ms ± 4% 56.6ms ± 3% ~ (p=0.320 n=35+35) Mandelbrot200-4 3.82ms ± 3% 3.79ms ± 3% ~ (p=0.074 n=35+35) GoParse-4 3.21ms ± 5% 3.24ms ± 4% ~ (p=0.119 n=35+35) RegexpMatchEasy0_32-4 76.3ns ± 4% 75.4ns ± 4% -1.14% (p=0.014 n=34+33) RegexpMatchEasy0_1K-4 251ns ± 4% 254ns ± 3% +1.28% (p=0.016 n=35+35) RegexpMatchEasy1_32-4 69.6ns ± 3% 70.1ns ± 3% +0.82% (p=0.005 n=35+35) RegexpMatchEasy1_1K-4 367ns ± 4% 376ns ± 4% +2.47% (p=0.000 n=35+35) RegexpMatchMedium_32-4 108ns ± 5% 104ns ± 4% -3.18% (p=0.000 n=35+35) RegexpMatchMedium_1K-4 33.8µs ± 3% 32.7µs ± 3% -3.27% (p=0.000 n=35+35) RegexpMatchHard_32-4 1.55µs ± 3% 1.52µs ± 3% -1.64% (p=0.000 n=35+35) RegexpMatchHard_1K-4 46.6µs ± 3% 46.6µs ± 4% ~ (p=0.149 n=35+35) Revcomp-4 416ms ± 7% 412ms ± 6% -0.95% (p=0.033 n=33+35) Template-4 64.3ms ± 3% 62.4ms ± 7% -2.94% (p=0.000 n=35+35) TimeParse-4 320ns ± 2% 322ns ± 3% ~ (p=0.589 n=35+35) TimeFormat-4 300ns ± 3% 300ns ± 3% ~ (p=0.597 n=35+35) [Geo mean] 47.4µs 47.0µs -0.86% name old speed new speed delta GobDecode-4 119MB/s ± 7% 118MB/s ± 7% -0.96% (p=0.027 n=35+35) GobEncode-4 126MB/s ± 7% 127MB/s ± 6% ~ (p=0.157 n=34+34) Gzip-4 85.3MB/s ± 3% 87.9MB/s ± 3% +3.02% (p=0.000 n=35+35) Gunzip-4 518MB/s ± 4% 522MB/s ± 3% +0.79% (p=0.037 n=35+35) JSONEncode-4 162MB/s ± 3% 159MB/s ± 3% -1.81% (p=0.009 n=35+35) JSONDecode-4 34.1MB/s ± 4% 34.3MB/s ± 3% ~ (p=0.318 n=35+35) GoParse-4 18.0MB/s ± 5% 17.9MB/s ± 4% ~ (p=0.117 n=35+35) RegexpMatchEasy0_32-4 419MB/s ± 3% 425MB/s ± 4% +1.46% (p=0.003 n=32+33) RegexpMatchEasy0_1K-4 4.07GB/s ± 4% 4.02GB/s ± 3% -1.28% (p=0.014 n=35+35) RegexpMatchEasy1_32-4 460MB/s ± 3% 456MB/s ± 4% -0.82% (p=0.004 n=35+35) RegexpMatchEasy1_1K-4 2.79GB/s ± 4% 2.72GB/s ± 4% -2.39% (p=0.000 n=35+35) RegexpMatchMedium_32-4 9.23MB/s ± 4% 9.53MB/s ± 4% +3.16% (p=0.000 n=35+35) RegexpMatchMedium_1K-4 30.3MB/s ± 3% 31.3MB/s ± 3% +3.38% (p=0.000 n=35+35) RegexpMatchHard_32-4 20.7MB/s ± 3% 21.0MB/s ± 3% +1.67% (p=0.000 n=35+35) RegexpMatchHard_1K-4 22.0MB/s ± 3% 21.9MB/s ± 4% ~ (p=0.277 n=35+33) Revcomp-4 612MB/s ± 7% 618MB/s ± 6% +0.96% (p=0.034 n=33+35) Template-4 30.2MB/s ± 3% 31.1MB/s ± 6% +3.05% (p=0.000 n=35+35) [Geo mean] 123MB/s 124MB/s +0.64% Change-Id: Ia025da272e07d0069413824bfff3471b106d6280 Reviewed-on: https://go-review.googlesource.com/121535 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
705f3c74e6 |
cmd/compile: optimize AMD64 with DIVSSload and DIVSDload
DIVSSload & DIVSDload directly operate on a memory operand. And binary size can be reduced by them, while the performance is not affected. The total size of pkg/linux_amd64 (excluding cmd/compile) decreases about 6KB. There is little regression in the go1 benchmark test (excluding noise). name old time/op new time/op delta BinaryTree17-4 2.63s ± 4% 2.62s ± 4% ~ (p=0.809 n=30+30) Fannkuch11-4 2.40s ± 2% 2.40s ± 2% ~ (p=0.109 n=30+30) FmtFprintfEmpty-4 43.1ns ± 4% 43.2ns ± 9% ~ (p=0.168 n=30+30) FmtFprintfString-4 73.6ns ± 4% 74.1ns ± 4% ~ (p=0.069 n=30+30) FmtFprintfInt-4 81.0ns ± 3% 81.4ns ± 5% ~ (p=0.350 n=30+30) FmtFprintfIntInt-4 127ns ± 4% 129ns ± 4% +0.99% (p=0.021 n=30+30) FmtFprintfPrefixedInt-4 156ns ± 4% 155ns ± 4% ~ (p=0.415 n=30+30) FmtFprintfFloat-4 219ns ± 4% 218ns ± 4% ~ (p=0.071 n=30+30) FmtManyArgs-4 522ns ± 3% 518ns ± 3% -0.68% (p=0.034 n=30+30) GobDecode-4 6.49ms ± 6% 6.52ms ± 6% ~ (p=0.832 n=30+30) GobEncode-4 6.10ms ± 9% 6.14ms ± 7% ~ (p=0.485 n=30+30) Gzip-4 227ms ± 1% 224ms ± 4% ~ (p=0.484 n=24+30) Gunzip-4 37.2ms ± 3% 36.8ms ± 4% ~ (p=0.889 n=30+30) HTTPClientServer-4 58.9µs ± 1% 58.7µs ± 2% -0.42% (p=0.003 n=28+28) JSONEncode-4 12.0ms ± 3% 12.0ms ± 4% ~ (p=0.523 n=30+30) JSONDecode-4 54.6ms ± 4% 54.5ms ± 4% ~ (p=0.708 n=30+30) Mandelbrot200-4 3.78ms ± 4% 3.81ms ± 3% +0.99% (p=0.016 n=30+30) GoParse-4 3.20ms ± 4% 3.20ms ± 5% ~ (p=0.994 n=30+30) RegexpMatchEasy0_32-4 77.0ns ± 4% 75.9ns ± 3% -1.39% (p=0.006 n=29+30) RegexpMatchEasy0_1K-4 255ns ± 4% 253ns ± 4% ~ (p=0.091 n=30+30) RegexpMatchEasy1_32-4 69.7ns ± 3% 70.3ns ± 4% ~ (p=0.120 n=30+30) RegexpMatchEasy1_1K-4 373ns ± 2% 378ns ± 3% +1.43% (p=0.000 n=21+26) RegexpMatchMedium_32-4 107ns ± 2% 108ns ± 4% +1.50% (p=0.012 n=22+30) RegexpMatchMedium_1K-4 34.0µs ± 1% 34.3µs ± 3% +1.08% (p=0.008 n=24+30) RegexpMatchHard_32-4 1.53µs ± 3% 1.54µs ± 3% ~ (p=0.234 n=30+30) RegexpMatchHard_1K-4 46.7µs ± 4% 47.0µs ± 4% ~ (p=0.420 n=30+30) Revcomp-4 411ms ± 7% 415ms ± 6% ~ (p=0.059 n=30+30) Template-4 65.5ms ± 5% 66.9ms ± 4% +2.21% (p=0.001 n=30+30) TimeParse-4 317ns ± 3% 311ns ± 3% -1.97% (p=0.000 n=30+30) TimeFormat-4 293ns ± 3% 294ns ± 3% ~ (p=0.243 n=30+30) [Geo mean] 47.4µs 47.5µs +0.17% name old speed new speed delta GobDecode-4 118MB/s ± 5% 118MB/s ± 6% ~ (p=0.832 n=30+30) GobEncode-4 125MB/s ± 7% 125MB/s ± 7% ~ (p=0.625 n=29+30) Gzip-4 85.3MB/s ± 1% 86.6MB/s ± 4% ~ (p=0.486 n=24+30) Gunzip-4 522MB/s ± 3% 527MB/s ± 4% ~ (p=0.889 n=30+30) JSONEncode-4 162MB/s ± 3% 162MB/s ± 4% ~ (p=0.520 n=30+30) JSONDecode-4 35.5MB/s ± 4% 35.6MB/s ± 4% ~ (p=0.701 n=30+30) GoParse-4 18.1MB/s ± 4% 18.1MB/s ± 4% ~ (p=0.891 n=29+30) RegexpMatchEasy0_32-4 416MB/s ± 4% 422MB/s ± 3% +1.43% (p=0.005 n=29+30) RegexpMatchEasy0_1K-4 4.01GB/s ± 4% 4.04GB/s ± 4% ~ (p=0.091 n=30+30) RegexpMatchEasy1_32-4 460MB/s ± 3% 456MB/s ± 5% ~ (p=0.123 n=30+30) RegexpMatchEasy1_1K-4 2.74GB/s ± 2% 2.70GB/s ± 3% -1.33% (p=0.000 n=22+26) RegexpMatchMedium_32-4 9.39MB/s ± 3% 9.19MB/s ± 4% -2.06% (p=0.001 n=28+30) RegexpMatchMedium_1K-4 30.1MB/s ± 1% 29.8MB/s ± 3% -1.04% (p=0.008 n=24+30) RegexpMatchHard_32-4 20.9MB/s ± 3% 20.8MB/s ± 3% ~ (p=0.234 n=30+30) RegexpMatchHard_1K-4 21.9MB/s ± 4% 21.8MB/s ± 4% ~ (p=0.420 n=30+30) Revcomp-4 619MB/s ± 7% 612MB/s ± 7% ~ (p=0.059 n=30+30) Template-4 29.6MB/s ± 4% 29.0MB/s ± 4% -2.16% (p=0.002 n=30+30) [Geo mean] 123MB/s 123MB/s -0.33% Change-Id: Ia59e077feae4f2824df79059daea4d0f678e3e4c Reviewed-on: https://go-review.googlesource.com/120275 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> |
||
|
b75c5c5992 |
cmd/compile: optimize AMD64 with more read-modify-write operations
6 more operations which do read-modify-write with a constant source operand are added. 1. The total size of pkg/linux_amd64 decreases about 3KB, excluding cmd/compile. 2. The go1 benckmark shows a slight improvement. name old time/op new time/op delta BinaryTree17-4 2.61s ± 4% 2.67s ± 2% +2.26% (p=0.000 n=30+29) Fannkuch11-4 2.39s ± 2% 2.32s ± 2% -2.67% (p=0.000 n=30+30) FmtFprintfEmpty-4 44.0ns ± 4% 41.7ns ± 4% -5.15% (p=0.000 n=30+30) FmtFprintfString-4 74.2ns ± 4% 72.3ns ± 4% -2.59% (p=0.000 n=30+30) FmtFprintfInt-4 81.7ns ± 3% 78.8ns ± 4% -3.54% (p=0.000 n=27+30) FmtFprintfIntInt-4 130ns ± 4% 124ns ± 5% -4.60% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 154ns ± 3% 152ns ± 3% -1.13% (p=0.012 n=30+30) FmtFprintfFloat-4 215ns ± 4% 212ns ± 5% -1.56% (p=0.002 n=30+30) FmtManyArgs-4 522ns ± 3% 512ns ± 3% -1.84% (p=0.001 n=30+30) GobDecode-4 6.42ms ± 5% 6.49ms ± 7% ~ (p=0.070 n=30+30) GobEncode-4 6.07ms ± 8% 5.98ms ± 8% ~ (p=0.150 n=30+30) Gzip-4 236ms ± 4% 223ms ± 4% -5.57% (p=0.000 n=30+30) Gunzip-4 37.4ms ± 3% 36.7ms ± 4% -2.03% (p=0.000 n=30+30) HTTPClientServer-4 58.7µs ± 1% 58.5µs ± 2% -0.37% (p=0.018 n=30+29) JSONEncode-4 12.0ms ± 4% 12.1ms ± 3% ~ (p=0.112 n=30+30) JSONDecode-4 54.5ms ± 3% 55.5ms ± 4% +1.80% (p=0.006 n=30+30) Mandelbrot200-4 3.78ms ± 4% 3.78ms ± 4% ~ (p=0.173 n=30+30) GoParse-4 3.16ms ± 5% 3.22ms ± 5% +1.75% (p=0.010 n=30+30) RegexpMatchEasy0_32-4 76.6ns ± 1% 75.9ns ± 3% ~ (p=0.672 n=25+30) RegexpMatchEasy0_1K-4 252ns ± 3% 253ns ± 3% +0.57% (p=0.027 n=30+30) RegexpMatchEasy1_32-4 69.8ns ± 4% 70.2ns ± 6% ~ (p=0.539 n=30+30) RegexpMatchEasy1_1K-4 374ns ± 3% 373ns ± 5% ~ (p=0.263 n=30+30) RegexpMatchMedium_32-4 107ns ± 4% 109ns ± 3% ~ (p=0.067 n=30+30) RegexpMatchMedium_1K-4 33.9µs ± 5% 34.1µs ± 4% ~ (p=0.297 n=30+30) RegexpMatchHard_32-4 1.54µs ± 3% 1.56µs ± 4% +1.43% (p=0.002 n=30+30) RegexpMatchHard_1K-4 46.6µs ± 3% 47.0µs ± 3% ~ (p=0.055 n=30+30) Revcomp-4 411ms ± 6% 407ms ± 6% ~ (p=0.219 n=30+30) Template-4 66.8ms ± 3% 64.8ms ± 5% -3.01% (p=0.000 n=30+30) TimeParse-4 312ns ± 2% 319ns ± 3% +2.50% (p=0.000 n=30+30) TimeFormat-4 296ns ± 5% 299ns ± 3% +0.93% (p=0.005 n=30+30) [Geo mean] 47.5µs 47.1µs -0.75% name old speed new speed delta GobDecode-4 120MB/s ± 5% 118MB/s ± 6% ~ (p=0.072 n=30+30) GobEncode-4 127MB/s ± 8% 129MB/s ± 8% ~ (p=0.150 n=30+30) Gzip-4 82.1MB/s ± 4% 87.0MB/s ± 4% +5.90% (p=0.000 n=30+30) Gunzip-4 519MB/s ± 4% 529MB/s ± 4% +2.07% (p=0.001 n=30+30) JSONEncode-4 162MB/s ± 4% 161MB/s ± 3% ~ (p=0.110 n=30+30) JSONDecode-4 35.6MB/s ± 3% 35.0MB/s ± 4% -1.77% (p=0.007 n=30+30) GoParse-4 18.3MB/s ± 4% 18.0MB/s ± 4% -1.72% (p=0.009 n=30+30) RegexpMatchEasy0_32-4 418MB/s ± 1% 422MB/s ± 3% ~ (p=0.645 n=25+30) RegexpMatchEasy0_1K-4 4.06GB/s ± 3% 4.04GB/s ± 3% -0.57% (p=0.033 n=30+30) RegexpMatchEasy1_32-4 459MB/s ± 4% 456MB/s ± 6% ~ (p=0.530 n=30+30) RegexpMatchEasy1_1K-4 2.73GB/s ± 3% 2.75GB/s ± 5% ~ (p=0.279 n=30+30) RegexpMatchMedium_32-4 9.28MB/s ± 5% 9.18MB/s ± 4% ~ (p=0.086 n=30+30) RegexpMatchMedium_1K-4 30.2MB/s ± 4% 30.0MB/s ± 4% ~ (p=0.300 n=30+30) RegexpMatchHard_32-4 20.8MB/s ± 3% 20.5MB/s ± 4% -1.41% (p=0.002 n=30+30) RegexpMatchHard_1K-4 22.0MB/s ± 3% 21.8MB/s ± 3% ~ (p=0.051 n=30+30) Revcomp-4 619MB/s ± 7% 625MB/s ± 7% ~ (p=0.219 n=30+30) Template-4 29.0MB/s ± 3% 29.9MB/s ± 4% +3.11% (p=0.000 n=30+30) [Geo mean] 123MB/s 123MB/s +0.28% Change-Id: I850652cfd53329c1af804b7f57f4393d8097bb0d Reviewed-on: https://go-review.googlesource.com/121135 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> |
||
|
b40db51438 |
cmd/compile: split slow 3 operand LEA instructions into two LEAs
go tool objdump ../bin/go | grep "\.go\:" | grep -c "LEA.*0x.*[(].*[(].*" Before: 1012 After: 20 Updates #21735 Benchmarks thanks to drchase@google.com Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz benchstat -geomean *.stdout | grep -v pkg: name old time/op new time/op delta FastTest2KB-12 131.0ns ± 0% 131.0ns ± 0% ~ (all equal) BaseTest2KB-12 601.3ns ± 0% 601.3ns ± 0% ~ (p=0.942 n=45+41) Encoding4KBVerySparse-12 15.38µs ± 1% 15.41µs ± 0% +0.24% (p=0.000 n=44+43) Join_8-12 2.117s ± 2% 2.128s ± 2% +0.51% (p=0.007 n=48+48) HashimotoLight-12 1.663ms ± 1% 1.668ms ± 1% +0.35% (p=0.006 n=49+49) Sha3_224_MTU-12 4.843µs ± 0% 4.836µs ± 0% -0.14% (p=0.000 n=45+42) GenSharedKeyP256-12 73.74µs ± 0% 70.92µs ± 0% -3.82% (p=0.000 n=48+47) Run/10k/1-12 24.81s ± 0% 24.88s ± 0% +0.30% (p=0.000 n=48+47) Run/10k/16-12 4.621s ± 2% 4.625s ± 3% ~ (p=0.776 n=50+50) Dnrm2MediumPosInc-12 4.018µs ± 0% 4.019µs ± 0% ~ (p=0.060 n=50+48) DasumMediumUnitaryInc-12 855.3ns ± 0% 855.0ns ± 0% -0.03% (p=0.000 n=47+42) Dgeev/Circulant10-12 40.45µs ± 1% 41.11µs ± 1% +1.62% (p=0.000 n=45+49) Dgeev/Circulant100-12 10.42ms ± 2% 10.61ms ± 1% +1.83% (p=0.000 n=49+48) MulWorkspaceDense1000Hundredth-12 64.69ms ± 1% 64.63ms ± 1% ~ (p=0.718 n=48+48) ScaleVec10000Inc20-12 22.31µs ± 1% 22.29µs ± 1% ~ (p=0.424 n=50+50) ValidateVersionTildeFail-12 737.6ns ± 1% 736.0ns ± 1% -0.22% (p=0.000 n=49+49) StripHTML-12 2.846µs ± 0% 2.806µs ± 1% -1.40% (p=0.000 n=43+50) ReaderContains-12 6.073µs ± 0% 5.999µs ± 0% -1.22% (p=0.000 n=48+48) EncodeCodecFromInternalProtobuf-12 5.817µs ± 2% 5.555µs ± 2% -4.51% (p=0.000 n=47+47) TarjanSCCGnp_10_tenth-12 7.091µs ± 5% 7.132µs ± 7% ~ (p=0.361 n=50+50) TarjanSCCGnp_1000_half-12 82.25ms ± 3% 81.29ms ± 2% -1.16% (p=0.000 n=50+43) AStarUndirectedmallWorld_10_2_2_2_Heur-12 15.18µs ± 8% 15.11µs ± 7% ~ (p=0.511 n=50+49) LouvainDirectedMultiplex-12 20.92ms ± 1% 21.00ms ± 1% +0.36% (p=0.000 n=48+49) WalkAllBreadthFirstGnp_10_tenth-12 2.974µs ± 4% 2.964µs ± 5% ~ (p=0.504 n=50+50) WalkAllBreadthFirstGnp_1000_tenth-12 9.733ms ± 4% 9.741ms ± 4% ~ (p=0.774 n=48+50) TextMovementBetweenSegments-12 432.8µs ± 0% 433.2µs ± 1% ~ (p=0.128 n=50+50) Growth_MultiSegment-12 13.11ms ± 0% 13.19ms ± 1% +0.58% (p=0.000 n=44+46) AddingFields/Zap.Sugar-12 1.296µs ± 1% 1.310µs ± 2% +1.09% (p=0.000 n=43+43) AddingFields/apex/log-12 34.19µs ± 1% 34.31µs ± 1% +0.35% (p=0.000 n=45+45) AddingFields/inconshreveable/log15-12 30.08µs ± 2% 30.07µs ± 2% ~ (p=0.803 n=48+47) AddingFields/sirupsen/logrus-12 6.683µs ± 3% 6.735µs ± 3% +0.78% (p=0.000 n=43+42) [Geo mean] 143.5µs 143.3µs -0.16% Change-Id: I637203c75c837737f1febced75d5985703e51044 Reviewed-on: https://go-review.googlesource.com/114655 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> |
||
|
f32348669c |
cmd/compile: rename memory-using operations
Some *mem ops are loads, some are stores, some are modifications. Replace mem->load for the loads. Replace mem->store for the stores. Replace mem->modify for the load-modify-stores. The only semantic change in this CL is to mark ADD(Q|L)constmodify (which used to be ADD(Q|L)constmem) as both a read and a write, instead of just a write. This is arguably a bug fix, but the bug isn't triggerable at the moment, see CL 112157. Change-Id: Iccb45aea817b606adb2d712ff99b10ee28e4616a Reviewed-on: https://go-review.googlesource.com/112159 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
|
8bf4b7e673 |
cmd/compile: convert amd64 BSFL and BSRL from tuple to result only
Change-Id: I220a459f67ecb310b6e9a526a1ff55527d421e70 Reviewed-on: https://go-review.googlesource.com/109416 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
22115859a5 |
cmd/compile: add amd64 LEAL{1,2,4,8} ops
For future use in rewrite rules. Change-Id: Ic9875beb0dea6e0bbcbd4b75d99a53f4a9a7c3fd Reviewed-on: https://go-review.googlesource.com/101275 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
||
|
8871c930be |
cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific OpXXXconvert. This is silly because OpConvert means the same thing on all architectures and is logically a no-op that exists only to keep track of conversions to and from unsafe.Pointer. Furthermore, lowering it makes it harder to recognize in other analyses, particularly liveness analysis. This CL eliminates the lowering of OpConvert, leaving it as the generic op until code generation time. The main complexity here is that we still need to register-allocate OpConvert operations. Currently, each arch's lowered OpConvert specifies all GP registers in its register mask. Ideally, OpConvert wouldn't affect value homing at all, and we could just copy the home of OpConvert's source, but this can potentially home an OpConvert in a LocalSlot, which neither regalloc nor stackalloc expect. Rather than try to disentangle this assumption from regalloc and stackalloc, we continue to register-allocate OpConvert, but teach regalloc that OpConvert can be allocated to any allocatable GP register. For #24543. Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6 Reviewed-on: https://go-review.googlesource.com/108496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
||
|
ab48574bab |
cmd/compile: use Block.Likely to put likely branch first
When a neither of a conditional block's successors follows, the block must end with a conditional branch followed by a an unconditional branch. If the (conditional) branch is "unlikely", invert it and swap successors to make it likely instead. This doesn't matter to most benchmarks on amd64, but in one instance on amd64 it caused a 30% improvement, and it is otherwise harmless. The problematic loop is for i := 0; i < w; i++ { if pw[i] != 0 { return true } } compiled under GOEXPERIMENT=preemptibleloops This the very worst-case benchmark for that experiment. Also in this CL is a commoning up of heavily-repeated boilerplate, which made it much easier to see that the changes were applied correctly. In the future this should allow un-exporting of SSAGenState.Branches once the boilerplate-replacement is done everywhere. Change-Id: I0e5ded6eeb3ab1e3e0138e12d54c7e056bd99335 Reviewed-on: https://go-review.googlesource.com/104977 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
79112707bb |
cmd/compile: add patterns for bit set/clear/complement on amd64
This patch completes implementation of BT(Q|L), and adds support for BT(S|R|C)(Q|L). Example of code changes from time.(*Time).addSec: if t.wall&hasMonotonic != 0 { 0x1073465 488b08 MOVQ 0(AX), CX 0x1073468 4889ca MOVQ CX, DX 0x107346b 48c1e93f SHRQ $0x3f, CX 0x107346f 48c1e13f SHLQ $0x3f, CX 0x1073473 48f7c1ffffffff TESTQ $-0x1, CX 0x107347a 746b JE 0x10734e7 if t.wall&hasMonotonic != 0 { 0x1073435 488b08 MOVQ 0(AX), CX 0x1073438 480fbae13f BTQ $0x3f, CX 0x107343d 7363 JAE 0x10734a2 Another example: t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic 0x10734c8 4881e1ffffff3f ANDQ $0x3fffffff, CX 0x10734cf 48c1e61e SHLQ $0x1e, SI 0x10734d3 4809ce ORQ CX, SI 0x10734d6 48b90000000000000080 MOVQ $0x8000000000000000, CX 0x10734e0 4809f1 ORQ SI, CX 0x10734e3 488908 MOVQ CX, 0(AX) t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic 0x107348b 4881e2ffffff3f ANDQ $0x3fffffff, DX 0x1073492 48c1e61e SHLQ $0x1e, SI 0x1073496 4809f2 ORQ SI, DX 0x1073499 480fbaea3f BTSQ $0x3f, DX 0x107349e 488910 MOVQ DX, 0(AX) Go1 benchmarks seem unaffected, and I would be surprised otherwise: name old time/op new time/op delta BinaryTree17-4 2.64s ± 4% 2.56s ± 9% -2.92% (p=0.008 n=9+9) Fannkuch11-4 2.90s ± 1% 2.95s ± 3% +1.76% (p=0.010 n=10+9) FmtFprintfEmpty-4 35.3ns ± 1% 34.5ns ± 2% -2.34% (p=0.004 n=9+8) FmtFprintfString-4 57.0ns ± 1% 58.4ns ± 5% +2.52% (p=0.029 n=9+10) FmtFprintfInt-4 59.8ns ± 3% 59.8ns ± 6% ~ (p=0.565 n=10+10) FmtFprintfIntInt-4 93.9ns ± 3% 91.2ns ± 5% -2.94% (p=0.014 n=10+9) FmtFprintfPrefixedInt-4 107ns ± 6% 104ns ± 6% ~ (p=0.099 n=10+10) FmtFprintfFloat-4 187ns ± 3% 188ns ± 3% ~ (p=0.505 n=10+9) FmtManyArgs-4 410ns ± 1% 415ns ± 6% ~ (p=0.649 n=8+10) GobDecode-4 5.30ms ± 3% 5.27ms ± 3% ~ (p=0.436 n=10+10) GobEncode-4 4.62ms ± 5% 4.47ms ± 2% -3.24% (p=0.001 n=9+10) Gzip-4 197ms ± 4% 193ms ± 3% ~ (p=0.123 n=10+10) Gunzip-4 30.4ms ± 3% 30.1ms ± 3% ~ (p=0.481 n=10+10) HTTPClientServer-4 76.3µs ± 1% 76.0µs ± 1% ~ (p=0.236 n=8+9) JSONEncode-4 10.5ms ± 9% 10.3ms ± 3% ~ (p=0.280 n=10+10) JSONDecode-4 42.3ms ±10% 41.3ms ± 2% ~ (p=0.053 n=9+10) Mandelbrot200-4 3.80ms ± 2% 3.72ms ± 2% -2.15% (p=0.001 n=9+10) GoParse-4 2.88ms ±10% 2.81ms ± 2% ~ (p=0.247 n=10+10) RegexpMatchEasy0_32-4 69.5ns ± 4% 68.6ns ± 2% ~ (p=0.171 n=10+10) RegexpMatchEasy0_1K-4 165ns ± 3% 162ns ± 3% ~ (p=0.137 n=10+10) RegexpMatchEasy1_32-4 65.7ns ± 6% 64.4ns ± 2% -2.02% (p=0.037 n=10+10) RegexpMatchEasy1_1K-4 278ns ± 2% 279ns ± 3% ~ (p=0.991 n=8+9) RegexpMatchMedium_32-4 99.3ns ± 3% 98.5ns ± 4% ~ (p=0.457 n=10+9) RegexpMatchMedium_1K-4 30.1µs ± 1% 30.4µs ± 2% ~ (p=0.173 n=8+10) RegexpMatchHard_32-4 1.40µs ± 2% 1.41µs ± 4% ~ (p=0.565 n=10+10) RegexpMatchHard_1K-4 42.5µs ± 1% 41.5µs ± 3% -2.13% (p=0.002 n=8+9) Revcomp-4 332ms ± 4% 328ms ± 5% ~ (p=0.720 n=9+10) Template-4 48.3ms ± 2% 49.6ms ± 3% +2.56% (p=0.002 n=8+10) TimeParse-4 252ns ± 2% 249ns ± 3% ~ (p=0.116 n=9+10) TimeFormat-4 262ns ± 4% 252ns ± 3% -4.01% (p=0.000 n=9+10) name old speed new speed delta GobDecode-4 145MB/s ± 3% 146MB/s ± 3% ~ (p=0.436 n=10+10) GobEncode-4 166MB/s ± 5% 172MB/s ± 2% +3.28% (p=0.001 n=9+10) Gzip-4 98.6MB/s ± 4% 100.4MB/s ± 3% ~ (p=0.123 n=10+10) Gunzip-4 639MB/s ± 3% 645MB/s ± 3% ~ (p=0.481 n=10+10) JSONEncode-4 185MB/s ± 8% 189MB/s ± 3% ~ (p=0.280 n=10+10) JSONDecode-4 46.0MB/s ± 9% 47.0MB/s ± 2% +2.21% (p=0.046 n=9+10) GoParse-4 20.1MB/s ± 9% 20.6MB/s ± 2% ~ (p=0.239 n=10+10) RegexpMatchEasy0_32-4 460MB/s ± 4% 467MB/s ± 2% ~ (p=0.165 n=10+10) RegexpMatchEasy0_1K-4 6.19GB/s ± 3% 6.28GB/s ± 3% ~ (p=0.165 n=10+10) RegexpMatchEasy1_32-4 487MB/s ± 5% 497MB/s ± 2% +2.00% (p=0.043 n=10+10) RegexpMatchEasy1_1K-4 3.67GB/s ± 2% 3.67GB/s ± 3% ~ (p=0.963 n=8+9) RegexpMatchMedium_32-4 10.1MB/s ± 3% 10.1MB/s ± 4% ~ (p=0.435 n=10+9) RegexpMatchMedium_1K-4 34.0MB/s ± 1% 33.7MB/s ± 2% ~ (p=0.173 n=8+10) RegexpMatchHard_32-4 22.9MB/s ± 2% 22.7MB/s ± 4% ~ (p=0.565 n=10+10) RegexpMatchHard_1K-4 24.0MB/s ± 3% 24.7MB/s ± 3% +2.64% (p=0.001 n=9+9) Revcomp-4 766MB/s ± 4% 775MB/s ± 5% ~ (p=0.720 n=9+10) Template-4 40.2MB/s ± 2% 39.2MB/s ± 3% -2.47% (p=0.002 n=8+10) The rules match ~1800 times during all.bash. Fixes #18943 Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4 Reviewed-on: https://go-review.googlesource.com/94766 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
a35ec9a59e |
cmd/compile: implement CMOV on amd64
This builds upon the branchelim pass, activating it for amd64 and lowering CondSelect. Special care is made to FPU instructions for NaN handling. Benchmark results on Xeon E5630 (Westmere EP): name old time/op new time/op delta BinaryTree17-16 4.99s ± 9% 4.66s ± 2% ~ (p=0.095 n=5+5) Fannkuch11-16 4.93s ± 3% 5.04s ± 2% ~ (p=0.548 n=5+5) FmtFprintfEmpty-16 58.8ns ± 7% 61.4ns ±14% ~ (p=0.579 n=5+5) FmtFprintfString-16 114ns ± 2% 114ns ± 4% ~ (p=0.603 n=5+5) FmtFprintfInt-16 181ns ± 4% 125ns ± 3% -30.90% (p=0.008 n=5+5) FmtFprintfIntInt-16 263ns ± 2% 217ns ± 2% -17.34% (p=0.008 n=5+5) FmtFprintfPrefixedInt-16 230ns ± 1% 212ns ± 1% -7.99% (p=0.008 n=5+5) FmtFprintfFloat-16 411ns ± 3% 344ns ± 5% -16.43% (p=0.008 n=5+5) FmtManyArgs-16 828ns ± 4% 790ns ± 2% -4.59% (p=0.032 n=5+5) GobDecode-16 10.9ms ± 4% 10.8ms ± 5% ~ (p=0.548 n=5+5) GobEncode-16 9.52ms ± 5% 9.46ms ± 2% ~ (p=1.000 n=5+5) Gzip-16 334ms ± 2% 337ms ± 2% ~ (p=0.548 n=5+5) Gunzip-16 64.4ms ± 1% 65.0ms ± 1% +1.00% (p=0.008 n=5+5) HTTPClientServer-16 156µs ± 3% 155µs ± 3% ~ (p=0.690 n=5+5) JSONEncode-16 21.0ms ± 1% 21.8ms ± 0% +3.76% (p=0.016 n=5+4) JSONDecode-16 95.1ms ± 0% 95.7ms ± 1% ~ (p=0.151 n=5+5) Mandelbrot200-16 6.38ms ± 1% 6.42ms ± 1% ~ (p=0.095 n=5+5) GoParse-16 5.47ms ± 2% 5.36ms ± 1% -1.95% (p=0.016 n=5+5) RegexpMatchEasy0_32-16 111ns ± 1% 111ns ± 1% ~ (p=0.635 n=5+4) RegexpMatchEasy0_1K-16 408ns ± 1% 411ns ± 2% ~ (p=0.087 n=5+5) RegexpMatchEasy1_32-16 103ns ± 1% 104ns ± 1% ~ (p=0.484 n=5+5) RegexpMatchEasy1_1K-16 659ns ± 2% 652ns ± 1% ~ (p=0.571 n=5+5) RegexpMatchMedium_32-16 176ns ± 2% 174ns ± 1% ~ (p=0.476 n=5+5) RegexpMatchMedium_1K-16 58.6µs ± 4% 57.7µs ± 4% ~ (p=0.548 n=5+5) RegexpMatchHard_32-16 3.07µs ± 3% 3.04µs ± 4% ~ (p=0.421 n=5+5) RegexpMatchHard_1K-16 89.2µs ± 1% 87.9µs ± 2% -1.52% (p=0.032 n=5+5) Revcomp-16 575ms ± 0% 587ms ± 2% +2.12% (p=0.032 n=4+5) Template-16 110ms ± 1% 107ms ± 3% -3.00% (p=0.032 n=5+5) TimeParse-16 463ns ± 0% 462ns ± 0% ~ (p=0.810 n=5+4) TimeFormat-16 538ns ± 0% 535ns ± 0% -0.63% (p=0.024 n=5+5) name old speed new speed delta GobDecode-16 70.7MB/s ± 4% 71.4MB/s ± 5% ~ (p=0.452 n=5+5) GobEncode-16 80.7MB/s ± 5% 81.2MB/s ± 2% ~ (p=1.000 n=5+5) Gzip-16 58.2MB/s ± 2% 57.7MB/s ± 2% ~ (p=0.452 n=5+5) Gunzip-16 302MB/s ± 1% 299MB/s ± 1% -0.99% (p=0.008 n=5+5) JSONEncode-16 92.4MB/s ± 1% 89.1MB/s ± 0% -3.63% (p=0.016 n=5+4) JSONDecode-16 20.4MB/s ± 0% 20.3MB/s ± 1% ~ (p=0.135 n=5+5) GoParse-16 10.6MB/s ± 2% 10.8MB/s ± 1% +2.00% (p=0.016 n=5+5) RegexpMatchEasy0_32-16 286MB/s ± 1% 285MB/s ± 3% ~ (p=1.000 n=5+5) RegexpMatchEasy0_1K-16 2.51GB/s ± 1% 2.49GB/s ± 2% ~ (p=0.095 n=5+5) RegexpMatchEasy1_32-16 309MB/s ± 1% 307MB/s ± 1% ~ (p=0.548 n=5+5) RegexpMatchEasy1_1K-16 1.55GB/s ± 2% 1.57GB/s ± 1% ~ (p=0.690 n=5+5) RegexpMatchMedium_32-16 5.68MB/s ± 2% 5.73MB/s ± 1% ~ (p=0.579 n=5+5) RegexpMatchMedium_1K-16 17.5MB/s ± 4% 17.8MB/s ± 4% ~ (p=0.500 n=5+5) RegexpMatchHard_32-16 10.4MB/s ± 3% 10.5MB/s ± 4% ~ (p=0.460 n=5+5) RegexpMatchHard_1K-16 11.5MB/s ± 1% 11.7MB/s ± 2% +1.57% (p=0.032 n=5+5) Revcomp-16 442MB/s ± 0% 433MB/s ± 2% -2.05% (p=0.032 n=4+5) Template-16 17.7MB/s ± 1% 18.2MB/s ± 3% +3.12% (p=0.032 n=5+5) Change-Id: I6972e8f35f2b31f9a42ac473a6bf419a18022558 Reviewed-on: https://go-review.googlesource.com/100935 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
644d14ea0f |
Revert "cmd/compile: implement CMOV on amd64"
This reverts commit 080187f4f72bd6594e3c2efc35cf51bf61378552. It broke build of golang.org/x/exp/shiny/iconvg See issue 24395 for details Change-Id: Ifd6134f6214e6cee40bd3c63c32941d5fc96ae8b Reviewed-on: https://go-review.googlesource.com/100755 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
eb44b8635c |
cmd/compile: remove BTQconst rule
This rule is meant for code optimization, but it makes other rules potentially more complex, as they need to cope with the fact that a 32-bit op (BTLconst) can appear everywhere a 64-bit rule maches. Move the optimization to opcode expansion instead. Tests will be added in following CL. Change-Id: Ica5ef291e7963c4af17c124d4a2869e6c8f7b0c7 Reviewed-on: https://go-review.googlesource.com/99995 Reviewed-by: Keith Randall <khr@golang.org> |
||
|
85a8d25d53 |
cmd/compile/internal/ssa: emit IMUL3{L/Q} for MUL{L/Q}const on x86
cmd/asm now supports three-operand form of IMUL, so instead of using IMUL with resultInArg0, emit IMUL3 instruction. This results in less redundant MOVs where SSA assigns different registers to input[0] and dst arguments. Note: these have exactly the same encoding when reg0=reg1: IMUL3x $const, reg0, reg1 IMULx $const, reg Two-operand IMULx is like a crippled IMUL3x, with dst fixed to input[0]. This is why we don't bother to generate IMULx for the case where dst is the same as input[0]. Change-Id: I4becda475b3dffdd07b6fdf1c75bacc82af654e4 Reviewed-on: https://go-review.googlesource.com/99656 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
080187f4f7 |
cmd/compile: implement CMOV on amd64
This builds upon the branchelim pass, activating it for amd64 and lowering CondSelect. Special care is made to FPU instructions for NaN handling. Benchmark results on Xeon E5630 (Westmere EP): name old time/op new time/op delta BinaryTree17-16 4.99s ± 9% 4.66s ± 2% ~ (p=0.095 n=5+5) Fannkuch11-16 4.93s ± 3% 5.04s ± 2% ~ (p=0.548 n=5+5) FmtFprintfEmpty-16 58.8ns ± 7% 61.4ns ±14% ~ (p=0.579 n=5+5) FmtFprintfString-16 114ns ± 2% 114ns ± 4% ~ (p=0.603 n=5+5) FmtFprintfInt-16 181ns ± 4% 125ns ± 3% -30.90% (p=0.008 n=5+5) FmtFprintfIntInt-16 263ns ± 2% 217ns ± 2% -17.34% (p=0.008 n=5+5) FmtFprintfPrefixedInt-16 230ns ± 1% 212ns ± 1% -7.99% (p=0.008 n=5+5) FmtFprintfFloat-16 411ns ± 3% 344ns ± 5% -16.43% (p=0.008 n=5+5) FmtManyArgs-16 828ns ± 4% 790ns ± 2% -4.59% (p=0.032 n=5+5) GobDecode-16 10.9ms ± 4% 10.8ms ± 5% ~ (p=0.548 n=5+5) GobEncode-16 9.52ms ± 5% 9.46ms ± 2% ~ (p=1.000 n=5+5) Gzip-16 334ms ± 2% 337ms ± 2% ~ (p=0.548 n=5+5) Gunzip-16 64.4ms ± 1% 65.0ms ± 1% +1.00% (p=0.008 n=5+5) HTTPClientServer-16 156µs ± 3% 155µs ± 3% ~ (p=0.690 n=5+5) JSONEncode-16 21.0ms ± 1% 21.8ms ± 0% +3.76% (p=0.016 n=5+4) JSONDecode-16 95.1ms ± 0% 95.7ms ± 1% ~ (p=0.151 n=5+5) Mandelbrot200-16 6.38ms ± 1% 6.42ms ± 1% ~ (p=0.095 n=5+5) GoParse-16 5.47ms ± 2% 5.36ms ± 1% -1.95% (p=0.016 n=5+5) RegexpMatchEasy0_32-16 111ns ± 1% 111ns ± 1% ~ (p=0.635 n=5+4) RegexpMatchEasy0_1K-16 408ns ± 1% 411ns ± 2% ~ (p=0.087 n=5+5) RegexpMatchEasy1_32-16 103ns ± 1% 104ns ± 1% ~ (p=0.484 n=5+5) RegexpMatchEasy1_1K-16 659ns ± 2% 652ns ± 1% ~ (p=0.571 n=5+5) RegexpMatchMedium_32-16 176ns ± 2% 174ns ± 1% ~ (p=0.476 n=5+5) RegexpMatchMedium_1K-16 58.6µs ± 4% 57.7µs ± 4% ~ (p=0.548 n=5+5) RegexpMatchHard_32-16 3.07µs ± 3% 3.04µs ± 4% ~ (p=0.421 n=5+5) RegexpMatchHard_1K-16 89.2µs ± 1% 87.9µs ± 2% -1.52% (p=0.032 n=5+5) Revcomp-16 575ms ± 0% 587ms ± 2% +2.12% (p=0.032 n=4+5) Template-16 110ms ± 1% 107ms ± 3% -3.00% (p=0.032 n=5+5) TimeParse-16 463ns ± 0% 462ns ± 0% ~ (p=0.810 n=5+4) TimeFormat-16 538ns ± 0% 535ns ± 0% -0.63% (p=0.024 n=5+5) name old speed new speed delta GobDecode-16 70.7MB/s ± 4% 71.4MB/s ± 5% ~ (p=0.452 n=5+5) GobEncode-16 80.7MB/s ± 5% 81.2MB/s ± 2% ~ (p=1.000 n=5+5) Gzip-16 58.2MB/s ± 2% 57.7MB/s ± 2% ~ (p=0.452 n=5+5) Gunzip-16 302MB/s ± 1% 299MB/s ± 1% -0.99% (p=0.008 n=5+5) JSONEncode-16 92.4MB/s ± 1% 89.1MB/s ± 0% -3.63% (p=0.016 n=5+4) JSONDecode-16 20.4MB/s ± 0% 20.3MB/s ± 1% ~ (p=0.135 n=5+5) GoParse-16 10.6MB/s ± 2% 10.8MB/s ± 1% +2.00% (p=0.016 n=5+5) RegexpMatchEasy0_32-16 286MB/s ± 1% 285MB/s ± 3% ~ (p=1.000 n=5+5) RegexpMatchEasy0_1K-16 2.51GB/s ± 1% 2.49GB/s ± 2% ~ (p=0.095 n=5+5) RegexpMatchEasy1_32-16 309MB/s ± 1% 307MB/s ± 1% ~ (p=0.548 n=5+5) RegexpMatchEasy1_1K-16 1.55GB/s ± 2% 1.57GB/s ± 1% ~ (p=0.690 n=5+5) RegexpMatchMedium_32-16 5.68MB/s ± 2% 5.73MB/s ± 1% ~ (p=0.579 n=5+5) RegexpMatchMedium_1K-16 17.5MB/s ± 4% 17.8MB/s ± 4% ~ (p=0.500 n=5+5) RegexpMatchHard_32-16 10.4MB/s ± 3% 10.5MB/s ± 4% ~ (p=0.460 n=5+5) RegexpMatchHard_1K-16 11.5MB/s ± 1% 11.7MB/s ± 2% +1.57% (p=0.032 n=5+5) Revcomp-16 442MB/s ± 0% 433MB/s ± 2% -2.05% (p=0.032 n=4+5) Template-16 17.7MB/s ± 1% 18.2MB/s ± 3% +3.12% (p=0.032 n=5+5) Change-Id: Ic7cb7374d07da031e771bdcbfdd832fd1b17159c Reviewed-on: https://go-review.googlesource.com/98695 Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> |
||
|
4b00d3f4a2 |
cmd/compile: implement comparisons directly with memory
Allow the compiler to generate code like CMPQ 16(AX), $7 It's tricky because it's difficult to spill such a comparison during flagalloc, because the same memory state might not be available at the restore locations. Solve this problem by decomposing the compare+load back into its parts if it needs to be spilled. The big win is that the write barrier test goes from: MOVL runtime.writeBarrier(SB), CX TESTL CX, CX JNE 60 to CMPL runtime.writeBarrier(SB), $0 JNE 59 It's one instruction and one byte smaller. Fixes #19485 Fixes #15245 Update #22460 Binaries are about 0.15% smaller. Change-Id: I4fd8d1111b6b9924d52f9a0901ca1b2e5cce0836 Reviewed-on: https://go-review.googlesource.com/86035 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> |
||
|
f4d9c30901 |
cmd/compile/internal/amd64: use appropriate NEG for div
Currently we generate NEGQ for DIV{Q,L,W}. By generating NEGL and NEGW, we will reduce code size, because NEGL doesn't require rex prefix. This also guarantees that upper 32 bits are zeroed, so we can revert CL 85736, and remove zero-extensions of DIVL results. Also adds test for redundant zero extend elimination. Fixes #23310 Change-Id: Ic58c3104c255a71371a06e09d10a975bbe5df587 Reviewed-on: https://go-review.googlesource.com/96815 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> |
||
|
de4edf3de7 |
cmd/compile/internal/amd64: update popcnt code generation
Popcnt has false dependency on output register and generates MOVQ $0, reg to break it. But recently we switched MOVQ $0, reg encoding from xor reg, reg to actual mov $0, reg. This CL updates code generation for popcnt to use actual XOR. Change-Id: I4c1fc11e85758b53ba2679165fa55614ec54b27d Reviewed-on: https://go-review.googlesource.com/82516 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
9b331189c1 |
cmd/internal/obj/x86: adjust SP correctly for tail calls
Currently, tail calls on x86 don't adjust the SP on return, so it's important that the compiler produce a zero-sized frame and disable the frame pointer. However, these constraints aren't necessary. For example, on other architectures it's generally necessary to restore the saved LR before a tail call, so obj simply makes this work. Likewise, on x86, there's no reason we can't simply make this work. Hence, this CL adjusts the compiler to use the same tail call convention for x86 that we use on LR machines by producing a RET with a target, rather than a JMP with a target. In fact, obj already understands this convention for x86 except that it's buggy with non-zero frame sizes. So we also fix this bug obj. As a result of these fixes, the compiler no longer needs to mark wrappers as NoFramePointer since it's now perfectly fine to save the frame pointer. In fact, this eliminates the only use of NoFramePointer in the compiler, which will enable further cleanups. This also fixes what is very nearly, but not quite, a code generation bug. NoFramePointer becomes obj.NOFRAME in the object file, which on ppc64 and s390x means to omit the saved LR. Hence, on these architectures, NoFramePointer (and NOFRAME) is only safe to set on leaf functions. However, on *most* architectures, wrappers aren't necessarily leaf functions because they may call DUFFZERO. We're saved on ppc64 and s390x only because the compiler doesn't have the rules to produce DUFFZERO calls on these architectures. Hence, this only works because the set of LR architectures that implement NOFRAME is disjoint from the set where the compiler produces DUFFZERO operations. (I discovered this whole mess when I attempted to add NOFRAME support to arm.) Change-Id: Icc589aeb86beacb850d0a6a80bd3024974a33947 Reviewed-on: https://go-review.googlesource.com/92035 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |