cybercyst/go - go - Gitea: Git with a cup of tea

mirror of https://github.com/golang/go.git synced 2025-05-20 06:43:26 +00:00

Author	SHA1	Message	Date
Ilya Tocar	4201c2077e	cmd/compile: omit racefuncentry/exit when they are not needed When compiling with -race, we insert calls to racefuncentry, into every function. Add a rule that removes them in leaf functions, without instrumented loads/stores. Shaves ~30kb from "-race" version of go tool: file difference: go_old 15626192 go_new 15597520 [-28672 bytes] section differences: global text (code) = -24513 bytes (-0.358598%) read-only data = -5849 bytes (-0.167064%) Total difference -30362 bytes (-0.097928%) Fixes #24662 Change-Id: Ia63bf1827f4cf2c25e3e28dcd097c150994ade0a Reviewed-on: https://go-review.googlesource.com/121235 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-08-20 22:07:22 +00:00
Richard Musiol	81555cb4f3	cmd/compile/internal/gc: add nil check for closure call on wasm This commit adds an explicit nil check for closure calls on wasm, so calling a nil func causes a proper panic instead of crashing on the WebAssembly level. Change-Id: I6246844f316677976cdd420618be5664444c25ae Reviewed-on: https://go-review.googlesource.com/127759 Run-TryBot: Richard Musiol <neelance@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-08-14 09:19:38 +00:00
David Chase	0029cd479e	cmd/compile: add LocalAddr that takes SP,mem operands Lack of a well-defined order between VarDef and related address operations sometimes causes problems with store order and write barrier transformations; glitches in the order are made irreparable (by later optimizations) if the two parts of the glitch straddle a split in the original block caused by insertion of a write barrier diamond. Fix this by creating a LocalAddr for addresses of locals (what VarDef matters for) that takes a memory input to help make the order explicit. Addr is modified to only be legal for SB operand, so there is no overlap between Addr and LocalAddr uses (there may be some downstream cleanup from this). Changes to generic.rules and rewrite.go ensure that codegen tests continue to pass; CSE of LocalAddr is impaired, not quite sure of the cost. Fixes #26105. Change-Id: Id4192b4440aa4e9d7ba54a465c456df9b530b515 Reviewed-on: https://go-review.googlesource.com/122483 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-07-12 18:45:31 +00:00
Cherry Zhang	d21bdf125c	cmd/compile: check SSAability in handling of INDEX of 1-element array SSA can handle 1-element array, but only when the element type is SSAable. When building SSA for INDEX of 1-element array, we did not check the element type is SSAable. And when it's not, it resulted in an unhandled SSA op. Fixes #26120. Change-Id: Id709996b5d9d90212f6c56d3f27eed320a4d8360 Reviewed-on: https://go-review.googlesource.com/121496 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-06-29 12:05:05 +00:00
Wei Xiao	0a7ac93c27	cmd/compile: improve atomic add intrinsics with ARMv8.1 new instruction ARMv8.1 has added new instruction (LDADDAL) for atomic memory operations. This CL improves existing atomic add intrinsics with the new instruction. Since the new instruction is only guaranteed to be present after ARMv8.1, we guard its usage with a conditional on CPU feature. Performance result on ARMv8.1 machine: name old time/op new time/op delta Xadd-224 1.05µs ± 6% 0.02µs ± 4% -98.06% (p=0.000 n=10+8) Xadd64-224 1.05µs ± 3% 0.02µs ±13% -98.10% (p=0.000 n=9+10) [Geo mean] 1.05µs 0.02µs -98.08% Performance result on ARMv8.0 machine: name old time/op new time/op delta Xadd-46 538ns ± 1% 541ns ± 1% +0.62% (p=0.000 n=9+9) Xadd64-46 505ns ± 1% 508ns ± 0% +0.48% (p=0.003 n=9+8) [Geo mean] 521ns 524ns +0.55% Change-Id: If4b5d8d0e2d6f84fe1492a4f5de0789910ad0ee9 Reviewed-on: https://go-review.googlesource.com/81877 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-06-21 14:52:43 +00:00
Yury Smolsky	5eb98b3c30	cmd/compile: use expandable columns in ssa.html Display just a few columns in ssa.html, other columns can be expanded by clicking on collapsed column. Use sans serif font for the text, slightly smaller font size for non program text. Fixes #25286 Change-Id: I1094695135401602d90b97b69e42f6dda05871a2 Reviewed-on: https://go-review.googlesource.com/117275 Run-TryBot: Yury Smolsky <yury@smolsky.by> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-06-13 21:02:54 +00:00
Richard Musiol	482d241936	cmd/compile: add wasm stack optimization Go's SSA instructions only operate on registers. For example, an add instruction would read two registers, do the addition and then write to a register. WebAssembly's instructions, on the other hand, operate on the stack. The add instruction first pops two values from the stack, does the addition, then pushes the result to the stack. To fulfill Go's semantics, one needs to map Go's single add instruction to 4 WebAssembly instructions: - Push the value of local variable A to the stack - Push the value of local variable B to the stack - Do addition - Write value from stack to local variable C Now consider that B was set to the constant 42 before the addition: - Push constant 42 to the stack - Write value from stack to local variable B This works, but is inefficient. Instead, the stack is used directly by inlining instructions if possible. With inlining it becomes: - Push the value of local variable A to the stack (add) - Push constant 42 to the stack (constant) - Do addition (add) - Write value from stack to local variable C (add) Note that the two SSA instructions can not be generated sequentially anymore, because their WebAssembly instructions are interleaved. Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4 Updates #18892 Change-Id: Ie35e1c0bebf4985fddda0d6330eb2066f9ad6dec Reviewed-on: https://go-review.googlesource.com/103535 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-05-23 21:37:59 +00:00
Austin Clements	ed7a0682e2	cmd/compile: output stack map index everywhere it changes Currently, the code generator only considers outputting stack map indexes at CALL instructions. Raise this into the code generator loop itself so that changes in the stack map index at any instruction emit a PCDATA Prog before the actual instruction. We'll optimize this in later CLs: name old time/op new time/op delta Template 190ms ± 2% 191ms ± 2% ~ (p=0.529 n=10+10) Unicode 96.4ms ± 1% 98.5ms ± 3% +2.18% (p=0.001 n=9+10) GoTypes 669ms ± 1% 673ms ± 1% +0.62% (p=0.004 n=9+9) Compiler 3.18s ± 1% 3.22s ± 1% +1.06% (p=0.000 n=10+9) SSA 7.59s ± 1% 7.64s ± 1% +0.66% (p=0.023 n=10+10) Flate 128ms ± 1% 130ms ± 2% +1.07% (p=0.043 n=10+10) GoParser 157ms ± 2% 158ms ± 3% ~ (p=0.123 n=10+10) Reflect 442ms ± 1% 445ms ± 1% +0.73% (p=0.017 n=10+9) Tar 179ms ± 1% 180ms ± 1% +0.58% (p=0.019 n=9+9) XML 229ms ± 1% 232ms ± 2% +1.27% (p=0.009 n=10+10) [Geo mean] 401ms 405ms +0.94% name old exe-bytes new exe-bytes delta HelloSize 1.46M ± 0% 1.47M ± 0% +0.84% (p=0.000 n=10+10) [Geo mean] 1.46M 1.47M +0.84% For #24543. Change-Id: I4bfe45b767c9d9db47308a27763b303fa75bfa54 Reviewed-on: https://go-review.googlesource.com/109350 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-05-22 14:43:38 +00:00
Austin Clements	a367f44c18	cmd/compile: enable stack maps everywhere except unsafe points This modifies issafepoint in liveness analysis to report almost every operation as a safe point. There are four things we don't mark as safe-points: 1. Runtime code (other than at calls). 2. go:nosplit functions (other than at calls). 3. Instructions between the load of the write barrier-enabled flag and the write. 4. Instructions leading up to a uintptr -> unsafe.Pointer conversion. We'll optimize this in later CLs: name old time/op new time/op delta Template 185ms ± 2% 190ms ± 2% +2.95% (p=0.000 n=10+10) Unicode 96.3ms ± 3% 96.4ms ± 1% ~ (p=0.905 n=10+9) GoTypes 658ms ± 0% 669ms ± 1% +1.72% (p=0.000 n=10+9) Compiler 3.14s ± 1% 3.18s ± 1% +1.56% (p=0.000 n=9+10) SSA 7.41s ± 2% 7.59s ± 1% +2.48% (p=0.000 n=9+10) Flate 126ms ± 1% 128ms ± 1% +2.08% (p=0.000 n=10+10) GoParser 153ms ± 1% 157ms ± 2% +2.38% (p=0.000 n=10+10) Reflect 437ms ± 1% 442ms ± 1% +0.98% (p=0.001 n=10+10) Tar 178ms ± 1% 179ms ± 1% +0.67% (p=0.035 n=10+9) XML 223ms ± 1% 229ms ± 1% +2.58% (p=0.000 n=10+10) [Geo mean] 394ms 401ms +1.75% No effect on binary size because we're not yet emitting these extra safe points. For #24543. Change-Id: I16a1eebb9183cad7cef9d53c0fd21a973cad6859 Reviewed-on: https://go-review.googlesource.com/109348 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-05-22 14:43:37 +00:00
Austin Clements	d7d9df8a5b	cmd/compile: introduce LivenessMap and LivenessIndex Currently liveness only produces a stack map index at each safe point, so the information is summarized in a map[*ssa.Value]int. We're about to have both a stack map index and a register map index, so replace the int with a LivenessIndex type we can extend, and replace the map with a LivenessMap that we can also change more easily in the future. This also gives us an easy hook for defining the value that means "not a safe point". Passes toolstash -cmp. For #24543. Change-Id: Ic4c069839635efed4fd0f603899b80f8be3b56ec Reviewed-on: https://go-review.googlesource.com/109347 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-05-22 14:43:36 +00:00
Austin Clements	837ed98d63	cmd/compile: don't produce a past-the-end pointer in range loops Currently, range loops over slices and arrays are compiled roughly like: for i, x := range s { b } ⇓ for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b } ⇓ i, _n, _p := 0, len(s), &s[0] goto cond body: { b } i, _p = i+1, _p + unsafe.Sizeof(s[0]) cond: if i < _n { goto body } else { goto end } end: The problem with this lowering is that _p may temporarily point past the end of the allocation the moment before the loop terminates. Right now this isn't a problem because there's never a safe-point during this brief moment. We're about to introduce safe-points everywhere, so this bad pointer is going to be a problem. We could mark the increment as an unsafe block, but this inhibits reordering opportunities and could result in infrequent safe-points if the body is short. Instead, this CL fixes this by changing how we compile range loops to never produce this past-the-end pointer. It changes the lowering to roughly: i, _n, _p := 0, len(s), &s[0] if i < _n { goto body } else { goto end } top: _p += unsafe.Sizeof(s[0]) body: { b } i++ if i < _n { goto top } else { goto end } end: Notably, the increment is split into two parts: we increment the index before checking the condition, but increment the pointer only after the condition check has succeeded. The implementation builds on the OFORUNTIL construct that was introduced during the loop preemption experiments, since OFORUNTIL places the increment and condition after the loop body. To support the extra "late increment" step, we further define OFORUNTIL's "List" field to contain the late increment statements. This makes all of this a relatively small change. This depends on the improvements to the prove pass in CL 102603. With the current lowering, bounds-check elimination knows that i < _n in the body because the body block is dominated by the cond block. In the new lowering, deriving this fact requires detecting that i < _n on both paths into body and hence is true in body. CL 102603 made prove able to detect this. The code size effect of this is minimal. The cmd/go binary on linux/amd64 increases by 0.17%. Performance-wise, this actually appears to be a net win, though it's mostly noise: name old time/op new time/op delta BinaryTree17-12 2.80s ± 0% 2.61s ± 1% -6.88% (p=0.000 n=20+18) Fannkuch11-12 2.41s ± 0% 2.42s ± 0% +0.05% (p=0.005 n=20+20) FmtFprintfEmpty-12 41.6ns ± 5% 41.4ns ± 6% ~ (p=0.765 n=20+19) FmtFprintfString-12 69.4ns ± 3% 69.3ns ± 1% ~ (p=0.084 n=19+17) FmtFprintfInt-12 76.1ns ± 1% 77.3ns ± 1% +1.57% (p=0.000 n=19+19) FmtFprintfIntInt-12 122ns ± 2% 123ns ± 3% +0.95% (p=0.015 n=20+20) FmtFprintfPrefixedInt-12 153ns ± 2% 151ns ± 3% -1.27% (p=0.013 n=20+20) FmtFprintfFloat-12 215ns ± 0% 216ns ± 0% +0.47% (p=0.000 n=20+16) FmtManyArgs-12 486ns ± 1% 498ns ± 0% +2.40% (p=0.000 n=20+17) GobDecode-12 6.43ms ± 0% 6.50ms ± 0% +1.08% (p=0.000 n=18+19) GobEncode-12 5.43ms ± 1% 5.47ms ± 0% +0.76% (p=0.000 n=20+20) Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.883 n=20+20) Gunzip-12 38.8ms ± 0% 38.9ms ± 0% ~ (p=0.644 n=19+19) HTTPClientServer-12 76.2µs ± 1% 76.4µs ± 2% ~ (p=0.218 n=20+20) JSONEncode-12 12.2ms ± 0% 12.3ms ± 1% +0.45% (p=0.000 n=19+19) JSONDecode-12 54.2ms ± 1% 53.3ms ± 0% -1.67% (p=0.000 n=20+20) Mandelbrot200-12 3.71ms ± 0% 3.71ms ± 0% ~ (p=0.143 n=19+20) GoParse-12 3.22ms ± 0% 3.19ms ± 1% -0.72% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 76.7ns ± 1% 75.8ns ± 1% -1.19% (p=0.000 n=20+17) RegexpMatchEasy0_1K-12 245ns ± 1% 243ns ± 0% -0.72% (p=0.000 n=18+17) RegexpMatchEasy1_32-12 71.9ns ± 0% 71.7ns ± 1% -0.39% (p=0.006 n=12+18) RegexpMatchEasy1_1K-12 358ns ± 1% 354ns ± 1% -1.13% (p=0.000 n=20+19) RegexpMatchMedium_32-12 105ns ± 2% 105ns ± 1% -0.63% (p=0.007 n=19+20) RegexpMatchMedium_1K-12 31.9µs ± 1% 31.9µs ± 1% ~ (p=1.000 n=17+17) RegexpMatchHard_32-12 1.51µs ± 1% 1.52µs ± 2% +0.46% (p=0.042 n=18+18) RegexpMatchHard_1K-12 45.3µs ± 1% 45.5µs ± 2% +0.44% (p=0.029 n=18+19) Revcomp-12 388ms ± 1% 385ms ± 0% -0.57% (p=0.000 n=19+18) Template-12 63.0ms ± 1% 63.3ms ± 0% +0.50% (p=0.000 n=19+20) TimeParse-12 309ns ± 1% 307ns ± 0% -0.62% (p=0.000 n=20+20) TimeFormat-12 328ns ± 0% 333ns ± 0% +1.35% (p=0.000 n=19+19) [Geo mean] 47.0µs 46.9µs -0.20% (https://perf.golang.org/search?q=upload:20180326.1) For #10958. For #24543. Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a Reviewed-on: https://go-review.googlesource.com/102604 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-05-22 14:15:46 +00:00
David Chase	529362e1e2	cmd/compile: refactor and cleanup of common code/workaround There's a glitch in how attributes from procs that do not generate code are combined, and the workaround for this glitch appeared in two places. "One big pile is better than two little ones." Updates #25426. Change-Id: I252f9adc5b77591720a61fa22e6f9dda33d95350 Reviewed-on: https://go-review.googlesource.com/113717 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-05-18 21:00:56 +00:00
David Chase	c2c1822b12	cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2018-05-14 14:09:49 +00:00
Martin Möhrmann	a8a60ac2a7	cmd/compile: optimize append(x, make([]T, y)...) slice extension Changes the compiler to recognize the slice extension pattern append(x, make([]T, y)...) and replace it with growslice and an optional memclr to avoid an allocation for make([]T, y). Memclr is not called in case growslice already allocated a new cleared backing array when T contains pointers. amd64: name old time/op new time/op delta ExtendSlice/IntSlice 103ns ± 4% 57ns ± 4% -44.55% (p=0.000 n=18+18) ExtendSlice/PointerSlice 155ns ± 3% 77ns ± 3% -49.93% (p=0.000 n=20+20) ExtendSlice/NoGrow 50.2ns ± 3% 5.2ns ± 2% -89.67% (p=0.000 n=18+18) name old alloc/op new alloc/op delta ExtendSlice/IntSlice 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=20+20) ExtendSlice/PointerSlice 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=20+20) ExtendSlice/NoGrow 32.0B ± 0% 0.0B -100.00% (p=0.000 n=20+20) name old allocs/op new allocs/op delta ExtendSlice/IntSlice 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=20+20) ExtendSlice/PointerSlice 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=20+20) ExtendSlice/NoGrow 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Fixes #21266 Change-Id: Idc3077665f63cbe89762b590c5967a864fd1c07f Reviewed-on: https://go-review.googlesource.com/109517 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-05-06 04:28:23 +00:00
Richard Musiol	3b137dd2df	cmd/compile: add wasm architecture This commit adds the wasm architecture to the compile command. A later commit will contain the corresponding linker changes. Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4 The following files are generated: - src/cmd/compile/internal/ssa/opGen.go - src/cmd/compile/internal/ssa/rewriteWasm.go - src/cmd/internal/obj/wasm/anames.go Updates #18892 Change-Id: Ifb4a96a3e427aac2362a1c97967d5667450fba3b Reviewed-on: https://go-review.googlesource.com/103295 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-05-04 17:56:12 +00:00
Wei Xiao	20102594a0	cmd/compile: intrinsify runtime.getcallerpc on all link register architectures Add a compiler intrinsic for getcallerpc on following architectures: arm mips mipsle mips64 mips64le ppc64 ppc64le s390x Change-Id: I758f3d4742fc214b206bcd07d90408622c17dbef Reviewed-on: https://go-review.googlesource.com/110835 Run-TryBot: Wei Xiao <Wei.Xiao@arm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-05-02 16:59:27 +00:00
Matthew Dempsky	004260afde	cmd/compile: open code select{send,recv,default} Registration now looks like: var cases [4]runtime.scases var order [8]uint16 cases[0].kind = caseSend cases[0].c = c1 cases[0].elem = &v1 if raceenabled \|\| msanenabled { selectsetpc(&cases[0]) } cases[1].kind = caseRecv cases[1].c = c2 cases[1].elem = &v2 if raceenabled \|\| msanenabled { selectsetpc(&cases[1]) } ... Change-Id: Ib9bcf426a4797fe4bfd8152ca9e6e08e39a70b48 Reviewed-on: https://go-review.googlesource.com/37934 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2018-05-01 03:17:44 +00:00
Wei Xiao	bd8a88729c	cmd/compile: intrinsify runtime.getcallerpc on arm64 Add a compiler intrinsic for getcallerpc on arm64 for better code generation. Change-Id: I897e670a2b8ffa1a8c2fdc638f5b2c44bda26318 Reviewed-on: https://go-review.googlesource.com/109276 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-04-30 13:29:14 +00:00
Josh Bleecher Snyder	5af0b28a73	runtime: iterate over set bits in adjustpointers There are several things combined in this change. First, eliminate the gobitvector type in favor of adding a ptrbit method to bitvector. In non-performance-critical code, use that method. In performance critical code, though, load the bitvector data one byte at a time and iterate only over set bits. To support that, add and use sys.Ctz8. name old time/op new time/op delta StackCopyPtr-8 81.8ms ± 5% 78.9ms ± 3% -3.58% (p=0.000 n=97+96) StackCopy-8 65.9ms ± 3% 62.8ms ± 3% -4.67% (p=0.000 n=96+92) StackCopyNoCache-8 105ms ± 3% 102ms ± 3% -3.38% (p=0.000 n=96+95) Change-Id: I00b80f45612708bd440b1a411a57fa6dfa24aa74 Reviewed-on: https://go-review.googlesource.com/109716 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2018-04-29 05:24:44 +00:00
Carlos Eduardo Seo	ebb67d993a	cmd/compile, cmd/internal/obj/ppc64: make math.Round an intrinsic on ppc64x This change implements math.Round as an intrinsic on ppc64x so it can be done using a single instruction. benchmark old ns/op new ns/op delta BenchmarkRound-16 2.60 0.69 -73.46% Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8 Reviewed-on: https://go-review.googlesource.com/109395 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2018-04-26 14:12:09 +00:00
Matthew Dempsky	736390c2bd	cmd/compile: allow SSA of multi-field structs while instrumenting When we moved racewalk to buildssa, we disabled SSA of structs with 2-4 fields while instrumenting because it caused false dependencies because for "x.f" we would emit (StructSelect (Load (Addr x)) "f") Even though we later simplify this to (Load (OffPtr (Addr x) "f")) the instrumentation saw a load of x in its entirety and would issue appropriate race/msan calls. The fix taken here is to directly emit the OffPtr form when x.f is addressable and can't be represented in SSA form. Change-Id: I0caf37bced52e9c16937466b0ac8cab6d356e525 Reviewed-on: https://go-review.googlesource.com/109360 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-04-26 07:07:18 +00:00
Josh Bleecher Snyder	c5f0104daf	cmd/compile: use intrinsic for LeadingZeros8 on amd64 The previous change sped up the pure computation form of LeadingZeros8. This places it somewhat close to the table lookup form. Depending on something that varies from toolchain to toolchain (alignment, perhaps?), the slowdown from ditching the table lookup is either 20% or 5%. This benchmark is the best case scenario for the table lookup: It is in the L1 cache already. I think we're close enough that we can switch to the computational version, and trust that the memory effects and binary size savings will be worth it. Code: func f8(x uint8) { z = bits.LeadingZeros8(x) } Before: "".f8 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) LEAQ math/bits.len8tab(SB), CX 0x000f 00015 (x.go:7) MOVBLZX (CX)(AX1), AX 0x0013 00019 (x.go:7) ADDQ $-8, AX 0x0017 00023 (x.go:7) NEGQ AX 0x001a 00026 (x.go:7) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:7) RET After: "".f8 STEXT nosplit size=30 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) LEAL 1(AX)(AX1), AX 0x000c 00012 (x.go:7) BSRL AX, AX 0x000f 00015 (x.go:7) ADDQ $-8, AX 0x0013 00019 (x.go:7) NEGQ AX 0x0016 00022 (x.go:7) MOVQ AX, "".z(SB) 0x001d 00029 (x.go:7) RET Change-Id: Icc7db50a7820fb9a3da8a816d6b6940d7f8e193e Reviewed-on: https://go-review.googlesource.com/108942 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-25 21:34:15 +00:00
Josh Bleecher Snyder	1d321ada73	cmd/compile: optimize LeadingZeros(16\|32) on amd64 Introduce Len8 and Len16 ops and provide optimized lowerings for them. amd64 only for this CL, although it wouldn't surprise me if other architectures also admit of optimized lowerings. Also use and optimize the Len32 lowering, along the same lines. Leave Len8 unused for the moment; a subsequent CL will enable it. For 16 and 32 bits, this leads to a speed-up. name old time/op new time/op delta LeadingZeros16-8 1.42ns ± 5% 1.23ns ± 5% -13.42% (p=0.000 n=20+20) LeadingZeros32-8 1.25ns ± 5% 1.03ns ± 5% -17.63% (p=0.000 n=20+16) Code: func f16(x uint16) { z = bits.LeadingZeros16(x) } func f32(x uint32) { z = bits.LeadingZeros32(x) } Before: "".f16 STEXT nosplit size=38 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) BSRQ AX, AX 0x000c 00012 (x.go:8) MOVQ $-1, CX 0x0013 00019 (x.go:8) CMOVQEQ CX, AX 0x0017 00023 (x.go:8) ADDQ $-15, AX 0x001b 00027 (x.go:8) NEGQ AX 0x001e 00030 (x.go:8) MOVQ AX, "".z(SB) 0x0025 00037 (x.go:8) RET "".f32 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:9) TEXT "".f32(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:9) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:9) MOVL "".x+8(SP), AX 0x0004 00004 (x.go:9) BSRQ AX, AX 0x0008 00008 (x.go:9) MOVQ $-1, CX 0x000f 00015 (x.go:9) CMOVQEQ CX, AX 0x0013 00019 (x.go:9) ADDQ $-31, AX 0x0017 00023 (x.go:9) NEGQ AX 0x001a 00026 (x.go:9) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:9) RET After: "".f16 STEXT nosplit size=30 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) LEAL 1(AX)(AX1), AX 0x000c 00012 (x.go:8) BSRL AX, AX 0x000f 00015 (x.go:8) ADDQ $-16, AX 0x0013 00019 (x.go:8) NEGQ AX 0x0016 00022 (x.go:8) MOVQ AX, "".z(SB) 0x001d 00029 (x.go:8) RET "".f32 STEXT nosplit size=28 args=0x8 locals=0x0 0x0000 00000 (x.go:9) TEXT "".f32(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:9) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:9) MOVL "".x+8(SP), AX 0x0004 00004 (x.go:9) LEAQ 1(AX)(AX1), AX 0x0009 00009 (x.go:9) BSRQ AX, AX 0x000d 00013 (x.go:9) ADDQ $-32, AX 0x0011 00017 (x.go:9) NEGQ AX 0x0014 00020 (x.go:9) MOVQ AX, "".z(SB) 0x001b 00027 (x.go:9) RET Change-Id: I6c93c173752a7bfdeab8be30777ae05a736e1f4b Reviewed-on: https://go-review.googlesource.com/108941 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-25 21:34:04 +00:00
Josh Bleecher Snyder	54dbab5221	cmd/compile: optimize TrailingZeros(8\|16) on amd64 Introduce Ctz8 and Ctz16 ops and provide optimized lowerings for them. amd64 only for this CL, although it wouldn't surprise me if other architectures also admit of optimized lowerings. name old time/op new time/op delta TrailingZeros8-8 1.33ns ± 6% 0.84ns ± 3% -36.90% (p=0.000 n=20+20) TrailingZeros16-8 1.26ns ± 5% 0.84ns ± 5% -33.50% (p=0.000 n=20+18) Code: func f8(x uint8) { z = bits.TrailingZeros8(x) } func f16(x uint16) { z = bits.TrailingZeros16(x) } Before: "".f8 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) BTSQ $8, AX 0x000d 00013 (x.go:7) BSFQ AX, AX 0x0011 00017 (x.go:7) MOVL $64, CX 0x0016 00022 (x.go:7) CMOVQEQ CX, AX 0x001a 00026 (x.go:7) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:7) RET "".f16 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) BTSQ $16, AX 0x000d 00013 (x.go:8) BSFQ AX, AX 0x0011 00017 (x.go:8) MOVL $64, CX 0x0016 00022 (x.go:8) CMOVQEQ CX, AX 0x001a 00026 (x.go:8) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:8) RET After: "".f8 STEXT nosplit size=20 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) BTSL $8, AX 0x0009 00009 (x.go:7) BSFL AX, AX 0x000c 00012 (x.go:7) MOVQ AX, "".z(SB) 0x0013 00019 (x.go:7) RET "".f16 STEXT nosplit size=20 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) BTSL $16, AX 0x0009 00009 (x.go:8) BSFL AX, AX 0x000c 00012 (x.go:8) MOVQ AX, "".z(SB) 0x0013 00019 (x.go:8) RET Change-Id: I0551e357348de2b724737d569afd6ac9f5c3aa11 Reviewed-on: https://go-review.googlesource.com/108940 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-25 21:33:52 +00:00
Matthew Dempsky	e10ee798c4	cmd/compile/internal/types: remove ElemType wrapper This was an artifact from when we had a separate ssa.Type interface to break circular dependency between packages ssa and gc. It's no longer needed now that package ssa directly uses package types. Change-Id: I6a93e5d79082815f7f0eb89507381969cc6cb403 Reviewed-on: https://go-review.googlesource.com/109137 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-04-24 22:24:47 +00:00
Austin Clements	8871c930be	cmd/compile: don't lower OpConvert Currently, each architecture lowers OpConvert to an arch-specific OpXXXconvert. This is silly because OpConvert means the same thing on all architectures and is logically a no-op that exists only to keep track of conversions to and from unsafe.Pointer. Furthermore, lowering it makes it harder to recognize in other analyses, particularly liveness analysis. This CL eliminates the lowering of OpConvert, leaving it as the generic op until code generation time. The main complexity here is that we still need to register-allocate OpConvert operations. Currently, each arch's lowered OpConvert specifies all GP registers in its register mask. Ideally, OpConvert wouldn't affect value homing at all, and we could just copy the home of OpConvert's source, but this can potentially home an OpConvert in a LocalSlot, which neither regalloc nor stackalloc expect. Rather than try to disentangle this assumption from regalloc and stackalloc, we continue to register-allocate OpConvert, but teach regalloc that OpConvert can be allocated to any allocatable GP register. For #24543. Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6 Reviewed-on: https://go-review.googlesource.com/108496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-04-20 18:46:39 +00:00
Josh Bleecher Snyder	24d5c871aa	cmd/compile: alphabetize sysfunc lists Change-Id: Ia95643752d743d208363e3434497ffcf0af7b2d7 Reviewed-on: https://go-review.googlesource.com/107816 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-18 17:15:42 +00:00
Josh Bleecher Snyder	581331e75b	cmd/compile: fix race in SSA construction syslook cannot be called safely during SSA construction. Change-Id: Ief173babd2e964fd5016578073dd3ba12e5731c5 Reviewed-on: https://go-review.googlesource.com/107815 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-18 17:15:35 +00:00
Daniel Martí	2b2348ab14	cmd/compile/internal/gc: add some Node methods Focus on "isfoo" funcs that take a Node, and conver them to isFoo methods instead. This makes for more idiomatic Go code, and also more readable func names. Found candidates with grep, and applied most changes with sed. The funcs chosen were isgoconst, isnil, and isblank. All had the same signature, func(Node) bool. While at it, camelCase the isliteral and iszero function names. Don't move these to methods, as they are only used in the backend part of gc, which might one day be split into a separate package. Passes toolstash -cmp on std cmd. Change-Id: I4df081b12d36c46c253167c8841c5a841f1c5a16 Reviewed-on: https://go-review.googlesource.com/105555 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-04-16 00:16:55 +00:00
David Chase	ab48574bab	cmd/compile: use Block.Likely to put likely branch first When a neither of a conditional block's successors follows, the block must end with a conditional branch followed by a an unconditional branch. If the (conditional) branch is "unlikely", invert it and swap successors to make it likely instead. This doesn't matter to most benchmarks on amd64, but in one instance on amd64 it caused a 30% improvement, and it is otherwise harmless. The problematic loop is for i := 0; i < w; i++ { if pw[i] != 0 { return true } } compiled under GOEXPERIMENT=preemptibleloops This the very worst-case benchmark for that experiment. Also in this CL is a commoning up of heavily-repeated boilerplate, which made it much easier to see that the changes were applied correctly. In the future this should allow un-exporting of SSAGenState.Branches once the boilerplate-replacement is done everywhere. Change-Id: I0e5ded6eeb3ab1e3e0138e12d54c7e056bd99335 Reviewed-on: https://go-review.googlesource.com/104977 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-04-11 17:35:34 +00:00
Matthew Dempsky	17df5ed910	cmd/compile: insert instrumentation during SSA building Insert appropriate race/msan calls before each memory operation during SSA construction. This is conceptually simple, but subtle because we need to be careful that inserted instrumentation calls don't clobber arguments that are currently being prepared for a user function call. reorder1 already handles introducing temporary variables for arguments in some cases. This CL changes it to use them for all arguments when instrumenting. Also, we can't SSA struct types with more than one field while instrumenting. Otherwise, concurrent uses of disjoint fields within an SSA-able struct can introduce false races. This is both somewhat better and somewhat worse than the old racewalk instrumentation pass. We're now able to easily recognize cases like constructing non-escaping closures on the stack or accessing closure variables don't need instrumentation calls. On the other hand, spilling escaping parameters to the heap now results in an instrumentation call. Overall, this CL results in a small net reduction in the number of instrumentation calls, but a small net increase in binary size for instrumented executables. cmd/go ends up with 5.6% fewer calls, but a 2.4% larger binary. Fixes #19054. Change-Id: I70d1dd32ad6340e6fdb691e6d5a01452f58e97f3 Reviewed-on: https://go-review.googlesource.com/102817 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-04-09 18:40:55 +00:00
Daniel Martí	14393c5cd4	cmd: remove a few more unused parameters ssa's pos parameter on the Const* funcs is unused, so remove it. ld's alloc parameter on elfnote is always true, so remove the arguments and simplify the code. Finally, arm's addpltreloc never has its return parameter used, so remove it. Change-Id: I63387ecf6ab7b5f7c20df36be823322bb98427b8 Reviewed-on: https://go-review.googlesource.com/104456 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-09 17:10:25 +00:00
David Chase	eda22a06fb	cmd/compile: ensure first instruction of function is stmt In gdb, "b f" gets confused if the first instruction of "f" is not marked as a statement in the DWARF line table. To ensure gdb is not confused, move the first statement marker in "f" to its first instruction. The screwy-looking conditional for "what's the first instruction with a statement marker" will become simpler in the future. Fixes #24695. Change-Id: I2eef81676b64d1bd9bff5da03b89b9dc0c18f44f Reviewed-on: https://go-review.googlesource.com/104955 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-04-06 16:05:42 +00:00
Daniel Martí	284c53498f	cmd: some semi-automated cleanups * Remove some redundant returns * Replace HasPrefix with TrimPrefix * Remove some obviously dead code Passes toolstash -cmp on std cmd. Change-Id: Ifb0d70a45cbb8a8553758a8c4878598b7fe932bc Reviewed-on: https://go-review.googlesource.com/105017 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-06 13:59:29 +00:00
Richard Musiol	533fdfd00e	cmd/compile/internal/gc: factor out beginning of SSAGenState.Call This commit does not change the semantics of the Call method. Its purpose is to avoid duplication of code by making PrepareCall available for separate use by the wasm backend. Updates #18892 Change-Id: I04a3098f56ebf0d995791c5375dd4c03b6a202a3 Reviewed-on: https://go-review.googlesource.com/103275 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-04-05 21:17:53 +00:00
David Chase	b9a365681f	cmd/compile: adjust is-statement on Pos's to improve debugging Stores to auto tmp variables can be hoisted to places where the line numbers make debugging look "jumpy". Turning those instructions into ones with is_stmt = 0 in the DWARF (accomplished by marking ssa nodes with NotStmt) makes debugging look better while still attributing the instructions with the correct line number. The same is true for certain register allocator spills and reloads. Change-Id: I97a394eb522d4911cc40b4bf5bf76d3d7221f6c0 Reviewed-on: https://go-review.googlesource.com/98415 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-04 22:14:59 +00:00
David Chase	619679a397	cmd/compile: add IsStmt breakpoint info to src.lico Add IsStmt information to src.lico so that suitable lines for breakpoints (or not) can be noted, eventually for communication to the debugger via the linker and DWARF. The expectation is that the front end will apply statement boundary marks because it has best information about the input, and the optimizer will attempt to preserve these. The exact method for placing these marks is still TBD; ideally stopping "at" line N in unoptimized code will occur at a point where none of the side effects of N have occurred and all of the inputs for line N can still be observed. The optimizer will work with the same markings supplied for unoptimized code. It is a goal that non-optimizing compilation should conserve statement marks. The optimizer will also use the not-a-statement annotation to indicate instructions that have a line number (for profiling purposes) but should not be the target of debugger step, next, or breakpoints. Because instructions marked as statements are sometimes removed, a third value indicating that a position (instruction) can serve as a statement if the optimizer removes the current instruction marked as a statement for the same line. The optimizer should attempt to conserve statement marks, but it is not a bug if some are lost. Includes changes to html output for GOSSAFUNC to indicate not-default is-a-statement with bold and not-a-statement with strikethrough. Change-Id: Ia22c9a682f276e2ca2a4ef7a85d4b6ebf9c62b7f Reviewed-on: https://go-review.googlesource.com/93663 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-04-04 22:11:34 +00:00
Balaram Makam	d7c7d88b2c	cmd/compile: intrinsify math/big.mulWW on ARM64 Performance numbers on amberwing: pkg: math/big name old time/op new time/op delta QuoRem 3.08µs ± 0% 2.93µs ± 1% -4.89% (p=0.008 n=5+5) ModSqrt225_Tonelli 721µs ± 0% 718µs ± 0% -0.46% (p=0.008 n=5+5) ModSqrt224_3Mod4 218µs ± 0% 217µs ± 0% -0.27% (p=0.008 n=5+5) ModSqrt5430_Tonelli 2.91s ± 0% 2.91s ± 0% ~ (p=0.222 n=5+5) ModSqrt5430_3Mod4 970ms ± 0% 970ms ± 0% ~ (p=0.151 n=5+5) Sqrt 45.9µs ± 0% 43.8µs ± 0% -4.63% (p=0.008 n=5+5) IntSqr/1 19.9ns ± 0% 17.3ns ± 0% -13.07% (p=0.008 n=5+5) IntSqr/2 52.6ns ± 0% 50.8ns ± 0% -3.35% (p=0.008 n=5+5) IntSqr/3 70.4ns ± 0% 69.4ns ± 0% ~ (p=0.079 n=4+5) IntSqr/5 103ns ± 0% 99ns ± 0% -3.98% (p=0.008 n=5+5) IntSqr/8 179ns ± 0% 178ns ± 0% -0.56% (p=0.008 n=5+5) IntSqr/10 272ns ± 0% 272ns ± 0% ~ (all equal) IntSqr/20 763ns ± 0% 787ns ± 0% +3.15% (p=0.016 n=5+4) IntSqr/30 1.25µs ± 1% 1.29µs ± 1% +3.27% (p=0.008 n=5+5) IntSqr/50 2.64µs ± 0% 2.71µs ± 0% +2.61% (p=0.008 n=5+5) IntSqr/80 5.67µs ± 0% 5.72µs ± 0% +0.88% (p=0.008 n=5+5) IntSqr/100 8.05µs ± 0% 8.09µs ± 0% +0.45% (p=0.008 n=5+5) IntSqr/200 28.0µs ± 0% 28.1µs ± 0% ~ (p=0.151 n=5+5) IntSqr/300 59.4µs ± 0% 59.6µs ± 0% +0.36% (p=0.008 n=5+5) IntSqr/500 141µs ± 0% 141µs ± 0% +0.08% (p=0.008 n=5+5) IntSqr/800 280µs ± 0% 280µs ± 0% -0.12% (p=0.008 n=5+5) IntSqr/1000 429µs ± 0% 428µs ± 0% -0.27% (p=0.008 n=5+5) pkg: crypto-ecdsa name old time/op new time/op delta SignP384 7.85ms ± 1% 7.61ms ± 1% -3.12% (p=0.008 n=5+5) Change-Id: I1ab30856cc0e570f6312f0bd8914779b55adbc16 Reviewed-on: https://go-review.googlesource.com/104135 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-04-04 18:37:24 +00:00
Matthew Dempsky	64bf90576e	cmd/compile: refactor memory op constructors in SSA builder Pulling these operations out separately so it's easier to add instrumentation calls. Passes toolstash-check. Updates #19054. Change-Id: If6a537124a87bac2eceff1d66d9df5ebb3bf07be Reviewed-on: https://go-review.googlesource.com/102816 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-29 20:51:13 +00:00
isharipo	3afd2d7fc8	cmd/compile/internal/gc: properly initialize ssa.Func Type field The ssa.Func has Type field that is described as function signature type. It never gets any value and remains nil. This leads to "<T>" signature printed representation. Given this function declaration: func foo(x int, f func() string) (int, error) GOSSAFUNC printed it as below: compiling foo foo <T> After this change: compiling foo foo func(int, func() string) (int, error) Change-Id: Iec5eec8aac5c76ff184659e30f41b2f5fe86d329 Reviewed-on: https://go-review.googlesource.com/102375 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-03-24 02:18:52 +00:00
Matthew Dempsky	b55eedd173	Revert "cmd/compile: cleanup nodpc and nodfp" This reverts commit dcac984b97470c4f047f0d3d87b0af40f5246ed2. Reason for revert: broke LR architectures (arm64, ppc64, s390x) Change-Id: I531d311c9053e81503c8c78d6cf044b318fc828b Reviewed-on: https://go-review.googlesource.com/99695 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2018-03-08 21:23:01 +00:00
Matthew Dempsky	dcac984b97	cmd/compile: cleanup nodpc and nodfp Instead of creating a new &nodfp expression for every recover() call, or a new nodpc variable for every function instrumented by the race detector, this CL introduces two new uintptr-typed pseudo-variables callerSP and callerPC. These pseudo-variables act just like calls to the runtime's getcallersp() and getcallerpc() functions. For consistency, change runtime.gorecover's builtin stub's parameter type from "*int32" to "uintptr". Passes toolstash-check, but toolstash-check -race fails because of register allocator changes. Change-Id: I985d644653de2dac8b7b03a28829ad04dfd4f358 Reviewed-on: https://go-review.googlesource.com/99416 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-03-08 18:22:29 +00:00
Matthew Dempsky	8b766e5d09	cmd/compile: remove state.exitCode We're holding onto the function's complete AST anyway, so might as well grab the exit code from there. Passes toolstash-check. Change-Id: I851b5dfdb53f991e9cd9488d25d0d2abc2a8379f Reviewed-on: https://go-review.googlesource.com/99417 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-03-08 18:22:04 +00:00
Keith Randall	2413b54888	cmd/compile: mark the first word of an interface as a uintptr The first word of an interface is a pointer, but for the purposes of GC we don't need to treat it as such. 1. If it is a non-empty interface, the pointer points to an itab which is always in persistentalloc space. 2. If it is an empty interface, the pointer points to a _type. a. If it is a compile-time-allocated type, it points into the read-only data section. b. If it is a reflect-allocated type, it points into the Go heap. Reflect is responsible for keeping a reference to the underlying type so it won't be GCd. If we ever have a moving GC, we need to change this for 2b (as well as scan itabs to update their itab._type fields). Write barriers on the first word of interfaces have already been removed. Change-Id: I643e91d7ac4de980ac2717436eff94097c65d959 Reviewed-on: https://go-review.googlesource.com/97518 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-02-27 22:58:32 +00:00
ChrisALiles	4f5389c321	cmd/compile: move the SSA local type definitions to a single location Fixes #20304 Change-Id: I52ee02d1602ed7fffc96b27fd60990203c771aaf Reviewed-on: https://go-review.googlesource.com/94256 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-02-27 19:40:36 +00:00
Kunpei Sakai	0c471dfae2	cmd: avoid unnecessary type conversions CL generated mechanically with github.com/mdempsky/unconvert. Also updated cmd/compile/internal/ssa/gen/*.rules manually. Change-Id: If721ef73cf0771ae83ce7e2d11623fc8d9155768 Reviewed-on: https://go-review.googlesource.com/97075 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-02-26 20:22:06 +00:00
Alberto Donizetti	9ee78af806	cmd/compile: intrinsify math.Sqrt on 386 It seems like all the pieces were already there, it only needed the final plumbing. Before: 0x001b 00027 (test.go:9) MOVSD X0, (SP) 0x0020 00032 (test.go:9) CALL math.Sqrt(SB) 0x0025 00037 (test.go:9) MOVSD 8(SP), X0 After: 0x0018 00024 (test.go:9) SQRTSD X0, X0 name old time/op new time/op delta Sqrt-4 4.60ns ± 2% 0.45ns ± 1% -90.33% (p=0.000 n=10+10) Change-Id: I0f623958e19e726840140bf9b495d3f3a9184b9d Reviewed-on: https://go-review.googlesource.com/96615 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-23 13:49:55 +00:00
Alberto Donizetti	6aeddb1b57	cmd/compile: intrinsify math.Sqrt on mips64 Fixes #24006 Change-Id: Ic1438b121fe705f9a6e3ed8340882e9dfd26ecf7 Reviewed-on: https://go-review.googlesource.com/95916 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>	2018-02-21 17:34:02 +00:00
Chad Rosier	07f0f09563	cmd/compile: make math.Ceil/Floor/Round/Trunc intrinsics on arm64 name old time/op new time/op delta Ceil 550ns ± 0% 486ns ± 7% -11.64% (p=0.000 n=13+18) Floor 495ns ±19% 512ns ±12% ~ (p=0.164 n=20+20) Round 550ns ± 0% 487ns ± 8% -11.49% (p=0.000 n=12+19) Trunc 563ns ± 7% 488ns ±13% -13.44% (p=0.000 n=15+2) Change-Id: I53f234b160b3c026a277506e2cf977d150379464 Reviewed-on: https://go-review.googlesource.com/88295 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-16 15:37:57 +00:00
Balaram Makam	fcba05148f	cmd/compile: arm64 intrinsics for math/bits.OnesCount This adds math/bits intrinsics for OnesCount on arm64. name old time/op new time/op delta OnesCount 3.81ns ± 0% 1.60ns ± 0% -57.96% (p=0.000 n=7+8) OnesCount8 1.60ns ± 0% 1.60ns ± 0% ~ (all equal) OnesCount16 2.41ns ± 0% 1.60ns ± 0% -33.61% (p=0.000 n=8+8) OnesCount32 4.17ns ± 0% 1.60ns ± 0% -61.58% (p=0.000 n=8+8) OnesCount64 3.80ns ± 0% 1.60ns ± 0% -57.84% (p=0.000 n=8+8) Update #18616 Conflicts: src/cmd/compile/internal/gc/asm_test.go Change-Id: I63ac2f63acafdb1f60656ab8a56be0b326eec5cb Reviewed-on: https://go-review.googlesource.com/90835 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-15 23:00:20 +00:00

1 2 3 4 5 ...

654 Commits