cybercyst/go - go - Gitea: Git with a cup of tea

mirror of https://github.com/golang/go.git synced 2025-05-22 16:09:37 +00:00

Author	SHA1	Message	Date
Josh Bleecher Snyder	c5f0104daf	cmd/compile: use intrinsic for LeadingZeros8 on amd64 The previous change sped up the pure computation form of LeadingZeros8. This places it somewhat close to the table lookup form. Depending on something that varies from toolchain to toolchain (alignment, perhaps?), the slowdown from ditching the table lookup is either 20% or 5%. This benchmark is the best case scenario for the table lookup: It is in the L1 cache already. I think we're close enough that we can switch to the computational version, and trust that the memory effects and binary size savings will be worth it. Code: func f8(x uint8) { z = bits.LeadingZeros8(x) } Before: "".f8 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) LEAQ math/bits.len8tab(SB), CX 0x000f 00015 (x.go:7) MOVBLZX (CX)(AX1), AX 0x0013 00019 (x.go:7) ADDQ $-8, AX 0x0017 00023 (x.go:7) NEGQ AX 0x001a 00026 (x.go:7) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:7) RET After: "".f8 STEXT nosplit size=30 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) LEAL 1(AX)(AX1), AX 0x000c 00012 (x.go:7) BSRL AX, AX 0x000f 00015 (x.go:7) ADDQ $-8, AX 0x0013 00019 (x.go:7) NEGQ AX 0x0016 00022 (x.go:7) MOVQ AX, "".z(SB) 0x001d 00029 (x.go:7) RET Change-Id: Icc7db50a7820fb9a3da8a816d6b6940d7f8e193e Reviewed-on: https://go-review.googlesource.com/108942 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-25 21:34:15 +00:00
Josh Bleecher Snyder	1d321ada73	cmd/compile: optimize LeadingZeros(16\|32) on amd64 Introduce Len8 and Len16 ops and provide optimized lowerings for them. amd64 only for this CL, although it wouldn't surprise me if other architectures also admit of optimized lowerings. Also use and optimize the Len32 lowering, along the same lines. Leave Len8 unused for the moment; a subsequent CL will enable it. For 16 and 32 bits, this leads to a speed-up. name old time/op new time/op delta LeadingZeros16-8 1.42ns ± 5% 1.23ns ± 5% -13.42% (p=0.000 n=20+20) LeadingZeros32-8 1.25ns ± 5% 1.03ns ± 5% -17.63% (p=0.000 n=20+16) Code: func f16(x uint16) { z = bits.LeadingZeros16(x) } func f32(x uint32) { z = bits.LeadingZeros32(x) } Before: "".f16 STEXT nosplit size=38 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) BSRQ AX, AX 0x000c 00012 (x.go:8) MOVQ $-1, CX 0x0013 00019 (x.go:8) CMOVQEQ CX, AX 0x0017 00023 (x.go:8) ADDQ $-15, AX 0x001b 00027 (x.go:8) NEGQ AX 0x001e 00030 (x.go:8) MOVQ AX, "".z(SB) 0x0025 00037 (x.go:8) RET "".f32 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:9) TEXT "".f32(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:9) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:9) MOVL "".x+8(SP), AX 0x0004 00004 (x.go:9) BSRQ AX, AX 0x0008 00008 (x.go:9) MOVQ $-1, CX 0x000f 00015 (x.go:9) CMOVQEQ CX, AX 0x0013 00019 (x.go:9) ADDQ $-31, AX 0x0017 00023 (x.go:9) NEGQ AX 0x001a 00026 (x.go:9) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:9) RET After: "".f16 STEXT nosplit size=30 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) LEAL 1(AX)(AX1), AX 0x000c 00012 (x.go:8) BSRL AX, AX 0x000f 00015 (x.go:8) ADDQ $-16, AX 0x0013 00019 (x.go:8) NEGQ AX 0x0016 00022 (x.go:8) MOVQ AX, "".z(SB) 0x001d 00029 (x.go:8) RET "".f32 STEXT nosplit size=28 args=0x8 locals=0x0 0x0000 00000 (x.go:9) TEXT "".f32(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:9) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:9) MOVL "".x+8(SP), AX 0x0004 00004 (x.go:9) LEAQ 1(AX)(AX1), AX 0x0009 00009 (x.go:9) BSRQ AX, AX 0x000d 00013 (x.go:9) ADDQ $-32, AX 0x0011 00017 (x.go:9) NEGQ AX 0x0014 00020 (x.go:9) MOVQ AX, "".z(SB) 0x001b 00027 (x.go:9) RET Change-Id: I6c93c173752a7bfdeab8be30777ae05a736e1f4b Reviewed-on: https://go-review.googlesource.com/108941 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-25 21:34:04 +00:00
Josh Bleecher Snyder	54dbab5221	cmd/compile: optimize TrailingZeros(8\|16) on amd64 Introduce Ctz8 and Ctz16 ops and provide optimized lowerings for them. amd64 only for this CL, although it wouldn't surprise me if other architectures also admit of optimized lowerings. name old time/op new time/op delta TrailingZeros8-8 1.33ns ± 6% 0.84ns ± 3% -36.90% (p=0.000 n=20+20) TrailingZeros16-8 1.26ns ± 5% 0.84ns ± 5% -33.50% (p=0.000 n=20+18) Code: func f8(x uint8) { z = bits.TrailingZeros8(x) } func f16(x uint16) { z = bits.TrailingZeros16(x) } Before: "".f8 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) BTSQ $8, AX 0x000d 00013 (x.go:7) BSFQ AX, AX 0x0011 00017 (x.go:7) MOVL $64, CX 0x0016 00022 (x.go:7) CMOVQEQ CX, AX 0x001a 00026 (x.go:7) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:7) RET "".f16 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) BTSQ $16, AX 0x000d 00013 (x.go:8) BSFQ AX, AX 0x0011 00017 (x.go:8) MOVL $64, CX 0x0016 00022 (x.go:8) CMOVQEQ CX, AX 0x001a 00026 (x.go:8) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:8) RET After: "".f8 STEXT nosplit size=20 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) BTSL $8, AX 0x0009 00009 (x.go:7) BSFL AX, AX 0x000c 00012 (x.go:7) MOVQ AX, "".z(SB) 0x0013 00019 (x.go:7) RET "".f16 STEXT nosplit size=20 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) BTSL $16, AX 0x0009 00009 (x.go:8) BSFL AX, AX 0x000c 00012 (x.go:8) MOVQ AX, "".z(SB) 0x0013 00019 (x.go:8) RET Change-Id: I0551e357348de2b724737d569afd6ac9f5c3aa11 Reviewed-on: https://go-review.googlesource.com/108940 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-25 21:33:52 +00:00
Matthew Dempsky	e10ee798c4	cmd/compile/internal/types: remove ElemType wrapper This was an artifact from when we had a separate ssa.Type interface to break circular dependency between packages ssa and gc. It's no longer needed now that package ssa directly uses package types. Change-Id: I6a93e5d79082815f7f0eb89507381969cc6cb403 Reviewed-on: https://go-review.googlesource.com/109137 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-04-24 22:24:47 +00:00
Austin Clements	8871c930be	cmd/compile: don't lower OpConvert Currently, each architecture lowers OpConvert to an arch-specific OpXXXconvert. This is silly because OpConvert means the same thing on all architectures and is logically a no-op that exists only to keep track of conversions to and from unsafe.Pointer. Furthermore, lowering it makes it harder to recognize in other analyses, particularly liveness analysis. This CL eliminates the lowering of OpConvert, leaving it as the generic op until code generation time. The main complexity here is that we still need to register-allocate OpConvert operations. Currently, each arch's lowered OpConvert specifies all GP registers in its register mask. Ideally, OpConvert wouldn't affect value homing at all, and we could just copy the home of OpConvert's source, but this can potentially home an OpConvert in a LocalSlot, which neither regalloc nor stackalloc expect. Rather than try to disentangle this assumption from regalloc and stackalloc, we continue to register-allocate OpConvert, but teach regalloc that OpConvert can be allocated to any allocatable GP register. For #24543. Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6 Reviewed-on: https://go-review.googlesource.com/108496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-04-20 18:46:39 +00:00
Josh Bleecher Snyder	24d5c871aa	cmd/compile: alphabetize sysfunc lists Change-Id: Ia95643752d743d208363e3434497ffcf0af7b2d7 Reviewed-on: https://go-review.googlesource.com/107816 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-18 17:15:42 +00:00
Josh Bleecher Snyder	581331e75b	cmd/compile: fix race in SSA construction syslook cannot be called safely during SSA construction. Change-Id: Ief173babd2e964fd5016578073dd3ba12e5731c5 Reviewed-on: https://go-review.googlesource.com/107815 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-18 17:15:35 +00:00
Daniel Martí	2b2348ab14	cmd/compile/internal/gc: add some Node methods Focus on "isfoo" funcs that take a Node, and conver them to isFoo methods instead. This makes for more idiomatic Go code, and also more readable func names. Found candidates with grep, and applied most changes with sed. The funcs chosen were isgoconst, isnil, and isblank. All had the same signature, func(Node) bool. While at it, camelCase the isliteral and iszero function names. Don't move these to methods, as they are only used in the backend part of gc, which might one day be split into a separate package. Passes toolstash -cmp on std cmd. Change-Id: I4df081b12d36c46c253167c8841c5a841f1c5a16 Reviewed-on: https://go-review.googlesource.com/105555 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-04-16 00:16:55 +00:00
David Chase	ab48574bab	cmd/compile: use Block.Likely to put likely branch first When a neither of a conditional block's successors follows, the block must end with a conditional branch followed by a an unconditional branch. If the (conditional) branch is "unlikely", invert it and swap successors to make it likely instead. This doesn't matter to most benchmarks on amd64, but in one instance on amd64 it caused a 30% improvement, and it is otherwise harmless. The problematic loop is for i := 0; i < w; i++ { if pw[i] != 0 { return true } } compiled under GOEXPERIMENT=preemptibleloops This the very worst-case benchmark for that experiment. Also in this CL is a commoning up of heavily-repeated boilerplate, which made it much easier to see that the changes were applied correctly. In the future this should allow un-exporting of SSAGenState.Branches once the boilerplate-replacement is done everywhere. Change-Id: I0e5ded6eeb3ab1e3e0138e12d54c7e056bd99335 Reviewed-on: https://go-review.googlesource.com/104977 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-04-11 17:35:34 +00:00
Matthew Dempsky	17df5ed910	cmd/compile: insert instrumentation during SSA building Insert appropriate race/msan calls before each memory operation during SSA construction. This is conceptually simple, but subtle because we need to be careful that inserted instrumentation calls don't clobber arguments that are currently being prepared for a user function call. reorder1 already handles introducing temporary variables for arguments in some cases. This CL changes it to use them for all arguments when instrumenting. Also, we can't SSA struct types with more than one field while instrumenting. Otherwise, concurrent uses of disjoint fields within an SSA-able struct can introduce false races. This is both somewhat better and somewhat worse than the old racewalk instrumentation pass. We're now able to easily recognize cases like constructing non-escaping closures on the stack or accessing closure variables don't need instrumentation calls. On the other hand, spilling escaping parameters to the heap now results in an instrumentation call. Overall, this CL results in a small net reduction in the number of instrumentation calls, but a small net increase in binary size for instrumented executables. cmd/go ends up with 5.6% fewer calls, but a 2.4% larger binary. Fixes #19054. Change-Id: I70d1dd32ad6340e6fdb691e6d5a01452f58e97f3 Reviewed-on: https://go-review.googlesource.com/102817 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-04-09 18:40:55 +00:00
Daniel Martí	14393c5cd4	cmd: remove a few more unused parameters ssa's pos parameter on the Const* funcs is unused, so remove it. ld's alloc parameter on elfnote is always true, so remove the arguments and simplify the code. Finally, arm's addpltreloc never has its return parameter used, so remove it. Change-Id: I63387ecf6ab7b5f7c20df36be823322bb98427b8 Reviewed-on: https://go-review.googlesource.com/104456 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-09 17:10:25 +00:00
David Chase	eda22a06fb	cmd/compile: ensure first instruction of function is stmt In gdb, "b f" gets confused if the first instruction of "f" is not marked as a statement in the DWARF line table. To ensure gdb is not confused, move the first statement marker in "f" to its first instruction. The screwy-looking conditional for "what's the first instruction with a statement marker" will become simpler in the future. Fixes #24695. Change-Id: I2eef81676b64d1bd9bff5da03b89b9dc0c18f44f Reviewed-on: https://go-review.googlesource.com/104955 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-04-06 16:05:42 +00:00
Daniel Martí	284c53498f	cmd: some semi-automated cleanups * Remove some redundant returns * Replace HasPrefix with TrimPrefix * Remove some obviously dead code Passes toolstash -cmp on std cmd. Change-Id: Ifb0d70a45cbb8a8553758a8c4878598b7fe932bc Reviewed-on: https://go-review.googlesource.com/105017 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-06 13:59:29 +00:00
Richard Musiol	533fdfd00e	cmd/compile/internal/gc: factor out beginning of SSAGenState.Call This commit does not change the semantics of the Call method. Its purpose is to avoid duplication of code by making PrepareCall available for separate use by the wasm backend. Updates #18892 Change-Id: I04a3098f56ebf0d995791c5375dd4c03b6a202a3 Reviewed-on: https://go-review.googlesource.com/103275 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-04-05 21:17:53 +00:00
David Chase	b9a365681f	cmd/compile: adjust is-statement on Pos's to improve debugging Stores to auto tmp variables can be hoisted to places where the line numbers make debugging look "jumpy". Turning those instructions into ones with is_stmt = 0 in the DWARF (accomplished by marking ssa nodes with NotStmt) makes debugging look better while still attributing the instructions with the correct line number. The same is true for certain register allocator spills and reloads. Change-Id: I97a394eb522d4911cc40b4bf5bf76d3d7221f6c0 Reviewed-on: https://go-review.googlesource.com/98415 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-04 22:14:59 +00:00
David Chase	619679a397	cmd/compile: add IsStmt breakpoint info to src.lico Add IsStmt information to src.lico so that suitable lines for breakpoints (or not) can be noted, eventually for communication to the debugger via the linker and DWARF. The expectation is that the front end will apply statement boundary marks because it has best information about the input, and the optimizer will attempt to preserve these. The exact method for placing these marks is still TBD; ideally stopping "at" line N in unoptimized code will occur at a point where none of the side effects of N have occurred and all of the inputs for line N can still be observed. The optimizer will work with the same markings supplied for unoptimized code. It is a goal that non-optimizing compilation should conserve statement marks. The optimizer will also use the not-a-statement annotation to indicate instructions that have a line number (for profiling purposes) but should not be the target of debugger step, next, or breakpoints. Because instructions marked as statements are sometimes removed, a third value indicating that a position (instruction) can serve as a statement if the optimizer removes the current instruction marked as a statement for the same line. The optimizer should attempt to conserve statement marks, but it is not a bug if some are lost. Includes changes to html output for GOSSAFUNC to indicate not-default is-a-statement with bold and not-a-statement with strikethrough. Change-Id: Ia22c9a682f276e2ca2a4ef7a85d4b6ebf9c62b7f Reviewed-on: https://go-review.googlesource.com/93663 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-04-04 22:11:34 +00:00
Balaram Makam	d7c7d88b2c	cmd/compile: intrinsify math/big.mulWW on ARM64 Performance numbers on amberwing: pkg: math/big name old time/op new time/op delta QuoRem 3.08µs ± 0% 2.93µs ± 1% -4.89% (p=0.008 n=5+5) ModSqrt225_Tonelli 721µs ± 0% 718µs ± 0% -0.46% (p=0.008 n=5+5) ModSqrt224_3Mod4 218µs ± 0% 217µs ± 0% -0.27% (p=0.008 n=5+5) ModSqrt5430_Tonelli 2.91s ± 0% 2.91s ± 0% ~ (p=0.222 n=5+5) ModSqrt5430_3Mod4 970ms ± 0% 970ms ± 0% ~ (p=0.151 n=5+5) Sqrt 45.9µs ± 0% 43.8µs ± 0% -4.63% (p=0.008 n=5+5) IntSqr/1 19.9ns ± 0% 17.3ns ± 0% -13.07% (p=0.008 n=5+5) IntSqr/2 52.6ns ± 0% 50.8ns ± 0% -3.35% (p=0.008 n=5+5) IntSqr/3 70.4ns ± 0% 69.4ns ± 0% ~ (p=0.079 n=4+5) IntSqr/5 103ns ± 0% 99ns ± 0% -3.98% (p=0.008 n=5+5) IntSqr/8 179ns ± 0% 178ns ± 0% -0.56% (p=0.008 n=5+5) IntSqr/10 272ns ± 0% 272ns ± 0% ~ (all equal) IntSqr/20 763ns ± 0% 787ns ± 0% +3.15% (p=0.016 n=5+4) IntSqr/30 1.25µs ± 1% 1.29µs ± 1% +3.27% (p=0.008 n=5+5) IntSqr/50 2.64µs ± 0% 2.71µs ± 0% +2.61% (p=0.008 n=5+5) IntSqr/80 5.67µs ± 0% 5.72µs ± 0% +0.88% (p=0.008 n=5+5) IntSqr/100 8.05µs ± 0% 8.09µs ± 0% +0.45% (p=0.008 n=5+5) IntSqr/200 28.0µs ± 0% 28.1µs ± 0% ~ (p=0.151 n=5+5) IntSqr/300 59.4µs ± 0% 59.6µs ± 0% +0.36% (p=0.008 n=5+5) IntSqr/500 141µs ± 0% 141µs ± 0% +0.08% (p=0.008 n=5+5) IntSqr/800 280µs ± 0% 280µs ± 0% -0.12% (p=0.008 n=5+5) IntSqr/1000 429µs ± 0% 428µs ± 0% -0.27% (p=0.008 n=5+5) pkg: crypto-ecdsa name old time/op new time/op delta SignP384 7.85ms ± 1% 7.61ms ± 1% -3.12% (p=0.008 n=5+5) Change-Id: I1ab30856cc0e570f6312f0bd8914779b55adbc16 Reviewed-on: https://go-review.googlesource.com/104135 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-04-04 18:37:24 +00:00
Matthew Dempsky	64bf90576e	cmd/compile: refactor memory op constructors in SSA builder Pulling these operations out separately so it's easier to add instrumentation calls. Passes toolstash-check. Updates #19054. Change-Id: If6a537124a87bac2eceff1d66d9df5ebb3bf07be Reviewed-on: https://go-review.googlesource.com/102816 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-29 20:51:13 +00:00
isharipo	3afd2d7fc8	cmd/compile/internal/gc: properly initialize ssa.Func Type field The ssa.Func has Type field that is described as function signature type. It never gets any value and remains nil. This leads to "<T>" signature printed representation. Given this function declaration: func foo(x int, f func() string) (int, error) GOSSAFUNC printed it as below: compiling foo foo <T> After this change: compiling foo foo func(int, func() string) (int, error) Change-Id: Iec5eec8aac5c76ff184659e30f41b2f5fe86d329 Reviewed-on: https://go-review.googlesource.com/102375 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-03-24 02:18:52 +00:00
Matthew Dempsky	b55eedd173	Revert "cmd/compile: cleanup nodpc and nodfp" This reverts commit dcac984b97470c4f047f0d3d87b0af40f5246ed2. Reason for revert: broke LR architectures (arm64, ppc64, s390x) Change-Id: I531d311c9053e81503c8c78d6cf044b318fc828b Reviewed-on: https://go-review.googlesource.com/99695 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2018-03-08 21:23:01 +00:00
Matthew Dempsky	dcac984b97	cmd/compile: cleanup nodpc and nodfp Instead of creating a new &nodfp expression for every recover() call, or a new nodpc variable for every function instrumented by the race detector, this CL introduces two new uintptr-typed pseudo-variables callerSP and callerPC. These pseudo-variables act just like calls to the runtime's getcallersp() and getcallerpc() functions. For consistency, change runtime.gorecover's builtin stub's parameter type from "*int32" to "uintptr". Passes toolstash-check, but toolstash-check -race fails because of register allocator changes. Change-Id: I985d644653de2dac8b7b03a28829ad04dfd4f358 Reviewed-on: https://go-review.googlesource.com/99416 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-03-08 18:22:29 +00:00
Matthew Dempsky	8b766e5d09	cmd/compile: remove state.exitCode We're holding onto the function's complete AST anyway, so might as well grab the exit code from there. Passes toolstash-check. Change-Id: I851b5dfdb53f991e9cd9488d25d0d2abc2a8379f Reviewed-on: https://go-review.googlesource.com/99417 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-03-08 18:22:04 +00:00
Keith Randall	2413b54888	cmd/compile: mark the first word of an interface as a uintptr The first word of an interface is a pointer, but for the purposes of GC we don't need to treat it as such. 1. If it is a non-empty interface, the pointer points to an itab which is always in persistentalloc space. 2. If it is an empty interface, the pointer points to a _type. a. If it is a compile-time-allocated type, it points into the read-only data section. b. If it is a reflect-allocated type, it points into the Go heap. Reflect is responsible for keeping a reference to the underlying type so it won't be GCd. If we ever have a moving GC, we need to change this for 2b (as well as scan itabs to update their itab._type fields). Write barriers on the first word of interfaces have already been removed. Change-Id: I643e91d7ac4de980ac2717436eff94097c65d959 Reviewed-on: https://go-review.googlesource.com/97518 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-02-27 22:58:32 +00:00
ChrisALiles	4f5389c321	cmd/compile: move the SSA local type definitions to a single location Fixes #20304 Change-Id: I52ee02d1602ed7fffc96b27fd60990203c771aaf Reviewed-on: https://go-review.googlesource.com/94256 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-02-27 19:40:36 +00:00
Kunpei Sakai	0c471dfae2	cmd: avoid unnecessary type conversions CL generated mechanically with github.com/mdempsky/unconvert. Also updated cmd/compile/internal/ssa/gen/*.rules manually. Change-Id: If721ef73cf0771ae83ce7e2d11623fc8d9155768 Reviewed-on: https://go-review.googlesource.com/97075 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-02-26 20:22:06 +00:00
Alberto Donizetti	9ee78af806	cmd/compile: intrinsify math.Sqrt on 386 It seems like all the pieces were already there, it only needed the final plumbing. Before: 0x001b 00027 (test.go:9) MOVSD X0, (SP) 0x0020 00032 (test.go:9) CALL math.Sqrt(SB) 0x0025 00037 (test.go:9) MOVSD 8(SP), X0 After: 0x0018 00024 (test.go:9) SQRTSD X0, X0 name old time/op new time/op delta Sqrt-4 4.60ns ± 2% 0.45ns ± 1% -90.33% (p=0.000 n=10+10) Change-Id: I0f623958e19e726840140bf9b495d3f3a9184b9d Reviewed-on: https://go-review.googlesource.com/96615 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-23 13:49:55 +00:00
Alberto Donizetti	6aeddb1b57	cmd/compile: intrinsify math.Sqrt on mips64 Fixes #24006 Change-Id: Ic1438b121fe705f9a6e3ed8340882e9dfd26ecf7 Reviewed-on: https://go-review.googlesource.com/95916 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>	2018-02-21 17:34:02 +00:00
Chad Rosier	07f0f09563	cmd/compile: make math.Ceil/Floor/Round/Trunc intrinsics on arm64 name old time/op new time/op delta Ceil 550ns ± 0% 486ns ± 7% -11.64% (p=0.000 n=13+18) Floor 495ns ±19% 512ns ±12% ~ (p=0.164 n=20+20) Round 550ns ± 0% 487ns ± 8% -11.49% (p=0.000 n=12+19) Trunc 563ns ± 7% 488ns ±13% -13.44% (p=0.000 n=15+2) Change-Id: I53f234b160b3c026a277506e2cf977d150379464 Reviewed-on: https://go-review.googlesource.com/88295 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-16 15:37:57 +00:00
Balaram Makam	fcba05148f	cmd/compile: arm64 intrinsics for math/bits.OnesCount This adds math/bits intrinsics for OnesCount on arm64. name old time/op new time/op delta OnesCount 3.81ns ± 0% 1.60ns ± 0% -57.96% (p=0.000 n=7+8) OnesCount8 1.60ns ± 0% 1.60ns ± 0% ~ (all equal) OnesCount16 2.41ns ± 0% 1.60ns ± 0% -33.61% (p=0.000 n=8+8) OnesCount32 4.17ns ± 0% 1.60ns ± 0% -61.58% (p=0.000 n=8+8) OnesCount64 3.80ns ± 0% 1.60ns ± 0% -57.84% (p=0.000 n=8+8) Update #18616 Conflicts: src/cmd/compile/internal/gc/asm_test.go Change-Id: I63ac2f63acafdb1f60656ab8a56be0b326eec5cb Reviewed-on: https://go-review.googlesource.com/90835 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-15 23:00:20 +00:00
Heschi Kreinick	b8644e3243	cmd/compile/internal: reuse memory for valueToProgAfter Not a big improvement, but does help edge cases like the SSA package. Change-Id: I40e531110b97efd5f45955be477fd0f4faa8d545 Reviewed-on: https://go-review.googlesource.com/92396 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-02-14 18:29:23 +00:00
Heschi Kreinick	2075a9323d	cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was very bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-02-14 18:29:19 +00:00
Austin Clements	2010189407	runtime: remove legacy eager write barrier Now that the buffered write barrier is implemented for all architectures, we can remove the old eager write barrier implementation. This CL removes the implementation from the runtime, support in the compiler for calling it, and updates some compiler tests that relied on the old eager barrier support. It also makes sure that all of the useful comments from the old write barrier implementation still have a place to live. Fixes #22460. Updates #21640 since this fixes the layering concerns of the write barrier (but not the other things in that issue). Change-Id: I580f93c152e89607e0a72fe43370237ba97bae74 Reviewed-on: https://go-review.googlesource.com/92705 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-02-13 16:34:46 +00:00
Keith Randall	9d70b3ae04	cmd/compile: fix noopt builder, weird append case Turn off append-to-itself optimization if optimizations are turned off. This optimization triggered a bug when doing s = append(s, s) where we write to the leftmost s before reading the rightmost s. Update #17039 Change-Id: I21996532d20a75db6ec8d49db50cb157a1360b80 Reviewed-on: https://go-review.googlesource.com/81816 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-12-04 21:45:40 +00:00
David Chase	f22cf7131a	cmd/compile: use src.NoXPos for entry-block constants The ssa backend is aggressive about placing constants and certain other values in the Entry block. It's implausible that the original line numbers for these constants makes any sort of sense when it appears to a user stepping in a debugger, and they're also not that useful in dumps since entry-block instructions tend to be constants (i.e., unlikely to be the cause of a crash). Therefore, use src.NoXPos for any values that are explicitly inserted into a function's entry block. Passes all tests, including ssa/debug_test.go with both gdb and a fairly recent dlv. Hand-verified that it solves the reported problem; constructed a test that reproduced a problem, and fixed it. Modified test harness to allow injection of slightly more interesting inputs. Fixes #22558. Change-Id: I4476927067846bc4366da7793d2375c111694c55 Reviewed-on: https://go-review.googlesource.com/81215 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-12-01 07:09:54 +00:00
Vladimir Stefanovic	6be1c09e19	cmd/compile: use soft-float routines for soft-float targets Updates #18162 (mostly fixes) Change-Id: I35bcb8a688bdaa432adb0ddbb73a2f7adda47b9e Reviewed-on: https://go-review.googlesource.com/37958 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-11-30 17:37:37 +00:00
David Chase	bd41c6783b	cmd/compile: improve debugging output for GOSSAFUNC This changes the assembly language output to use the innermost (instead of outermost) position for line number and file. The file is printed separately, only when it changes, to remove redundant and space-consuming noise from the output. Unknown positions have line number "?" The output format was changed slightly to make it easier to read. One source of gratuitous variation in debugging output was removed. Change-Id: I1fd9c8b0ddd82766488582fb684dce4b04f35723 Reviewed-on: https://go-review.googlesource.com/78895 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-11-21 16:29:33 +00:00
Keith Randall	fa1f52c5f6	cmd/compile: always nil check before interface call Fixes #22703 The fix was already done by Cherry for defer/go of an interface call (CL 23820). We just need to do it everywhere. Change-Id: I0115d22e443931fe1bcce44c93c4d0770b5fd268 Reviewed-on: https://go-review.googlesource.com/77450 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-11-14 05:39:45 +00:00
Daniel Martí	79dbc1cc7b	cmd/compile: replace classnames with Class.String Since the slice of names is almost exactly the same as what stringer is already generating for us. Passes toolstash -cmp on std cmd. Change-Id: I3f1e95efc690c0108236689e721627f00f79a461 Reviewed-on: https://go-review.googlesource.com/77190 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dave Cheney <dave@cheney.net>	2017-11-12 13:25:35 +00:00
Daniel Martí	3231d4e4ef	cmd/compile: replace opnames with stringer Now possible, since stringer just got the -trimprefix flag added. While at it, simplify a few Op stringifications since we can now use %v, and no longer have to worry about o<len(opnames). Passes toolstash -cmp on std cmd. Fixes #15462. Change-Id: Icdcde0b0a5eb165d18488918175024da274f782b Reviewed-on: https://go-review.googlesource.com/76790 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Dave Cheney <dave@cheney.net>	2017-11-10 10:56:22 +00:00
Jeff R. Allen	d7ac9bb992	cmd/compile: do not write slices/strings > 2g The linker will refuse to work on objects larger than 2e9 bytes (see issue #9862 for why). With this change, the compiler gives a useful error message explaining this, instead of leaving it to the linker to give a cryptic message later. Fixes #1700. Change-Id: I3933ce08ef846721ece7405bdba81dff644cb004 Reviewed-on: https://go-review.googlesource.com/74330 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-11-09 18:50:22 +00:00
David Chase	a042221cdb	cmd/compile: adjust Pos setting for "empty" blocks Plain blocks that contain only uninteresting instructions (that do not have reliable Pos information themselves) need to have their Pos left unset so that they can inherit it from their successors. The "uninteresting" test was not properly applied and not properly defined. OpFwdRef does not appear in the ssa.html debugging output, but at the time of the test these instructions did appear, and it needs to be part of the test. Fixes #22365. Change-Id: I99e5b271acd8f6bcfe0f72395f905c7744ea9a02 Reviewed-on: https://go-review.googlesource.com/74252 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-11-08 22:39:49 +00:00
Austin Clements	3a446d8652	cmd/compile: []T where T is go:notinheap does not need write barriers Currently, assigning a []T where T is a go:notinheap type generates an unnecessary write barrier for storing the slice pointer. This fixes this by teaching HasHeapPointer that this type does not have a heap pointer, and tweaking the lowering of slice assignments so the pointer store retains the correct type rather than simply lowering it to a *uint8 store. Change-Id: I8bf7c66e64a7fefdd14f2bd0de8a5a3596340bab Reviewed-on: https://go-review.googlesource.com/76027 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-11-06 21:07:57 +00:00
David Chase	d58d90152b	cmd/compile: adjust locationlist lifetimes A statement like foo = bar + qux might compile to AX := AX + BX resulting in a regkill for AX before this instruction. The buggy behavior is to kill AX "at" this instruction, before it has executed. (Code generation of no-instruction values like RegKills applies their effects at the next actual instruction emitted). However, bar is still associated with AX until after the instruction executes, so the effect of the regkill must occur at the boundary between this instruction and the next. Similarly, the new value bound to AX is not visible until this instruction executes (and in the case of values that require multiple instructions in code generation, until all of them have executed). The ranges are adjusted so that a value's start occurs at the next following instruction after its evaluation, and the end occurs after (execution of) the first instruction following the end of the lifetime as a value. (Notice the asymmetry; the entire value must be finished before it is visible, but execution of a single instruction invalidates. However, the value is visible before that next instruction executes). The test was adjusted to make it insensitive to the result numbering for variables printed by gdb, since that is not relevant to the test and makes the differences introduced by small changes larger than necessary/useful. The test was also improved to present variable probes more intuitively, and also to allow explicit indication of "this variable was optimized out" Change-Id: I39453eead8399e6bb05ebd957289b112d1100c0e Reviewed-on: https://go-review.googlesource.com/74090 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-11-05 18:32:53 +00:00
Ilya Tocar	2f1f607b21	cmd/compile: intrinsify math.RoundToEven on amd64 We already do this for floor/ceil, but RoundToEven was added later. Intrinsify it also. name old time/op new time/op delta RoundToEven-8 3.00ns ± 1% 0.68ns ± 2% -77.34% (p=0.000 n=10+10) Change-Id: Ib158cbceb436c6725b2d9353a526c5c4be19bcad Reviewed-on: https://go-review.googlesource.com/74852 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-11-02 17:33:52 +00:00
Martin Möhrmann	fbfc2031a6	cmd/compile: specialize map creation for small hint sizes Handle make(map[any]any) and make(map[any]any, hint) where hint <= BUCKETSIZE special to allow for faster map initialization and to improve binary size by using runtime calls with fewer arguments. Given hint is smaller or equal to BUCKETSIZE in which case overLoadFactor(hint, 0) is false and no buckets would be allocated by makemap: * If hmap needs to be allocated on the stack then only hmap's hash0 field needs to be initialized and no call to makemap is needed. * If hmap needs to be allocated on the heap then a new special makehmap function will allocate hmap and intialize hmap's hash0 field. Reduces size of the godoc by ~36kb. AMD64 name old time/op new time/op delta NewEmptyMap 16.6ns ± 2% 5.5ns ± 2% -66.72% (p=0.000 n=10+10) NewSmallMap 64.8ns ± 1% 56.5ns ± 1% -12.75% (p=0.000 n=9+10) Updates #6853 Change-Id: I624e90da6775afaa061178e95db8aca674f44e9b Reviewed-on: https://go-review.googlesource.com/61190 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-11-02 17:03:45 +00:00
Ilya Tocar	94484d8ed5	cmd/compile: intrinsify math.{Trunc/Ceil/Floor} on amd64 This significantly speed-ups Trunc. Ceil/Floor are using the same instruction, so do them too. name old time/op new time/op delta Floor-6 3.33ns ± 1% 3.22ns ± 0% -3.39% (p=0.000 n=10+10) Ceil-6 3.33ns ± 1% 3.22ns ± 0% -3.16% (p=0.000 n=10+7) Trunc-6 4.83ns ± 0% 3.22ns ± 0% -33.36% (p=0.000 n=6+8) Change-Id: If848790e458eedfe38a6a0407bb4f589c68ac254 Reviewed-on: https://go-review.googlesource.com/68630 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-31 19:30:54 +00:00
Michael Munday	4745604bcb	cmd/compile: intrinsify math.RoundToEven on s390x The new RoundToEven function can be implemented as a single FIDBR instruction on s390x. name old time/op new time/op delta RoundToEven 5.32ns ± 1% 0.86ns ± 1% -83.86% (p=0.000 n=10+10) Change-Id: Iaf597e57a0d1085961701e3c75ff4f6f6dcebb5f Reviewed-on: https://go-review.googlesource.com/74350 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-31 18:04:27 +00:00
Austin Clements	7e343134d3	cmd/compile: compiler support for buffered write barrier This CL implements the compiler support for calling the buffered write barrier added by the previous CL. Since the buffered write barrier is only implemented on amd64 right now, this still supports the old, eager write barrier as well. There's little overhead to supporting both and this way a few tests in test/fixedbugs that expect to have liveness maps at write barrier calls can easily opt-in to the old, eager barrier. This significantly improves the performance of the write barrier: name old time/op new time/op delta WriteBarrier-12 73.5ns ±20% 19.2ns ±27% -73.90% (p=0.000 n=19+18) It also reduces the size of binaries because the write barrier call is more compact: name old object-bytes new object-bytes delta Template 398k ± 0% 393k ± 0% -1.14% (p=0.008 n=5+5) Unicode 208k ± 0% 206k ± 0% -1.00% (p=0.008 n=5+5) GoTypes 1.18M ± 0% 1.15M ± 0% -2.00% (p=0.008 n=5+5) Compiler 4.05M ± 0% 3.88M ± 0% -4.26% (p=0.008 n=5+5) SSA 8.25M ± 0% 8.11M ± 0% -1.59% (p=0.008 n=5+5) Flate 228k ± 0% 224k ± 0% -1.83% (p=0.008 n=5+5) GoParser 295k ± 0% 284k ± 0% -3.62% (p=0.008 n=5+5) Reflect 1.00M ± 0% 0.99M ± 0% -0.70% (p=0.008 n=5+5) Tar 339k ± 0% 333k ± 0% -1.67% (p=0.008 n=5+5) XML 404k ± 0% 395k ± 0% -2.10% (p=0.008 n=5+5) [Geo mean] 704k 690k -2.00% name old exe-bytes new exe-bytes delta HelloSize 1.05M ± 0% 1.04M ± 0% -1.55% (p=0.008 n=5+5) https://perf.golang.org/search?q=upload:20171027.1 (Amusingly, this also reduces compiler allocations by 0.75%, which, combined with the better write barrier, speeds up the compiler overall by 2.10%. See the perf link.) It slightly improves the performance of most of the go1 benchmarks and improves the performance of the x/benchmarks: name old time/op new time/op delta BinaryTree17-12 2.40s ± 1% 2.47s ± 1% +2.69% (p=0.000 n=19+19) Fannkuch11-12 2.95s ± 0% 2.95s ± 0% +0.21% (p=0.000 n=20+19) FmtFprintfEmpty-12 41.8ns ± 4% 41.4ns ± 2% -1.03% (p=0.014 n=20+20) FmtFprintfString-12 68.7ns ± 2% 67.5ns ± 1% -1.75% (p=0.000 n=20+17) FmtFprintfInt-12 79.0ns ± 3% 77.1ns ± 1% -2.40% (p=0.000 n=19+17) FmtFprintfIntInt-12 127ns ± 1% 123ns ± 3% -3.42% (p=0.000 n=20+20) FmtFprintfPrefixedInt-12 152ns ± 1% 150ns ± 1% -1.02% (p=0.000 n=18+17) FmtFprintfFloat-12 211ns ± 1% 209ns ± 0% -0.99% (p=0.000 n=20+16) FmtManyArgs-12 500ns ± 0% 496ns ± 0% -0.73% (p=0.000 n=17+20) GobDecode-12 6.44ms ± 1% 6.53ms ± 0% +1.28% (p=0.000 n=20+19) GobEncode-12 5.46ms ± 0% 5.46ms ± 1% ~ (p=0.550 n=19+20) Gzip-12 220ms ± 1% 216ms ± 0% -1.75% (p=0.000 n=19+19) Gunzip-12 38.8ms ± 0% 38.6ms ± 0% -0.30% (p=0.000 n=18+19) HTTPClientServer-12 79.0µs ± 1% 78.2µs ± 1% -1.01% (p=0.000 n=20+20) JSONEncode-12 11.9ms ± 0% 11.9ms ± 0% -0.29% (p=0.000 n=20+19) JSONDecode-12 52.6ms ± 0% 52.2ms ± 0% -0.68% (p=0.000 n=19+20) Mandelbrot200-12 3.69ms ± 0% 3.68ms ± 0% -0.36% (p=0.000 n=20+20) GoParse-12 3.13ms ± 1% 3.18ms ± 1% +1.67% (p=0.000 n=19+20) RegexpMatchEasy0_32-12 73.2ns ± 1% 72.3ns ± 1% -1.19% (p=0.000 n=19+18) RegexpMatchEasy0_1K-12 241ns ± 0% 239ns ± 0% -0.83% (p=0.000 n=17+16) RegexpMatchEasy1_32-12 68.6ns ± 1% 69.0ns ± 1% +0.47% (p=0.015 n=18+16) RegexpMatchEasy1_1K-12 364ns ± 0% 361ns ± 0% -0.67% (p=0.000 n=16+17) RegexpMatchMedium_32-12 104ns ± 1% 103ns ± 1% -0.79% (p=0.001 n=20+15) RegexpMatchMedium_1K-12 33.8µs ± 3% 34.0µs ± 2% ~ (p=0.267 n=20+19) RegexpMatchHard_32-12 1.64µs ± 1% 1.62µs ± 2% -1.25% (p=0.000 n=19+18) RegexpMatchHard_1K-12 49.2µs ± 0% 48.7µs ± 1% -0.93% (p=0.000 n=19+18) Revcomp-12 391ms ± 5% 396ms ± 7% ~ (p=0.154 n=19+19) Template-12 63.1ms ± 0% 59.5ms ± 0% -5.76% (p=0.000 n=18+19) TimeParse-12 307ns ± 0% 306ns ± 0% -0.39% (p=0.000 n=19+17) TimeFormat-12 325ns ± 0% 323ns ± 0% -0.50% (p=0.000 n=19+19) [Geo mean] 47.3µs 46.9µs -0.67% https://perf.golang.org/search?q=upload:20171026.1 name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.25ms ± 1% 2.20ms ± 1% -2.31% (p=0.000 n=18+18) HTTP-12 12.6µs ± 0% 12.6µs ± 0% -0.72% (p=0.000 n=18+17) JSON-12 11.0ms ± 0% 11.0ms ± 1% -0.68% (p=0.000 n=17+19) https://perf.golang.org/search?q=upload:20171026.2 Updates #14951. Updates #22460. Change-Id: Id4c0932890a1d41020071bec73b8522b1367d3e7 Reviewed-on: https://go-review.googlesource.com/73712 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-10-30 18:12:46 +00:00
Lynn Boger	4d0151ede5	cmd/compile,cmd/internal/obj/ppc64: make math.Abs,math.Copysign instrinsics on ppc64x This adds support for math Abs, Copysign to be instrinsics on ppc64x. New instruction FCPSGN is added to generate fcpsgn. Some new rules are added to improve the int<->float conversions that are generated mainly due to the Float64bits and Float64frombits in the math package. PPC64.rules is also modified as suggested in the review for CL 63290. Improvements: benchmark old ns/op new ns/op delta BenchmarkAbs-16 1.12 0.69 -38.39% BenchmarkCopysign-16 1.30 0.93 -28.46% BenchmarkNextafter32-16 9.34 8.05 -13.81% BenchmarkFrexp-16 8.81 7.60 -13.73% Others that used Copysign also saw smaller improvements. I attempted to make this work using rules since that seems to be preferred, but due to the use of Float64bits and Float64frombits in these functions, several rules had to be added and even then not all cases were matched. Using rules became too complicated and seemed too fragile for these. Updates #21390 Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995 Reviewed-on: https://go-review.googlesource.com/67130 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-30 13:56:39 +00:00
Austin Clements	afbe646ab4	cmd/compile: report typedslicecopy write barriers Most write barrier calls are inserted by SSA, but copy and append are lowered to runtime.typedslicecopy during walk. Fix these to set Func.WBPos and emit the "write barrier" warning, as done for the write barriers inserted by SSA. As part of this, we refactor setting WBPos and emitting this warning into the frontend so it can be shared by both walk and SSA. Change-Id: I5fe9997d9bdb55e03e01dd58aee28908c35f606b Reviewed-on: https://go-review.googlesource.com/73411 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-10-29 20:21:43 +00:00

1 2 3 4 5 ...

633 Commits