cybercyst/go - go - Gitea: Git with a cup of tea

mirror of https://github.com/golang/go.git synced 2025-05-25 17:31:22 +00:00

Author	SHA1	Message	Date
Ian Lance Taylor	e24228af25	runtime: enable/disable SIGPROF if needed when profiling This ensures that SIGPROF is handled correctly when using runtime/pprof in a c-archive or c-shared library. Separate profiler handling into pre-process changes and per-thread changes. Simplify the Windows code slightly accordingly. Fixes #18220. Change-Id: I5060f7084c91ef0bbe797848978bdc527c312777 Reviewed-on: https://go-review.googlesource.com/34018 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>	2017-02-09 18:53:34 +00:00
Russ Cox	e4371fb179	time: optimize Now on darwin, windows Fetch both monotonic and wall time together when possible. Avoids skew and is cheaper. Also shave a few ns off in conversion in package time. Compared to current implementation (after monotonic changes): name old time/op new time/op delta Now 19.6ns ± 1% 9.7ns ± 1% -50.63% (p=0.000 n=41+49) darwin/amd64 Now 23.5ns ± 4% 10.6ns ± 5% -54.61% (p=0.000 n=30+28) windows/amd64 Now 54.5ns ± 5% 29.8ns ± 9% -45.40% (p=0.000 n=27+29) windows/386 More importantly, compared to Go 1.8: name old time/op new time/op delta Now 9.5ns ± 1% 9.7ns ± 1% +1.94% (p=0.000 n=41+49) darwin/amd64 Now 12.9ns ± 5% 10.6ns ± 5% -17.73% (p=0.000 n=30+28) windows/amd64 Now 15.3ns ± 5% 29.8ns ± 9% +94.36% (p=0.000 n=30+29) windows/386 This brings time.Now back in line with Go 1.8 on darwin/amd64 and windows/amd64. It's not obvious why windows/386 is still noticeably worse than Go 1.8, but it's better than before this CL. The windows/386 speed is not too important; the changes just keep the two architectures similar. Change-Id: If69b94970c8a1a57910a371ee91e0d4e82e46c5d Reviewed-on: https://go-review.googlesource.com/36428 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-02-09 14:45:16 +00:00
Michael Matloob	62956897c1	runtime: add definitions for SetGoroutineLabels and Do This change defines runtime/pprof.SetGoroutineLabels and runtime/pprof.Do, which are used to set profiler labels on goroutines. The change defines functions in the runtime for setting and getting profile labels, and sets and unsets profile labels when goroutines are created and deleted. The change also adds the package runtime/internal/proflabel, which defines the structure the runtime uses to store profile labels. Change-Id: I747a4400141f89b6e8160dab6aa94ca9f0d4c94d Reviewed-on: https://go-review.googlesource.com/34198 Run-TryBot: Michael Matloob <matloob@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-on: https://go-review.googlesource.com/35010	2017-02-06 20:29:37 +00:00
David Chase	7f1ff65c39	cmd/compile: insert scheduling checks on loop backedges Loop breaking with a counter. Benchmarked (see comments), eyeball checked for sanity on popular loops. This code ought to handle loops in general, and properly inserts phi functions in cases where the earlier version might not have. Includes test, plus modifications to test/run.go to deal with timeout and killing looping test. Tests broken by the addition of extra code (branch frequency and live vars) for added checks turn the check insertion off. If GOEXPERIMENT=preemptibleloops, the compiler inserts reschedule checks on every backedge of every reducible loop. Alternately, specifying GO_GCFLAGS=-d=ssa/insert_resched_checks/on will enable it for a single compilation, but because the core Go libraries contain some loops that may run long, this is less likely to have the desired effect. This is intended as a tool to help in the study and diagnosis of GC and other latency problems, now that goal STW GC latency is on the order of 100 microseconds or less. Updates #17831. Updates #10958. Change-Id: I6206c163a5b0248e3f21eb4fc65f73a179e1f639 Reviewed-on: https://go-review.googlesource.com/33910 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-01-09 21:01:29 +00:00
Austin Clements	0c942e8f2c	runtime: avoid incorrect panic when a signal arrives during STW Stop-the-world and freeze-the-world (used for unhandled panics) are currently not safe to do at the same time. While a regular unhandled panic can't happen concurrently with STW (if the P hasn't been stopped, then the panic blocks the STW), a panic from a _SigThrow signal can happen on an already-stopped P, racing with STW. When this happens, freezetheworld sets sched.stopwait to 0x7fffffff and stopTheWorldWithSema panics because sched.stopwait != 0. Fix this by detecting when freeze-the-world happens before stop-the-world has completely stopped the world and freeze the STW operation rather than panicking. Fixes #17442. Change-Id: I646a7341221dd6d33ea21d818c2f7218e2cb7e20 Reviewed-on: https://go-review.googlesource.com/34611 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-12-20 17:27:47 +00:00
Austin Clements	0bae74e8c9	runtime: wake idle Ps when enqueuing GC work If the scheduler has no user work and there's no GC work visible, it puts the P to sleep (or blocks on the network). However, if we later enqueue more GC work, there's currently nothing that specifically wakes up the scheduler to let it start an idle GC worker. As a result, we can underutilize the CPU during GC if Ps have been put to sleep. Fix this by making GC wake idle Ps when work buffers are put on the full list. We already have a hook to do this, since we use this to preempt a random P if we need more dedicated workers. We expand this hook to instead wake an idle P if there is one. The logic we use for this is identical to the logic used to wake an idle P when we ready a goroutine. To make this really sound, we also fix the scheduler to re-check the idle GC worker condition after releasing its P. This closes a race where 1) the scheduler checks for idle work and finds none, 2) new work is enqueued but there are no idle Ps so none are woken, and 3) the scheduler releases its P. There is one subtlety here. Currently we call enlistWorker directly from putfull, but the gcWork is in an inconsistent state in the places that call putfull. This isn't a problem right now because nothing that enlistWorker does touches the gcWork, but with the added call to wakep, it's possible to get a recursive call into the gcWork (specifically, while write barriers are disallowed, this can do an allocation, which can dispose a gcWork, which can put a workbuf). To handle this, we lift the enlistWorker calls up a layer and delay them until the gcWork is in a consistent state. Fixes #14179. Change-Id: Ia2467a52e54c9688c3c1752e1fc00f5b37bbfeeb Reviewed-on: https://go-review.googlesource.com/32434 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2016-11-20 22:44:22 +00:00
Austin Clements	49ea9207b6	runtime: exit idle worker if there's higher-priority work Idle GC workers trigger whenever there's a GC running and the scheduler doesn't find any other work. However, they currently run for a full scheduler quantum (~10ms) once started. This is really bad for event-driven applications, where work may come in on the network hundreds of times during that window. In the go-gcbench rpc benchmark, this is bad enough to often cause effective STWs where all Ps are in the idle worker. When this happens, we don't even poll the network any more (except for the background 10ms poll in sysmon), so we don't even know there's more work to do. Fix this by making idle workers check with the scheduler roughly every 100 µs to see if there's any higher-priority work the P should be doing. This check includes polling the network for incoming work. Fixes #16528. Change-Id: I6f62ebf6d36a92368da9891bafbbfd609b9bd003 Reviewed-on: https://go-review.googlesource.com/32433 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-11-20 22:44:17 +00:00
Russ Cox	e6da64b6c0	runtime: fix Windows profiling crash I don't have any way to test or reproduce this problem, but the current code is clearly wrong for Windows. Make it better. As I said on #17165: But the borrowing of M's and the profiling of M's by the CPU profiler seem not synchronized enough. This code implements the CPU profiler on Windows: func profileloop1(param uintptr) uint32 { stdcall2(_SetThreadPriority, currentThread, _THREAD_PRIORITY_HIGHEST) for { stdcall2(_WaitForSingleObject, profiletimer, _INFINITE) first := (m)(atomic.Loadp(unsafe.Pointer(&allm))) for mp := first; mp != nil; mp = mp.alllink { thread := atomic.Loaduintptr(&mp.thread) // Do not profile threads blocked on Notes, // this includes idle worker threads, // idle timer thread, idle heap scavenger, etc. if thread == 0 \|\| mp.profilehz == 0 \|\| mp.blocked { continue } stdcall1(_SuspendThread, thread) if mp.profilehz != 0 && !mp.blocked { profilem(mp) } stdcall1(_ResumeThread, thread) } } } func profilem(mp m) { var r context rbuf := make([]byte, unsafe.Sizeof(r)+15) tls := &mp.tls[0] gp := ((g)(unsafe.Pointer(tls))) // align Context to 16 bytes r = (context)(unsafe.Pointer((uintptr(unsafe.Pointer(&rbuf[15]))) &^ 15)) r.contextflags = _CONTEXT_CONTROL stdcall2(_GetThreadContext, mp.thread, uintptr(unsafe.Pointer(r))) sigprof(r.ip(), r.sp(), 0, gp, mp) } func sigprof(pc, sp, lr uintptr, gp g, mp m) { if prof.hz == 0 { return } // Profiling runs concurrently with GC, so it must not allocate. mp.mallocing++ ... lots of code ... mp.mallocing-- } A borrowed M may migrate between threads. Between the atomic.Loaduintptr(&mp.thread) and the SuspendThread, mp may have moved to a new thread, so that it's in active use. In particular it might be calling malloc, as in the crash stack trace. If so, the mp.mallocing++ in sigprof would provoke the crash. Those lines are trying to guard against allocation during sigprof. But on Windows, mp is the thread being traced, not the current thread. Those lines should really be using getg().m.mallocing, which is the same on Unix but not on Windows. With that change, it's possible the race on the actual thread is not a problem: the traceback would get confused and eventually return an error, but that's fine. The code expects that possibility. Fixes #17165. Change-Id: If6619731910d65ca4b1a6e7de761fa2518ef339e Reviewed-on: https://go-review.googlesource.com/33132 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-11-11 20:50:08 +00:00
David Crawshaw	47c1715ba5	runtime: address comments from CL 32357 Change-Id: I174d7307bfdd8ec57bb4266dab8569fd2234abb4 Reviewed-on: https://go-review.googlesource.com/32610 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-11-02 13:49:25 +00:00
David Crawshaw	54ec7b072e	runtime: access modules via a slice The introduction of -buildmode=plugin means modules can be added to a Go program while it is running. This means there exists some time while the program is running with the module is on the moduledata linked list, but it has not been initialized to the satisfaction of other parts of the runtime. Notably, the GC. This CL adds a new way of access modules, an activeModules function. It returns a slice of modules that is built in the background and atomically swapped in. The parts of the runtime that need to wait on module initialization can use this slice instead of the linked list. Fixes #17455 Change-Id: I04790fd07e40c7295beb47cea202eb439206d33d Reviewed-on: https://go-review.googlesource.com/32357 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-11-01 16:04:12 +00:00
Martin Möhrmann	d7b34d5f29	runtime: improve atoi implementation - Adds overflow checks - Adds parsing of negative integers - Adds boolean return value to signal parsing errors - Adds atoi32 for parsing of integers that fit in an int32 - Adds tests Handling of errors to provide error messages at the call sites is left to future CLs. Updates #17718 Change-Id: I3cacd0ab1230b9efc5404c68edae7304d39bcbc0 Reviewed-on: https://go-review.googlesource.com/32390 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-11-01 14:04:39 +00:00
Austin Clements	8f81dfe8b4	runtime: perform write barrier before pointer write Currently, we perform write barriers after performing pointer writes. At the moment, it simply doesn't matter what order this happens in, as long as they appear atomic to GC. But both the hybrid barrier and ROC are going to require a pre-write write barrier. For the hybrid barrier, this is important because the barrier needs to observe both the current value of the slot and the value that will be written to it. (Alternatively, the caller could do the write and pass in the old value, but it seems easier and more useful to just swap the order of the barrier and the write.) For ROC, this is necessary because, if the pointer write is going to make the pointer reachable to some goroutine that it currently is not visible to, the garbage collector must take some special action before that pointer becomes more broadly visible. This commits swaps pointer writes around so the write barrier occurs before the pointer write. The main subtlety here is bulk memory writes. Currently, these copy to the destination first and then use the pointer bitmap of the destination to find the copied pointers and invoke the write barrier. This is necessary because the source may not have a pointer bitmap. To handle these, we pass both the source and the destination to the bulk memory barrier, which uses the pointer bitmap of the destination, but reads the pointer values from the source. Updates #17503. Change-Id: I78ecc0c5c94ee81c29019c305b3d232069294a55 Reviewed-on: https://go-review.googlesource.com/31763 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-28 20:47:52 +00:00
Austin Clements	c3163d23f0	runtime: eliminate write barriers from save As for dropg, save is writing a nil pointer that will generate a write barrier with the hybrid barrier. However, in this case, ctxt always should already be nil, so replace the write with an assertion that this is the case. At this point, we're ready to disable the write barrier elision optimizations that interfere with the hybrid barrier. Updates #17503. Change-Id: I83208e65aa33403d442401f355b2e013ab9a50e9 Reviewed-on: https://go-review.googlesource.com/31571 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-28 20:05:49 +00:00
Austin Clements	8044b77a57	runtime: eliminate write barriers from dropg Currently this contains no write barriers because it's writing nil pointers, but with the hybrid barrier, even these will produce write barriers. However, since these are gs and ms, they don't need write barriers, so we can simply eliminate them. Updates #17503. Change-Id: Ib188a60492c5cfb352814bf9b2bcb2941fb7d6c0 Reviewed-on: https://go-review.googlesource.com/31570 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-28 20:05:39 +00:00
Austin Clements	ee785f03a2	runtime: shade stack-to-stack copy when starting a goroutine The hybrid barrier requires barriers on stack-to-stack copies if either stack is grey. There are only two instances of this in the runtime: channel sends and starting a goroutine. Channel sends already use typedmemmove and hence have the necessary barriers. This commits adds barriers for the stack-to-stack copy when starting a goroutine. Updates #17503. Change-Id: Ibb55e08127ca4d021ac54be61cb96732efa5df5b Reviewed-on: https://go-review.googlesource.com/31455 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-28 20:05:18 +00:00
Austin Clements	88518e7dd6	runtime: zero-initialize LR on new stacks Currently we initialize LR on a new stack by writing nil to it. But this is an initializing write since the newly allocated stack is not zeroed, so this is unsafe with the hybrid barrier. Change this is a uintptr write to avoid a bad write barrier. Updates #17503. Change-Id: I062ac352e35df7da4644c1f2a5aaab87049d1f60 Reviewed-on: https://go-review.googlesource.com/32093 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-28 19:14:03 +00:00
Austin Clements	87e48c5afd	runtime, cmd/compile: rename memclr -> memclrNoHeapPointers Since barrier-less memclr is only safe in very narrow circumstances, this commit renames memclr to avoid accidentally calling memclr on typed memory. This can cause subtle, non-deterministic bugs, so it's worth some effort to prevent. In the near term, this will also prevent bugs creeping in from any concurrent CLs that add calls to memclr; if this happens, whichever patch hits master second will fail to compile. This also adds the other new memclr variants to the compiler's builtin.go to minimize the churn on that binary blob. We'll use these in future commits. Updates #17503. Change-Id: I00eead049f5bd35ca107ea525966831f3d1ed9ca Reviewed-on: https://go-review.googlesource.com/31369 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-28 18:20:33 +00:00
Peter Weinberger	ca922b6d36	runtime: Profile goroutines holding contended mutexes. runtime.SetMutexProfileFraction(n int) will capture 1/n-th of stack traces of goroutines holding contended mutexes if n > 0. From runtime/pprof, pprot.Lookup("mutex").WriteTo writes the accumulated stack traces to w (in essentially the same format that blocking profiling uses). Change-Id: Ie0b54fa4226853d99aa42c14cb529ae586a8335a Reviewed-on: https://go-review.googlesource.com/29650 Reviewed-by: Austin Clements <austin@google.com>	2016-10-28 11:47:16 +00:00
Alberto Donizetti	10560afb54	runtime/debug: avoid overflow in SetMaxThreads Fixes #16076 Change-Id: I91fa87b642592ee4604537dd8c3197cd61ec8b31 Reviewed-on: https://go-review.googlesource.com/31516 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-10-20 21:08:18 +00:00
Austin Clements	2be3ab4415	runtime: keep gcMarkRootCheck happy with spare Gs oneNewExtraM creates a spare M and G for use with cgo callbacks. The G doesn't run right away, but goes directly into syscall status. For the garbage collector, it's marked as "scan valid" and not on the rescan list, but I forgot to also mark it as "scan done". As a result, gcMarkRootCheck thinks that the goroutine hasn't been scanned and panics. This only affects GODEBUG=gccheckmark=1 mode, since we otherwise skip the gcMarkRootCheck. Fixes #17473. Change-Id: I94f5671c42eb44bd5ea7dc68fbf85f0c19e2e52c Reviewed-on: https://go-review.googlesource.com/31139 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-19 21:36:53 +00:00
Austin Clements	687d9d5d78	runtime: print a message on bad morestack If morestack runs on the g0 or gsignal stack, it currently performs some abort operation that typically produces a signal (e.g., it does an INT $3 on x86). This is useful if you're running in a debugger, but if you're not, the runtime tries to trap this signal, which is likely to send the program into a deeper spiral of collapse and lead to very confusing diagnostic output. Help out people trying to debug without a debugger by making morestack print an informative message before blowing up. Change-Id: I2814c64509b137bfe20a00091d8551d18c2c4749 Reviewed-on: https://go-review.googlesource.com/31133 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-10-17 18:56:09 +00:00
Austin Clements	9897e40811	runtime: use more go:nowritebarrierrec in proc.go Currently we use go:nowritebarrier in many places in proc.go. go:notinheap and go:yeswritebarrierrec now let us use go:nowritebarrierrec (the recursive form of the go:nowritebarrier pragma) more liberally. Do so in proc.go Change-Id: Ia7fcbc12ce6c51cb24730bf835fb7634ad53462f Reviewed-on: https://go-review.googlesource.com/30942 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-15 17:58:23 +00:00
Austin Clements	a9e6cebde2	cmd/compile, runtime: add go:yeswritebarrierrec pragma This pragma cancels the effect of go:nowritebarrierrec. This is useful in the scheduler because there are places where we enter a function without a valid P (and hence cannot have write barriers), but then obtain a P. This allows us to annotate the function with go:nowritebarrierrec and split out the part after we've obtained a P into a go:yeswritebarrierrec function. Change-Id: Ic8ce4b6d3c074a1ecd8280ad90eaf39f0ffbcc2a Reviewed-on: https://go-review.googlesource.com/30938 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2016-10-15 17:58:11 +00:00
Cherry Zhang	7c431cb7f9	cmd/link: insert trampolines for too-far jumps on ARM ARM direct CALL/JMP instruction has 24 bit offset, which can only encodes jumps within +/-32M. When the target is too far, the top bits get truncated and the program jumps wild. This CL detects too-far jumps and automatically insert trampolines, currently only internal linking on ARM. It is necessary to make the following changes to the linker: - Resolve direct jump relocs when assigning addresses to functions. this allows trampoline insertion without moving all code that already laid down. - Lay down packages in dependency order, so that when resolving a inter-package direct jump reloc, the target address is already known. Intra-package jumps are assumed never too far. - a linker flag -debugtramp is added for debugging trampolines: "-debugtramp=1 -v" prints trampoline debug message "-debugtramp=2" forces all inter-package jump to use trampolines (currently ARM only) "-debugtramp=2 -v" does both - Some data structures are changed for bookkeeping. On ARM, pseudo DIV/DIVU/MOD/MODU instructions now clobber R8 (unfortunate). In the standard library there is no ARM assembly code that uses these instructions, and the compiler no longer emits them (CL 29390). all.bash passes with -debugtramp=2, except a disassembly test (this is unavoidable as we changed the instruction). TBD: debug info of trampolines? Fixes #17028. Change-Id: Idcce347ea7e0af77c4079041a160b2f6e114b474 Reviewed-on: https://go-review.googlesource.com/29397 Reviewed-by: David Crawshaw <crawshaw@golang.org> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-11 13:35:33 +00:00
Ian Lance Taylor	d03e8b226c	runtime: record current PC for SIGPROF on non-Go thread If we get a SIGPROF on a non-Go thread, and the program has not called runtime.SetCgoTraceback so we have no way to collect a stack trace, then record a profile that is just the PC where the signal occurred. That will at least point the user to the right area. Retrieving the PC from the sigctxt in a signal handler on a non-G thread required marking a number of trivial sigctxt methods as nosplit, and, for extra safety, nowritebarrierrec. The test shows that the existing test CgoPprofThread test does not test the stack trace, just the profile signal. Leaving that for later. Change-Id: I8f8f3ff09ac099fc9d9df94b5a9d210ffc20c4ab Reviewed-on: https://go-review.googlesource.com/30252 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2016-10-11 12:56:15 +00:00
Dmitry Vyukov	c14050646f	runtime: fix newextram PC passed to race detector PC passed to racegostart is expected to be a return PC of the go statement. Race runtime will subtract 1 from the PC before symbolization. Passing start PC of a function is wrong. Add sys.PCQuantum to the function start PC. Update #17190 Change-Id: Ia504c49e79af84ed4ea360c2aea472b370ea8bf5 Reviewed-on: https://go-review.googlesource.com/29712 Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-09-25 12:15:40 +00:00
David Crawshaw	8607bed744	runtime: avoid dependence on main symbol For -buildmode=plugin, this lets the linker drop the main.main symbol out of the binary while including most of the runtime. (In the future it should be possible to drop the entire runtime package from plugins.) Change-Id: I3e7a024ddf5cc945e3d8b84bf37a0b7cb2a00eb6 Reviewed-on: https://go-review.googlesource.com/27821 Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-09-16 14:49:27 +00:00
Josh Bleecher Snyder	2b74de3ed9	runtime: rename fastrand1 to fastrand Change-Id: I37706ff0a3486827c5b072c95ad890ea87ede847 Reviewed-on: https://go-review.googlesource.com/28210 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-08-30 23:59:21 +00:00
Keith Randall	0d7a2241cb	runtime: update a few comments noescape is now 0 instructions with the SSA backend. fast atomics are no longer a TODO (at least for amd64). Change-Id: Ib6e06f7471bef282a47ba236d8ce95404bb60a42 Reviewed-on: https://go-review.googlesource.com/28087 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-08-30 18:16:28 +00:00
David Crawshaw	f135c32640	runtime: initialize hash algs before typemap When compiling with -buildmode=shared, a map[int32]*_type is created for each extra module mapping duplicate types back to a canonical object. This is done in the function typelinksinit, which is called before the init function that sets up the hash functions for the map implementation. The result is typemap becomes unusable after runtime initialization. The fix in this CL is to move algorithm init before typelinksinit in the runtime setup process. (For 1.8, we may want to turn typemap into a sorted slice of types and use binary search.) Manually tested on GOOS=linux with: GOHOSTARCH=386 GOARCH=386 ./make.bash && \ go install -buildmode=shared std && \ cd ../test && \ go run run.go -linkshared Fixes #16590 Change-Id: Idc08c50cc70d20028276fbf564509d2cd5405210 Reviewed-on: https://go-review.googlesource.com/25469 Run-TryBot: David Crawshaw <crawshaw@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-08-04 17:39:05 +00:00
Ian Lance Taylor	50048a4e8e	runtime: add as many extra M's as needed When a non-Go thread calls into Go, the runtime needs an M to run the Go code. The runtime keeps a list of extra M's available. When the last extra M is allocated, the needextram field is set to tell it to allocate a new extra M as soon as it is running in Go. This ensures that an extra M will always be available for the next thread. However, if many threads need an extra M at the same time, this serializes them all. One thread will get an extra M with the needextram field set. All the other threads will see that there is no M available and will go to sleep. The one thread that succeeded will create a new extra M. One lucky thread will get it. All the other threads will see that there is no M available and will go to sleep. The effect is thundering herd, as all the threads looking for an extra M go through the process one by one. This seems to have a particularly bad effect on the FreeBSD scheduler for some reason. With this change, we track the number of threads waiting for an M, and create all of them as soon as one thread gets through. This still means that all the threads will fight for the lock to pick up the next M. But at least each thread that gets the lock will succeed, instead of going to sleep only to fight again. This smooths out the performance greatly on FreeBSD, reducing the average wall time of `testprogcgo CgoCallbackGC` by 74%. On GNU/Linux the average wall time goes down by 9%. Fixes #13926 Fixes #16396 Change-Id: I6dc42a4156085a7ed4e5334c60b39db8f8ef8fea Reviewed-on: https://go-review.googlesource.com/25047 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2016-07-20 13:31:55 +00:00
Ian Lance Taylor	25a609556a	runtime: correct printing of blocked field in scheduler trace When the blocked field was first introduced back in https://golang.org/cl/61250043 the scheduler trace code incorrectly used m->blocked instead of mp->blocked. That has carried through the conversion to Go. This CL fixes it. Change-Id: Id81907b625221895aa5c85b9853f7c185efd8f4b Reviewed-on: https://go-review.googlesource.com/24571 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-06-29 01:38:39 +00:00
Ian Lance Taylor	84d8aff94c	runtime: collect stack trace if SIGPROF arrives on non-Go thread Fixes #15994. Change-Id: I5aca91ab53985ac7dcb07ce094ec15eb8ec341f8 Reviewed-on: https://go-review.googlesource.com/23891 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-06-13 21:43:19 +00:00
Russ Cox	7fdec6216c	build: enable framepointer mode by default This has a minor performance cost, but far less than is being gained by SSA. As an experiment, enable it during the Go 1.7 beta. Having frame pointers on by default makes Linux's perf, Intel VTune, and other profilers much more useful, because it lets them gather a stack trace efficiently on profiling events. (It doesn't help us that much, since when we walk the stack we usually need to look up PC-specific information as well.) Fixes #15840. Change-Id: I4efd38412a0de4a9c87b1b6e5d11c301e63f1a2a Reviewed-on: https://go-review.googlesource.com/23451 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-26 19:02:00 +00:00
Ian Lance Taylor	a5d1a72a40	cmd/cgo, runtime, runtime/cgo: TSAN support for malloc Acquire and release the TSAN synchronization point when calling malloc, just as we do when calling any other C function. If we don't do this, TSAN will report false positive errors about races calling malloc and free. We used to have a special code path for malloc and free, going through the runtime functions cmalloc and cfree. The special code path for cfree was no longer used even before this CL. This CL stops using the special code path for malloc, because there is no place along that path where we could conditionally insert the TSAN synchronization. This CL removes the support for the special code path for both functions. Instead, cgo now automatically generates the malloc function as though it were referenced as C.malloc. We need to automatically generate it even if C.malloc is not called, even if malloc and size_t are not declared, to support cgo-provided functions like C.CString. Change-Id: I829854ec0787a80f33fa0a8a0dc2ee1d617830e2 Reviewed-on: https://go-review.googlesource.com/23260 Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2016-05-25 23:22:24 +00:00
Austin Clements	3be48b4dc8	runtime: pass gcWork to scanstack Currently scanstack obtains its own gcWork from the P for the duration of the stack scan and then, if called during mark termination, disposes the gcWork. However, this means that the number of workbufs allocated will be at least the number of stacks scanned during mark termination, which may be very high (especially during a STW GC). This happens because, in steady state, each scanstack will obtain a fresh workbuf (either from the empty list or by allocating it), fill it with the scan results, and then dispose it to the full list. Nothing is consuming from the full list during this (and hence nothing is recycling them to the empty list), so the length of the full list by the time mark termination starts draining it is at least the number of stacks scanned. Fix this by pushing the gcWork acquisition up the stack to either the gcDrain that calls markroot that calls scanstack (which batches across many stack scans and is the path taken during STW GC) or to newstack (which is still a single scanstack call, but this is roughly bounded by the number of Ps). This fix reduces the workbuf allocation for the test program from issue #15319 from 213 MB (roughly 2KB * 1e5 goroutines) to 10 MB. Fixes #15319. Note that there's potentially a similar issue in write barriers during mark 2. Fixing that will be more difficult since there's no broader non-preemptible context, but it should also be less of a problem since the full list is being drained during mark 2. Some overall improvements in the go1 benchmarks, plus the usual noise. No significant change in the garbage benchmark (time/op or GC memory). name old time/op new time/op delta BinaryTree17-12 2.54s ± 1% 2.51s ± 1% -1.09% (p=0.000 n=20+19) Fannkuch11-12 2.12s ± 0% 2.17s ± 0% +2.18% (p=0.000 n=19+18) FmtFprintfEmpty-12 45.1ns ± 1% 45.2ns ± 0% ~ (p=0.078 n=19+18) FmtFprintfString-12 127ns ± 0% 128ns ± 0% +1.08% (p=0.000 n=19+16) FmtFprintfInt-12 125ns ± 0% 122ns ± 1% -2.71% (p=0.000 n=14+18) FmtFprintfIntInt-12 196ns ± 0% 190ns ± 1% -2.91% (p=0.000 n=12+20) FmtFprintfPrefixedInt-12 196ns ± 0% 194ns ± 1% -0.94% (p=0.000 n=13+18) FmtFprintfFloat-12 253ns ± 1% 251ns ± 1% -0.86% (p=0.000 n=19+20) FmtManyArgs-12 807ns ± 1% 784ns ± 1% -2.85% (p=0.000 n=20+20) GobDecode-12 7.13ms ± 1% 7.12ms ± 1% ~ (p=0.351 n=19+20) GobEncode-12 5.89ms ± 0% 5.95ms ± 0% +0.94% (p=0.000 n=19+19) Gzip-12 219ms ± 1% 221ms ± 1% +1.35% (p=0.000 n=18+20) Gunzip-12 37.5ms ± 1% 37.4ms ± 0% ~ (p=0.057 n=20+19) HTTPClientServer-12 81.4µs ± 4% 81.9µs ± 3% ~ (p=0.118 n=17+18) JSONEncode-12 15.7ms ± 1% 15.8ms ± 1% +0.73% (p=0.000 n=17+18) JSONDecode-12 57.9ms ± 1% 57.2ms ± 1% -1.34% (p=0.000 n=19+19) Mandelbrot200-12 4.12ms ± 1% 4.10ms ± 0% -0.33% (p=0.000 n=19+17) GoParse-12 3.22ms ± 2% 3.25ms ± 1% +0.72% (p=0.000 n=18+20) RegexpMatchEasy0_32-12 70.6ns ± 1% 71.1ns ± 2% +0.63% (p=0.005 n=19+20) RegexpMatchEasy0_1K-12 240ns ± 0% 239ns ± 1% -0.59% (p=0.000 n=19+20) RegexpMatchEasy1_32-12 71.3ns ± 1% 71.3ns ± 1% ~ (p=0.844 n=17+17) RegexpMatchEasy1_1K-12 384ns ± 2% 371ns ± 1% -3.45% (p=0.000 n=19+20) RegexpMatchMedium_32-12 109ns ± 1% 108ns ± 2% -0.48% (p=0.029 n=19+19) RegexpMatchMedium_1K-12 34.3µs ± 1% 34.5µs ± 2% ~ (p=0.160 n=18+20) RegexpMatchHard_32-12 1.79µs ± 9% 1.72µs ± 2% -3.83% (p=0.000 n=19+19) RegexpMatchHard_1K-12 53.3µs ± 4% 51.8µs ± 1% -2.82% (p=0.000 n=19+20) Revcomp-12 386ms ± 0% 388ms ± 0% +0.72% (p=0.000 n=17+20) Template-12 62.9ms ± 1% 62.5ms ± 1% -0.57% (p=0.010 n=18+19) TimeParse-12 325ns ± 0% 331ns ± 0% +1.84% (p=0.000 n=18+19) TimeFormat-12 338ns ± 0% 343ns ± 0% +1.34% (p=0.000 n=18+20) [Geo mean] 52.7µs 52.5µs -0.42% Change-Id: Ib2d34736c4ae2ec329605b0fbc44636038d8d018 Reviewed-on: https://go-review.googlesource.com/23391 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-05-25 21:11:47 +00:00
Austin Clements	91740582c3	runtime: add 'next' flag to ready Currently ready always puts the readied goroutine in runnext. We're going to have to change this for some uses, so add a flag for whether or not to use runnext. For now we always pass true so this is a no-op change. For #15706. Change-Id: Iaa66d8355ccfe4bbe347570cc1b1878c70fa25df Reviewed-on: https://go-review.googlesource.com/23171 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-05-19 18:17:58 +00:00
Ian Lance Taylor	1f7a0d4b5e	runtime: don't do a plain throw when throwsplit == true The test case in #15639 somehow causes an invalid syscall frame. The failure is obscured because the throw occurs when throwsplit == true, which causes a "stack split at bad time" error when trying to print the throw message. This CL fixes the "stack split at bad time" by using systemstack. No test because there shouldn't be any way to trigger this error anyhow. Update #15639. Change-Id: I4240f3fd01bdc3c112f3ffd1316b68504222d9e1 Reviewed-on: https://go-review.googlesource.com/23153 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2016-05-19 04:37:45 +00:00
Ian Lance Taylor	84e808043f	runtime: use cgo traceback for SIGPROF If we collected a cgo traceback when entering the SIGPROF signal handler, record it as part of the profiling stack trace. This serves as the promised test for https://golang.org/cl/21055 . Change-Id: I5f60cd6cea1d9b7c3932211483a6bfab60ed21d2 Reviewed-on: https://go-review.googlesource.com/22650 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2016-05-04 00:08:19 +00:00
Dmitry Vyukov	caa2147532	runtime: per-P contexts for race detector Race runtime also needs local malloc caches and currently uses a mix of per-OS-thread and per-goroutine caches. This leads to increased memory consumption. But more importantly cache of synchronization objects is per-goroutine and we don't always have goroutine context when feeing memory in GC. As the result synchronization object descriptors leak (more precisely, they can be reused if another synchronization object is recreated at the same address, but it does not always help). For example, the added BenchmarkSyncLeak has effectively runaway memory consumption (based on a real long running server). This change updates race runtime with support for per-P contexts. BenchmarkSyncLeak now stabilizes at ~1GB memory consumption. Long term, this will allow us to remove race runtime dependency on glibc (as malloc is the main cornerstone). I've also implemented a different scheme to pass P context to race runtime: scheduler notified race runtime about association between G and P by calling procwire(g, p)/procunwire(g, p). But it turned out to be very messy as we have lots of places where the association changes (e.g. syscalls). So I dropped it in favor of the current scheme: race runtime asks scheduler about the current P. Fixes #14533 Change-Id: Iad10d2f816a44affae1b9fed446b3580eafd8c69 Reviewed-on: https://go-review.googlesource.com/19970 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-03 11:00:43 +00:00
Dmitry Vyukov	fcd7c02c70	runtime: fix CPU underutilization Runqempty is a critical predicate for scheduler. If runqempty spuriously returns true, then scheduler can fail to schedule arbitrary number of runnable goroutines on idle Ps for arbitrary long time. With the addition of runnext runqempty predicate become broken (can spuriously return true). Consider that runnext is not nil and the main array is empty. Runqempty observes that the array is empty, then it is descheduled for some time. Then queue owner pushes another element to the queue evicting runnext into the array. Then queue owner pops runnext. Then runqempty resumes and observes runnext is nil and returns true. But there were no point in time when the queue was empty. Fix runqempty predicate to not return true spuriously. Change-Id: Ifb7d75a699101f3ff753c4ce7c983cf08befd31e Reviewed-on: https://go-review.googlesource.com/20858 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-03 10:06:32 +00:00
Austin Clements	2a889b9d93	runtime: make stack re-scan O(# dirty stacks) Currently the stack re-scan during mark termination is O(# stacks) because we enqueue a root marking job for every goroutine. It takes ~34ns to process this root marking job for a valid (clean) stack, so at around 300k goroutines we exceed the 10ms pause goal. A non-trivial portion of this time is spent simply taking the cache miss to check the gcscanvalid flag, so simply optimizing the path that handles clean stacks can only improve this so much. Fix this by keeping an explicit list of goroutines with dirty stacks that need to be rescanned. When a goroutine first transitions to running after a stack scan and marks its stack dirty, it adds itself to this list. We enqueue root marking jobs only for the goroutines in this list, so this improves stack re-scanning asymptotically by completely eliminating time spent on clean goroutines. This reduces mark termination time for 500k idle goroutines from 15ms to 238µs. Overall performance effect is negligible. name \ 95%ile-time/markTerm old new delta IdleGs/gs:500000/gomaxprocs:12 15000µs ± 0% 238µs ± 5% -98.41% (p=0.000 n=10+10) name old time/op new time/op delta XBenchGarbage-12 2.30ms ± 3% 2.29ms ± 1% -0.43% (p=0.049 n=17+18) name old time/op new time/op delta BinaryTree17-12 2.57s ± 3% 2.59s ± 2% ~ (p=0.141 n=19+20) Fannkuch11-12 2.09s ± 0% 2.10s ± 1% +0.53% (p=0.000 n=19+19) FmtFprintfEmpty-12 45.3ns ± 3% 45.2ns ± 2% ~ (p=0.845 n=20+20) FmtFprintfString-12 129ns ± 0% 127ns ± 0% -1.55% (p=0.000 n=16+16) FmtFprintfInt-12 123ns ± 0% 119ns ± 1% -3.24% (p=0.000 n=19+19) FmtFprintfIntInt-12 195ns ± 1% 189ns ± 1% -3.11% (p=0.000 n=17+17) FmtFprintfPrefixedInt-12 193ns ± 1% 187ns ± 1% -3.06% (p=0.000 n=19+19) FmtFprintfFloat-12 254ns ± 0% 255ns ± 1% +0.35% (p=0.001 n=14+17) FmtManyArgs-12 781ns ± 0% 770ns ± 0% -1.48% (p=0.000 n=16+19) GobDecode-12 7.00ms ± 1% 6.98ms ± 1% ~ (p=0.563 n=19+19) GobEncode-12 5.91ms ± 1% 5.92ms ± 0% ~ (p=0.118 n=19+18) Gzip-12 219ms ± 1% 215ms ± 1% -1.81% (p=0.000 n=18+18) Gunzip-12 37.2ms ± 0% 37.4ms ± 0% +0.45% (p=0.000 n=17+19) HTTPClientServer-12 76.9µs ± 3% 77.5µs ± 2% +0.81% (p=0.030 n=20+19) JSONEncode-12 15.0ms ± 0% 14.8ms ± 1% -0.88% (p=0.001 n=15+19) JSONDecode-12 50.6ms ± 0% 53.2ms ± 2% +5.07% (p=0.000 n=17+19) Mandelbrot200-12 4.05ms ± 0% 4.05ms ± 1% ~ (p=0.581 n=16+17) GoParse-12 3.34ms ± 1% 3.30ms ± 1% -1.21% (p=0.000 n=15+20) RegexpMatchEasy0_32-12 69.6ns ± 1% 69.8ns ± 2% ~ (p=0.566 n=19+19) RegexpMatchEasy0_1K-12 238ns ± 1% 236ns ± 0% -0.91% (p=0.000 n=17+13) RegexpMatchEasy1_32-12 69.8ns ± 1% 70.0ns ± 1% +0.23% (p=0.026 n=17+16) RegexpMatchEasy1_1K-12 371ns ± 1% 363ns ± 1% -2.07% (p=0.000 n=19+19) RegexpMatchMedium_32-12 107ns ± 2% 106ns ± 1% -0.51% (p=0.031 n=18+20) RegexpMatchMedium_1K-12 33.0µs ± 0% 32.9µs ± 0% -0.30% (p=0.004 n=16+16) RegexpMatchHard_32-12 1.70µs ± 0% 1.70µs ± 0% +0.45% (p=0.000 n=16+17) RegexpMatchHard_1K-12 51.1µs ± 2% 51.4µs ± 1% +0.53% (p=0.000 n=17+19) Revcomp-12 378ms ± 1% 385ms ± 1% +1.92% (p=0.000 n=19+18) Template-12 64.3ms ± 2% 65.0ms ± 2% +1.09% (p=0.001 n=19+19) TimeParse-12 315ns ± 1% 317ns ± 2% ~ (p=0.108 n=18+20) TimeFormat-12 360ns ± 1% 337ns ± 0% -6.30% (p=0.000 n=18+13) [Geo mean] 51.8µs 51.6µs -0.48% Change-Id: Icf8994671476840e3998236e15407a505d4c760c Reviewed-on: https://go-review.googlesource.com/20700 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-26 23:40:13 +00:00
Austin Clements	5b765ce310	runtime: don't clear gcscanvalid in casfrom_Gscanstatus Currently we clear gcscanvalid in both casgstatus and casfrom_Gscanstatus if the new status is _Grunning. This is very important to do in casgstatus. However, this is potentially wrong in casfrom_Gscanstatus because in this case the caller doesn't own gp and hence the write is racy. Unlike the other _Gscan statuses, during _Gscanrunning, the G is still running. This does not indicate that it's transitioning into a running state. The scan simply hasn't happened yet, so it's neither valid nor invalid. Conveniently, this also means clearing gcscanvalid is unnecessary in this case because the G was already in _Grunning, so we can simply remove this code. What will happen instead is that the G will be preempted to scan itself, that scan will set gcscanvalid to true, and then the G will return to _Grunning via casgstatus, clearing gcscanvalid. This fix will become necessary shortly when we start keeping track of the set of G's with dirty stacks, since it will no longer be idempotent to simply set gcscanvalid to false. Change-Id: I688c82e6fbf00d5dbbbff49efa66acb99ee86785 Reviewed-on: https://go-review.googlesource.com/20669 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-26 23:40:10 +00:00
Austin Clements	c707d83856	runtime: fix typos in comment about gcscanvalid Change-Id: Id4ad7ebf88a21eba2bc5714b96570ed5cfaed757 Reviewed-on: https://go-review.googlesource.com/22210 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-26 23:40:07 +00:00
Austin Clements	1a2cf91f5e	runtime: split gfree list into with-stacks and without-stacks Currently all free Gs are added to one list. Split this into two lists: one for free Gs with cached stacks and one for Gs without cached stacks. This lets us preferentially allocate Gs that already have a stack, but more importantly, it sets us up to free cached G stacks concurrently. Change-Id: Idbe486f708997e1c9d166662995283f02d1eeb3c Reviewed-on: https://go-review.googlesource.com/20664 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-26 23:39:51 +00:00
Dmitry Vyukov	a3703618ea	runtime: use per-goroutine sequence numbers in tracer Currently tracer uses global sequencer and it introduces significant slowdown on parallel machines (up to 10x). Replace the global sequencer with per-goroutine sequencer. If we assign per-goroutine sequence numbers to only 3 types of events (start, unblock and syscall exit), it is enough to restore consistent partial ordering of all events. Even these events don't need sequence numbers all the time (if goroutine starts on the same P where it was unblocked, then start does not need sequence number). The burden of restoring the order is put on trace parser. Details of the algorithm are described in the comments. On http benchmark with GOMAXPROCS=48: no tracing: 5026 ns/op tracing: 27803 ns/op (+453%) with this change: 6369 ns/op (+26%, mostly for traceback) Also trace size is reduced by ~22%. Average event size before: 4.63 bytes/event, after: 3.62 bytes/event. Besides running trace tests, I've also tested with manually broken cputicks (random skew for each event, per-P skew and episodic random skew). In all cases broken timestamps were detected and no test failures. Change-Id: I078bde421ccc386a66f6c2051ab207bcd5613efa Reviewed-on: https://go-review.googlesource.com/21512 Run-TryBot: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-23 15:57:05 +00:00
David Crawshaw	7d469179e6	cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final rtype. To ensure there is only one rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other rtype values with typeOff (and name/string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-13 13:03:11 +00:00
Emmanuel Odeke	e4f1d9cf2e	runtime: make execution error panic values implement the Error interface Make execution panics implement Error as mandated by https://golang.org/ref/spec#Run_time_panics, instead of panics with strings. Fixes #14965 Change-Id: I7827f898b9b9c08af541db922cc24fa0800ff18a Reviewed-on: https://go-review.googlesource.com/21214 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-10 01:16:30 +00:00
Michael Hudson-Doyle	31cf1c1779	runtime: clamp OS-reported number of processors to _MaxGomaxprocs So that all Go processes do not die on startup on a system with >256 CPUs. I tested this by hacking osinit to set ncpu to 1000. Updates #15131 Change-Id: I52e061a0de97be41d684dd8b748fa9087d6f1aef Reviewed-on: https://go-review.googlesource.com/21599 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-04-07 00:11:25 +00:00
Dmitry Vyukov	475d113b53	runtime: don't burn CPU unnecessarily Two GC-related functions, scang and casgstatus, wait in an active spin loop. Active spinning is never a good idea in user-space. Once we wait several times more than the expected wait time, something unexpected is happenning (e.g. the thread we are waiting for is descheduled or handling a page fault) and we need to yield to OS scheduler. Moreover, the expected wait time is very high for these functions: scang wait time can be tens of milliseconds, casgstatus can be hundreds of microseconds. It does not make sense to spin even for that time. go install -a std profile on a 4-core machine shows that 11% of time is spent in the active spin in scang: 6.12% compile compile [.] runtime.scang 3.27% compile compile [.] runtime.readgstatus 1.72% compile compile [.] runtime/internal/atomic.Load The active spin also increases tail latency in the case of the slightest oversubscription: GC goroutines spend whole quantum in the loop instead of executing user code. Here is scang wait time histogram during go install -a std: 13707.0000 - 1815442.7667 [ 118]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎... 1815442.7667 - 3617178.5333 [ 9]: ∎∎∎∎∎∎∎∎∎ 3617178.5333 - 5418914.3000 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ 5418914.3000 - 7220650.0667 [ 5]: ∎∎∎∎∎ 7220650.0667 - 9022385.8333 [ 12]: ∎∎∎∎∎∎∎∎∎∎∎∎ 9022385.8333 - 10824121.6000 [ 13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎ 10824121.6000 - 12625857.3667 [ 15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 12625857.3667 - 14427593.1333 [ 18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 14427593.1333 - 16229328.9000 [ 18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 16229328.9000 - 18031064.6667 [ 32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 18031064.6667 - 19832800.4333 [ 28]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 19832800.4333 - 21634536.2000 [ 6]: ∎∎∎∎∎∎ 21634536.2000 - 23436271.9667 [ 15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 23436271.9667 - 25238007.7333 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ 25238007.7333 - 27039743.5000 [ 27]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 27039743.5000 - 28841479.2667 [ 20]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 28841479.2667 - 30643215.0333 [ 10]: ∎∎∎∎∎∎∎∎∎∎ 30643215.0333 - 32444950.8000 [ 7]: ∎∎∎∎∎∎∎ 32444950.8000 - 34246686.5667 [ 4]: ∎∎∎∎ 34246686.5667 - 36048422.3333 [ 4]: ∎∎∎∎ 36048422.3333 - 37850158.1000 [ 1]: ∎ 37850158.1000 - 39651893.8667 [ 5]: ∎∎∎∎∎ 39651893.8667 - 41453629.6333 [ 2]: ∎∎ 41453629.6333 - 43255365.4000 [ 2]: ∎∎ 43255365.4000 - 45057101.1667 [ 2]: ∎∎ 45057101.1667 - 46858836.9333 [ 1]: ∎ 46858836.9333 - 48660572.7000 [ 2]: ∎∎ 48660572.7000 - 50462308.4667 [ 3]: ∎∎∎ 50462308.4667 - 52264044.2333 [ 2]: ∎∎ 52264044.2333 - 54065780.0000 [ 2]: ∎∎ and the zoomed-in first part: 13707.0000 - 19916.7667 [ 2]: ∎∎ 19916.7667 - 26126.5333 [ 2]: ∎∎ 26126.5333 - 32336.3000 [ 9]: ∎∎∎∎∎∎∎∎∎ 32336.3000 - 38546.0667 [ 8]: ∎∎∎∎∎∎∎∎ 38546.0667 - 44755.8333 [ 12]: ∎∎∎∎∎∎∎∎∎∎∎∎ 44755.8333 - 50965.6000 [ 10]: ∎∎∎∎∎∎∎∎∎∎ 50965.6000 - 57175.3667 [ 5]: ∎∎∎∎∎ 57175.3667 - 63385.1333 [ 6]: ∎∎∎∎∎∎ 63385.1333 - 69594.9000 [ 5]: ∎∎∎∎∎ 69594.9000 - 75804.6667 [ 6]: ∎∎∎∎∎∎ 75804.6667 - 82014.4333 [ 6]: ∎∎∎∎∎∎ 82014.4333 - 88224.2000 [ 4]: ∎∎∎∎ 88224.2000 - 94433.9667 [ 1]: ∎ 94433.9667 - 100643.7333 [ 1]: ∎ 100643.7333 - 106853.5000 [ 2]: ∎∎ 106853.5000 - 113063.2667 [ 0]: 113063.2667 - 119273.0333 [ 2]: ∎∎ 119273.0333 - 125482.8000 [ 2]: ∎∎ 125482.8000 - 131692.5667 [ 1]: ∎ 131692.5667 - 137902.3333 [ 1]: ∎ 137902.3333 - 144112.1000 [ 0]: 144112.1000 - 150321.8667 [ 2]: ∎∎ 150321.8667 - 156531.6333 [ 1]: ∎ 156531.6333 - 162741.4000 [ 1]: ∎ 162741.4000 - 168951.1667 [ 0]: 168951.1667 - 175160.9333 [ 0]: 175160.9333 - 181370.7000 [ 1]: ∎ 181370.7000 - 187580.4667 [ 1]: ∎ 187580.4667 - 193790.2333 [ 2]: ∎∎ 193790.2333 - 200000.0000 [ 0]: Here is casgstatus wait time histogram: 631.0000 - 5276.6333 [ 3]: ∎∎∎ 5276.6333 - 9922.2667 [ 5]: ∎∎∎∎∎ 9922.2667 - 14567.9000 [ 2]: ∎∎ 14567.9000 - 19213.5333 [ 6]: ∎∎∎∎∎∎ 19213.5333 - 23859.1667 [ 5]: ∎∎∎∎∎ 23859.1667 - 28504.8000 [ 6]: ∎∎∎∎∎∎ 28504.8000 - 33150.4333 [ 6]: ∎∎∎∎∎∎ 33150.4333 - 37796.0667 [ 2]: ∎∎ 37796.0667 - 42441.7000 [ 1]: ∎ 42441.7000 - 47087.3333 [ 3]: ∎∎∎ 47087.3333 - 51732.9667 [ 0]: 51732.9667 - 56378.6000 [ 1]: ∎ 56378.6000 - 61024.2333 [ 0]: 61024.2333 - 65669.8667 [ 0]: 65669.8667 - 70315.5000 [ 0]: 70315.5000 - 74961.1333 [ 1]: ∎ 74961.1333 - 79606.7667 [ 0]: 79606.7667 - 84252.4000 [ 0]: 84252.4000 - 88898.0333 [ 0]: 88898.0333 - 93543.6667 [ 0]: 93543.6667 - 98189.3000 [ 0]: 98189.3000 - 102834.9333 [ 0]: 102834.9333 - 107480.5667 [ 1]: ∎ 107480.5667 - 112126.2000 [ 0]: 112126.2000 - 116771.8333 [ 0]: 116771.8333 - 121417.4667 [ 0]: 121417.4667 - 126063.1000 [ 0]: 126063.1000 - 130708.7333 [ 0]: 130708.7333 - 135354.3667 [ 0]: 135354.3667 - 140000.0000 [ 1]: ∎ Ideally we eliminate the waiting by switching to async state machine for GC, but for now just yield to OS scheduler after a reasonable wait time. To choose yielding parameters I've measured golang.org/x/benchmarks/http tail latencies with different yield delays and oversubscription levels. With no oversubscription (to the degree possible): scang yield delay = 1, casgstatus yield delay = 1 Latency-50 1.41ms ±15% 1.41ms ± 5% ~ (p=0.611 n=13+12) Latency-95 5.21ms ± 2% 5.15ms ± 2% -1.15% (p=0.012 n=13+13) Latency-99 7.16ms ± 2% 7.05ms ± 2% -1.54% (p=0.002 n=13+13) Latency-999 10.7ms ± 9% 10.2ms ±10% -5.46% (p=0.004 n=12+13) scang yield delay = 5000, casgstatus yield delay = 3000 Latency-50 1.41ms ±15% 1.41ms ± 8% ~ (p=0.511 n=13+13) Latency-95 5.21ms ± 2% 5.14ms ± 2% -1.23% (p=0.006 n=13+13) Latency-99 7.16ms ± 2% 7.02ms ± 2% -1.94% (p=0.000 n=13+13) Latency-999 10.7ms ± 9% 10.1ms ± 8% -6.14% (p=0.000 n=12+13) scang yield delay = 10000, casgstatus yield delay = 5000 Latency-50 1.41ms ±15% 1.45ms ± 6% ~ (p=0.724 n=13+13) Latency-95 5.21ms ± 2% 5.18ms ± 1% ~ (p=0.287 n=13+13) Latency-99 7.16ms ± 2% 7.05ms ± 2% -1.64% (p=0.002 n=13+13) Latency-999 10.7ms ± 9% 10.0ms ± 5% -6.72% (p=0.000 n=12+13) scang yield delay = 30000, casgstatus yield delay = 10000 Latency-50 1.41ms ±15% 1.51ms ± 7% +6.57% (p=0.002 n=13+13) Latency-95 5.21ms ± 2% 5.21ms ± 2% ~ (p=0.960 n=13+13) Latency-99 7.16ms ± 2% 7.06ms ± 2% -1.50% (p=0.012 n=13+13) Latency-999 10.7ms ± 9% 10.0ms ± 6% -6.49% (p=0.000 n=12+13) scang yield delay = 100000, casgstatus yield delay = 50000 Latency-50 1.41ms ±15% 1.53ms ± 6% +8.48% (p=0.000 n=13+12) Latency-95 5.21ms ± 2% 5.23ms ± 2% ~ (p=0.287 n=13+13) Latency-99 7.16ms ± 2% 7.08ms ± 2% -1.21% (p=0.004 n=13+13) Latency-999 10.7ms ± 9% 9.9ms ± 3% -7.99% (p=0.000 n=12+12) scang yield delay = 200000, casgstatus yield delay = 100000 Latency-50 1.41ms ±15% 1.47ms ± 5% ~ (p=0.072 n=13+13) Latency-95 5.21ms ± 2% 5.17ms ± 2% ~ (p=0.091 n=13+13) Latency-99 7.16ms ± 2% 7.02ms ± 2% -1.99% (p=0.000 n=13+13) Latency-999 10.7ms ± 9% 9.9ms ± 5% -7.86% (p=0.000 n=12+13) With slight oversubscription (another instance of http benchmark was running in background with reduced GOMAXPROCS): scang yield delay = 1, casgstatus yield delay = 1 Latency-50 840µs ± 3% 804µs ± 3% -4.37% (p=0.000 n=15+18) Latency-95 6.52ms ± 4% 6.03ms ± 4% -7.51% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 10.0ms ± 4% -7.33% (p=0.000 n=18+14) Latency-999 18.0ms ± 9% 16.8ms ± 7% -6.84% (p=0.000 n=18+18) scang yield delay = 5000, casgstatus yield delay = 3000 Latency-50 840µs ± 3% 809µs ± 3% -3.71% (p=0.000 n=15+17) Latency-95 6.52ms ± 4% 6.11ms ± 4% -6.29% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 9.9ms ± 6% -7.55% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.5ms ±11% -8.49% (p=0.000 n=18+18) scang yield delay = 10000, casgstatus yield delay = 5000 Latency-50 840µs ± 3% 823µs ± 5% -2.06% (p=0.002 n=15+18) Latency-95 6.52ms ± 4% 6.32ms ± 3% -3.05% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 4% -5.22% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.7ms ±10% -7.09% (p=0.000 n=18+18) scang yield delay = 30000, casgstatus yield delay = 10000 Latency-50 840µs ± 3% 836µs ± 5% ~ (p=0.442 n=15+18) Latency-95 6.52ms ± 4% 6.39ms ± 3% -2.00% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 6% -5.15% (p=0.000 n=18+17) Latency-999 18.0ms ± 9% 16.6ms ± 8% -7.48% (p=0.000 n=18+18) scang yield delay = 100000, casgstatus yield delay = 50000 Latency-50 840µs ± 3% 836µs ± 6% ~ (p=0.401 n=15+18) Latency-95 6.52ms ± 4% 6.40ms ± 4% -1.79% (p=0.010 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 5% -4.95% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.5ms ±14% -8.17% (p=0.000 n=18+18) scang yield delay = 200000, casgstatus yield delay = 100000 Latency-50 840µs ± 3% 828µs ± 2% -1.49% (p=0.001 n=15+17) Latency-95 6.52ms ± 4% 6.38ms ± 4% -2.04% (p=0.001 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 4% -4.77% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.9ms ± 9% -6.23% (p=0.000 n=18+18) With significant oversubscription (background http benchmark was running with full GOMAXPROCS): scang yield delay = 1, casgstatus yield delay = 1 Latency-50 1.32ms ±12% 1.30ms ±13% ~ (p=0.454 n=14+14) Latency-95 16.3ms ±10% 15.3ms ± 7% -6.29% (p=0.001 n=14+14) Latency-99 29.4ms ±10% 27.9ms ± 5% -5.04% (p=0.001 n=14+12) Latency-999 49.9ms ±19% 45.9ms ± 5% -8.00% (p=0.008 n=14+13) scang yield delay = 5000, casgstatus yield delay = 3000 Latency-50 1.32ms ±12% 1.29ms ± 9% ~ (p=0.227 n=14+14) Latency-95 16.3ms ±10% 15.4ms ± 5% -5.27% (p=0.002 n=14+14) Latency-99 29.4ms ±10% 27.9ms ± 6% -5.16% (p=0.001 n=14+14) Latency-999 49.9ms ±19% 46.8ms ± 8% -6.21% (p=0.050 n=14+14) scang yield delay = 10000, casgstatus yield delay = 5000 Latency-50 1.32ms ±12% 1.35ms ± 9% ~ (p=0.401 n=14+14) Latency-95 16.3ms ±10% 15.0ms ± 4% -7.67% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.4ms ± 5% -6.98% (p=0.000 n=14+14) Latency-999 49.9ms ±19% 44.7ms ± 5% -10.56% (p=0.000 n=14+11) scang yield delay = 30000, casgstatus yield delay = 10000 Latency-50 1.32ms ±12% 1.36ms ±10% ~ (p=0.246 n=14+14) Latency-95 16.3ms ±10% 14.9ms ± 5% -8.31% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.4ms ± 7% -6.70% (p=0.000 n=14+14) Latency-999 49.9ms ±19% 44.9ms ±15% -10.13% (p=0.003 n=14+14) scang yield delay = 100000, casgstatus yield delay = 50000 Latency-50 1.32ms ±12% 1.41ms ± 9% +6.37% (p=0.008 n=14+13) Latency-95 16.3ms ±10% 15.1ms ± 8% -7.45% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.5ms ±12% -6.67% (p=0.002 n=14+14) Latency-999 49.9ms ±19% 45.9ms ±16% -8.06% (p=0.019 n=14+14) scang yield delay = 200000, casgstatus yield delay = 100000 Latency-50 1.32ms ±12% 1.42ms ±10% +7.21% (p=0.003 n=14+14) Latency-95 16.3ms ±10% 15.0ms ± 7% -7.59% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.3ms ± 8% -7.20% (p=0.000 n=14+14) Latency-999 49.9ms ±19% 44.8ms ± 8% -10.21% (p=0.001 n=14+13) All numbers are on 8 cores and with GOGC=10 (http benchmark has tiny heap, few goroutines and low allocation rate, so by default GC barely affects tail latency). 10us/5us yield delays seem to provide a reasonable compromise and give 5-10% tail latency reduction. That's what used in this change. go install -a std results on 4 core machine: name old time/op new time/op delta Time 8.39s ± 2% 7.94s ± 2% -5.34% (p=0.000 n=47+49) UserTime 24.6s ± 2% 22.9s ± 2% -6.76% (p=0.000 n=49+49) SysTime 1.77s ± 9% 1.89s ±11% +7.00% (p=0.000 n=49+49) CpuLoad 315ns ± 2% 313ns ± 1% -0.59% (p=0.000 n=49+48) # %CPU MaxRSS 97.1ms ± 4% 97.5ms ± 9% ~ (p=0.838 n=46+49) # bytes Update #14396 Update #14189 Change-Id: I3f4109bf8f7fd79b39c466576690a778232055a2 Reviewed-on: https://go-review.googlesource.com/21503 Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2016-04-05 15:52:03 +00:00

1 2 3

140 Commits