cybercyst/go - go - Gitea: Git with a cup of tea

mirror of https://github.com/golang/go.git synced 2025-05-05 15:43:04 +00:00

Author	SHA1	Message	Date
David Chase	fc5073bc15	runtime,internal: move runtime/internal/sys to internal/runtime/sys Cleanup and friction reduction For #65355. Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511 Reviewed-on: https://go-review.googlesource.com/c/go/+/600436 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>	2024-07-23 19:05:35 +00:00
Nick Ripley	87abb4afb6	runtime: avoid multiple records with identical stacks from MutexProfile When using frame pointer unwinding, we defer frame skipping and inline expansion for call stacks until profile reporting time. We can end up with records which have different stacks if no frames are skipped, but identical stacks once skipping is taken into account. Returning multiple records with the same stack (but different values) has broken programs which rely on the records already being fully aggregated by call stack when returned from runtime.MutexProfile. This CL addresses the problem by handling skipping at recording time. We do full inline expansion to correctly skip the desired number of frames when recording the call stack, and then handle the rest of inline expansion when reporting the profile. The regression test in this CL is adapted from the reproducer in https://github.com/grafana/pyroscope-go/issues/103, authored by Tolya Korniltsev. Fixes #67548 This reapplies CL 595966. The original version of this CL introduced a bounds error in MutexProfile and failed to correctly expand inlined frames from that call. This CL applies the original CL, fixing the bounds error and adding a test for the output of MutexProfile to ensure the frames are expanded properly. Change-Id: I5ef8aafb9f88152a704034065c0742ba767c4dbb Reviewed-on: https://go-review.googlesource.com/c/go/+/598515 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2024-07-17 19:17:19 +00:00
Cherry Mui	6948b4df8c	Revert "runtime: avoid multiple records with identical stacks from MutexProfile" This reverts CL 595966. Reason for revert: This CL contains a bug. See the comment in https://go-review.googlesource.com/c/go/+/595966/8#message-57f4c1f9570b5fe912e06f4ae3b52817962533c0 Change-Id: I48030907ded173ae20a8965bf1b41a713dd06059 Reviewed-on: https://go-review.googlesource.com/c/go/+/598219 Reviewed-by: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org>	2024-07-15 20:09:02 +00:00
Nick Ripley	183a40db6d	runtime: avoid multiple records with identical stacks from MutexProfile When using frame pointer unwinding, we defer frame skipping and inline expansion for call stacks until profile reporting time. We can end up with records which have different stacks if no frames are skipped, but identical stacks once skipping is taken into account. Returning multiple records with the same stack (but different values) has broken programs which rely on the records already being fully aggregated by call stack when returned from runtime.MutexProfile. This CL addresses the problem by handling skipping at recording time. We do full inline expansion to correctly skip the desired number of frames when recording the call stack, and then handle the rest of inline expansion when reporting the profile. The regression test in this CL is adapted from the reproducer in https://github.com/grafana/pyroscope-go/issues/103, authored by Tolya Korniltsev. Fixes #67548 Co-Authored-By: Tolya Korniltsev <korniltsev.anatoly@gmail.com> Change-Id: I6a42ce612377f235b2c8c0cec9ba8e9331224b84 Reviewed-on: https://go-review.googlesource.com/c/go/+/595966 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Carlos Amedee <carlos@golang.org> Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>	2024-07-09 21:41:42 +00:00
Cherry Mui	82c14346d8	cmd/link: don't disable memory profiling when pprof.WriteHeapProfile is used We have an optimization that if the memory profile is not consumed anywhere, we set the memory profiling rate to 0 to disable the "background" low-rate profiling. We detect whether the memory profile is used by checking whether the runtime.MemProfile function is reachable at link time. Previously, all APIs that access the memory profile go through runtime.MemProfile. But the code was refactored in CL 572396, and now the legacy entry point WriteHeapProfile uses pprof_memProfileInternal without going through runtime.MemProfile. In fact, even with the recommended runtime/pprof.Profile API (pprof.Lookup or pprof.Profiles), runtime.MemProfile is only (happen to be) reachable through countHeap. Change the linker to check runtime.memProfileInternal instead, which is on all code paths that retrieve the memory profile. Add a test case for WriteHeapProfile, so we cover all entry points. Fixes #68136. Change-Id: I075c8d45c95c81825a1822f032e23107aea4303c Reviewed-on: https://go-review.googlesource.com/c/go/+/596538 Reviewed-by: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-07-03 22:43:11 +00:00
Rhys Hiltner	9114c51521	Revert "runtime: prepare for extensions to waiting M list" This reverts commit be0b569caa0eab1a7f30edf64e550bbf5f6ff235 (CL 585635). Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. Updates #66999 Updates #67585 Change-Id: I7843ccaecbd273b7ceacfa0f420dd993b4b15a0a Reviewed-on: https://go-review.googlesource.com/c/go/+/589117 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2024-05-30 17:57:37 +00:00
Rhys Hiltner	afbbc2894b	Revert "runtime: double-link list of waiting Ms" This reverts commit d881ed6384ae58154d99682f1e20160c64e7c3c2 (CL 585637). Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. Updates #66999 Updates #67585 Change-Id: I70d8d0b74f73be95c43d664f584e8d98519aba26 Reviewed-on: https://go-review.googlesource.com/c/go/+/589116 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2024-05-30 17:56:46 +00:00
Rhys Hiltner	5dead59add	Revert "runtime: profile mutex contention during unlock" This reverts commit ba1c5b2c4573e10f3b5f0e0f25a27f17fba67eb0 (CL 585638). Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. Updates #66999 Updates #67585 Change-Id: Ibeec5d8deb17e87966cf352fefc7efe2267839d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/589115 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-05-30 17:56:43 +00:00
Rhys Hiltner	ca7d300509	Revert "runtime: remove GODEBUG=runtimecontentionstacks" This reverts commit 87e930f7289136fad1310d4b63dd4127e409bac5 (CL 585639) Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. Updates #66999 Updates #67585 Change-Id: I1e286d2a16d16e4af202cd5dc04b2d9c4ee71b32 Reviewed-on: https://go-review.googlesource.com/c/go/+/589097 Reviewed-by: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>	2024-05-30 17:52:17 +00:00
Rhys Hiltner	1be701a344	Revert "runtime: split mutex profile clocks" This reverts commit 8ab131fb1256a4a795c610e145c022e22e2d1567 (CL 586796) Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. Updates #66999 Updates #67585 Change-Id: I54711691e86e072081482102019d168292b5150a Reviewed-on: https://go-review.googlesource.com/c/go/+/589095 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com>	2024-05-30 17:49:07 +00:00
Rhys Hiltner	8ab131fb12	runtime: split mutex profile clocks Mutex contention measurements work with two clocks: nanotime for use in runtime/metrics, and cputicks for the runtime/pprof profile. They're subject to different sampling rates: the runtime/metrics view is always enabled, but the profile is adjustable and is turned off by default. They have different levels of overhead: it can take as little as one instruction to read cputicks while nanotime calls are more elaborate (although some platforms implement cputicks as a nanotime call). The use of the timestamps is also different: the profile's view needs to attach the delay in some Ms' lock2 calls to another M's unlock2 call stack, but the metric's view is only an int64. Treat them differently. Don't bother threading the nanotime clock through to the unlock2 call, measure and report it directly within lock2. Sample nanotime at a constant gTrackingPeriod. Don't consult any clocks unless the mutex is actually contended. Continue liberal use of cputicks for now. For #66999 Change-Id: I1c2085ea0e695bfa90c30fadedc99ced9eb1f69e Reviewed-on: https://go-review.googlesource.com/c/go/+/586796 TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org>	2024-05-22 14:34:20 +00:00
Rhys Hiltner	87e930f728	runtime: remove GODEBUG=runtimecontentionstacks Go 1.22 promised to remove the setting in a future release once the semantics of runtime-internal lock contention matched that of sync.Mutex. That work is done, remove the setting. For #66999 Change-Id: I3c4894148385adf2756d8754e44d7317305ad758 Reviewed-on: https://go-review.googlesource.com/c/go/+/585639 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2024-05-21 17:17:36 +00:00
Rhys Hiltner	ba1c5b2c45	runtime: profile mutex contention during unlock When an M's use of a lock causes delays in other Ms, capture the stack of the unlock call that caused the delay. This makes the meaning of the mutex profile for runtime-internal mutexes match the behavior for sync.Mutex: the profile points to the end of the critical section that is responsible for delaying other work. Fixes #66999 Change-Id: I4abc4a1df00a48765d29c07776481a1cbd539ff8 Reviewed-on: https://go-review.googlesource.com/c/go/+/585638 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>	2024-05-21 17:17:34 +00:00
Rhys Hiltner	d881ed6384	runtime: double-link list of waiting Ms When an M unlocks a contended mutex, it needs to consult a list of the Ms that had to wait during its critical section. This allows the M to attribute the appropriate amount of blame to the unlocking call stack. Mirroring the implementation for users' sync.Mutex contention (via sudog), we can (in a future commit) use the time that the head and tail of the wait list started waiting, and the number of waiters, to estimate the sum of the Ms' delays. When an M acquires the mutex, it needs to remove itself from the list of waiters. Since the futex-based lock implementation leaves the OS in control of the order of M wakeups, we need to be prepared for quickly (constant time) removing any M from the list. First, have each M add itself to a singly-linked wait list when it finds that its lock call will need to sleep. This case is safe against live-lock, since any delay to one M adding itself to the list would be due to another M making durable progress. Second, have the M that holds the lock (either right before releasing, or right after acquiring) update metadata on the list of waiting Ms to double-link the list and maintain a tail pointer and waiter count. That work is amortized-constant: we'll avoid contended locks becoming proportionally more contended and undergoing performance collapse. For #66999 Change-Id: If75cdea915afb59ccec47294e0b52c466aac8736 Reviewed-on: https://go-review.googlesource.com/c/go/+/585637 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>	2024-05-21 17:17:31 +00:00
Rhys Hiltner	be0b569caa	runtime: prepare for extensions to waiting M list Move the nextwaitm field into a small struct, in preparation for additional metadata to track how long Ms need to wait for locks. For #66999 Change-Id: Ib40e43c15cde22f7e35922641107973d99439ecd Reviewed-on: https://go-review.googlesource.com/c/go/+/585635 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2024-05-21 17:17:24 +00:00
Felix Geisendörfer	66cc2b7ca7	runtime: make profstackdepth a GODEBUG option Allow users to decrease the profiling stack depth back to 32 in case they experience any problems with the new default of 128. Users may also use this option to increase the depth up to 1024. Change-Id: Ieaab2513024915a223239278dd97a6e161dde1cf Reviewed-on: https://go-review.googlesource.com/c/go/+/581917 Reviewed-by: Austin Clements <austin@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2024-05-21 14:38:56 +00:00
Felix Geisendörfer	1b9dc3e178	runtime: increase profiling stack depth to 128 The current stack depth limit for alloc, mutex, block, threadcreate and goroutine profiles of 32 frequently leads to truncated stack traces in production applications. Increase the limit to 128 which is the same size used by the execution tracer. Create internal/profilerecord to define variants of the runtime's StackRecord, MemProfileRecord and BlockProfileRecord types that can hold arbitrarily big stack traces. Implement internal profiling APIs based on these new types and use them for creating protobuf profiles and to act as shims for the public profiling APIs using the old types. This will lead to an increase in memory usage for applications that use the impacted profile types and have stack traces exceeding the current limit of 32. Those applications will also experience a slight increase in CPU usage, but this will hopefully soon be mitigated via CL 540476 and 533258 which introduce frame pointer unwinding for the relevant profile types. For #43669. Change-Id: Ie53762e65d0f6295f5d4c7d3c87172d5a052164e Reviewed-on: https://go-review.googlesource.com/c/go/+/572396 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2024-05-21 14:38:45 +00:00
Felix Geisendörfer	47187a4f4f	runtime: fix profile stack trace depth regression Previously it was possible for mutex and block profile stack traces to contain up to 32 frames in Stack0 or the resulting pprof profiles. CL 533258 changed this behavior by using some of the space to record skipped frames that are discarded when performing delayed inline expansion. This has lowered the effective maximum stack size from 32 to 27 (the max skip value is 5), which can be seen as a small regression. Add TestProfilerStackDepth to demonstrate the issue and protect all profile types from similar regressions in the future. Fix the issue by increasing the internal maxStack limit to take the maxSkip value into account. Assert that the maxSkip value is never exceeded when recording mutex and block profile stack traces. Three alternative solutions to the problem were considered and discarded: 1) Revert CL 533258 and give up on frame pointer unwinding. This seems unappealing as we would lose the performance benefits of frame pointer unwinding. 2) Discard skipped frames when recording the initial stack trace. This would require eager inline expansion for up to maxSkip frames and partially negate the performance benefits of frame pointer unwinding. 3) Accept and document the new behavior. This would simplify the implementation, but seems more confusing from a user perspective. It also complicates the creation of test cases that make assertions about the maximum profiling stack depth. The execution tracer still has the same issue due to CL 463835. This should be addressed in a follow-up CL. Co-authored-by: Nick Ripley <nick.ripley@datadoghq.com> Change-Id: Ibf4dbf08a5166c9cb32470068c69f58bc5f98d2c Reviewed-on: https://go-review.googlesource.com/c/go/+/586657 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-05-21 14:38:39 +00:00
Nick Ripley	f4494522dc	runtime: use frame pointer unwinding for block and mutex profilers Use frame pointer unwinding, where supported, to collect call stacks for the block, and mutex profilers. This method of collecting call stacks is typically an order of magnitude faster than callers/tracebackPCs. The marginal benefit for these profile types is likely small compared to using frame pointer unwinding for the execution tracer. However, the block profiler can have noticeable overhead unless the sampling rate is very high. Additionally, using frame pointer unwinding in more places helps ensure more testing/support, which benefits systems like the execution tracer which rely on frame pointer unwinding to be practical to use. Change-Id: I4b36c90cd2df844645fd275a41b247352d635727 Reviewed-on: https://go-review.googlesource.com/c/go/+/533258 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2024-05-13 16:42:35 +00:00
Felix Geisendörfer	2141315251	runtime: move profiling pc buffers to m Move profiling pc buffers from being stack allocated to an m field. This is motivated by the next patch, which will increase the default stack depth to 128, which might lead to undesirable stack growth for goroutines that produce profiling events. Additionally, this change paves the way to make the stack depth configurable via GODEBUG. Change-Id: Ifa407f899188e2c7c0a81de92194fdb627cb4b36 Reviewed-on: https://go-review.googlesource.com/c/go/+/574699 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2024-05-08 17:48:38 +00:00
Michael Anthony Knyszek	11047345f5	runtime: remove allocfreetrace allocfreetrace prints all allocations and frees to stderr. It's not terribly useful because it has a really huge overhead, making it not feasible to use except for the most trivial programs. A follow-up CL will replace it with something that is both more thorough and also lower overhead. Change-Id: I1d668fee8b6aaef5251a5aea3054ec2444d75eb6 Reviewed-on: https://go-review.googlesource.com/c/go/+/583376 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com>	2024-05-08 17:44:56 +00:00
Andy Pan	4c2b1e0feb	runtime: migrate internal/atomic to internal/runtime For #65355 Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be Reviewed-on: https://go-review.googlesource.com/c/go/+/560155 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>	2024-03-25 19:53:03 +00:00
apocelipes	c8c46e746b	runtime: use built-in clear to simplify code Change-Id: Icb6d9ca996b4119d8636d9f7f6a56e510d74d059 GitHub-Last-Rev: 08178e8ff798f4a51860573788c9347a0fb6bc40 GitHub-Pull-Request: golang/go#66188 Reviewed-on: https://go-review.googlesource.com/c/go/+/569979 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: qiulaidongfeng <2645477756@qq.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2024-03-08 16:28:44 +00:00
Michael Pratt	65f056d07a	runtime: avoid new linkname for goroutine profiles CL 464349 added a new linkname to provide gcount to runtime/pprof to avoid a STW when estimating the goroutine profile allocation size. However, adding a linkname here isn't necessary for a few reasons: 1. We already export gcount via NumGoroutines. I completely forgot about this during review. 2. aktau suggested that goroutineProfileWithLabelsConcurrent return gcount as a fast path estimate when the input is empty. The second point keeps the code cleaner overall, so I've done that. For #54014. Change-Id: I6cb0811a769c805e269b55774cdd43509854078e Reviewed-on: https://go-review.googlesource.com/c/go/+/559515 Auto-Submit: Michael Pratt <mpratt@google.com> Auto-Submit: Nicolas Hillegeer <aktau@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-01-30 16:34:07 +00:00
Jun10ng	f96d9a6432	runtime: reduce one STW when obtaining goroutine configuration file Fixes #54014 Change-Id: If4ee2752008729e1ed4b767cfda52effdcec4959 GitHub-Last-Rev: 5ce300bf5128f842604d85d5f8749027c8e091c2 GitHub-Pull-Request: golang/go#58239 Reviewed-on: https://go-review.googlesource.com/c/go/+/464349 Reviewed-by: qiulaidongfeng <2645477756@qq.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: qiulaidongfeng <2645477756@qq.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2024-01-30 00:04:58 +00:00
Olivier Mengué	202b435969	runtime: more godoc links Change-Id: I8fe66326994894b17ce0eda991bba942844d26b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/541475 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-01-15 21:21:08 +00:00
Michael Pratt	98fd8f5768	runtime: rename GODEBUG=profileruntimelocks to runtimecontentionstacks profileruntimelocks is new in CL 544195, but the name is deceptive. Even with profileruntimelocks=0, runtime-internal locks are still profiled. The actual difference is that call stacks are not collected. Instead all contention is reported at runtime._LostContendedLock. Rename this setting to runtimecontentionstacks to make its name more aligned with its behavior. In addition, for this release the default is profileruntimelocks=0, meaning that users are fairly likely to encounter runtime._LostContendedLock. Rename it to runtime._LostContendedRuntimeLock in an attempt to make it more intuitive that these are runtime locks, not locks in application code. For #57071. Change-Id: I38aac28b2c0852db643d53b1eab3f3bc42a43393 Reviewed-on: https://go-review.googlesource.com/c/go/+/547055 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Rhys Hiltner <rhys@justin.tv>	2023-12-06 17:57:59 +00:00
Russ Cox	c29444ef39	math/rand, math/rand/v2: use ChaCha8 for global rand Move ChaCha8 code into internal/chacha8rand and use it to implement runtime.rand, which is used for the unseeded global source for both math/rand and math/rand/v2. This also affects the calculation of the start point for iteration over very very large maps (when the 32-bit fastrand is not big enough). The benefit is that misuse of the global random number generators in math/rand and math/rand/v2 in contexts where non-predictable randomness is important for security reasons is no longer a security problem, removing a common mistake among programmers who are unaware of the different kinds of randomness. The cost is an extra 304 bytes per thread stored in the m struct plus 2-3ns more per random uint64 due to the more sophisticated algorithm. Using PCG looks like it would cost about the same, although I haven't benchmarked that. Before this, the math/rand and math/rand/v2 global generator was wyrand (https://github.com/wangyi-fudan/wyhash). For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson ALFG was justifiable, since the latter was not any better. But for math/rand/v2, the global generator really should be at least as good as one of the well-studied, specific algorithms provided directly by the package, and it's not. (Wyrand is still reasonable for scheduling and cache decisions.) Good randomness does have a cost: about twice wyrand. Also rationalize the various runtime rand references. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ bbb48afeb7.amd64 │ 5cf807d1ea.amd64 │ │ sec/op │ sec/op vs base │ ChaCha8-32 1.862n ± 2% 1.861n ± 2% ~ (p=0.825 n=20) PCG_DXSM-32 1.471n ± 1% 1.460n ± 2% ~ (p=0.153 n=20) SourceUint64-32 1.636n ± 2% 1.582n ± 1% -3.30% (p=0.000 n=20) GlobalInt64-32 2.087n ± 1% 3.663n ± 1% +75.54% (p=0.000 n=20) GlobalInt64Parallel-32 0.1042n ± 1% 0.2026n ± 1% +94.48% (p=0.000 n=20) GlobalUint64-32 2.263n ± 2% 3.724n ± 1% +64.57% (p=0.000 n=20) GlobalUint64Parallel-32 0.1019n ± 1% 0.1973n ± 1% +93.67% (p=0.000 n=20) Int64-32 1.771n ± 1% 1.774n ± 1% ~ (p=0.449 n=20) Uint64-32 1.863n ± 2% 1.866n ± 1% ~ (p=0.364 n=20) GlobalIntN1000-32 3.134n ± 3% 4.730n ± 2% +50.95% (p=0.000 n=20) IntN1000-32 2.489n ± 1% 2.489n ± 1% ~ (p=0.683 n=20) Int64N1000-32 2.521n ± 1% 2.516n ± 1% ~ (p=0.394 n=20) Int64N1e8-32 2.479n ± 1% 2.478n ± 2% ~ (p=0.743 n=20) Int64N1e9-32 2.530n ± 2% 2.514n ± 2% ~ (p=0.193 n=20) Int64N2e9-32 2.501n ± 1% 2.494n ± 1% ~ (p=0.616 n=20) Int64N1e18-32 3.227n ± 1% 3.205n ± 1% ~ (p=0.101 n=20) Int64N2e18-32 3.647n ± 1% 3.599n ± 1% ~ (p=0.019 n=20) Int64N4e18-32 5.135n ± 1% 5.069n ± 2% ~ (p=0.034 n=20) Int32N1000-32 2.657n ± 1% 2.637n ± 1% ~ (p=0.180 n=20) Int32N1e8-32 2.636n ± 1% 2.636n ± 1% ~ (p=0.763 n=20) Int32N1e9-32 2.660n ± 2% 2.638n ± 1% ~ (p=0.358 n=20) Int32N2e9-32 2.662n ± 2% 2.618n ± 2% ~ (p=0.064 n=20) Float32-32 2.272n ± 2% 2.239n ± 2% ~ (p=0.194 n=20) Float64-32 2.272n ± 1% 2.286n ± 2% ~ (p=0.763 n=20) ExpFloat64-32 3.762n ± 1% 3.744n ± 1% ~ (p=0.171 n=20) NormFloat64-32 3.706n ± 1% 3.655n ± 2% ~ (p=0.066 n=20) Perm3-32 32.93n ± 3% 34.62n ± 1% +5.13% (p=0.000 n=20) Perm30-32 202.9n ± 1% 204.0n ± 1% ~ (p=0.482 n=20) Perm30ViaShuffle-32 115.0n ± 1% 114.9n ± 1% ~ (p=0.358 n=20) ShuffleOverhead-32 112.8n ± 1% 112.7n ± 1% ~ (p=0.692 n=20) Concurrent-32 2.107n ± 0% 3.725n ± 1% +76.75% (p=0.000 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 │ bbb48afeb7.arm64 │ 5cf807d1ea.arm64 │ │ sec/op │ sec/op vs base │ ChaCha8-8 2.480n ± 0% 2.429n ± 0% -2.04% (p=0.000 n=20) PCG_DXSM-8 2.531n ± 0% 2.530n ± 0% ~ (p=0.877 n=20) SourceUint64-8 2.534n ± 0% 2.533n ± 0% ~ (p=0.732 n=20) GlobalInt64-8 2.172n ± 1% 4.794n ± 0% +120.67% (p=0.000 n=20) GlobalInt64Parallel-8 0.4320n ± 0% 0.9605n ± 0% +122.32% (p=0.000 n=20) GlobalUint64-8 2.182n ± 0% 4.770n ± 0% +118.58% (p=0.000 n=20) GlobalUint64Parallel-8 0.4307n ± 0% 0.9583n ± 0% +122.51% (p=0.000 n=20) Int64-8 4.107n ± 0% 4.104n ± 0% ~ (p=0.416 n=20) Uint64-8 4.080n ± 0% 4.080n ± 0% ~ (p=0.052 n=20) GlobalIntN1000-8 2.814n ± 2% 5.643n ± 0% +100.50% (p=0.000 n=20) IntN1000-8 4.141n ± 0% 4.139n ± 0% ~ (p=0.140 n=20) Int64N1000-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.313 n=20) Int64N1e8-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.103 n=20) Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.761 n=20) Int64N2e9-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.636 n=20) Int64N1e18-8 5.266n ± 0% 5.326n ± 1% +1.14% (p=0.001 n=20) Int64N2e18-8 6.052n ± 0% 6.167n ± 0% +1.90% (p=0.000 n=20) Int64N4e18-8 8.826n ± 0% 9.051n ± 0% +2.55% (p=0.000 n=20) Int32N1000-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20) Int32N1e8-8 4.126n ± 0% 4.131n ± 0% +0.12% (p=0.000 n=20) Int32N1e9-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20) Int32N2e9-8 4.132n ± 0% 4.131n ± 0% ~ (p=0.017 n=20) Float32-8 4.109n ± 0% 4.105n ± 0% ~ (p=0.379 n=20) Float64-8 4.107n ± 0% 4.106n ± 0% ~ (p=0.867 n=20) ExpFloat64-8 5.339n ± 0% 5.383n ± 0% +0.82% (p=0.000 n=20) NormFloat64-8 5.735n ± 0% 5.737n ± 1% ~ (p=0.856 n=20) Perm3-8 26.65n ± 0% 26.80n ± 1% +0.58% (p=0.000 n=20) Perm30-8 194.8n ± 1% 197.0n ± 0% +1.18% (p=0.000 n=20) Perm30ViaShuffle-8 156.6n ± 0% 157.6n ± 1% +0.61% (p=0.000 n=20) ShuffleOverhead-8 124.9n ± 0% 125.5n ± 0% +0.52% (p=0.000 n=20) Concurrent-8 2.434n ± 3% 5.066n ± 0% +108.09% (p=0.000 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ bbb48afeb7.386 │ 5cf807d1ea.386 │ │ sec/op │ sec/op vs base │ ChaCha8-32 11.295n ± 1% 4.748n ± 2% -57.96% (p=0.000 n=20) PCG_DXSM-32 7.693n ± 1% 7.738n ± 2% ~ (p=0.542 n=20) SourceUint64-32 7.658n ± 2% 7.622n ± 2% ~ (p=0.344 n=20) GlobalInt64-32 3.473n ± 2% 7.526n ± 2% +116.73% (p=0.000 n=20) GlobalInt64Parallel-32 0.3198n ± 0% 0.5444n ± 0% +70.22% (p=0.000 n=20) GlobalUint64-32 3.612n ± 0% 7.575n ± 1% +109.69% (p=0.000 n=20) GlobalUint64Parallel-32 0.3168n ± 0% 0.5403n ± 0% +70.51% (p=0.000 n=20) Int64-32 7.673n ± 2% 7.789n ± 1% ~ (p=0.122 n=20) Uint64-32 7.773n ± 1% 7.827n ± 2% ~ (p=0.920 n=20) GlobalIntN1000-32 6.268n ± 1% 9.581n ± 1% +52.87% (p=0.000 n=20) IntN1000-32 10.33n ± 2% 10.45n ± 1% ~ (p=0.233 n=20) Int64N1000-32 10.98n ± 2% 11.01n ± 1% ~ (p=0.401 n=20) Int64N1e8-32 11.19n ± 2% 10.97n ± 1% ~ (p=0.033 n=20) Int64N1e9-32 11.06n ± 1% 11.08n ± 1% ~ (p=0.498 n=20) Int64N2e9-32 11.10n ± 1% 11.01n ± 2% ~ (p=0.995 n=20) Int64N1e18-32 15.23n ± 2% 15.04n ± 1% ~ (p=0.973 n=20) Int64N2e18-32 15.89n ± 1% 15.85n ± 1% ~ (p=0.409 n=20) Int64N4e18-32 18.96n ± 2% 19.34n ± 2% ~ (p=0.048 n=20) Int32N1000-32 10.46n ± 2% 10.44n ± 2% ~ (p=0.480 n=20) Int32N1e8-32 10.46n ± 2% 10.49n ± 2% ~ (p=0.951 n=20) Int32N1e9-32 10.28n ± 2% 10.26n ± 1% ~ (p=0.431 n=20) Int32N2e9-32 10.50n ± 2% 10.44n ± 2% ~ (p=0.249 n=20) Float32-32 13.80n ± 2% 13.80n ± 2% ~ (p=0.751 n=20) Float64-32 23.55n ± 2% 23.87n ± 0% ~ (p=0.408 n=20) ExpFloat64-32 15.36n ± 1% 15.29n ± 2% ~ (p=0.316 n=20) NormFloat64-32 13.57n ± 1% 13.79n ± 1% +1.66% (p=0.005 n=20) Perm3-32 45.70n ± 2% 46.99n ± 2% +2.81% (p=0.001 n=20) Perm30-32 399.0n ± 1% 403.8n ± 1% +1.19% (p=0.006 n=20) Perm30ViaShuffle-32 349.0n ± 1% 350.4n ± 1% ~ (p=0.909 n=20) ShuffleOverhead-32 322.3n ± 1% 323.8n ± 1% ~ (p=0.410 n=20) Concurrent-32 3.331n ± 1% 7.312n ± 1% +119.50% (p=0.000 n=20) For #61716. Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2 Reviewed-on: https://go-review.googlesource.com/c/go/+/516860 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-12-05 20:34:30 +00:00
Rhys Hiltner	450ecbe905	runtime: profile contended lock calls Add runtime-internal locks to the mutex contention profile. Store up to one call stack responsible for lock contention on the M, until it's safe to contribute its value to the mprof table. Try to use that limited local storage space for a relatively large source of contention, and attribute any contention in stacks we're not able to store to a sentinel _LostContendedLock function. Avoid ballooning lock contention while manipulating the mprof table by attributing to that sentinel function any lock contention experienced while reporting lock contention. Guard collecting real call stacks with GODEBUG=profileruntimelocks=1, since the available data has mixed semantics; we can easily capture an M's own wait time, but we'd prefer for the profile entry of each critical section to describe how long it made the other Ms wait. It's too late in the Go 1.22 cycle to make the required changes to futex-based locks. When not enabled, attribute the time to the sentinel function instead. Fixes #57071 This is a roll-forward of https://go.dev/cl/528657, which was reverted in https://go.dev/cl/543660 Reason for revert: de-flakes tests (reduces dependence on fine-grained timers, correctly identifies contention on big-endian futex locks, attempts to measure contention in the semaphore implementation but only uses that secondary measurement to finish the test early, skips tests on single-processor systems) Change-Id: I31389f24283d85e46ad9ba8d4f514cb9add8dfb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/544195 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Rhys Hiltner <rhys@justin.tv> Run-TryBot: Rhys Hiltner <rhys@justin.tv>	2023-11-21 21:02:20 +00:00
Matthew Dempsky	468bc94188	Revert "runtime: profile contended lock calls" This reverts commit go.dev/cl/528657. Reason for revert: broke a lot of builders. Change-Id: I70c33062020e997c4df67b3eaa2e886cf0da961e Reviewed-on: https://go-review.googlesource.com/c/go/+/543660 Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-11-20 13:20:29 +00:00
Rhys Hiltner	0b31a46f1f	runtime: profile contended lock calls Add runtime-internal locks to the mutex contention profile. Store up to one call stack responsible for lock contention on the M, until it's safe to contribute its value to the mprof table. Try to use that limited local storage space for a relatively large source of contention, and attribute any contention in stacks we're not able to store to a sentinel _LostContendedLock function. Avoid ballooning lock contention while manipulating the mprof table by attributing to that sentinel function any lock contention experienced while reporting lock contention. Guard collecting real call stacks with GODEBUG=profileruntimelocks=1, since the available data has mixed semantics; we can easily capture an M's own wait time, but we'd prefer for the profile entry of each critical section to describe how long it made the other Ms wait. It's too late in the Go 1.22 cycle to make the required changes to futex-based locks. When not enabled, attribute the time to the sentinel function instead. Fixes #57071 Change-Id: I3eee0ccbfc20f333b56f20d8725dfd7f3a526b41 Reviewed-on: https://go-review.googlesource.com/c/go/+/528657 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Rhys Hiltner <rhys@justin.tv> Reviewed-by: Than McIntosh <thanm@google.com>	2023-11-17 23:04:58 +00:00
Michael Pratt	6ef98ac87c	runtime/metrics: add STW stopping and total time metrics This CL adds four new time histogram metrics: /sched/pauses/stopping/gc:seconds /sched/pauses/stopping/other:seconds /sched/pauses/total/gc:seconds /sched/pauses/total/other:seconds The "stopping" metrics measure the time taken to start a stop-the-world pause. i.e., how long it takes stopTheWorldWithSema to stop all Ps. This can be used to detect STW struggling to preempt Ps. The "total" metrics measure the total duration of a stop-the-world pause, from starting to stop-the-world until the world is started again. This includes the time spent in the "start" phase. The "gc" metrics are used for GC-related STW pauses. The "other" metrics are used for all other STW pauses. All of these metrics start timing in stopTheWorldWithSema only after successfully acquiring sched.lock, thus excluding lock contention on sched.lock. The reasoning behind this is that while waiting on sched.lock the world is not stopped at all (all other Ps can run), so the impact of this contention is primarily limited to the goroutine attempting to stop-the-world. Additionally, we already have some visibility into sched.lock contention via contention profiles (#57071). /sched/pauses/total/gc:seconds is conceptually equivalent to /gc/pauses:seconds, so the latter is marked as deprecated and returns the same histogram as the former. In the implementation, there are a few minor differences: * For both mark and sweep termination stops, /gc/pauses:seconds started timing prior to calling startTheWorldWithSema, thus including lock contention. These details are minor enough, that I do not believe the slight change in reporting will matter. For mark termination stops, moving timing stop into startTheWorldWithSema does have the side effect of requiring moving other GC metric calculations outside of the STW, as they depend on the same end time. Fixes #63340 Change-Id: Iacd0bab11bedab85d3dcfb982361413a7d9c0d05 Reviewed-on: https://go-review.googlesource.com/c/go/+/534161 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-11-15 16:49:45 +00:00
Michael Anthony Knyszek	130baf3d42	runtime: improve tickspersecond Currently tickspersecond forces a 100 millisecond sleep the first time it's called. This isn't great for profiling short-lived programs, since both CPU profiling and block profiling might call into it. 100 milliseconds is a long time, but it's chosen to try and capture a decent estimate of the conversion on platform with course-granularity clocks. If the granularity is 15 ms, it'll only be 15% off at worst. Let's try a different strategy. First, let's require 5 milliseconds of time to have elapsed at a minimum. This should be plenty on platforms with nanosecond time granularity from the system clock, provided the caller of tickspersecond intends to use it for calculating durations, not timestamps. Next, grab a timestamp as close to process start as possible, so that we can cover some of that 5 millisecond just during runtime start. Finally, this function is only ever called from normal goroutine contexts. Let's do a regular goroutine sleep instead of a thread-level sleep under a runtime lock, which has all sorts of nasty effects on preemption. While we're here, let's also rename tickspersecond to ticksPerSecond. Also, let's write down some explicit rules of thumb on when to use this function. Clocks are hard, and using this for timestamp conversion is likely to make lining up those timestamps with other clocks on the system difficult if not impossible. Note that while this improves ticksPerSecond on platforms with good clocks, we still end up with a pretty coarse sleep on platforms with coarse clocks, and a pretty coarse result. On these platforms, keep the minimum required elapsed time at 100 ms. There's not much we can do about these platforms except spin and try to catch the clock boundary, but at 10+ ms of granularity, that might be a lot of spinning. Fixes #63103. Fixes #63078. Change-Id: Ic32a4ba70a03bdf5c13cb80c2669c4064aa4cca2 Reviewed-on: https://go-review.googlesource.com/c/go/+/538898 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2023-11-09 22:38:24 +00:00
cui fliter	6a1bbca2b3	runtime: add available godoc link Change-Id: Ifb4844efddcb0369b0302eeab72394eeaf5c8072 Reviewed-on: https://go-review.googlesource.com/c/go/+/540022 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: shuang cui <imcusg@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-11-08 16:59:11 +00:00
Rhys Hiltner	9cdcb01320	runtime/pprof: include labels for caller of goroutine profile The goroutine profile has close to three code paths for adding a goroutine record to the goroutine profile: one for the goroutine that requested the profile, one for every other goroutine, plus some special handling for the finalizer goroutine. The first of those captured the goroutine stack, but neglected to include that goroutine's labels. Update the tests to check for the inclusion of labels for all three types of goroutines, and include labels for the creator of the goroutine profile. For #63712 Change-Id: Id5387a5f536d3c37268c240e0b6db3d329a3d632 Reviewed-on: https://go-review.googlesource.com/c/go/+/537515 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Rhys Hiltner <rhys@justin.tv> Reviewed-by: David Chase <drchase@google.com>	2023-10-25 17:37:34 +00:00
Andy Pan	a7c3de7052	runtime: document maxStack and m.createstack in more details Change-Id: If93b6cfa5a598a5f4101c879a0cd88a194e4a6aa Reviewed-on: https://go-review.googlesource.com/c/go/+/518116 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Andy Pan <panjf2000@gmail.com>	2023-08-21 17:10:00 +00:00
Russ Cox	1c00354013	runtime: change mutex profile to count every blocked goroutine The pprof mutex profile was meant to match the Google C++ (now Abseil) mutex profiler, originally designed and implemented by Mike Burrows. When we worked on the Go version, pjw and I missed that C++ counts the time each thread is blocked, even if multiple threads are blocked on a mutex. That is, if 100 threads are blocked on the same mutex for the same 10ms, that still counts as 1000ms of contention in C++. In Go, to date, /debug/pprof/mutex has counted that as only 10ms of contention. If 100 goroutines are blocked on one mutex and only 1 goroutine is blocked on another mutex, we probably do want to see the first mutex as being more contended, so the Abseil approach is the more useful one. This CL adopts "contention scales with number of goroutines blocked", to better match Abseil [1]. However, it still makes sure to attribute the time to the unlock that caused the backup, not subsequent innocent unlocks that were affected by the congestion. In this way it still gives more accurate profiles than Abseil does. [1] https://github.com/abseil/abseil-cpp/blob/lts_2023_01_25/absl/synchronization/mutex.cc#L2390 Fixes #61015. Change-Id: I7eb9e706867ffa8c0abb5b26a1b448f6eba49331 Reviewed-on: https://go-review.googlesource.com/c/go/+/506415 Run-TryBot: Russ Cox <rsc@golang.org> Auto-Submit: Russ Cox <rsc@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2023-08-17 01:30:31 +00:00
Michael Anthony Knyszek	b1aadd034c	runtime: emit STW events for all pauses, not just those for the GC Currently STW events are only emitted for GC STWs. There's little reason why the trace can't contain events for every STW: they're rare so don't take up much space in the trace, yet being able to see when the world was stopped is often critical to debugging certain latency issues, especially when they stem from user-level APIs. This change adds new "kinds" to the EvGCSTWStart event, renames the GCSTW events to just "STW," and lets the parser deal with unknown STW kinds for future backwards compatibility. But, this change must break trace compatibility, so it bumps the trace version to Go 1.21. This change also includes a small cleanup in the trace command, which previously checked for STW events when deciding whether user tasks overlapped with a GC. Looking at the source, I don't see a way for STW events to ever enter the stream that that code looks at, so that condition has been deleted. Change-Id: I9a5dc144092c53e92eb6950e9a5504a790ac00cf Reviewed-on: https://go-review.googlesource.com/c/go/+/494495 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>	2023-05-19 17:06:45 +00:00
David Chase	a2838ec5f2	runtime: redefine _type to abi.Type; add rtype for methods. Change-Id: I1c478b704d84811caa209006c657dda82d9c4cf9 Reviewed-on: https://go-review.googlesource.com/c/go/+/488435 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2023-05-11 04:50:30 +00:00
Austin Clements	3e360b035f	runtime: new API for filling PC traceback buffers Currently, filling PC traceback buffers is one of the jobs of gentraceback. This moves it into a new function, tracebackPCs, with a simple API built around unwinder, and changes all callers to use this new API. Updates #54466. Change-Id: Id2038bded81bf533a5a4e71178a7c014904d938c Reviewed-on: https://go-review.googlesource.com/c/go/+/468300 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2023-03-10 17:59:37 +00:00
Keith Randall	469afbcc46	runtime: remove go119ConcurrentGoroutineProfile flag Change-Id: If7a248374dcb2c276d2d85a4863eb2ed1bc246a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/463226 Run-TryBot: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Rhys Hiltner <rhys@justin.tv>	2023-01-28 19:56:15 +00:00
Nick Ripley	30b1af00ff	runtime/pprof: scale mutex profile samples when they are recorded Samples in the mutex profile have their count and duration scaled according to the probability they were sampled. This is done when the profile is actually requested. The adjustment is done using to the current configured sampling rate. However, if the sample rate is changed after a specific sample is recorded, then the sample will be scaled incorrectly. In particular, if the sampling rate is changed to 0, all of the samples in the encoded profile will have 0 count and duration. This means the profile will be "empty", even if it should have had samples. This CL scales the samples in the profile when they are recorded, rather than when the profile is requested. This matches what is currently done for the block profile. With this change, neither the block profile nor mutex profile are scaled when they are encoded, so the logic for scaling the samples can be removed. Change-Id: If228cf39284385aa8fb9a2d62492d839e02f027f Reviewed-on: https://go-review.googlesource.com/c/go/+/443056 Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2022-11-15 17:33:07 +00:00
Wang Deyu	f63b268b9a	runtime: avoid initializing MemProfileRate in init function Fixes #55100 Change-Id: Ibbff921e74c3a416fd8bb019d20410273961c015 Reviewed-on: https://go-review.googlesource.com/c/go/+/431315 Auto-Submit: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2022-10-05 17:31:56 +00:00
Leonard Wang	bd5595d7fa	runtime: refactor finalizer goroutine status Use an atomic.Uint32 to represent the state of finalizer goroutine. fingStatus will only be changed to fingWake in non fingWait state, so it is safe to set fingRunningFinalizer status in runfinq. name old time/op new time/op delta Finalizer-8 592µs ± 4% 561µs ± 1% -5.22% (p=0.000 n=10+10) FinalizerRun-8 694ns ± 6% 675ns ± 7% ~ (p=0.059 n=9+8) Change-Id: I7e4da30cec98ce99f7d8cf4c97f933a8a2d1cae1 Reviewed-on: https://go-review.googlesource.com/c/go/+/400134 Reviewed-by: Joedian Reid <joedian@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Michael Pratt <mpratt@google.com>	2022-09-05 08:28:34 +00:00
Cuong Manh Le	a719a78c1b	runtime: add and use runtime/internal/sys.NotInHeap Updates #46731 Change-Id: Ic2208c8bb639aa1e390be0d62e2bd799ecf20654 Reviewed-on: https://go-review.googlesource.com/c/go/+/421878 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2022-08-19 00:29:18 +00:00
Jun10ng	de8101d21b	runtime: fix typos "finializer" => "finalizer" Change-Id: Ia3c12bff8556b6a8d33b700c59357f47502757b1 GitHub-Last-Rev: c64cf47974020c8480039ba61d0890bdc07a3b0f GitHub-Pull-Request: golang/go#53917 Reviewed-on: https://go-review.googlesource.com/c/go/+/417915 Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Keith Randall <khr@google.com>	2022-07-18 18:04:23 +00:00
John Bampton	b2116f748a	all: fix spelling Change-Id: Iee18987c495d1d4bde9da888d454eea8079d3ebc GitHub-Last-Rev: ff5e01599ddf7deb3ab6ce190ba92eb02ae2cb15 GitHub-Pull-Request: golang/go#52949 Reviewed-on: https://go-review.googlesource.com/c/go/+/406915 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>	2022-05-17 21:46:33 +00:00
Michael Anthony Knyszek	8fdd277fe6	runtime: profile finalizer G more carefully in goroutine profile If SetFinalizer is never called, we might readgstatus on a nil fing variable, resulting in a crash. This change guards code that accesses fing by a nil check. Fixes #52821. Change-Id: I3e8e7004f97f073dc622e801a1d37003ea153a29 Reviewed-on: https://go-review.googlesource.com/c/go/+/405475 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Rhys Hiltner <rhys@justin.tv>	2022-05-10 15:43:40 +00:00
Rhys Hiltner	89c0dd829f	runtime: split mprof locks The profiles for memory allocations, sync.Mutex contention, and general blocking store their data in a shared hash table. The bookkeeping work at the end of a garbage collection cycle involves maintenance on each memory allocation record. Previously, a single lock guarded access to the hash table and the contents of all records. When a program has allocated memory at a large number of unique call stacks, the maintenance following every garbage collection can hold that lock for several milliseconds. That can prevent progress on all other goroutines by delaying acquirep's call to mcache.prepareForSweep, which needs the lock in mProf_Free to report when a profiled allocation is no longer in use. With no user goroutines making progress, it is in effect a multi-millisecond GC-related stop-the-world pause. Split the lock so the call to mProf_Flush no longer delays each P's call to mProf_Free: mProf_Free uses a lock on the memory records' N+1 cycle, and mProf_Flush uses locks on the memory records' accumulator and their N cycle. mProf_Malloc also no longer competes with mProf_Flush, as it uses a lock on the memory records' N+2 cycle. The profiles for sync.Mutex contention and general blocking now share a separate lock, and another lock guards insertions to the shared hash table (uncommon in the steady-state). Consumers of each type of profile take the matching accumulator lock, so will observe consistent count and magnitude values for each record. For #45894 Change-Id: I615ff80618d10e71025423daa64b0b7f9dc57daa Reviewed-on: https://go-review.googlesource.com/c/go/+/399956 Reviewed-by: Carlos Amedee <carlos@golang.org> Run-TryBot: Rhys Hiltner <rhys@justin.tv> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-05-03 20:49:59 +00:00
Rhys Hiltner	52bd1c4d6c	runtime: decrease STW pause for goroutine profile The goroutine profile needs to stop the world to get a consistent snapshot of all goroutines in the app. Leaving the world stopped while iterating over allgs leads to a pause proportional to the number of goroutines in the app (or its high-water mark). Instead, do only a fixed amount of bookkeeping while the world is stopped. Install a barrier so the scheduler confirms that a goroutine appears in the profile, with its stack recorded exactly as it was during the stop-the-world pause, before it allows that goroutine to execute. Iterate over allgs while the app resumes normal operations, adding each to the profile unless they've been scheduled in the meantime (and so have profiled themselves). Stop the world a second time to remove the barrier and do a fixed amount of cleanup work. This increases both the fixed overhead and per-goroutine CPU-time cost of GoroutineProfile. It also increases the wall-clock latency of the call to GoroutineProfile, since the scheduler may interrupt it to execute other goroutines. name old time/op new time/op delta GoroutineProfile/small/loaded-8 1.05ms ± 5% 4.99ms ±31% +376.85% (p=0.000 n=10+9) GoroutineProfile/sparse/loaded-8 1.04ms ± 4% 3.61ms ±27% +246.61% (p=0.000 n=10+10) GoroutineProfile/large/loaded-8 7.69ms ±17% 20.35ms ± 4% +164.50% (p=0.000 n=10+10) GoroutineProfile/small/idle 958µs ± 0% 1820µs ±23% +89.91% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.00ms ± 3% 1.52ms ±17% +51.18% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.01ms ± 4% 1.47ms ± 7% +45.28% (p=0.000 n=9+9) GoroutineProfile/sparse/idle 980µs ± 1% 1403µs ± 2% +43.22% (p=0.000 n=9+10) GoroutineProfile/large/idle-8 7.19ms ± 8% 8.43ms ±21% +17.22% (p=0.011 n=10+10) PingPongHog 511ns ± 8% 585ns ± 9% +14.39% (p=0.000 n=10+10) GoroutineProfile/large/idle 6.71ms ± 0% 7.58ms ± 3% +13.08% (p=0.000 n=8+10) PingPongHog-8 469ns ± 8% 509ns ±12% +8.62% (p=0.010 n=9+10) WakeupParallelSyscall/5µs 216µs ± 4% 229µs ± 3% +6.06% (p=0.000 n=10+9) WakeupParallelSyscall/5µs-8 147µs ± 1% 149µs ± 2% +1.12% (p=0.009 n=10+10) WakeupParallelSyscall/2µs-8 140µs ± 0% 142µs ± 1% +1.11% (p=0.001 n=10+9) WakeupParallelSyscall/50µs-8 236µs ± 0% 238µs ± 1% +1.08% (p=0.000 n=9+10) WakeupParallelSyscall/1µs-8 138µs ± 0% 140µs ± 1% +1.05% (p=0.013 n=10+9) Matmult 8.52ns ± 1% 8.61ns ± 0% +0.98% (p=0.002 n=10+8) WakeupParallelSyscall/10µs-8 157µs ± 1% 158µs ± 1% +0.58% (p=0.003 n=10+8) CreateGoroutinesSingle-8 328ns ± 0% 330ns ± 1% +0.57% (p=0.000 n=9+9) WakeupParallelSpinning/100µs-8 343µs ± 0% 344µs ± 1% +0.30% (p=0.015 n=8+8) WakeupParallelSyscall/20µs-8 178µs ± 0% 178µs ± 0% +0.18% (p=0.043 n=10+9) StackGrowthDeep-8 22.8µs ± 0% 22.9µs ± 0% +0.12% (p=0.006 n=10+10) StackGrowth 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=8+9) WakeupParallelSpinning/0s 10.7µs ± 0% 10.7µs ± 0% +0.08% (p=0.000 n=9+9) WakeupParallelSpinning/5µs 30.7µs ± 0% 30.7µs ± 0% +0.04% (p=0.000 n=10+10) WakeupParallelSpinning/100µs 411µs ± 0% 411µs ± 0% +0.03% (p=0.000 n=10+9) WakeupParallelSpinning/2µs 18.7µs ± 0% 18.7µs ± 0% +0.02% (p=0.026 n=10+10) WakeupParallelSpinning/20µs-8 93.0µs ± 0% 93.0µs ± 0% +0.01% (p=0.021 n=9+10) StackGrowth-8 216ns ± 0% 216ns ± 0% ~ (p=0.209 n=10+10) CreateGoroutinesParallel-8 49.5ns ± 2% 49.3ns ± 1% ~ (p=0.591 n=10+10) CreateGoroutinesSingle 699ns ±20% 748ns ±19% ~ (p=0.353 n=10+10) WakeupParallelSpinning/0s-8 15.9µs ± 2% 16.0µs ± 3% ~ (p=0.315 n=10+10) WakeupParallelSpinning/1µs 14.6µs ± 0% 14.6µs ± 0% ~ (p=0.513 n=10+10) WakeupParallelSpinning/2µs-8 24.2µs ± 3% 24.1µs ± 2% ~ (p=0.971 n=10+10) WakeupParallelSpinning/10µs 50.7µs ± 0% 50.7µs ± 0% ~ (p=0.101 n=10+10) WakeupParallelSpinning/20µs 90.7µs ± 0% 90.7µs ± 0% ~ (p=0.898 n=10+10) WakeupParallelSpinning/50µs 211µs ± 0% 211µs ± 0% ~ (p=0.382 n=10+10) WakeupParallelSyscall/0s-8 137µs ± 1% 138µs ± 0% ~ (p=0.075 n=10+10) WakeupParallelSyscall/1µs 216µs ± 1% 219µs ± 3% ~ (p=0.065 n=10+9) WakeupParallelSyscall/2µs 214µs ± 7% 219µs ± 1% ~ (p=0.101 n=10+8) WakeupParallelSyscall/50µs 317µs ± 5% 326µs ± 4% ~ (p=0.123 n=10+10) WakeupParallelSyscall/100µs 450µs ± 9% 459µs ± 8% ~ (p=0.247 n=10+10) WakeupParallelSyscall/100µs-8 337µs ± 0% 338µs ± 1% ~ (p=0.089 n=10+10) WakeupParallelSpinning/5µs-8 32.2µs ± 0% 32.2µs ± 0% -0.05% (p=0.026 n=9+10) WakeupParallelSpinning/50µs-8 216µs ± 0% 216µs ± 0% -0.12% (p=0.004 n=10+10) WakeupParallelSpinning/1µs-8 20.6µs ± 0% 20.5µs ± 0% -0.22% (p=0.014 n=10+10) WakeupParallelSpinning/10µs-8 54.5µs ± 0% 54.2µs ± 1% -0.57% (p=0.000 n=10+10) CreateGoroutines-8 213ns ± 1% 211ns ± 1% -0.86% (p=0.002 n=10+10) CreateGoroutinesCapture 1.03µs ± 0% 1.02µs ± 0% -0.91% (p=0.000 n=10+10) CreateGoroutinesCapture-8 1.32µs ± 1% 1.31µs ± 1% -1.06% (p=0.001 n=10+9) CreateGoroutines 188ns ± 0% 186ns ± 0% -1.06% (p=0.000 n=9+10) CreateGoroutinesParallel 188ns ± 0% 186ns ± 0% -1.27% (p=0.000 n=8+10) WakeupParallelSyscall/0s 210µs ± 3% 207µs ± 3% -1.60% (p=0.043 n=10+10) StackGrowthDeep 121µs ± 1% 119µs ± 1% -1.70% (p=0.000 n=9+10) Matmult-8 1.82ns ± 3% 1.78ns ± 3% -2.16% (p=0.020 n=10+10) WakeupParallelSyscall/20µs 281µs ± 3% 269µs ± 4% -4.44% (p=0.000 n=10+10) WakeupParallelSyscall/10µs 239µs ± 3% 228µs ± 9% -4.70% (p=0.001 n=10+10) GoroutineProfile/sparse-nil/idle-8 485µs ± 2% 12µs ± 4% -97.56% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle-8 484µs ± 2% 12µs ± 1% -97.60% (p=0.000 n=10+7) GoroutineProfile/small-nil/loaded-8 487µs ± 2% 11µs ± 3% -97.68% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/loaded-8 507µs ± 4% 11µs ± 6% -97.78% (p=0.000 n=10+10) GoroutineProfile/large-nil/idle-8 709µs ± 2% 11µs ± 4% -98.38% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 717µs ± 2% 11µs ± 3% -98.43% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle 465µs ± 3% 1µs ± 1% -99.84% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle 493µs ± 3% 1µs ± 0% -99.85% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle 716µs ± 1% 1µs ± 2% -99.89% (p=0.000 n=7+10) name old alloc/op new alloc/op delta CreateGoroutinesCapture 144B ± 0% 144B ± 0% ~ (all equal) CreateGoroutinesCapture-8 144B ± 0% 144B ± 0% ~ (all equal) name old allocs/op new allocs/op delta CreateGoroutinesCapture 5.00 ± 0% 5.00 ± 0% ~ (all equal) CreateGoroutinesCapture-8 5.00 ± 0% 5.00 ± 0% ~ (all equal) name old p50-ns new p50-ns delta GoroutineProfile/small/loaded-8 1.01M ± 3% 3.87M ±45% +282.15% (p=0.000 n=10+10) GoroutineProfile/sparse/loaded-8 1.02M ± 3% 2.43M ±41% +138.42% (p=0.000 n=10+10) GoroutineProfile/large/loaded-8 7.43M ±16% 17.28M ± 2% +132.43% (p=0.000 n=10+10) GoroutineProfile/small/idle 956k ± 0% 1559k ±16% +63.03% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.01M ± 3% 1.45M ± 7% +44.31% (p=0.000 n=10+9) GoroutineProfile/sparse/idle 977k ± 1% 1399k ± 2% +43.20% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.00M ± 3% 1.41M ± 3% +40.47% (p=0.000 n=10+10) GoroutineProfile/large/idle-8 6.97M ± 1% 8.41M ±25% +20.54% (p=0.003 n=8+10) GoroutineProfile/large/idle 6.71M ± 1% 7.46M ± 4% +11.15% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle-8 483k ± 3% 13k ± 3% -97.41% (p=0.000 n=10+9) GoroutineProfile/small-nil/idle-8 483k ± 2% 12k ± 1% -97.43% (p=0.000 n=10+8) GoroutineProfile/small-nil/loaded-8 484k ± 3% 10k ± 2% -97.93% (p=0.000 n=10+8) GoroutineProfile/sparse-nil/loaded-8 492k ± 2% 10k ± 4% -97.97% (p=0.000 n=10+8) GoroutineProfile/large-nil/idle-8 708k ± 2% 12k ±15% -98.36% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 714k ± 2% 10k ± 2% -98.60% (p=0.000 n=10+8) GoroutineProfile/sparse-nil/idle 459k ± 1% 1k ± 1% -99.85% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle 477k ± 1% 1k ± 0% -99.85% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle 712k ± 1% 1k ± 1% -99.90% (p=0.000 n=7+10) name old p90-ns new p90-ns delta GoroutineProfile/small/loaded-8 1.13M ±10% 7.49M ±35% +562.07% (p=0.000 n=10+10) GoroutineProfile/sparse/loaded-8 1.10M ±12% 4.58M ±31% +318.02% (p=0.000 n=10+9) GoroutineProfile/large/loaded-8 8.78M ±24% 27.83M ± 2% +217.00% (p=0.000 n=10+10) GoroutineProfile/small/idle 967k ± 0% 2909k ±50% +200.91% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.02M ± 3% 1.96M ±76% +92.99% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.07M ±17% 1.55M ±12% +45.23% (p=0.000 n=10+10) GoroutineProfile/sparse/idle 992k ± 1% 1417k ± 3% +42.79% (p=0.000 n=9+10) GoroutineProfile/large/idle 6.73M ± 0% 7.99M ± 8% +18.80% (p=0.000 n=8+10) GoroutineProfile/large/idle-8 8.20M ±25% 9.18M ±25% ~ (p=0.315 n=10+10) GoroutineProfile/sparse-nil/idle-8 495k ± 3% 13k ± 1% -97.36% (p=0.000 n=10+9) GoroutineProfile/small-nil/idle-8 494k ± 2% 13k ± 3% -97.36% (p=0.000 n=10+10) GoroutineProfile/small-nil/loaded-8 496k ± 2% 13k ± 1% -97.41% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/loaded-8 544k ±11% 13k ± 1% -97.62% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle-8 724k ± 1% 13k ± 3% -98.20% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 729k ± 3% 13k ± 2% -98.23% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle 476k ± 4% 1k ± 1% -99.85% (p=0.000 n=9+10) GoroutineProfile/small-nil/idle 537k ±10% 1k ± 0% -99.87% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle 729k ± 0% 1k ± 1% -99.90% (p=0.000 n=7+10) name old p99-ns new p99-ns delta GoroutineProfile/sparse/loaded-8 1.27M ±33% 20.49M ±17% +1514.61% (p=0.000 n=10+10) GoroutineProfile/small/loaded-8 1.37M ±29% 20.48M ±23% +1399.35% (p=0.000 n=10+10) GoroutineProfile/large/loaded-8 9.76M ±23% 39.98M ±22% +309.52% (p=0.000 n=10+8) GoroutineProfile/small/idle 976k ± 1% 3367k ±55% +244.94% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.03M ± 3% 2.50M ±65% +142.30% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.17M ±34% 1.70M ±14% +45.15% (p=0.000 n=10+10) GoroutineProfile/sparse/idle 1.02M ± 3% 1.45M ± 4% +42.64% (p=0.000 n=9+10) GoroutineProfile/large/idle 6.92M ± 2% 9.00M ± 7% +29.98% (p=0.000 n=8+9) GoroutineProfile/large/idle-8 8.74M ±23% 9.90M ±24% ~ (p=0.190 n=10+10) GoroutineProfile/sparse-nil/idle-8 508k ± 4% 16k ± 2% -96.90% (p=0.000 n=10+9) GoroutineProfile/small-nil/idle-8 508k ± 4% 16k ± 3% -96.91% (p=0.000 n=10+9) GoroutineProfile/small-nil/loaded-8 542k ± 5% 15k ±15% -97.15% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/loaded-8 649k ±16% 15k ±18% -97.67% (p=0.000 n=10+10) GoroutineProfile/large-nil/idle-8 738k ± 2% 16k ± 2% -97.86% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 765k ± 4% 15k ±17% -98.03% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle 539k ±26% 1k ±17% -99.84% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle 659k ±25% 1k ± 0% -99.84% (p=0.000 n=10+8) GoroutineProfile/large-nil/idle 760k ± 2% 1k ±22% -99.88% (p=0.000 n=9+10) Fixes #33250 For #50794 Change-Id: I862a2bc4e991cec485f21a6fce4fca84f2c6435b Reviewed-on: https://go-review.googlesource.com/c/go/+/387415 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Rhys Hiltner <rhys@justin.tv> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-05-03 20:49:34 +00:00

1 2

99 Commits