cybercyst/go - go - Gitea: Git with a cup of tea

mirror of https://github.com/golang/go.git synced 2025-05-05 23:53:05 +00:00

Author	SHA1	Message	Date
Michael Pratt	7b874619be	runtime/cgo: store M for C-created thread in pthread key This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and CL 485316 incorporated. CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 482975 is a followup fix to a C declaration in testprogcgo. CL 485315 is a followup fix for x_cgo_getstackbound on Illumos. CL 485316 is a followup cleanup for ppc64 assembly. [Original CL 481061 description] This reapplies CL 392854, with the followup fixes in CL 479255, CL 479915, and CL 481057 incorporated. CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 479255 is a followup fix for a small bug in ARM assembly code. CL 479915 is another followup fix to address C to Go calls after the C code uses some stack, but that CL is also buggy. CL 481057, by Michael Knyszek, is a followup fix for a memory leak bug of CL 479915. [Original CL 392854 description] In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls. So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call. Instead, we only dropm while the C thread exits, so the extra M won't leak. When invoking a Go function from C: Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor. And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits. When returning back to C: Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C. This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows. This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread. For the newly added BenchmarkCGoInCThread, some benchmark results: 1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz [CL 479915 description] Currently, when C calls into Go the first time, we grab an M using needm, which sets m.g0's stack bounds using the SP. We don't know how big the stack is, so we simply assume 32K. Previously, when the Go function returns to C, we drop the M, and the next time C calls into Go, we put a new stack bound on the g0 based on the current SP. After CL 392854, we don't drop the M, and the next time C calls into Go, we reuse the same g0, without recomputing the stack bounds. If the C code uses quite a bit of stack space before calling into Go, the SP may be well below the 32K stack bound we assumed, so the runtime thinks the g0 stack overflows. This CL makes needm get a more accurate stack bound from pthread. (In some platforms this may still be a guess as we don't know exactly where we are in the C stack), but it is probably better than simply assuming 32K. [CL 485500 description] CL 479915 passed the G to _cgo_getstackbound for direct updates to gp.stack.lo. A G can be reused on a new thread after the previous thread exited. This could trigger the C TSAN race detector because it couldn't see the synchronization in Go (lockextra) preventing the same G from being used on multiple threads at the same time. We work around this by passing the address of a stack variable to _cgo_getstackbound rather than the G. The stack is generally unique per thread, so TSAN won't see the same address from multiple threads. Even if stacks are reused across threads by pthread, C TSAN should see the synchonization in the stack allocator. A regression test is added to misc/cgo/testsanitizer. Fixes #51676. Fixes #59294. Fixes #59678. Change-Id: Ic62be31a06ee83568215e875a891df37084e08ca Reviewed-on: https://go-review.googlesource.com/c/go/+/485500 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Michael Pratt <mpratt@google.com>	2023-04-26 19:25:46 +00:00
Michael Pratt	94850c6f79	Revert "runtime/cgo: store M for C-created thread in pthread key" This reverts CL 481061. Reason for revert: When built with C TSAN, x_cgo_getstackbound triggers race detection on `g->stacklo` because the synchronization is in Go, which isn't instrumented. For #51676. For #59294. For #59678. Change-Id: I38afcda9fcffd6537582a39a5214bc23dc147d47 Reviewed-on: https://go-review.googlesource.com/c/go/+/485275 TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Than McIntosh <thanm@google.com>	2023-04-17 18:47:08 +00:00
Michael Pratt	0e9b2bc39a	Revert "runtime/cgo: use pthread_attr_get_np on Illumos" This reverts CL 481795. Reason for revert: CL 481061 causes C TSAN failures and must be reverted. See CL 485275. This CL depends on CL 481061. For #59678. Change-Id: I5ec1f495154205ebdf19cd44c6e6452a7a3606f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/485315 Auto-Submit: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Than McIntosh <thanm@google.com>	2023-04-17 18:43:56 +00:00
Cherry Mui	ad87a124be	runtime/cgo: use pthread_attr_get_np on Illumos While Solaris supports pthread_getattr_np, Illumos doesn't... Instead, Illumos supports pthread_attr_get_np. Updates #59294. Change-Id: I2c66dad79b8bf3d510352875bf21d04415f23eeb Reviewed-on: https://go-review.googlesource.com/c/go/+/481795 TryBot-Bypass: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2023-04-04 03:37:07 +00:00
doujiang24	ccad8a9f9c	runtime/cgo: store M for C-created thread in pthread key This reapplies CL 392854, with the followup fixes in CL 479255, CL 479915, and CL 481057 incorporated. CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 479255 is a followup fix for a small bug in ARM assembly code. CL 479915 is another followup fix to address C to Go calls after the C code uses some stack, but that CL is also buggy. CL 481057, by Michael Knyszek, is a followup fix for a memory leak bug of CL 479915. [Original CL 392854 description] In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls. So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call. Instead, we only dropm while the C thread exits, so the extra M won't leak. When invoking a Go function from C: Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor. And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits. When returning back to C: Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C. This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows. This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread. For the newly added BenchmarkCGoInCThread, some benchmark results: 1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz [CL 479915 description] Currently, when C calls into Go the first time, we grab an M using needm, which sets m.g0's stack bounds using the SP. We don't know how big the stack is, so we simply assume 32K. Previously, when the Go function returns to C, we drop the M, and the next time C calls into Go, we put a new stack bound on the g0 based on the current SP. After CL 392854, we don't drop the M, and the next time C calls into Go, we reuse the same g0, without recomputing the stack bounds. If the C code uses quite a bit of stack space before calling into Go, the SP may be well below the 32K stack bound we assumed, so the runtime thinks the g0 stack overflows. This CL makes needm get a more accurate stack bound from pthread. (In some platforms this may still be a guess as we don't know exactly where we are in the C stack), but it is probably better than simply assuming 32K. Fixes #51676. Fixes #59294. Change-Id: I9bf1400106d5c08ce621d2ed1df3a2d9e3f55494 Reviewed-on: https://go-review.googlesource.com/c/go/+/481061 Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: DeJiang Zhu (doujiang) <doujiang24@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-04-03 18:34:11 +00:00
Cherry Mui	63ef9059a2	Revert "runtime: get a better g0 stack bound in needm" This reverts CL 479915. Reason for revert: breaks a lot google internal tests. Change-Id: I13a9422e810af7ba58cbf4a7e6e55f4d8cc0ca51 Reviewed-on: https://go-review.googlesource.com/c/go/+/481055 Reviewed-by: Chressie Himpel <chressie@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-03-31 18:42:48 +00:00
Cherry Mui	443eb9757c	runtime: get a better g0 stack bound in needm Currently, when C calls into Go the first time, we grab an M using needm, which sets m.g0's stack bounds using the SP. We don't know how big the stack is, so we simply assume 32K. Previously, when the Go function returns to C, we drop the M, and the next time C calls into Go, we put a new stack bound on the g0 based on the current SP. After CL 392854, we don't drop the M, and the next time C calls into Go, we reuse the same g0, without recomputing the stack bounds. If the C code uses quite a bit of stack space before calling into Go, the SP may be well below the 32K stack bound we assumed, so the runtime thinks the g0 stack overflows. This CL makes needm get a more accurate stack bound from pthread. (In some platforms this may still be a guess as we don't know exactly where we are in the C stack), but it is probably better than simply assuming 32K. For #59294. Change-Id: Ie52a8f931e0648d8753e4c1dbe45468b8748b527 Reviewed-on: https://go-review.googlesource.com/c/go/+/479915 Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2023-03-30 23:23:55 +00:00

7 Commits