This change modifies the semantics of waking the scavenger: rather than
wake on any update to pacing, wake when we know we will have work to do,
that is, when the sweeper is done. The current scavenger runs over the
address space just once per GC cycle, and we want to maximize the chance
that the scavenger observes the most attractive scavengable memory in
that pass (i.e. free memory with the highest address), so the timing is
important. By having the scavenger awaken and reset its search space
when the sweeper is done, we increase the chance that the scavenger will
observe the most attractive scavengable memory, because no more memory
will be freed that GC cycle (so the highest scavengable address should
now be available).
Furthermore, in applications that go idle, this means the background
scavenger will be awoken even if another GC doesn't happen, which isn't
true today.
However, we're unable to wake the scavenger directly from within the
sweeper; waking the scavenger involves modifying timers and readying
goroutines, the latter of which may trigger an allocation today (and the
sweeper may run during allocation!). Instead, we do the following:
1. Set a flag which is checked by sysmon. sysmon will clear the flag and
wake the scavenger.
2. Wake the scavenger unconditionally at sweep termination.
The idea behind this policy is that it gets us close enough to the state
above without having to deal with the complexity of waking the scavenger
in deep parts of the runtime. If the application goes idle and sweeping
finishes (so we don't reach sweep termination), then sysmon will wake
the scavenger. sysmon has a worst-case 20 ms delay in responding to this
signal, which is probably fine if the application is completely idle
anyway, but if the application is actively allocating, then the
proportional sweeper should help ensure that sweeping ends very close to
sweep termination, so sweep termination is a perfectly reasonable time
to wake up the scavenger.
Updates #35788.
Change-Id: I84289b37816a7d595d803c72a71b7f5c59d47e6b
Reviewed-on: https://go-review.googlesource.com/c/go/+/207998
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
All five calls to wakep are protected by the same check of nmidle and
nmspinning. Move this check into wakep.
Change-Id: I2094eec211ce551e462e87614578f37f1896ba38
Reviewed-on: https://go-review.googlesource.com/c/go/+/230757
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Currently, newproc1 allocates, initializes, and schedules a new
goroutine. We're about to change debug call injection in a way that
will need to create a new goroutine without immediately scheduling it.
To prepare for that, make scheduling the responsibility of newproc1's
caller. Currently, there's exactly one caller (newproc), so this
simply shifts that responsibility.
For #36365.
Change-Id: Idacd06b63e738982e840fe995d891bfd377ce23b
Reviewed-on: https://go-review.googlesource.com/c/go/+/229298
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
I took some of the infrastructure from Austin's lock logging CR
https://go-review.googlesource.com/c/go/+/192704 (with deadlock
detection from the logs), and developed a setup to give static lock
ranking for runtime locks.
Static lock ranking establishes a documented total ordering among locks,
and then reports an error if the total order is violated. This can
happen if a deadlock happens (by acquiring a sequence of locks in
different orders), or if just one side of a possible deadlock happens.
Lock ordering deadlocks cannot happen as long as the lock ordering is
followed.
Along the way, I found a deadlock involving the new timer code, which Ian fixed
via https://go-review.googlesource.com/c/go/+/207348, as well as two other
potential deadlocks.
See the constants at the top of runtime/lockrank.go to show the static
lock ranking that I ended up with, along with some comments. This is
great documentation of the current intended lock ordering when acquiring
multiple locks in the runtime.
I also added an array lockPartialOrder[] which shows and enforces the
current partial ordering among locks (which is embedded within the total
ordering). This is more specific about the dependencies among locks.
I don't try to check the ranking within a lock class with multiple locks
that can be acquired at the same time (i.e. check the ranking when
multiple hchan locks are acquired).
Currently, I am doing a lockInit() call to set the lock rank of most
locks. Any lock that is not otherwise initialized is assumed to be a
leaf lock (a very high rank lock), so that eliminates the need to do
anything for a bunch of locks (including all architecture-dependent
locks). For two locks, root.lock and notifyList.lock (only in the
runtime/sema.go file), it is not as easy to do lock initialization, so
instead, I am passing the lock rank with the lock calls.
For Windows compilation, I needed to increase the StackGuard size from
896 to 928 because of the new lock-rank checking functions.
Checking of the static lock ranking is enabled by setting
GOEXPERIMENT=staticlockranking before doing a run.
To make sure that the static lock ranking code has no overhead in memory
or CPU when not enabled by GOEXPERIMENT, I changed 'go build/install' so
that it defines a build tag (with the same name) whenever any experiment
has been baked into the toolchain (by checking Expstring()). This allows
me to avoid increasing the size of the 'mutex' type when static lock
ranking is not enabled.
Fixes#38029
Change-Id: I154217ff307c47051f8dae9c2a03b53081acd83a
Reviewed-on: https://go-review.googlesource.com/c/go/+/207619
Reviewed-by: Dan Scales <danscales@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change makes it so that worldsema isn't held across the mark phase.
This means that various operations like ReadMemStats may now stop the
world during the mark phase, reducing latency on such operations.
Only three such operations are still no longer allowed to occur during
marking: GOMAXPROCS, StartTrace, and StopTrace.
For the former it's because any change to GOMAXPROCS impacts GC mark
background worker scheduling and the details there are tricky.
For the latter two it's because tracing needs to observe consistent GC
start and GC end events, and if StartTrace or StopTrace may stop the
world during marking, then it's possible for it to see a GC end event
without a start or GC start event without an end, respectively.
To ensure that GOMAXPROCS and StartTrace/StopTrace cannot proceed until
marking is complete, the runtime now holds a new semaphore, gcsema,
across the mark phase just like it used to with worldsema.
This change is being landed once more after being reverted in the Go
1.14 release cycle, since CL 215157 allows it to have a positive
effect on system performance.
For the benchmark BenchmarkReadMemStatsLatency in the runtime, which
measures ReadMemStats latencies while the GC is exercised, the tail of
these latencies reduced dramatically on an 8-core machine:
name old 50%tile-ns new 50%tile-ns delta
ReadMemStatsLatency-8 4.40M ±74% 0.12M ± 2% -97.35% (p=0.008 n=5+5)
name old 90%tile-ns new 90%tile-ns delta
ReadMemStatsLatency-8 102M ± 6% 0M ±14% -99.79% (p=0.008 n=5+5)
name old 99%tile-ns new 99%tile-ns delta
ReadMemStatsLatency-8 147M ±18% 4M ±57% -97.43% (p=0.008 n=5+5)
Fixes#19812.
Change-Id: If66c3c97d171524ae29f0e7af4bd33509d9fd0bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/216557
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
In Go 1.13, when the network poller found a list of ready goroutines,
they were added to the global run queue. The timer goroutine would
typically sleep in a futex with a timeout, and when the timeout
expired the timer goroutine would either be handed off to an idle P
or added to the global run queue. The effect was that on a busy system
with no idle P's goroutines waiting for timeouts and goroutines waiting
for the network would start at the same priority.
That changed on tip with the new timer code. Now timer functions are
invoked directly from a P, and it happens that the functions used
by time.Sleep and time.After and time.Ticker add the newly ready
goroutines to the local run queue. When a P looks for work it will
prefer goroutines on the local run queue; in fact it will only
occasionally look at the global run queue, and even when it does it
will just pull one goroutine off. So on a busy system with both active
timers and active network connections the system can noticeably prefer
to run goroutines waiting for timers rather than goroutines waiting
for the network.
This CL undoes that change by, when possible, adding goroutines
waiting for the network to the local run queue of the P that checked.
This doesn't affect network poller checks done by sysmon, but it
does affect network poller checks done as each P enters the scheduler.
This CL also makes injecting a list into either the local or global run
queue more efficient, using bulk operations rather than individual ones.
Change-Id: I85a66ad74e4fc3b458256fb7ab395d06f0d2ffac
Reviewed-on: https://go-review.googlesource.com/c/go/+/216198
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Having an mcache field in both m and p is confusing, so remove it from m.
Always use mcache field from p. Use new variable mcache0 during bootstrap.
Change-Id: If2cba9f8bb131d911d512b61fd883a86cf62cc98
Reviewed-on: https://go-review.googlesource.com/c/go/+/205239
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This reverts commit 7b294cdd8df0a9523010f6ffc80c59e64578f34b, CL 182657.
Reason for revert: This change may be causing latency problems
for applications which call ReadMemStats, because it may cause
all goroutines to stop until the GC completes.
https://golang.org/cl/215157 fixes this problem, but it's too
late in the cycle to land that.
Updates #19812.
Change-Id: Iaa26f4dec9b06b9db2a771a44e45f58d0aa8f26d
Reviewed-on: https://go-review.googlesource.com/c/go/+/216358
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The timers code used to have a problem: if code started and stopped a
lot of timers, as would happen with, for example, lots of calls to
context.WithTimeout, then it would steadily use memory holding timers
that had stopped but not been removed from the timer heap.
That problem was fixed by CL 214299, which would remove all deleted
timers whenever they got to be more than 1/4 of the total number of
timers on the heap.
The timers code had a different problem: if there were some idle P's,
the running P's would have lock contention trying to steal their timers.
That problem was fixed by CL 214185, which only acquired the timer lock
if the next timer was ready to run or there were some timers to adjust.
Unfortunately, CL 214185 partially undid 214299, in that we could now
accumulate an increasing number of deleted timers while there were no
timers ready to run. This CL restores the 214299 behavior, by checking
whether there are lots of deleted timers without acquiring the lock.
This is a performance issue to consider for the 1.14 release.
Change-Id: I13c980efdcc2a46eb84882750c39e3f7c5b2e7c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/215722
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This reduces lock contention when only a few P's are running and
checking for whether they need to run timers on the sleeping P's.
Without this change the running P's would get lock contention
while looking at the sleeping P's timers. With this change a single
atomic load suffices to determine whether there are any ready timers.
Change-Id: Ie843782bd56df49867a01ecf19c47498ec827452
Reviewed-on: https://go-review.googlesource.com/c/go/+/214185
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
Whenever more than 1/4 of the timers on a P's heap are deleted,
remove them from the heap.
Change-Id: Iff63ed3d04e6f33ffc5c834f77f645c52c007e52
Reviewed-on: https://go-review.googlesource.com/c/go/+/214299
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
When a goroutine yields the remainder of its time to another goroutine
during direct semaphore handoff (as in an Unlock of a sync.Mutex in
starvation mode), it needs to signal that change to the execution
tracer. The discussion in CL 200577 didn't reach consensus on how best
to describe that, but pointed out that "traceEvGoSched / goroutine calls
Gosched" could be confusing.
Emit a "traceEvGoPreempt / goroutine is preempted" event in this case,
to allow the execution tracer to find a consistent event ordering
without being both specific and inaccurate about why the active
goroutine has changed.
Fixes#36186
Change-Id: Ic4ade19325126db2599aff6aba7cba028bb0bee9
Reviewed-on: https://go-review.googlesource.com/c/go/+/211797
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Dan Scales pointed out a theoretical deadlock in the runtime.
The timer code runs timer functions while holding the timers lock for a P.
The scavenger queues up a timer function that calls wakeScavenger,
which acquires the scavenger lock.
The scavengeSleep function acquires the scavenger lock,
then calls resetTimer which can call addInitializedTimer
which acquires the timers lock for the current P.
So there is a potential deadlock, in that the scavenger lock and
the timers lock for some P may both be acquired in different order.
It's not clear to me whether this deadlock can ever actually occur.
Issue 35532 describes another possible deadlock.
The pollSetDeadline function acquires pd.lock for some poll descriptor,
and in some cases calls resettimer which can in some cases acquire
the timers lock for the current P.
The timer code runs timer functions while holding the timers lock for a P.
The timer function for poll descriptors winds up in netpolldeadlineimpl
which acquires pd.lock.
So again there is a potential deadlock, in that the pd lock for some
poll descriptor and the timers lock for some P may both be acquired in
different order. I think this can happen if we change the deadline
for a network connection exactly as the former deadline expires.
Looking at the code, I don't see any reason why we have to hold
the timers lock while running a timer function.
This CL implements that change.
Updates #6239
Updates #27707Fixes#35532
Change-Id: I17792f5a0120e01ea07cf1b2de8434d5c10704dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/207348
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
In the discussion of CL 171828 we decided that it was not necessary to
acquire timersLock around the call to moveTimers, because the world is
stopped. However, that is not correct, as sysmon runs even when the world
is stopped, and it calls timeSleepUntil which looks through the timers.
timeSleepUntil acquires timersLock, but that doesn't help if moveTimers
is running at the same time.
Updates #6239
Updates #27707
Updates #35462
Change-Id: I346c5bde594c4aff9955ae430b37c2b6fc71567f
Reviewed-on: https://go-review.googlesource.com/c/go/+/206938
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Some, but not all, architectures mix in OS-provided random seeds when
initializing the fastrand state. The others have TODOs saying we need
to do the same. Lift that logic up in the architecture-independent
part, and use memhash to mix the seed instead of a simple addition.
Previously, dumping the fastrand state at initialization would yield
something like the following on linux-amd64, where the values in the
first column do not change between runs (as thread IDs are sequential
and always start at 0), and the values in the second column, while
changing every run, are pretty correlated:
first run:
0x0 0x44d82f1c
0x5f356495 0x44f339de
0xbe6ac92a 0x44f91cd8
0x1da02dbf 0x44fd91bc
0x7cd59254 0x44fee8a4
0xdc0af6e9 0x4547a1e0
0x3b405b7e 0x474c76fc
0x9a75c013 0x475309dc
0xf9ab24a8 0x4bffd075
second run:
0x0 0xa63fc3eb
0x5f356495 0xa6648dc2
0xbe6ac92a 0xa66c1c59
0x1da02dbf 0xa671bce8
0x7cd59254 0xa70e8287
0xdc0af6e9 0xa7129d2e
0x3b405b7e 0xa7379e2d
0x9a75c013 0xa7e4c64c
0xf9ab24a8 0xa7ecce07
With this change, we get initial states that appear to be much more
unpredictable, both within the same run as well as between runs:
0x11bddad7 0x97241c63
0x553dacc6 0x2bcd8523
0x62c01085 0x16413d92
0x6f40e9e6 0x7a138de6
0xa4898053 0x70d816f0
0x5ca5b433 0x188a395b
0x62778ca9 0xd462c3b5
0xd6e160e4 0xac9b4bd
0xb9571d65 0x597a981d
Change-Id: Ib22c530157d74200df0083f830e0408fd4aaea58
Reviewed-on: https://go-review.googlesource.com/c/go/+/203439
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
When we have already assigned the semaphore ticket to a specific
waiter, we want to get the waiter running as fast as possible since
no other G waiting on the semaphore can acquire it optimistically.
The net effect is that, when a sync.Mutex is contended, the code in
the critical section guarded by the Mutex gets a priority boost.
Fixes#33747
The original work was done in CL 200577 by Carlo Alberto Ferraris. The
change was reverted in CL 205817 because it broke the linux-arm64-packet
and solaris-amd64-oraclerel builders.
Change-Id: I76d79b1d63fd206ed1c57fe6900cb7ae9e4d46cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/206180
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This change adds a per-p free page cache which the page allocator may
allocate out of without a lock. The change also introduces a completely
lockless page allocator fast path.
Although the cache contains at most 64 pages (and usually less), the
vast majority (85%+) of page allocations are exactly 1 page in size.
Updates #35112.
Change-Id: I170bf0a9375873e7e3230845eb1df7e5cf741b78
Reviewed-on: https://go-review.googlesource.com/c/go/+/195701
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
This change adds a per-p mspan object cache similar to the sudog cache.
Unfortunately this cache can't quite operate like the sudog cache, since
it is used in contexts where write barriers are disallowed (i.e.
allocation codepaths), so rather than managing an array and a slice,
it's just an array and a length. A little bit more unsafe, but avoids
any write barriers.
The purpose of this change is to reduce the number of operations which
require the heap lock in allocation, paving the way for a lockless fast
path.
Updates #35112.
Change-Id: I32cfdcd8528fb7be985640e4f3a13cb98ffb7865
Reviewed-on: https://go-review.googlesource.com/c/go/+/196642
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
When we have already assigned the semaphore ticket to a specific
waiter, we want to get the waiter running as fast as possible since
no other G waiting on the semaphore can acquire it optimistically.
The net effect is that, when a sync.Mutex is contented, the code in
the critical section guarded by the Mutex gets a priority boost.
Fixes#33747
Change-Id: I9967f0f763c25504010651bdd7f944ee0189cd45
Reviewed-on: https://go-review.googlesource.com/c/go/+/200577
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Otherwise, we can get into a deadlock: sysmon takes the scheduler lock
and calls timeSleepUntil which takes each P's timer lock. Simultaneously,
some P calls runtimer (holding the P's own timer lock) which wakes up
the scavenger, calling goready, calling wakep, calling startm, getting
the scheduler lock. Now the sysmon thread is holding the scheduler lock
and trying to get a P's timer lock, while some other thread running on
that P is holding the P's timer lock and trying to get the scheduler lock.
So change sysmon to call timeSleepUntil without holding the scheduler
lock, and change timeSleepUntil to use allpLock, which is only held for
limited periods of time and should never compete with timer locks.
This hopefully
Fixes#35375
At least it should fix the linux-arm64-packet builder problems,
which occurred more reliably as that system has GOMAXPROCS == 96,
giving a lot more scope for this deadlock.
Change-Id: I7a7917daf7a4882e0b27ca416e4f6300cfaaa774
Reviewed-on: https://go-review.googlesource.com/c/go/+/205558
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
When dropping a P, if it has any timers, and if some thread is
sleeping in the netpoller, wake the netpoller to run the P's timers.
This mitigates races between the netpoller deciding how long to sleep
and a new timer being added.
In sysmon, if all P's are idle, check the timers to decide how long to sleep.
This avoids oversleeping if no thread is using the netpoller.
This can happen in particular if some threads use runtime.LockOSThread,
as those threads do not block in the netpoller.
Also, print the number of timers per P for GODEBUG=scheddetail=1.
Before this CL, TestLockedDeadlock2 would fail about 1% of the time.
With this CL, I ran it 150,000 times with no failures.
Updates #6239
Updates #27707Fixes#35274Fixes#35288
Change-Id: I7e5193e6c885e567f0b1ee023664aa3e2902fcd1
Reviewed-on: https://go-review.googlesource.com/c/go/+/204800
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
CL 200958 adds skipping empty init function feature without any tests
for it. A codegen test sounds ideal, but it's unlikely that we can make
one for now, so use a program to manipulate runtime/proc.go:initTask
directly.
Updates #34869
Change-Id: I2683b9a1ace36af6861af02a3a9fb18b3110b282
Reviewed-on: https://go-review.googlesource.com/c/go/+/204217
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This adds signal-based preemption to preemptone.
Since STW and forEachP ultimately use preemptone, this also makes
these work with async preemption.
This also makes freezetheworld more robust so tracebacks from fatal
panics should be far less likely to report "goroutine running on other
thread; stack unavailable".
For #10958, #24543. (This doesn't fix it yet because asynchronous
preemption only works on POSIX platforms on 386 and amd64 right now.)
Change-Id: If776181dd5a9b3026a7b89a1b5266521b95a5f61
Reviewed-on: https://go-review.googlesource.com/c/go/+/201762
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
I was doing some testing with GODEBUG=schedtrace=1,scheddetail=1 and I
noticed that the program hung after a throw with "all goroutines are
asleep". This is because when doing a throw or fatal panic with schedtrace
the panic code does a final schedtrace, which needs to acquire the
scheduler lock. The checkdead function is always called with the scheduler
lock held. So checkdead would throw with the scheduler lock held, then
the panic code would call schedtrace, which would block trying to acquire
the scheduler lock.
This problem will only happen for people debugging the runtime, but
it's easy to avoid by having checkdead unlock the scheduler lock before
it throws. I only did this for the throws that can happen for a normal
program, not for throws that indicate some corruption in the scheduler data.
Change-Id: Ic62277b3ca6bee6f0fca8d5eb516c59cb67855cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/204778
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
On some platforms (currently ARM and ARM64), when calling into
VDSO we store the G to the gsignal stack, if there is one, so if
we receive a signal during VDSO we can find the G.
When an M exits, it frees the gsignal stack. But m.gsignal.stack
still points to that stack. When we call nanotime on this M, we
will write to the already freed gsignal stack, which is bad.
Prevent this by unlinking the freed stack from the M.
Should fix#35235.
Change-Id: I338b1fc8ec62aae036f38afaca3484687e11a40d
Reviewed-on: https://go-review.googlesource.com/c/go/+/204158
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
We check whether an M is preemptible in a surprising number of places.
Put it in one function.
For #10958, #24543.
Change-Id: I305090fdb1ea7f7a55ffe25851c1e35012d0d06c
Reviewed-on: https://go-review.googlesource.com/c/go/+/201439
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently, gcscanvalid is used to resolve a race between attempts to
scan a stack. Now that there's a clear owner of the stack scan
operation, there's no longer any danger of racing or attempting to
scan a stack more than once, so this CL eliminates gcscanvalid.
I double-checked my reasoning by first adding a throw if gcscanvalid
was set in scanstack and verifying that all.bash still passed.
For #10958, #24543.
Fixes#24363.
Change-Id: I76794a5fcda325ed7cfc2b545e2a839b8b3bc713
Reviewed-on: https://go-review.googlesource.com/c/go/+/201139
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently, the process of suspending a goroutine is tied to stack
scanning. In preparation for non-cooperative preemption, this CL
abstracts this into general purpose suspendG/resumeG functions.
suspendG and resumeG closely follow the existing scang and restartg
functions with one exception: the addition of a _Gpreempted status.
Currently, preemption tasks (stack scanning) are carried out by the
target goroutine if it's in _Grunning. In this new approach, the task
is always carried out by the goroutine that called suspendG. Thus, we
need a reliable way to drive the target goroutine out of _Grunning
until the requesting goroutine is ready to resume it. The new
_Gpreempted state provides the handshake: when a runnable goroutine
responds to a preemption request, it now parks itself and enters
_Gpreempted. The requesting goroutine races to put it in _Gwaiting,
which gives it ownership, but also the responsibility to start it
again.
This CL adds several TODOs about improving the synchronization on the
G status. The existing code already has these problems; we're just
taking note of them.
The next CL will remove the now-dead scang and preemptscan.
For #10958, #24543.
Change-Id: I16dbf87bea9d50399cc86719c156f48e67198f16
Reviewed-on: https://go-review.googlesource.com/c/go/+/201137
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
We already claim on the documentation for _Grunning that this is case,
but execute transitions to _Grunning before assigning g.m. Fix this
and make the documentation even more explicit.
For #10958, #24543, but also a good cleanup.
Change-Id: I1eb0108e7762f55cfb0282aca624af1c0a15fe56
Reviewed-on: https://go-review.googlesource.com/c/go/+/201440
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
On ARM and ARM64, during a VDSO call, the g register may be
temporarily clobbered by the VDSO code. If a signal is received
during the execution of VDSO code, we may not find a valid g
reading the g register. In CL 192937, we conservatively assume
g is nil. But this approach has a problem: we cannot handle
the signal in this case. Further, if the signal is not a
profiling signal, we'll call badsignal, which calls needm, which
wants to get an extra m, but we don't have one in a non-cgo
binary, which cuases the program to hang.
This is even more of a problem with async preemption, where we
will receive more signals than before. I ran into this problem
while working on async preemption support on ARM64.
In this CL, before making a VDSO call, we save the g on the
gsignal stack. When we receive a signal, we will be running on
the gsignal stack, so we can fetch the g from there and move on.
We probably want to do the same for PPC64. Currently we rely on
that the VDSO code doesn't actually clobber the g register, but
this is not guaranteed and we don't have control with.
Idea from discussion with Dan Cross and Austin.
Should fix#34391.
Change-Id: Idbefc5e4c2f4373192c2be797be0140ae08b26e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/202759
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Since the new timers run on g0, which does not have a race context,
we add a race context field to the P, and use that for timer functions.
This works since all timer functions are in the standard library.
Updates #27707
Change-Id: I8a5b727b4ddc8ca6fc60eb6d6f5e9819245e395b
Reviewed-on: https://go-review.googlesource.com/c/go/+/171882
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Since timers are now on a P, rather than having a G running timerproc,
timejump changes to return a P rather than a G.
Updates #27707
Change-Id: I3d05af2d664409a0fd906e709fdecbbcbe00b9a7
Reviewed-on: https://go-review.googlesource.com/c/go/+/171880
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
This adds a new field to P, adjustTimers, that tells the P that one of
its existing timers was modified to be earlier, and that it therefore
needs to resort them.
Updates #27707
Change-Id: I4c5f5b51ed116f1d898d3f87cdddfa1b552337f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/171832
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
When we put timers on P's, the wasm code will not be able to rely on
the timer goroutine. Use the beforeIdle hook to schedule a wakeup.
Updates #6239
Updates #27707
Change-Id: Idf6309944778b8c3d7178f5d09431940843ea233
Reviewed-on: https://go-review.googlesource.com/c/go/+/171827
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
This change introduces an arm intrinsic that generates the FMULAD
instruction for the fused-multiply-add operation on systems that
support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite
rule translates the generic intrinsic to FMULAD.
Updates #25819.
Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/142117
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Add support to the main scheduler loop for handling timers on P's.
This is not used yet, as timers are not yet put on P's.
Updates #6239
Updates #27707
Change-Id: I6a359df408629f333a9232142ce19e8be8496dae
Reviewed-on: https://go-review.googlesource.com/c/go/+/171826
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
To permit ssa-level optimization, this change introduces an amd64 intrinsic
that generates the VFMADD231SD instruction for the fused-multiply-add
operation on systems that support it. System support is detected via
cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic
("Fma") to VFMADD231SD.
The benchmark compares the software implementation (old) with the intrinsic
(new).
name old time/op new time/op delta
Fma-4 27.2ns ± 1% 1.0ns ± 9% -96.48% (p=0.008 n=5+5)
Updates #25819.
Change-Id: I966655e5f96817a5d06dff5942418a3915b09584
Reviewed-on: https://go-review.googlesource.com/c/go/+/137156
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This new facility will be used by future CLs in this series.
Change the only blocking call to netpoll to do the right thing when
netpoll returns an empty list.
Updates #6239
Updates #27707
Change-Id: I58b3c2903eda61a3698b1a4729ed0e81382bb1ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/171821
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Change the comment to make more conformable to the function implementation.
Change-Id: I8461e2f09824c50e16223a27d0f61070f04bd21b
GitHub-Last-Rev: c25a8493d3938b38e2c318f7a7b94c9f2eb11bb4
GitHub-Pull-Request: golang/go#27404
Reviewed-on: https://go-review.googlesource.com/c/go/+/132477
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
This change makes it so that worldsema isn't held across the mark phase.
This means that various operations like ReadMemStats may now stop the
world during the mark phase, reducing latency on such operations.
Only three such operations are still no longer allowed to occur during
marking: GOMAXPROCS, StartTrace, and StopTrace.
For the former it's because any change to GOMAXPROCS impacts GC mark
background worker scheduling and the details there are tricky.
For the latter two it's because tracing needs to observe consistent GC
start and GC end events, and if StartTrace or StopTrace may stop the
world during marking, then it's possible for it to see a GC end event
without a start or GC start event without an end, respectively.
To ensure that GOMAXPROCS and StartTrace/StopTrace cannot proceed until
marking is complete, the runtime now holds a new semaphore, gcsema,
across the mark phase just like it used to with worldsema.
Fixes#19812.
Change-Id: I15d43ed184f711b3d104e8f267fb86e335f86bf9
Reviewed-on: https://go-review.googlesource.com/c/go/+/182657
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The existing condition is long and repetitive. Using select/case with
multiple values in the expression list is more concise and clearer.
Change-Id: I43f8abcf958e433468728f1d89ff1436332b29da
Reviewed-on: https://go-review.googlesource.com/c/go/+/188519
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Use efaceOf to safely convert from *interface{} to *_eface, and to
make it clearer what the pointer arithmetic is computing.
Incidentally, remove a spurious unsafe.Pointer->*uint8->unsafe.Pointer
round trip conversion in newproc.
No behavior change.
Change-Id: I2ad9d791d35d8bd008ef43b03dad1589713c5fd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/190457
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
CL 42652 changed the profile handler for mips/mipsle to
avoid recording a profile when in atomic functions, for fear
of interrupting the 32-bit simulation of a 64-bit atomic with
a lock. The profile logger itself uses 64-bit atomics and might
deadlock (#20146).
The change was to accumulate a count of dropped profile events
and then send the count when the next ordinary event was sent:
if prof.hz != 0 {
+ if (GOARCH == "mips" || GOARCH == "mipsle") && lostAtomic64Count > 0 {
+ cpuprof.addLostAtomic64(lostAtomic64Count)
+ lostAtomic64Count = 0
+ }
cpuprof.add(gp, stk[:n])
}
CL 117057 extended this behavior to include GOARCH == "arm".
Unfortunately, the inserted cpuprof.addLostAtomic64 differs from
the original cpuprof.add in that it neglects to acquire the lock
protecting the profile buffer.
This has caused a steady stream of flakes on the arm builders
for the past 12 months, ever since CL 117057 landed.
This CL moves the lostAtomic count into the profile buffer and
then lets the existing addExtra calls take care of it, instead of
duplicating the locking logic.
Fixes#24991.
Change-Id: Ia386c40034fcf46b31f080ce18f2420df4bb8004
Reviewed-on: https://go-review.googlesource.com/c/go/+/184164
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This marks all Go symbols called from assembly in other packages with
"go:linkname" directives to ensure they get ABI wrappers.
Now that we have this go:linkname convention, this also removes the
abi0Syms definition in the runtime, which was used to give morestackc
an ABI0 wrapper. Instead, we now just mark morestackc with a
go:linkname directive.
This was tested with buildall.bash in the default configuration, with
-race, and with -gcflags=all=-d=ssa/intrinsics/off. Since I couldn't
test cgo on non-Linux configurations, I manually grepped for runtime
symbols in runtime/cgo.
Updates #31230.
Change-Id: I6c8aa56be2ca6802dfa2bf159e49c411b9071bf1
Reviewed-on: https://go-review.googlesource.com/c/go/+/179862
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>