Message ID | 20221101073630.2797-1-dtcccc@linux.alibaba.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2799177wru; Tue, 1 Nov 2022 00:41:09 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7E+XrSDE9ud7dKnLXWY/BE47riYfqu8e8htb8JXQ316lhzds9jJ0e9E32s1W+1Cn0Z7njC X-Received: by 2002:a05:6402:2947:b0:451:32a:2222 with SMTP id ed7-20020a056402294700b00451032a2222mr17339410edb.376.1667288469734; Tue, 01 Nov 2022 00:41:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667288469; cv=none; d=google.com; s=arc-20160816; b=Z3nLhM7PnvR5rNBy5ESDk2l+ENfA8QBl6cdmOF3Ju4/KMtG+XNEmynw4pzfKAav3PM KfmVQgQSfUMqt20eQaDrLx9ps00DgJq1oHjeftUBAYaPMRXOlU1KNfWwSzL8z5H4vJ8m Fb6DRhiJ0Le7NIEdNhv3X2g/Y7Oq617WOrJ6t0a7/53IFDFzHrY3Y8PQ6JpsVxrlqGwM Ngzo20mx1/2Xpg91NIjJ5L5MJdZRHwTLtxMhp/XPFtk+A67GtWiirvVynjp5qkFoni+1 L8GZWLVR8KY9aHEEgm29h518DqLtl6yA6ua85d3BFOITfdCSG58xsJgVye6tOTPwq+Vc sPig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=fpR9bsIGBcOK09XqfA4TyQZnql8h3OdP4Ek7YElpZtM=; b=BQGgKZnIAVY4DOTVeVzCUrljBl/2YU6StZgR1aKbsTbx8Pb6HL1GOE6ej6FaRS+R8o rdGct2Hw3UXEfhUqK5sJdq7oW8eDj4nMeipRw2RaPN4H26RPcBbTcoVLAFpj2rkmGGUp op5kBFbHhkGdqLZzyRrLpn6N1mDzUNk32cIbYgIFy5BiHwQY8QEQT19cZ/+lFuKC58KL QbG2OyWa7+KbDb4BZwbTUWC0KXvfU8usgKb1agvIY66FubKJYmQZvhW6CokAKmkfOHUI 4IPcSXO1fb73zsDl5W4u0JYODfND+0NBx3lUwCVATNJ+TOAzQHwF9ZO894+vYwodEVL2 aQ+A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gv28-20020a1709072bdc00b0078d8b6976eesi8749039ejc.140.2022.11.01.00.40.45; Tue, 01 Nov 2022 00:41:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229641AbiKAHgu (ORCPT <rfc822;kartikey406@gmail.com> + 99 others); Tue, 1 Nov 2022 03:36:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229452AbiKAHgs (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 1 Nov 2022 03:36:48 -0400 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD9B9387 for <linux-kernel@vger.kernel.org>; Tue, 1 Nov 2022 00:36:46 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=dtcccc@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0VTejVpy_1667288190; Received: from localhost.localdomain(mailfrom:dtcccc@linux.alibaba.com fp:SMTPD_---0VTejVpy_1667288190) by smtp.aliyun-inc.com; Tue, 01 Nov 2022 15:36:44 +0800 From: Tianchen Ding <dtcccc@linux.alibaba.com> To: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Daniel Bristot de Oliveira <bristot@redhat.com>, Valentin Schneider <vschneid@redhat.com> Cc: linux-kernel@vger.kernel.org Subject: [PATCH] sched: Clear ttwu_pending after enqueue_task Date: Tue, 1 Nov 2022 15:36:30 +0800 Message-Id: <20221101073630.2797-1-dtcccc@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,HK_RANDOM_ENVFROM,HK_RANDOM_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, USER_IN_DEF_SPF_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748278674675494739?= X-GMAIL-MSGID: =?utf-8?q?1748278674675494739?= |
Series |
sched: Clear ttwu_pending after enqueue_task
|
|
Commit Message
Tianchen Ding
Nov. 1, 2022, 7:36 a.m. UTC
We found a long tail latency in schbench whem m*t is close to nr_cpus.
(e.g., "schbench -m 2 -t 16" on a machine with 32 cpus.)
This is because when the wakee cpu is idle, rq->ttwu_pending is cleared
too early, and idle_cpu() will return true until the wakee task enqueued.
This will mislead the waker when selecting idle cpu, and wake multiple
worker threads on the same wakee cpu. This situation is enlarged by
commit f3dd3f674555 ("sched: Remove the limitation of WF_ON_CPU on
wakelist if wakee cpu is idle") because it tends to use wakelist.
Here is the result of "schbench -m 2 -t 16" on a VM with 32vcpu
(Intel(R) Xeon(R) Platinum 8369B).
Latency percentiles (usec):
base base+revert_f3dd3f674555 base+this_patch
50.0000th: 9 13 9
75.0000th: 12 19 12
90.0000th: 15 22 15
95.0000th: 18 24 17
*99.0000th: 27 31 24
99.5000th: 3364 33 27
99.9000th: 12560 36 30
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
---
kernel/sched/core.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
Comments
On Tue, Nov 01, 2022 at 03:36:30PM +0800, Tianchen Ding wrote: > We found a long tail latency in schbench whem m*t is close to nr_cpus. > (e.g., "schbench -m 2 -t 16" on a machine with 32 cpus.) > > This is because when the wakee cpu is idle, rq->ttwu_pending is cleared > too early, and idle_cpu() will return true until the wakee task enqueued. > This will mislead the waker when selecting idle cpu, and wake multiple > worker threads on the same wakee cpu. This situation is enlarged by > commit f3dd3f674555 ("sched: Remove the limitation of WF_ON_CPU on > wakelist if wakee cpu is idle") because it tends to use wakelist. > > Here is the result of "schbench -m 2 -t 16" on a VM with 32vcpu > (Intel(R) Xeon(R) Platinum 8369B). > > Latency percentiles (usec): > base base+revert_f3dd3f674555 base+this_patch > 50.0000th: 9 13 9 > 75.0000th: 12 19 12 > 90.0000th: 15 22 15 > 95.0000th: 18 24 17 > *99.0000th: 27 31 24 > 99.5000th: 3364 33 27 > 99.9000th: 12560 36 30 Nice; but have you also ran other benchmarks and confirmed it doesn't negatively affect those? If so; mentioning that is very helpful. If not; best go do so :-) > Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> > --- > kernel/sched/core.c | 8 +------- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 87c9cdf37a26..b07de1753be5 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3739,13 +3739,6 @@ void sched_ttwu_pending(void *arg) > if (!llist) > return; > > - /* > - * rq::ttwu_pending racy indication of out-standing wakeups. > - * Races such that false-negatives are possible, since they > - * are shorter lived that false-positives would be. > - */ > - WRITE_ONCE(rq->ttwu_pending, 0); > - > rq_lock_irqsave(rq, &rf); > update_rq_clock(rq); > Could you try the below instead? Also note the comment; since you did the work to figure out why -- best record that for posterity. @@ -3737,6 +3730,13 @@ void sched_ttwu_pending(void *arg) set_task_cpu(p, cpu_of(rq)); ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf); + /* + * Must be after enqueueing at least once task such that + * idle_cpu() does not observe a false-negative -- if it does, + * it is possible for select_idle_siblings() to stack a number + * of tasks on this CPU during that window. + */ + WRITE_ONCE(rq->ttwu_pending, 0); } rq_unlock_irqrestore(rq, &rf);
On 2022-11-01 at 11:34:04 +0100, Peter Zijlstra wrote: > On Tue, Nov 01, 2022 at 03:36:30PM +0800, Tianchen Ding wrote: > > We found a long tail latency in schbench whem m*t is close to nr_cpus. > > (e.g., "schbench -m 2 -t 16" on a machine with 32 cpus.) > > > > This is because when the wakee cpu is idle, rq->ttwu_pending is cleared > > too early, and idle_cpu() will return true until the wakee task enqueued. > > This will mislead the waker when selecting idle cpu, and wake multiple > > worker threads on the same wakee cpu. This situation is enlarged by > > commit f3dd3f674555 ("sched: Remove the limitation of WF_ON_CPU on > > wakelist if wakee cpu is idle") because it tends to use wakelist. > > > > Here is the result of "schbench -m 2 -t 16" on a VM with 32vcpu > > (Intel(R) Xeon(R) Platinum 8369B). > > > > Latency percentiles (usec): > > base base+revert_f3dd3f674555 base+this_patch > > 50.0000th: 9 13 9 > > 75.0000th: 12 19 12 > > 90.0000th: 15 22 15 > > 95.0000th: 18 24 17 > > *99.0000th: 27 31 24 > > 99.5000th: 3364 33 27 > > 99.9000th: 12560 36 30 > > Nice; but have you also ran other benchmarks and confirmed it doesn't > negatively affect those? > > If so; mentioning that is very helpful. If not; best go do so :-) > > > Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> > > --- > > kernel/sched/core.c | 8 +------- > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 87c9cdf37a26..b07de1753be5 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -3739,13 +3739,6 @@ void sched_ttwu_pending(void *arg) > > if (!llist) > > return; > > > > - /* > > - * rq::ttwu_pending racy indication of out-standing wakeups. > > - * Races such that false-negatives are possible, since they > > - * are shorter lived that false-positives would be. > > - */ > > - WRITE_ONCE(rq->ttwu_pending, 0); > > - > > rq_lock_irqsave(rq, &rf); > > update_rq_clock(rq); > > > > Could you try the below instead? Also note the comment; since you did > the work to figure out why -- best record that for posterity. > > @@ -3737,6 +3730,13 @@ void sched_ttwu_pending(void *arg) > set_task_cpu(p, cpu_of(rq)); > > ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf); > + /* > + * Must be after enqueueing at least once task such that > + * idle_cpu() does not observe a false-negative -- if it does, > + * it is possible for select_idle_siblings() to stack a number > + * of tasks on this CPU during that window. > + */ > + WRITE_ONCE(rq->ttwu_pending, 0); Just curious why do we put above code inside llist_for_each_entry_safe loop? My understanding is that once 1 task is queued, select_idle_cpu() would not treat this rq as idle anymore because nr_running is not 0. But would this bring overhead to write the rq->ttwu_pending multiple times, do I miss something? thanks, Chenyu > } > > rq_unlock_irqrestore(rq, &rf);
On Tue, Nov 01, 2022 at 09:51:25PM +0800, Chen Yu wrote: > > Could you try the below instead? Also note the comment; since you did > > the work to figure out why -- best record that for posterity. > > > > @@ -3737,6 +3730,13 @@ void sched_ttwu_pending(void *arg) > > set_task_cpu(p, cpu_of(rq)); > > > > ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf); > > + /* > > + * Must be after enqueueing at least once task such that > > + * idle_cpu() does not observe a false-negative -- if it does, > > + * it is possible for select_idle_siblings() to stack a number > > + * of tasks on this CPU during that window. > > + */ > > + WRITE_ONCE(rq->ttwu_pending, 0); > Just curious why do we put above code inside llist_for_each_entry_safe loop? > My understanding is that once 1 task is queued, select_idle_cpu() would not > treat this rq as idle anymore because nr_running is not 0. But would this bring > overhead to write the rq->ttwu_pending multiple times, do I miss something? So the consideration is that by clearing it late, you might also clear a next set; consider something like: cpu0 cpu1 cpu2 ttwu_queue() ->ttwu_pending = 1; llist_add() sched_ttwu_pending() llist_del_all() ... long ... ttwu_queue() ->ttwu_pending = 1 llist_add() ... time ... ->ttwu_pending = 0 Which leaves you with a non-empty list but with ttwu_pending == 0. But I suppose that's not actually better with my variant, since it keeps writing 0s. We can make it more complicated again, but perhaps it doesn't matter and your version is good enough. But please update with a comment on why it needs to be after ttwu_do_activate().
On 2022-11-01 at 15:59:25 +0100, Peter Zijlstra wrote: > On Tue, Nov 01, 2022 at 09:51:25PM +0800, Chen Yu wrote: > > > > Could you try the below instead? Also note the comment; since you did > > > the work to figure out why -- best record that for posterity. > > > > > > @@ -3737,6 +3730,13 @@ void sched_ttwu_pending(void *arg) > > > set_task_cpu(p, cpu_of(rq)); > > > > > > ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf); > > > + /* > > > + * Must be after enqueueing at least once task such that > > > + * idle_cpu() does not observe a false-negative -- if it does, > > > + * it is possible for select_idle_siblings() to stack a number > > > + * of tasks on this CPU during that window. > > > + */ > > > + WRITE_ONCE(rq->ttwu_pending, 0); > > Just curious why do we put above code inside llist_for_each_entry_safe loop? > > > My understanding is that once 1 task is queued, select_idle_cpu() would not > > treat this rq as idle anymore because nr_running is not 0. But would this bring > > overhead to write the rq->ttwu_pending multiple times, do I miss something? > > So the consideration is that by clearing it late, you might also clear a > next set; consider something like: > > > cpu0 cpu1 cpu2 > > ttwu_queue() > ->ttwu_pending = 1; > llist_add() > > sched_ttwu_pending() > llist_del_all() > ... long ... > ttwu_queue() > ->ttwu_pending = 1 > llist_add() > > ... time ... > ->ttwu_pending = 0 > > Which leaves you with a non-empty list but with ttwu_pending == 0. > Thanks for the explaination, in theory the race windows could be shrinked but could not be closed due to ttwu_pending is not protected by lock in ttwu_queue() -> __ttwu_queue_wakelist() I suppose. > But I suppose that's not actually better with my variant, since it keeps > writing 0s. We can make it more complicated again, but perhaps it > doesn't matter and your version is good enough. I see, although I'm not the author of this patch :) thanks, Chenyu > > But please update with a comment on why it needs to be after > ttwu_do_activate(). >
On 2022/11/1 18:34, Peter Zijlstra wrote: > On Tue, Nov 01, 2022 at 03:36:30PM +0800, Tianchen Ding wrote: >> We found a long tail latency in schbench whem m*t is close to nr_cpus. >> (e.g., "schbench -m 2 -t 16" on a machine with 32 cpus.) >> >> This is because when the wakee cpu is idle, rq->ttwu_pending is cleared >> too early, and idle_cpu() will return true until the wakee task enqueued. >> This will mislead the waker when selecting idle cpu, and wake multiple >> worker threads on the same wakee cpu. This situation is enlarged by >> commit f3dd3f674555 ("sched: Remove the limitation of WF_ON_CPU on >> wakelist if wakee cpu is idle") because it tends to use wakelist. >> >> Here is the result of "schbench -m 2 -t 16" on a VM with 32vcpu >> (Intel(R) Xeon(R) Platinum 8369B). >> >> Latency percentiles (usec): >> base base+revert_f3dd3f674555 base+this_patch >> 50.0000th: 9 13 9 >> 75.0000th: 12 19 12 >> 90.0000th: 15 22 15 >> 95.0000th: 18 24 17 >> *99.0000th: 27 31 24 >> 99.5000th: 3364 33 27 >> 99.9000th: 12560 36 30 > > Nice; but have you also ran other benchmarks and confirmed it doesn't > negatively affect those? > > If so; mentioning that is very helpful. If not; best go do so :-) > Thanks for the review. We've tested with unixbench and hackbench (they show the average scores), and the performance result seems no difference. We don't mention here because what we found is a specific case in schbench (where m*t==nr_cpus). It only affect long tail latency, so the problem and the fix should also take effects on only this case, not the average scores. >> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> >> --- >> kernel/sched/core.c | 8 +------- >> 1 file changed, 1 insertion(+), 7 deletions(-) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 87c9cdf37a26..b07de1753be5 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -3739,13 +3739,6 @@ void sched_ttwu_pending(void *arg) >> if (!llist) >> return; >> >> - /* >> - * rq::ttwu_pending racy indication of out-standing wakeups. >> - * Races such that false-negatives are possible, since they >> - * are shorter lived that false-positives would be. >> - */ >> - WRITE_ONCE(rq->ttwu_pending, 0); >> - >> rq_lock_irqsave(rq, &rf); >> update_rq_clock(rq); >> > > Could you try the below instead? Also note the comment; since you did > the work to figure out why -- best record that for posterity. > It works well for me. But I have the same thought with Chen Yu, and will explain in detail in my next reply. Thanks. > @@ -3737,6 +3730,13 @@ void sched_ttwu_pending(void *arg) > set_task_cpu(p, cpu_of(rq)); > > ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf); > + /* > + * Must be after enqueueing at least once task such that > + * idle_cpu() does not observe a false-negative -- if it does, > + * it is possible for select_idle_siblings() to stack a number > + * of tasks on this CPU during that window. > + */ > + WRITE_ONCE(rq->ttwu_pending, 0); > } > > rq_unlock_irqrestore(rq, &rf);
On 2022/11/1 22:59, Peter Zijlstra wrote: > On Tue, Nov 01, 2022 at 09:51:25PM +0800, Chen Yu wrote: > >>> Could you try the below instead? Also note the comment; since you did >>> the work to figure out why -- best record that for posterity. >>> >>> @@ -3737,6 +3730,13 @@ void sched_ttwu_pending(void *arg) >>> set_task_cpu(p, cpu_of(rq)); >>> >>> ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf); >>> + /* >>> + * Must be after enqueueing at least once task such that >>> + * idle_cpu() does not observe a false-negative -- if it does, >>> + * it is possible for select_idle_siblings() to stack a number >>> + * of tasks on this CPU during that window. >>> + */ >>> + WRITE_ONCE(rq->ttwu_pending, 0); >> Just curious why do we put above code inside llist_for_each_entry_safe loop? > >> My understanding is that once 1 task is queued, select_idle_cpu() would not >> treat this rq as idle anymore because nr_running is not 0. But would this bring >> overhead to write the rq->ttwu_pending multiple times, do I miss something? > > So the consideration is that by clearing it late, you might also clear a > next set; consider something like: > > > cpu0 cpu1 cpu2 > > ttwu_queue() > ->ttwu_pending = 1; > llist_add() > > sched_ttwu_pending() > llist_del_all() > ... long ... > ttwu_queue() > ->ttwu_pending = 1 > llist_add() > > ... time ... > ->ttwu_pending = 0 > > Which leaves you with a non-empty list but with ttwu_pending == 0. > > But I suppose that's not actually better with my variant, since it keeps > writing 0s. We can make it more complicated again, but perhaps it > doesn't matter and your version is good enough. > Yeah. Since your version repeats writting 0 to ttwu_pending, it finally reaches the same effect with mine. Although the performance results in my tests seem to be no difference, it may still bring more overhead. IMO, according to the latest linux-next code, all callers querying rq->ttwu_pending only take cares about whether the cpu is idle because they always combine with querying nr_running. Actually no one cares about whether wake_entry.llist is empty. So for the use of checking cpu idle state, move rq->ttwu_pending=0 after enqueuing task can help fully cover the whole state. For your case, although ttwu_pending is set to 0 with some tasks really pending, at this time nr_running is sure to be >0, so callers who query both ttwu_pending and nr_running will know this cpu is not idle. (Now the callers querying these two values are lockless, so there may be race in a really small window? But this case is extremely rare, I think we should not make it more complicated.) > But please update with a comment on why it needs to be after > ttwu_do_activate(). OK. Should I send v2 or you directly add the comment? Thanks.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 87c9cdf37a26..b07de1753be5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3739,13 +3739,6 @@ void sched_ttwu_pending(void *arg) if (!llist) return; - /* - * rq::ttwu_pending racy indication of out-standing wakeups. - * Races such that false-negatives are possible, since they - * are shorter lived that false-positives would be. - */ - WRITE_ONCE(rq->ttwu_pending, 0); - rq_lock_irqsave(rq, &rf); update_rq_clock(rq); @@ -3760,6 +3753,7 @@ void sched_ttwu_pending(void *arg) } rq_unlock_irqrestore(rq, &rf); + WRITE_ONCE(rq->ttwu_pending, 0); } void send_call_function_single_ipi(int cpu)