Message ID | 20221114120453.3233-1-xuewen.yan@unisoc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2111325wru; Mon, 14 Nov 2022 04:21:20 -0800 (PST) X-Google-Smtp-Source: AA0mqf5eas2at4YlkkAhRieoWqlmfQDoMKkC+C4+2eKbI1Y0lHyCJiPQ1MWhwFNoFlR166LAi0oT X-Received: by 2002:a05:6402:912:b0:461:f5ce:63fe with SMTP id g18-20020a056402091200b00461f5ce63femr10701053edz.362.1668428480525; Mon, 14 Nov 2022 04:21:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668428480; cv=none; d=google.com; s=arc-20160816; b=Du5DgqYJqXs5Z5Zp8op7V1VSzU+Fxeqc7jNlWVT6af0ToI8ob4w4rFOHqtWNOWTUJu 8fVjJYPHYFvqoFUPqDLkjysvZNjPzD05ljq4WZkafUqZjvemgYSnNBaLUDt45S++Fnbe ADm5Ww7zjp7YGNNrTfOTnxWHfPN9KSQ+rrVuX4y7rs9QlgpKvHpZ0Hd89a4Ym9x8MN+7 zKfroo8igFfIe2sFCcMrRzoKdgj/D/ICcARabRGk4vdX4w1x8bK03OiBIkEQMYJ8MId7 zjJC7m89+QEAyAE8DrnBrd7E2C8eKCECAkMpvrJWwGCYY/ZndBD+ZozJgtLQFDZX2IMe KVnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=qy+FtIi4w9lqC8WqUHwZX2LHftf6wNilKu1L949YCYU=; b=FLjLzSupDue7CQjHRg9S9yG8QXu6YGiYrvTesW35wih1i2in8RclkeL8MeDlLQOQ0o 425ngkgG/NTdd7RmcnhZID2JKjYOk1NEmJU2FxvnMyE9hCLCmdI7ZB/R7DHTYVB1jNh7 YRAywLgstR77kMJ4qN4pSUTtgvLg2exSfTKhPBIV2hHtbl722pxM30WWyr0f/ytFLoQf XZalAH+GRBTR8FmP58tIQfEUeDQaoJxrLEszOenuaEuvHKlEtybDPT5qFo9bi4Rnbxzo U3LEz7UjaQcoknpLST8sPWvJpOshmdZNXoRQ6/KpNQJttVKabo/nb6suNCMYC4ADgZ0U DrhA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p12-20020a1709066a8c00b007ae40ff2f11si7577068ejr.651.2022.11.14.04.20.57; Mon, 14 Nov 2022 04:21:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235655AbiKNMGF (ORCPT <rfc822;winker.wchi@gmail.com> + 99 others); Mon, 14 Nov 2022 07:06:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236312AbiKNMFf (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 14 Nov 2022 07:05:35 -0500 Received: from SHSQR01.spreadtrum.com (unknown [222.66.158.135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B3FCD13C for <linux-kernel@vger.kernel.org>; Mon, 14 Nov 2022 04:05:32 -0800 (PST) Received: from SHSend.spreadtrum.com (bjmbx01.spreadtrum.com [10.0.64.7]) by SHSQR01.spreadtrum.com with ESMTPS id 2AEC50dj085143 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NO); Mon, 14 Nov 2022 20:05:00 +0800 (CST) (envelope-from Xuewen.Yan@unisoc.com) Received: from BJ10918NBW01.spreadtrum.com (10.0.74.46) by BJMBX01.spreadtrum.com (10.0.64.7) with Microsoft SMTP Server (TLS) id 15.0.1497.23; Mon, 14 Nov 2022 20:05:00 +0800 From: Xuewen Yan <xuewen.yan@unisoc.com> To: <peterz@infradead.org>, <mingo@redhat.com>, <juri.lelli@redhat.com>, <vincent.guittot@linaro.org> CC: <dietmar.eggemann@arm.com>, <rostedt@goodmis.org>, <bsegall@google.com>, <mgorman@suse.de>, <bristot@redhat.com>, <vschneid@redhat.com>, <ke.wang@unisoc.com>, <linux-kernel@vger.kernel.org> Subject: [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop Date: Mon, 14 Nov 2022 20:04:53 +0800 Message-ID: <20221114120453.3233-1-xuewen.yan@unisoc.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.0.74.46] X-ClientProxiedBy: SHCAS01.spreadtrum.com (10.0.1.201) To BJMBX01.spreadtrum.com (10.0.64.7) X-MAIL: SHSQR01.spreadtrum.com 2AEC50dj085143 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749474062603128136?= X-GMAIL-MSGID: =?utf-8?q?1749474062603128136?= |
Series |
sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop
|
|
Commit Message
Xuewen Yan
Nov. 14, 2022, 12:04 p.m. UTC
We are performing the stress test related to cpu hotplug, when there are only two cpus left in the system(for example: cpu0 and cpu1), if cpu1 is offline at this time, the following infinite loop may occur: When cpu0 contains more than one rt task, it will push the other rt tasks to cpux for execution. If cpu1 is going through the hotplug process at this time, it woule start a stop_machine to be responsible for migrating the tasks on cpu1 to other online cpus. And prevent any task from migrating to this cpu. As a result, when the cpu0 migrates the rt task to cpu1, the stop_machine re-migrates the rt task to cpu0, which causes the rt.pushable_tasks of rq0 to never be null. The result is these rt tasks that have been repeatedly migrated to cpu0 and cpu1. In order to prevent the above situation from happening, use cpu_active_mask to prevent find_lowset_rq from selecting inactive cpu. At the same time, when there is only one active cpu in the system, irq_push_irq_work should be terminated in advance. Co-developed-by: Ke Wang <ke.wang@unisoc.com> Signed-off-by: Ke Wang <ke.wang@unisoc.com> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> --- kernel/sched/cpupri.c | 1 + kernel/sched/rt.c | 6 ++++++ 2 files changed, 7 insertions(+)
Comments
On Mon, 14 Nov 2022 20:04:53 +0800 Xuewen Yan <xuewen.yan@unisoc.com> wrote: > +++ b/kernel/sched/rt.c > @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd) > { > int next; > int cpu; > + struct cpumask tmp_cpumask; If you have a machine with thousands of CPUs, this will likely kill the stack. > > /* > * When starting the IPI RT pushing, the rto_cpu is set to -1, > @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd) > /* When rto_cpu is -1 this acts like cpumask_first() */ > cpu = cpumask_next(rd->rto_cpu, rd->rto_mask); > > + cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask); > + if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 && > + cpumask_test_cpu(smp_processor_id(), &tmp_cpumask)) > + break; > + Kill the above. > rd->rto_cpu = cpu; > > if (cpu < nr_cpu_ids) { Why not just add here: if (!cpumask_test_cpu(cpu, cpu_active_mask)) continue; return cpu; } ? -- Steve
On Fri, Nov 18, 2022 at 6:16 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Mon, 14 Nov 2022 20:04:53 +0800 > Xuewen Yan <xuewen.yan@unisoc.com> wrote: > > > +++ b/kernel/sched/rt.c > > @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd) > > { > > int next; > > int cpu; > > + struct cpumask tmp_cpumask; > > If you have a machine with thousands of CPUs, this will likely kill the > stack. Ha, I did not take it into account. Thanks! > > > > > /* > > * When starting the IPI RT pushing, the rto_cpu is set to -1, > > @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd) > > /* When rto_cpu is -1 this acts like cpumask_first() */ > > cpu = cpumask_next(rd->rto_cpu, rd->rto_mask); > > > > + cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask); > > + if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 && > > + cpumask_test_cpu(smp_processor_id(), &tmp_cpumask)) > > + break; > > + > > Kill the above. > > > rd->rto_cpu = cpu; > > > > if (cpu < nr_cpu_ids) { > > Why not just add here: > > if (!cpumask_test_cpu(cpu, cpu_active_mask)) > continue; > return cpu; > } > > ? Let's consider this scenario: the online_cpu_mask is 0x03(cpu0/1),the active_cpu_mask is 0x01(cpu0),the rto cpu is cpu0, the rto_mask is 0x01, and the irq cpu is cpu0, as a result, the first loop, the rto_cpu would be -1, but the loop < rto_loop_next, on next loop, because of the rto_cpu is -1, so the next rto cpu would be cpu0 still, as a result, the cpu0 would push rt tasks to cpu1(inactive cpu) while running in the irq_work. So we should judge whether the current cpu(the only one active cpu) is the next loop's cpu. Thanks! > > -- Steve
On Fri, 18 Nov 2022 20:08:54 +0800 Xuewen Yan <xuewen.yan94@gmail.com> wrote: > Let's consider this scenario: > the online_cpu_mask is 0x03(cpu0/1),the active_cpu_mask is > 0x01(cpu0),the rto cpu is cpu0, > the rto_mask is 0x01, and the irq cpu is cpu0, as a result, the first > loop, the rto_cpu would be -1, > but the loop < rto_loop_next, on next loop, because of the rto_cpu is > -1, so the next rto cpu would > be cpu0 still, as a result, the cpu0 would push rt tasks to > cpu1(inactive cpu) while running in the irq_work. > > So we should judge whether the current cpu(the only one active cpu) is > the next loop's cpu. Wait, I'm confused by what you are trying to do here. The rto_next_cpu() only sends an IPI to the next CPU that has more than one RT task queued on it, where the RT tasks can migrate. If we send CPU0 an IPI, let CPU0 figure out to only push to the active mask. Why are trying to prevent sending the IPI to an active CPU? Will the first part of your patch that modifies the cpupri() not keep it from pushing to the non active CPU? -- Steve
On 11/17/22 17:00, Steven Rostedt wrote: > On Mon, 14 Nov 2022 20:04:53 +0800 > Xuewen Yan <xuewen.yan@unisoc.com> wrote: > > > +++ b/kernel/sched/rt.c > > @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd) > > { > > int next; > > int cpu; > > + struct cpumask tmp_cpumask; > > If you have a machine with thousands of CPUs, this will likely kill the > stack. > > > > > /* > > * When starting the IPI RT pushing, the rto_cpu is set to -1, > > @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd) > > /* When rto_cpu is -1 this acts like cpumask_first() */ > > cpu = cpumask_next(rd->rto_cpu, rd->rto_mask); > > > > + cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask); > > + if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 && > > + cpumask_test_cpu(smp_processor_id(), &tmp_cpumask)) > > + break; > > + > > Kill the above. > > > rd->rto_cpu = cpu; > > > > if (cpu < nr_cpu_ids) { > > Why not just add here: > > if (!cpumask_test_cpu(cpu, cpu_active_mask)) > continue; > return cpu; > } > > ? Or this? diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index ed2a47e4ddae..5073e06e34c3 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2236,7 +2236,7 @@ static int rto_next_cpu(struct root_domain *rd) for (;;) { /* When rto_cpu is -1 this acts like cpumask_first() */ - cpu = cpumask_next(rd->rto_cpu, rd->rto_mask); + cpu = cpumask_next_and(rd->rto_cpu, rd->rto_mask, cpu_active_mask); rd->rto_cpu = cpu; I think we might be circumventing the root cause though. It looks like that there are only 2 cpus online in the system and one of them going offline (cpu1) while the other is being overloaded (cpu0), ending in ping-ponging the tasks? If that's the case, it seems to me the rto state needs to be cleared for cpu0 and stop attempting to push tasks. Which I'd assume it usually happens but there's a race that prevents it from realizing this. Maybe something like this would be better? Assuming I understood the problem correctly of course. diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index ed2a47e4ddae..d80072cc2596 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -334,6 +334,10 @@ static inline void rt_set_overload(struct rq *rq) if (!rq->online) return; + /* Overload is meaningless if we shrank to become a UP system */ + if (cpumask_weight(cpu_online_mask) == 1) + return; + cpumask_set_cpu(rq->cpu, rq->rd->rto_mask); /* * Make sure the mask is visible before we set This could be working around the problem too, not sure. But the bug IMHO is that if we shrink to a UP system, the whole push-pull mechanism should retire temporarily. Which I assume this usually happens, but there's a race somewhere. Thanks -- Qais Yousef
diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c index a286e726eb4b..42c40cfdf836 100644 --- a/kernel/sched/cpupri.c +++ b/kernel/sched/cpupri.c @@ -101,6 +101,7 @@ static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p, if (lowest_mask) { cpumask_and(lowest_mask, &p->cpus_mask, vec->mask); + cpumask_and(lowest_mask, lowest_mask, cpu_active_mask); /* * We have to ensure that we have at least one bit diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index ed2a47e4ddae..9433696dae50 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd) { int next; int cpu; + struct cpumask tmp_cpumask; /* * When starting the IPI RT pushing, the rto_cpu is set to -1, @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd) /* When rto_cpu is -1 this acts like cpumask_first() */ cpu = cpumask_next(rd->rto_cpu, rd->rto_mask); + cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask); + if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 && + cpumask_test_cpu(smp_processor_id(), &tmp_cpumask)) + break; + rd->rto_cpu = cpu; if (cpu < nr_cpu_ids)