Message ID | 20230921161634.4063233-1-mark.rutland@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp5034732vqi; Thu, 21 Sep 2023 11:01:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE4EQtt65R5Sb0xNn2eD1JYP3pED1EhJ8ARanBllb3ssfKcmLOCPQ65BlkDeXmBa/klW4DR X-Received: by 2002:a92:c5c7:0:b0:34d:ee65:a8ca with SMTP id s7-20020a92c5c7000000b0034dee65a8camr5891819ilt.24.1695319312394; Thu, 21 Sep 2023 11:01:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695319312; cv=none; d=google.com; s=arc-20160816; b=goCIXqQcIY28vIcmdmMW43WQmTLQ9ghITVm8JOW7Oymf2KHqTFhHULkXPpJc2WD/dP Jca9oqqjmMbgs+g+qXRw91VoVLLbQdVY4DX8Pdl1JtKwx7ypGvEmy9MmBTNs58Va7lkV bIzAxwnQOJzY9hemnsNDzFDEJM0NMIN5dSBTnF5Azsnnwm5dNK9wCedeRrE1NDGb7uTE 0FIW4HDg2vu5AFUnVr7UJO0jPOVVWCqMYXJ28RdzEr9RIy2pkz/ecOyiDqG4Xa3X7ted JCZVn7KGQNq4MTASNjdkH8hoLmdHOq5f6oB0MMdlqIZBmVjdlsg9qKx8acnW9d73Mdqp s/uw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=5VN2Mz7NVfQJwbmTH4qeyMIFcYc2lUHSm+JrP5ilLI4=; fh=xue16Ma4t8k91xmHIQrHTtc23Wg5VDUkzPj6YATSL8A=; b=pPZoUixkJNoWSwHs2kcY/C4rmKb2Olp7ooptGJ2tjYWzsBWzDZHK5TheBoqFCk7bcl 9uaFFQ0Sr+HfknU03pqvICKd5HqMoEmNHCx5WAhqvWUYI9RiMjrw0baT1tToqXW+hhqw Cv0uU7nEIlO34HLmoS/dfUGO6ekojv8ctFJmLEpiYFJWSfCyruimk7YG2yo+sMdaKrUL +juP3r7prJqpcQxbNzf4j5hdcD7uo9+7/mmPBkquMF/0Ub3ZKMpQ/j2Fx/y6r/4wiQXL 4e1iBmIAPGXuS5RbggjUejgF+DgWArKYWQZxuOc6F2XQG4fxYCZsw2kJwvFwwPmhfcOA h7KA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id e16-20020a656890000000b00565f4fb0999si1886318pgt.610.2023.09.21.11.01.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Sep 2023 11:01:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 3C3DE832C9D5; Thu, 21 Sep 2023 10:54:02 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229931AbjIURxn (ORCPT <rfc822;pwkd43@gmail.com> + 29 others); Thu, 21 Sep 2023 13:53:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229924AbjIURxh (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 21 Sep 2023 13:53:37 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id CA5FB66354 for <linux-kernel@vger.kernel.org>; Thu, 21 Sep 2023 10:34:05 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 42EF116F3; Thu, 21 Sep 2023 09:17:16 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 3BC6B3F59C; Thu, 21 Sep 2023 09:16:38 -0700 (PDT) From: Mark Rutland <mark.rutland@arm.com> To: linux-kernel@vger.kernel.org Cc: dianders@chromium.org, keescook@chromium.org, mark.rutland@arm.com, sumit.garg@linaro.org, swboyd@chromium.org Subject: [PATCH v2] lkdtm/bugs: add test for panic() with stuck secondary CPUs Date: Thu, 21 Sep 2023 17:16:34 +0100 Message-Id: <20230921161634.4063233-1-mark.rutland@arm.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 21 Sep 2023 10:54:02 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777671142864697191 X-GMAIL-MSGID: 1777671142864697191 |
Series |
[v2] lkdtm/bugs: add test for panic() with stuck secondary CPUs
|
|
Commit Message
Mark Rutland
Sept. 21, 2023, 4:16 p.m. UTC
Upon a panic() the kernel will use either smp_send_stop() or crash_smp_send_stop() to attempt to stop secondary CPUs via an IPI, which may or may not be an NMI. Generally it's preferable that this is an NMI so that CPUs can be stopped in as many situations as possible, but it's not always possible to provide an NMI, and there are cases where CPUs may be unable to handle the NMI regardless. This patch adds a test for panic() where all other CPUs are stuck with interrupts disabled, which can be used to check whether the kernel gracefully handles CPUs failing to respond to a stop, and whether NMIs actually work to stop CPUs. For example, on arm64 *without* an NMI, this results in: | # echo PANIC_STOP_IRQOFF > /sys/kernel/debug/provoke-crash/DIRECT | lkdtm: Performing direct entry PANIC_STOP_IRQOFF | Kernel panic - not syncing: panic stop irqoff test | CPU: 2 PID: 24 Comm: migration/2 Not tainted 6.5.0-rc3-00077-ge6c782389895-dirty #4 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 | Stopper: multi_cpu_stop+0x0/0x1a0 <- stop_machine_cpuslocked+0x158/0x1a4 | Call trace: | dump_backtrace+0x94/0xec | show_stack+0x18/0x24 | dump_stack_lvl+0x74/0xc0 | dump_stack+0x18/0x24 | panic+0x358/0x3e8 | lkdtm_PANIC+0x0/0x18 | multi_cpu_stop+0x9c/0x1a0 | cpu_stopper_thread+0x84/0x118 | smpboot_thread_fn+0x224/0x248 | kthread+0x114/0x118 | ret_from_fork+0x10/0x20 | SMP: stopping secondary CPUs | SMP: failed to stop secondary CPUs 0-3 | Kernel Offset: 0x401cf3490000 from 0xffff80008000000c0 | PHYS_OFFSET: 0x40000000 | CPU features: 0x00000000,68c167a1,cce6773f | Memory Limit: none | ---[ end Kernel panic - not syncing: panic stop irqoff test ]--- Note the "failed to stop secondary CPUs 0-3" message. On arm64 *with* an NMI, this results in: | # echo PANIC_STOP_IRQOFF > /sys/kernel/debug/provoke-crash/DIRECT | lkdtm: Performing direct entry PANIC_STOP_IRQOFF | Kernel panic - not syncing: panic stop irqoff test | CPU: 1 PID: 19 Comm: migration/1 Not tainted 6.5.0-rc3-00077-ge6c782389895-dirty #4 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 | Stopper: multi_cpu_stop+0x0/0x1a0 <- stop_machine_cpuslocked+0x158/0x1a4 | Call trace: | dump_backtrace+0x94/0xec | show_stack+0x18/0x24 | dump_stack_lvl+0x74/0xc0 | dump_stack+0x18/0x24 | panic+0x358/0x3e8 | lkdtm_PANIC+0x0/0x18 | multi_cpu_stop+0x9c/0x1a0 | cpu_stopper_thread+0x84/0x118 | smpboot_thread_fn+0x224/0x248 | kthread+0x114/0x118 | ret_from_fork+0x10/0x20 | SMP: stopping secondary CPUs | Kernel Offset: 0x55a9c0bc0000 from 0xffff800080000000 | PHYS_OFFSET: 0x40000000 | CPU features: 0x00000000,68c167a1,fce6773f | Memory Limit: none | ---[ end Kernel panic - not syncing: panic stop irqoff test ]--- Note the absence of a "failed to stop secondary CPUs" message, since we don't log anything when secondary CPUs are successfully stopped. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Douglas Anderson <dianders@chromium.org> Cc: Kees Cook <keescook@chromium.org> Cc: Stephen Boyd <swboyd@chromium.org Cc: Sumit Garg <sumit.garg@linaro.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> --- drivers/misc/lkdtm/bugs.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) Since v1 [1]: * Improve commit message * Clarify comment in panic_stop_irqoff_fn() * Drop cpus_read_{lock,unlock}() * Fold in tags [1] https://lore.kernel.org/all/20230831101026.3122590-1-mark.rutland@arm.com/
Comments
Quoting Mark Rutland (2023-09-21 09:16:34) > Upon a panic() the kernel will use either smp_send_stop() or > crash_smp_send_stop() to attempt to stop secondary CPUs via an IPI, > which may or may not be an NMI. Generally it's preferable that this is an > NMI so that CPUs can be stopped in as many situations as possible, but > it's not always possible to provide an NMI, and there are cases where > CPUs may be unable to handle the NMI regardless. > > This patch adds a test for panic() where all other CPUs are stuck with > interrupts disabled, which can be used to check whether the kernel > gracefully handles CPUs failing to respond to a stop, and whether NMIs > actually work to stop CPUs. > > For example, on arm64 *without* an NMI, this results in: > > | # echo PANIC_STOP_IRQOFF > /sys/kernel/debug/provoke-crash/DIRECT > | lkdtm: Performing direct entry PANIC_STOP_IRQOFF > | Kernel panic - not syncing: panic stop irqoff test > | CPU: 2 PID: 24 Comm: migration/2 Not tainted 6.5.0-rc3-00077-ge6c782389895-dirty #4 > | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 > | Stopper: multi_cpu_stop+0x0/0x1a0 <- stop_machine_cpuslocked+0x158/0x1a4 > | Call trace: > | dump_backtrace+0x94/0xec > | show_stack+0x18/0x24 > | dump_stack_lvl+0x74/0xc0 > | dump_stack+0x18/0x24 > | panic+0x358/0x3e8 > | lkdtm_PANIC+0x0/0x18 > | multi_cpu_stop+0x9c/0x1a0 > | cpu_stopper_thread+0x84/0x118 > | smpboot_thread_fn+0x224/0x248 > | kthread+0x114/0x118 > | ret_from_fork+0x10/0x20 > | SMP: stopping secondary CPUs > | SMP: failed to stop secondary CPUs 0-3 > | Kernel Offset: 0x401cf3490000 from 0xffff80008000000c0 > | PHYS_OFFSET: 0x40000000 > | CPU features: 0x00000000,68c167a1,cce6773f > | Memory Limit: none > | ---[ end Kernel panic - not syncing: panic stop irqoff test ]--- > > Note the "failed to stop secondary CPUs 0-3" message. > > On arm64 *with* an NMI, this results in: > > | # echo PANIC_STOP_IRQOFF > /sys/kernel/debug/provoke-crash/DIRECT > | lkdtm: Performing direct entry PANIC_STOP_IRQOFF > | Kernel panic - not syncing: panic stop irqoff test > | CPU: 1 PID: 19 Comm: migration/1 Not tainted 6.5.0-rc3-00077-ge6c782389895-dirty #4 > | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 > | Stopper: multi_cpu_stop+0x0/0x1a0 <- stop_machine_cpuslocked+0x158/0x1a4 > | Call trace: > | dump_backtrace+0x94/0xec > | show_stack+0x18/0x24 > | dump_stack_lvl+0x74/0xc0 > | dump_stack+0x18/0x24 > | panic+0x358/0x3e8 > | lkdtm_PANIC+0x0/0x18 > | multi_cpu_stop+0x9c/0x1a0 > | cpu_stopper_thread+0x84/0x118 > | smpboot_thread_fn+0x224/0x248 > | kthread+0x114/0x118 > | ret_from_fork+0x10/0x20 > | SMP: stopping secondary CPUs > | Kernel Offset: 0x55a9c0bc0000 from 0xffff800080000000 > | PHYS_OFFSET: 0x40000000 > | CPU features: 0x00000000,68c167a1,fce6773f > | Memory Limit: none > | ---[ end Kernel panic - not syncing: panic stop irqoff test ]--- > > Note the absence of a "failed to stop secondary CPUs" message, since we > don't log anything when secondary CPUs are successfully stopped. > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Cc: Douglas Anderson <dianders@chromium.org> > Cc: Kees Cook <keescook@chromium.org> > Cc: Stephen Boyd <swboyd@chromium.org > Cc: Sumit Garg <sumit.garg@linaro.org> > Reviewed-by: Kees Cook <keescook@chromium.org> > Reviewed-by: Douglas Anderson <dianders@chromium.org> > --- Reviewed-by: Stephen Boyd <swboyd@chromium.org>
On Thu, 21 Sep 2023 17:16:34 +0100, Mark Rutland wrote: > Upon a panic() the kernel will use either smp_send_stop() or > crash_smp_send_stop() to attempt to stop secondary CPUs via an IPI, > which may or may not be an NMI. Generally it's preferable that this is an > NMI so that CPUs can be stopped in as many situations as possible, but > it's not always possible to provide an NMI, and there are cases where > CPUs may be unable to handle the NMI regardless. > > [...] I added a line to tests.tst and applied this to for-next/hardening, thanks! [1/1] lkdtm/bugs: add test for panic() with stuck secondary CPUs https://git.kernel.org/kees/c/5fb07db970cf Take care,
diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c index c66cc05a68c45..b080eb2335eba 100644 --- a/drivers/misc/lkdtm/bugs.c +++ b/drivers/misc/lkdtm/bugs.c @@ -6,12 +6,14 @@ * test source files. */ #include "lkdtm.h" +#include <linux/cpu.h> #include <linux/list.h> #include <linux/sched.h> #include <linux/sched/signal.h> #include <linux/sched/task_stack.h> -#include <linux/uaccess.h> #include <linux/slab.h> +#include <linux/stop_machine.h> +#include <linux/uaccess.h> #if IS_ENABLED(CONFIG_X86_32) && !IS_ENABLED(CONFIG_UML) #include <asm/desc.h> @@ -73,6 +75,31 @@ static void lkdtm_PANIC(void) panic("dumptest"); } +static int panic_stop_irqoff_fn(void *arg) +{ + atomic_t *v = arg; + + /* + * As stop_machine() disables interrupts, all CPUs within this function + * have interrupts disabled and cannot take a regular IPI. + * + * The last CPU which enters here will trigger a panic, and as all CPUs + * cannot take a regular IPI, we'll only be able to stop secondaries if + * smp_send_stop() or crash_smp_send_stop() uses an NMI. + */ + if (atomic_inc_return(v) == num_online_cpus()) + panic("panic stop irqoff test"); + + for (;;) + cpu_relax(); +} + +static void lkdtm_PANIC_STOP_IRQOFF(void) +{ + atomic_t v = ATOMIC_INIT(0); + stop_machine(panic_stop_irqoff_fn, &v, cpu_online_mask); +} + static void lkdtm_BUG(void) { BUG(); @@ -638,6 +665,7 @@ static noinline void lkdtm_CORRUPT_PAC(void) static struct crashtype crashtypes[] = { CRASHTYPE(PANIC), + CRASHTYPE(PANIC_STOP_IRQOFF), CRASHTYPE(BUG), CRASHTYPE(WARNING), CRASHTYPE(WARNING_MESSAGE),