Message ID | 20230202014053.3604176-1-zengheng4@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp637353wrn; Wed, 1 Feb 2023 18:28:08 -0800 (PST) X-Google-Smtp-Source: AK7set/6uC/KtIOzdJRcIEgkScYsX8AoT6J9SKfZl29IXQcvpezzsinK1w2VLjxHGP4enCIYU7xH X-Received: by 2002:a17:906:3682:b0:88b:a30:25e9 with SMTP id a2-20020a170906368200b0088b0a3025e9mr4201193ejc.10.1675304888005; Wed, 01 Feb 2023 18:28:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675304887; cv=none; d=google.com; s=arc-20160816; b=W3ERUa+3Ai6ozGUx4w8cclBMnIRObM68e9X2oAuBevp+bV1msw/Fv/vpUXG4kvyYqz t46+vRqY2oTs3xgCBDq2wATaIpgjS4z3bAkvq+S4NAXGxFsx8SMjZHfYWUUbUvo4F8fu LfmkBXNqkdN8hIzG+pcYR0vqAJoZu6zL9E084e3cYrW249UHMEXRJx1rc1CyN4xNSXet enoNCSY81NB7cxN3mFtNXQGYzCpAZXawOJItaI3BDvjg1Zpc0Wh8H01X4c6DwQPXNd0/ 1eBFGDSso/8vyqyBOm3p5lbH4+6poZJVQZWRCsb5xCM0ySIEG/tIqyJTezIi7ZjTBoZ5 x9qw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=7frwMOCEATQwBAcyGhd4N70HkhY5ksC5xC/bNQMR67s=; b=gOsFhk9oMj2NYNug62BOfYP5HRvNlLIFl/VvKFscWffzAhHvwI396k3/WKQ0llxOD9 rmVAsZw3UtGaPIRvFTBTFHwLimDPUEW1qt1gQ4pM/lFFq63fr7qgenx483DDyLkpwK1F oeTb0VWHNN91uVJPmkKI5NwCQbesPuE7LDTbj/hBBKswKNo60VuLWqdM+F+1C/Q5FOlT 2S4Yc7lFSB1ZLfiFHm5cmyOrSpsYGbN81dKsenwgE61P/1/BLvALiqZWYJBoi+nhuuIX PwTEHQ4On1cs3XDrp16Lx/pMCZSYENBnka+RpJZrGThLyvtIxkmH9nnGgkEuXDU/m6Ue s0hg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z11-20020a50eb4b000000b0049ec7fd6bbesi23054179edp.371.2023.02.01.18.27.43; Wed, 01 Feb 2023 18:28:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230048AbjBBBlP (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others); Wed, 1 Feb 2023 20:41:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229551AbjBBBlO (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 1 Feb 2023 20:41:14 -0500 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 155DB7750D for <linux-kernel@vger.kernel.org>; Wed, 1 Feb 2023 17:41:12 -0800 (PST) Received: from kwepemi500024.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4P6hMH269ZzfYn6; Thu, 2 Feb 2023 09:40:59 +0800 (CST) Received: from huawei.com (10.175.103.91) by kwepemi500024.china.huawei.com (7.221.188.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Thu, 2 Feb 2023 09:41:08 +0800 From: Zeng Heng <zengheng4@huawei.com> To: <mingo@redhat.com>, <bp@alien8.de>, <jroedel@suse.de>, <vbabka@suse.cz>, <hpa@zytor.com>, <tglx@linutronix.de>, <eric.devolder@oracle.com>, <bhe@redhat.com>, <tiwai@suse.de>, <keescook@chromium.org>, <dave.hansen@linux.intel.com> CC: <linux-kernel@vger.kernel.org>, <x86@kernel.org>, <liwei391@huawei.com>, <xiexiuqi@huawei.com> Subject: [PATCH v3] x86/kdump: Handle blocked NMIs interrupt to avoid kdump crashes Date: Thu, 2 Feb 2023 09:40:53 +0800 Message-ID: <20230202014053.3604176-1-zengheng4@huawei.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.103.91] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemi500024.china.huawei.com (7.221.188.100) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756684498126207306?= X-GMAIL-MSGID: =?utf-8?q?1756684498126207306?= |
Series |
[v3] x86/kdump: Handle blocked NMIs interrupt to avoid kdump crashes
|
|
Commit Message
Zeng Heng
Feb. 2, 2023, 1:40 a.m. UTC
If the cpu panics within the NMI interrupt context,
there could be unhandled NMI interrupts in the
background which are blocked by processor until
next IRET instruction executes. Since that, it
prevents nested execution of the NMI handler.
In case of IRET execution during kdump reboot and
no proper NMIs handler registered at that point
(such as during EFI loader), we need to handle these
blocked NMI interrupts in advance to avoid kdump
crashes.
Because asm_exc_nmi() has the ability to handle
nested NMIs, here call iret_to_self() and execute
IRET instruction in order to trigger and handle the
possible blocked NMIs interrupts in advance before
the IDT set invalidate.
Provide one of test case to reproduce the concerned
issue, and here is the steps:
1. # cat uncorrected
CPU 1 BANK 4
STATUS uncorrected 0xc0
MCGSTATUS EIPV MCIP
ADDR 0x1234
RIP 0xdeadbabe
RAISINGCPU 0
MCGCAP SER CMCI TES 0x6
2. # modprobe mce_inject
3. # mce-inject uncorrected
Mce-inject would trigger kernel panic under NMI
interrupt context. In addition, we need another NMI
interrupt raise (such as from watchdog) during panic
process. Set proper watchdog threshold value and/or
add an artificial delay to make sure watchdog interrupt
raise during the panic procedure and the involved
issue would occur.
Fixes: ca0e22d4f011 ("x86/boot/compressed/64: Always switch to own page table")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
---
v1: add dummy NMI interrupt handler in EFI loader
v2: tidy up changelog, add comments (by Ingo Molnar)
v3: add iret_to_self() to deal with blocked NMIs in advance
arch/x86/kernel/crash.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
--
2.25.1
Comments
On Thu, Feb 02, 2023 at 09:40:53AM +0800, Zeng Heng wrote: > If the cpu panics within the NMI interrupt context, > there could be unhandled NMI interrupts in the > background which are blocked by processor until > next IRET instruction executes. Since that, it > prevents nested execution of the NMI handler. > > In case of IRET execution during kdump reboot and > no proper NMIs handler registered at that point > (such as during EFI loader), we need to handle these > blocked NMI interrupts in advance to avoid kdump > crashes. > > Because asm_exc_nmi() has the ability to handle > nested NMIs, here call iret_to_self() and execute > IRET instruction in order to trigger and handle the > possible blocked NMIs interrupts in advance before > the IDT set invalidate. > > Provide one of test case to reproduce the concerned > issue, and here is the steps: > 1. # cat uncorrected > CPU 1 BANK 4 > STATUS uncorrected 0xc0 > MCGSTATUS EIPV MCIP > ADDR 0x1234 > RIP 0xdeadbabe > RAISINGCPU 0 > MCGCAP SER CMCI TES 0x6 > 2. # modprobe mce_inject > 3. # mce-inject uncorrected > > Mce-inject would trigger kernel panic under NMI > interrupt context. In addition, we need another NMI > interrupt raise (such as from watchdog) during panic > process. Set proper watchdog threshold value and/or > add an artificial delay to make sure watchdog interrupt > raise during the panic procedure and the involved > issue would occur. > > Fixes: ca0e22d4f011 ("x86/boot/compressed/64: Always switch to own page table") > Signed-off-by: Zeng Heng <zengheng4@huawei.com> > Suggested-by: Borislav Petkov <bp@alien8.de> > --- > v1: add dummy NMI interrupt handler in EFI loader > v2: tidy up changelog, add comments (by Ingo Molnar) > v3: add iret_to_self() to deal with blocked NMIs in advance > > arch/x86/kernel/crash.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c > index 305514431f26..3aaca680a639 100644 > --- a/arch/x86/kernel/crash.c > +++ b/arch/x86/kernel/crash.c > @@ -41,6 +41,7 @@ > #include <asm/intel_pt.h> > #include <asm/crash.h> > #include <asm/cmdline.h> > +#include <asm/sync_core.h> > > /* Used while preparing memory map entries for second kernel */ > struct crash_memmap_data { > @@ -143,6 +144,19 @@ void native_machine_crash_shutdown(struct pt_regs *regs) > > crash_smp_send_stop(); > > + /* > + * If the cpu panics within the NMI interrupt context, > + * there may be unhandled NMI interrupts which are > + * blocked by processor until next IRET instruction > + * executes. > + * > + * In case of IRET execution during kdump reboot and > + * no proper NMIs handler registered at that point, > + * we trigger and handle blocked NMIs in advance to > + * avoid kdump crashes. > + */ > + iret_to_self(); > + > /* > * VMCLEAR VMCSs loaded on this cpu if needed. > */ I never remember the shutdown paths -- do we force wipe the PMU registers somewhere before this?
在 2023/2/2 17:09, Peter Zijlstra 写道: > On Thu, Feb 02, 2023 at 09:40:53AM +0800, Zeng Heng wrote: >> If the cpu panics within the NMI interrupt context, >> there could be unhandled NMI interrupts in the >> background which are blocked by processor until >> next IRET instruction executes. Since that, it >> prevents nested execution of the NMI handler. >> >> In case of IRET execution during kdump reboot and >> no proper NMIs handler registered at that point >> (such as during EFI loader), we need to handle these >> blocked NMI interrupts in advance to avoid kdump >> crashes. >> >> Because asm_exc_nmi() has the ability to handle >> nested NMIs, here call iret_to_self() and execute >> IRET instruction in order to trigger and handle the >> possible blocked NMIs interrupts in advance before >> the IDT set invalidate. >> >> Provide one of test case to reproduce the concerned >> issue, and here is the steps: >> 1. # cat uncorrected >> CPU 1 BANK 4 >> STATUS uncorrected 0xc0 >> MCGSTATUS EIPV MCIP >> ADDR 0x1234 >> RIP 0xdeadbabe >> RAISINGCPU 0 >> MCGCAP SER CMCI TES 0x6 >> 2. # modprobe mce_inject >> 3. # mce-inject uncorrected >> >> Mce-inject would trigger kernel panic under NMI >> interrupt context. In addition, we need another NMI >> interrupt raise (such as from watchdog) during panic >> process. Set proper watchdog threshold value and/or >> add an artificial delay to make sure watchdog interrupt >> raise during the panic procedure and the involved >> issue would occur. >> >> Fixes: ca0e22d4f011 ("x86/boot/compressed/64: Always switch to own page table") >> Signed-off-by: Zeng Heng <zengheng4@huawei.com> >> Suggested-by: Borislav Petkov <bp@alien8.de> >> --- >> v1: add dummy NMI interrupt handler in EFI loader >> v2: tidy up changelog, add comments (by Ingo Molnar) >> v3: add iret_to_self() to deal with blocked NMIs in advance >> >> arch/x86/kernel/crash.c | 14 ++++++++++++++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c >> index 305514431f26..3aaca680a639 100644 >> --- a/arch/x86/kernel/crash.c >> +++ b/arch/x86/kernel/crash.c >> @@ -41,6 +41,7 @@ >> #include <asm/intel_pt.h> >> #include <asm/crash.h> >> #include <asm/cmdline.h> >> +#include <asm/sync_core.h> >> >> /* Used while preparing memory map entries for second kernel */ >> struct crash_memmap_data { >> @@ -143,6 +144,19 @@ void native_machine_crash_shutdown(struct pt_regs *regs) >> >> crash_smp_send_stop(); >> >> + /* >> + * If the cpu panics within the NMI interrupt context, >> + * there may be unhandled NMI interrupts which are >> + * blocked by processor until next IRET instruction >> + * executes. >> + * >> + * In case of IRET execution during kdump reboot and >> + * no proper NMIs handler registered at that point, >> + * we trigger and handle blocked NMIs in advance to >> + * avoid kdump crashes. >> + */ >> + iret_to_self(); >> + >> /* >> * VMCLEAR VMCSs loaded on this cpu if needed. >> */ > I never remember the shutdown paths -- do we force wipe the PMU > registers somewhere before this? I have checked the panic process, and there is no wipe operation for PMU registers, which causes the watchdog bites. Do you mean we should directly disable PMU registers instead of calling `iret_to_self` to consume blocked NMI interrupts ? Best regards, zeng heng
On Tue, Feb 14, 2023 at 05:30:46PM +0800, Zeng Heng wrote: > > I never remember the shutdown paths -- do we force wipe the PMU > > registers somewhere before this? > > I have checked the panic process, and there is no wipe operation for PMU > registers, > > which causes the watchdog bites. > > Do you mean we should directly disable PMU registers instead of calling > `iret_to_self` to > > consume blocked NMI interrupts ? If you don't wipe the PMU, there will be many and continuous NMIs, a single IRET-to-SELF isn't going to safe you. Anyway, I had a bit of a grep around and I find we have: kernel/events/core.c: register_reboot_notifier(&perf_reboot_notifier); which should end up killing all the PMU activity. Somewhere around there there's also a CONFIG_KEXEC_CORE ifdef, so I'm thinking it gets called on the panic->crash-kernel path too? If not, someone should look at doing something there.
Add kexec list to CC. On 02/14/23 at 10:49am, Peter Zijlstra wrote: > On Tue, Feb 14, 2023 at 05:30:46PM +0800, Zeng Heng wrote: > > > > I never remember the shutdown paths -- do we force wipe the PMU > > > registers somewhere before this? > > > > I have checked the panic process, and there is no wipe operation for PMU > > registers, > > > > which causes the watchdog bites. > > > > Do you mean we should directly disable PMU registers instead of calling > > `iret_to_self` to > > > > consume blocked NMI interrupts ? > > If you don't wipe the PMU, there will be many and continuous NMIs, a > single IRET-to-SELF isn't going to safe you. > > Anyway, I had a bit of a grep around and I find we have: > > kernel/events/core.c: register_reboot_notifier(&perf_reboot_notifier); > > which should end up killing all the PMU activity. Somewhere around there > there's also a CONFIG_KEXEC_CORE ifdef, so I'm thinking it gets called > on the panic->crash-kernel path too? No, reboot_notifier_list is only handled in kexec reboot/reboot path, please see kernel_restart_prepare() invocation. Kdump path only shutdown key component like cpu, interrupt controller. > > If not, someone should look at doing something there. >
在 2023/2/15 9:01, Baoquan He 写道: > Add kexec list to CC. > > On 02/14/23 at 10:49am, Peter Zijlstra wrote: >> On Tue, Feb 14, 2023 at 05:30:46PM +0800, Zeng Heng wrote: >> >>>> I never remember the shutdown paths -- do we force wipe the PMU >>>> registers somewhere before this? >>> I have checked the panic process, and there is no wipe operation for PMU >>> registers, >>> >>> which causes the watchdog bites. >>> >>> Do you mean we should directly disable PMU registers instead of calling >>> `iret_to_self` to >>> >>> consume blocked NMI interrupts ? >> If you don't wipe the PMU, there will be many and continuous NMIs, a >> single IRET-to-SELF isn't going to safe you. >> >> Anyway, I had a bit of a grep around and I find we have: >> >> kernel/events/core.c: register_reboot_notifier(&perf_reboot_notifier); >> >> which should end up killing all the PMU activity. Somewhere around there >> there's also a CONFIG_KEXEC_CORE ifdef, so I'm thinking it gets called >> on the panic->crash-kernel path too? > No, reboot_notifier_list is only handled in kexec reboot/reboot path, > please see kernel_restart_prepare() invocation. Kdump path only shutdown > key component like cpu, interrupt controller. I would replace iret_to_self() with perf_event_exit_cpu() in kdump shutdown path (in native_machine_crash_shutdown()). After test, I would send v4 later. Thanks all, Zeng Heng
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 305514431f26..3aaca680a639 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -41,6 +41,7 @@ #include <asm/intel_pt.h> #include <asm/crash.h> #include <asm/cmdline.h> +#include <asm/sync_core.h> /* Used while preparing memory map entries for second kernel */ struct crash_memmap_data { @@ -143,6 +144,19 @@ void native_machine_crash_shutdown(struct pt_regs *regs) crash_smp_send_stop(); + /* + * If the cpu panics within the NMI interrupt context, + * there may be unhandled NMI interrupts which are + * blocked by processor until next IRET instruction + * executes. + * + * In case of IRET execution during kdump reboot and + * no proper NMIs handler registered at that point, + * we trigger and handle blocked NMIs in advance to + * avoid kdump crashes. + */ + iret_to_self(); + /* * VMCLEAR VMCSs loaded on this cpu if needed. */