Message ID | 20230217120604.435608-1-zengheng4@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp851889wrn; Fri, 17 Feb 2023 04:16:13 -0800 (PST) X-Google-Smtp-Source: AK7set9gsRygwcdKzwkQQUJv+aQQ+XZjvzKxY9Nvv5JAbf9/v8P6NERydNcSXzBjN89amrAferm6 X-Received: by 2002:a17:907:a0c6:b0:873:1b57:b27f with SMTP id hw6-20020a170907a0c600b008731b57b27fmr9625145ejc.61.1676636173216; Fri, 17 Feb 2023 04:16:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676636173; cv=none; d=google.com; s=arc-20160816; b=kUF+GPk5Y0rx0ZW+Dzsmw1hrRPHxRc479gHRwnJDs7P9KeqdqH77+wU/8JuQ0uEW8G DpYW+0IBQYVpwSsz1eHTXwwDmpckbOt3oprtyca+McInqM64KqF3HoTaoSBAkFLHGD0H jLVMUUXH83oB5UcGHThRdW4E2fQ081OHZlDJZHpvh0V7UUNhkuc7sreVF6UvhGrHsnsQ aTgLJYrCFRJzm3GhIZ12aHhDSV/HfTAK6OHfvwQMocDNrfHhPgDANQEuuN5aU5leJ3Zm Tp/SxR2hecSWm7qTk8OKVd/AoiRDYWQxs48LHzCOqnmCc+Zt6AkT6YkVJvRb2DRAm0H2 FKuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=vKuYV6obhCvFll/nB9hDzoKxKl59nznwMjWSPCgPx14=; b=TaLKzJiIzdaQceucZJTMtY5KaBV5mJZjt2eVpW5kDRWoX/5eyMv77bakst5JXH+q1g 00jUR8F0mNzD4g+tkqyakz6RcR56lrvYv5mIEjBR9KYZ41nhNy7LsF6/8kfdcNCae5ZE 7I8NtzkygQzZ7lStXE2DMePhbLJ2U7Y5OYpPxrjoLwRYJ9DZyd0pHdoeOjGTwQiHIVJU vSGAD140p+LXK9weQBNsQPTu+o1qvOmJ0sn9iqEOIbY/SW89lRB/DBjw3yiu/pxUitL9 tOiUjmWckGO9REXLVTVVzJDIB3KSeORAXk2Uuoy5/dwJXXXHsCYM0pbaAx+FMSc5DsJB 4vTQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gk15-20020a17090790cf00b008b14bb34aa1si4008338ejb.121.2023.02.17.04.15.50; Fri, 17 Feb 2023 04:16:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229904AbjBQMG5 (ORCPT <rfc822;aimixsaka@gmail.com> + 99 others); Fri, 17 Feb 2023 07:06:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229564AbjBQMGz (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 17 Feb 2023 07:06:55 -0500 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75F88644DD; Fri, 17 Feb 2023 04:06:53 -0800 (PST) Received: from kwepemi500024.china.huawei.com (unknown [172.30.72.57]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4PJ9R10kjGzDsWL; Fri, 17 Feb 2023 20:02:05 +0800 (CST) Received: from huawei.com (10.175.103.91) by kwepemi500024.china.huawei.com (7.221.188.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Fri, 17 Feb 2023 20:06:49 +0800 From: Zeng Heng <zengheng4@huawei.com> To: <alexander.shishkin@linux.intel.com>, <tglx@linutronix.de>, <peterz@infradead.org>, <tiwai@suse.de>, <jolsa@kernel.org>, <vbabka@suse.cz>, <keescook@chromium.org>, <mingo@redhat.com>, <acme@kernel.org>, <namhyung@kernel.org>, <bp@alien8.de>, <bhe@redhat.com>, <eric.devolder@oracle.com>, <hpa@zytor.com>, <jroedel@suse.de>, <dave.hansen@linux.intel.com> CC: <linux-perf-users@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <liwei391@huawei.com>, <x86@kernel.org>, <xiexiuqi@huawei.com> Subject: [RFC PATCH v4] x86/kdump: terminate watchdog NMI interrupt to avoid kdump crashes Date: Fri, 17 Feb 2023 20:06:04 +0800 Message-ID: <20230217120604.435608-1-zengheng4@huawei.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.103.91] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemi500024.china.huawei.com (7.221.188.100) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758080452046505934?= X-GMAIL-MSGID: =?utf-8?q?1758080452046505934?= |
Series |
[RFC,v4] x86/kdump: terminate watchdog NMI interrupt to avoid kdump crashes
|
|
Commit Message
Zeng Heng
Feb. 17, 2023, 12:06 p.m. UTC
If the cpu panics within the NMI interrupt context, there could be
unhandled NMI interrupts in the background which are blocked by processor
until next IRET instruction executes. Since that, it prevents nested
NMI handler execution.
In case of IRET execution during kdump reboot and no proper NMIs handler
registered at that point (such as during EFI loader), we need to ensure
watchdog no work any more, or kdump would crash later. So call
perf_event_exit_cpu() at the very last moment in the panic shutdown.
!! Here I know it's not allowed to call perf_event_exit_cpu() within nmi
context, because of mutex_lock, smp_call_function and so on.
Is there any experts know about the similar function which allowed to call
within atomic context (Neither x86_pmu_disable() nor x86_pmu_disable_all()
do work after my practice)?
Thank you in advance.
Here provide one of test case to reproduce the concerned issue:
1. # cat uncorrected
CPU 1 BANK 4
STATUS uncorrected 0xc0
MCGSTATUS EIPV MCIP
ADDR 0x1234
RIP 0xdeadbabe
RAISINGCPU 0
MCGCAP SER CMCI TES 0x6
2. # modprobe mce_inject
3. # mce-inject uncorrected
Mce-inject would trigger kernel panic under NMI interrupt context. In
addition, we need another NMI interrupt raise (such as from watchdog)
during panic process. Set proper watchdog threshold value and/or add an
artificial delay to make sure watchdog interrupt raise during the panic
procedure and the involved issue would occur.
Fixes: ca0e22d4f011 ("x86/boot/compressed/64: Always switch to own page table")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
---
v1: add dummy NMI interrupt handler in EFI loader
v2: tidy up changelog, add comments (by Ingo Molnar)
v3: add iret_to_self() to deal with blocked NMIs in advance
v4: call perf_event_exit_cpu() to terminate watchdog in panic shutdown
arch/x86/kernel/crash.c | 10 ++++++++++
1 file changed, 10 insertions(+)
--
2.25.1
Comments
On Fri, Feb 17, 2023 at 08:06:04PM +0800, Zeng Heng wrote: > If the cpu panics within the NMI interrupt context, there could be > unhandled NMI interrupts in the background which are blocked by processor > until next IRET instruction executes. Since that, it prevents nested > NMI handler execution. > > In case of IRET execution during kdump reboot and no proper NMIs handler > registered at that point (such as during EFI loader), we need to ensure > watchdog no work any more, or kdump would crash later. So call > perf_event_exit_cpu() at the very last moment in the panic shutdown. > > !! Here I know it's not allowed to call perf_event_exit_cpu() within nmi > context, because of mutex_lock, smp_call_function and so on. > Is there any experts know about the similar function which allowed to call > within atomic context (Neither x86_pmu_disable() nor x86_pmu_disable_all() > do work after my practice)? > > Thank you in advance. > > Here provide one of test case to reproduce the concerned issue: > 1. # cat uncorrected > CPU 1 BANK 4 > STATUS uncorrected 0xc0 > MCGSTATUS EIPV MCIP > ADDR 0x1234 > RIP 0xdeadbabe > RAISINGCPU 0 > MCGCAP SER CMCI TES 0x6 > 2. # modprobe mce_inject > 3. # mce-inject uncorrected > > Mce-inject would trigger kernel panic under NMI interrupt context. In > addition, we need another NMI interrupt raise (such as from watchdog) > during panic process. Set proper watchdog threshold value and/or add an > artificial delay to make sure watchdog interrupt raise during the panic > procedure and the involved issue would occur. > > Fixes: ca0e22d4f011 ("x86/boot/compressed/64: Always switch to own page table") > Signed-off-by: Zeng Heng <zengheng4@huawei.com> > --- > v1: add dummy NMI interrupt handler in EFI loader > v2: tidy up changelog, add comments (by Ingo Molnar) > v3: add iret_to_self() to deal with blocked NMIs in advance > v4: call perf_event_exit_cpu() to terminate watchdog in panic shutdown > > arch/x86/kernel/crash.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c > index 305514431f26..f46df94bbdad 100644 > --- a/arch/x86/kernel/crash.c > +++ b/arch/x86/kernel/crash.c > @@ -25,6 +25,7 @@ > #include <linux/slab.h> > #include <linux/vmalloc.h> > #include <linux/memblock.h> > +#include <linux/perf_event.h> > > #include <asm/processor.h> > #include <asm/hardirq.h> > @@ -170,6 +171,15 @@ void native_machine_crash_shutdown(struct pt_regs *regs) > #ifdef CONFIG_HPET_TIMER > hpet_disable(); > #endif > + > + /* > + * If the cpu panics within the NMI interrupt context, > + * we need to ensure no more NMI interrupts blocked by > + * processor. In case of IRET execution during kdump > + * path and no proper NMIs handler registered at that > + * point, here terminate watchdog in panic shutdown. > + */ > + perf_event_exit_cpu(smp_processor_id()); This kills all of perf, including but not limited to the hardware watchdog. However, it does nothing to external NMI sources like the NMI button found on some HP machines. Still I suppose it is sufficient for the normal case. > crash_save_cpu(regs, safe_smp_processor_id()); > } > > -- > 2.25.1 >
Peter Zijlstra <peterz@infradead.org> writes: > On Fri, Feb 17, 2023 at 08:06:04PM +0800, Zeng Heng wrote: >> If the cpu panics within the NMI interrupt context, there could be >> unhandled NMI interrupts in the background which are blocked by processor >> until next IRET instruction executes. Since that, it prevents nested >> NMI handler execution. >> >> In case of IRET execution during kdump reboot and no proper NMIs handler >> registered at that point (such as during EFI loader) EFI loader? kexec on panic is supposed to be kernel to kernel. If someone is getting EFI involved that is a bug. >>, we need to ensure >> watchdog no work any more, or kdump would crash later. So call >> perf_event_exit_cpu() at the very last moment in the panic shutdown. Why can't the crash recovery kernel handle this? Sometimes we very much do have cases where the crash recovery kernel can not handle it and we can in the dying kernel. But every line of code that is added to the code path the crashing kernel takes increases the probability that something will go wrong and a crash will not be captured. >> !! Here I know it's not allowed to call perf_event_exit_cpu() within nmi >> context, because of mutex_lock, smp_call_function and so on. >> Is there any experts know about the similar function which allowed to call >> within atomic context (Neither x86_pmu_disable() nor x86_pmu_disable_all() >> do work after my practice)? >> >> Thank you in advance. >> >> Here provide one of test case to reproduce the concerned issue: >> 1. # cat uncorrected >> CPU 1 BANK 4 >> STATUS uncorrected 0xc0 >> MCGSTATUS EIPV MCIP >> ADDR 0x1234 >> RIP 0xdeadbabe >> RAISINGCPU 0 >> MCGCAP SER CMCI TES 0x6 >> 2. # modprobe mce_inject >> 3. # mce-inject uncorrected >> >> Mce-inject would trigger kernel panic under NMI interrupt context. In >> addition, we need another NMI interrupt raise (such as from watchdog) >> during panic process. Set proper watchdog threshold value and/or add an >> artificial delay to make sure watchdog interrupt raise during the panic >> procedure and the involved issue would occur. >> >> Fixes: ca0e22d4f011 ("x86/boot/compressed/64: Always switch to own page table") >> Signed-off-by: Zeng Heng <zengheng4@huawei.com> >> --- >> v1: add dummy NMI interrupt handler in EFI loader >> v2: tidy up changelog, add comments (by Ingo Molnar) >> v3: add iret_to_self() to deal with blocked NMIs in advance >> v4: call perf_event_exit_cpu() to terminate watchdog in panic shutdown >> >> arch/x86/kernel/crash.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c >> index 305514431f26..f46df94bbdad 100644 >> --- a/arch/x86/kernel/crash.c >> +++ b/arch/x86/kernel/crash.c >> @@ -25,6 +25,7 @@ >> #include <linux/slab.h> >> #include <linux/vmalloc.h> >> #include <linux/memblock.h> >> +#include <linux/perf_event.h> >> >> #include <asm/processor.h> >> #include <asm/hardirq.h> >> @@ -170,6 +171,15 @@ void native_machine_crash_shutdown(struct pt_regs *regs) >> #ifdef CONFIG_HPET_TIMER >> hpet_disable(); >> #endif >> + >> + /* >> + * If the cpu panics within the NMI interrupt context, >> + * we need to ensure no more NMI interrupts blocked by >> + * processor. In case of IRET execution during kdump >> + * path and no proper NMIs handler registered at that >> + * point, here terminate watchdog in panic shutdown. >> + */ >> + perf_event_exit_cpu(smp_processor_id()); > > This kills all of perf, including but not limited to the hardware > watchdog. However, it does nothing to external NMI sources like the NMI > button found on some HP machines. > > Still I suppose it is sufficient for the normal case. Except the architecture appears to be wrong. I don't see any explanation and I can't think of one why we don't just leave NMIs deliberately disabled until the crash recover kernel figured out how to enable them safely. Eric >> crash_save_cpu(regs, safe_smp_processor_id()); >> } >> >> -- >> 2.25.1 >>
在 2023/2/23 2:39, Eric W. Biederman 写道: > Peter Zijlstra <peterz@infradead.org> writes: > >> On Fri, Feb 17, 2023 at 08:06:04PM +0800, Zeng Heng wrote: >>> If the cpu panics within the NMI interrupt context, there could be >>> unhandled NMI interrupts in the background which are blocked by processor >>> until next IRET instruction executes. Since that, it prevents nested >>> NMI handler execution. >>> >>> In case of IRET execution during kdump reboot and no proper NMIs handler >>> registered at that point (such as during EFI loader) > EFI loader? kexec on panic is supposed to be kernel to kernel. > If someone is getting EFI involved that is a bug. In kdump path, kexec would start purgatory to verify the secondary kernel by sha256. If verify passed, it would turn the control to EFI loader, and call the second kernel to capture the environment as vmcore file. As the mail said, if panic appears within NMI context, we never exit from that until EFI loader handles page fault exception and executes IRET instruction when exit from PF. At this moment, processor would allow the blocked NMI interrupt raise. >> This kills all of perf, including but not limited to the hardware >> watchdog. However, it does nothing to external NMI sources like the NMI >> button found on some HP machines. >> >> Still I suppose it is sufficient for the normal case. > I can't think of one why we don't just leave > NMIs deliberately disabled How to just leave NMIs disabled, could you explain it with more details ? Zeng Heng > until the crash recover kernel figured out how to enable them safely. >
在 2023/2/23 10:29, Zeng Heng 写道: > > 在 2023/2/23 2:39, Eric W. Biederman 写道: >> Peter Zijlstra <peterz@infradead.org> writes: >> >>> On Fri, Feb 17, 2023 at 08:06:04PM +0800, Zeng Heng wrote: >>>> If the cpu panics within the NMI interrupt context, there could be >>>> unhandled NMI interrupts in the background which are blocked by >>>> processor >>>> until next IRET instruction executes. Since that, it prevents nested >>>> NMI handler execution. >>>> >>>> In case of IRET execution during kdump reboot and no proper NMIs >>>> handler >>>> registered at that point (such as during EFI loader) >> EFI loader? kexec on panic is supposed to be kernel to kernel. >> If someone is getting EFI involved that is a bug. > > In kdump path, kexec would start purgatory to verify the secondary > kernel by > > sha256. If verify passed, it would turn the control to EFI loader, and > call the second > > kernel to capture the environment as vmcore file. > > As the mail said, if panic appears within NMI context, we never exit > from that until > > EFI loader handles page fault exception and executes IRET instruction > when exit > > from PF. At this moment, processor would allow the blocked NMI > interrupt raise. > > >>> This kills all of perf, including but not limited to the hardware >>> watchdog. However, it does nothing to external NMI sources like the NMI >>> button found on some HP machines. >>> >>> Still I suppose it is sufficient for the normal case. >> I can't think of one why we don't just leave >> NMIs deliberately disabled > Inative_machine_crash_shutdown() has called lapic_shutdown() to disable any kind of irq, but EFI loader assumes there is no any residual NMIs in the background. Here is the first version for this issue: https://lore.kernel.org/all/20230110102745.2514694-1-zengheng4@huawei.com/ Zeng Heng > >> until the crash recover kernel figured out how to enable them safely. >> >
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 305514431f26..f46df94bbdad 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -25,6 +25,7 @@ #include <linux/slab.h> #include <linux/vmalloc.h> #include <linux/memblock.h> +#include <linux/perf_event.h> #include <asm/processor.h> #include <asm/hardirq.h> @@ -170,6 +171,15 @@ void native_machine_crash_shutdown(struct pt_regs *regs) #ifdef CONFIG_HPET_TIMER hpet_disable(); #endif + + /* + * If the cpu panics within the NMI interrupt context, + * we need to ensure no more NMI interrupts blocked by + * processor. In case of IRET execution during kdump + * path and no proper NMIs handler registered at that + * point, here terminate watchdog in panic shutdown. + */ + perf_event_exit_cpu(smp_processor_id()); crash_save_cpu(regs, safe_smp_processor_id()); }