From patchwork Mon Nov 13 09:11:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: tip-bot2 for Thomas Gleixner X-Patchwork-Id: 164381 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b909:0:b0:403:3b70:6f57 with SMTP id t9csp1083347vqg; Mon, 13 Nov 2023 01:12:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IGqG5LR8Nolym7OnnyiaoQPt0OfYrp6gdNjhhArBOFBLsahp0+ouB9BTrZAFqVWRh6++CnQ X-Received: by 2002:a05:6358:2608:b0:16b:a9d6:a7b5 with SMTP id l8-20020a056358260800b0016ba9d6a7b5mr7652314rwc.13.1699866745486; Mon, 13 Nov 2023 01:12:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699866745; cv=none; d=google.com; s=arc-20160816; b=N4KPFMA5sXLiYjPSQP1yNnbLgukCOef5wpHJmD7CImHemUYZ9Ek+tGVbSf/0km91ym lZQvw2sQRScig/PznPTNGw9GjykHFJL9Z9WXq6fGs9LE+d9e4RnYGBaF5hLoyZGtoCEZ ojD7TKkMuxK0Dmxkr+ElpVHmqo2Jw2nKNoJQbNnxHYsNgrS+nZmp3hZGX8RxqzqiANUv s3TW2Mw+F3WsB4DW41OEQBjCa90wWzHS9yG/BHN0b03mXyy6kZdubZcQZUDZXeLPj0Np aL9F+zhmim1Sg6XtOZdCGincaMq53rvxUaTsoIhEu/t/9tnPfqmi/gi1bTb5U3wFOebZ Y1Xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:references:in-reply-to:cc:subject :to:reply-to:sender:from:dkim-signature:dkim-signature:date; bh=XcsUChw2M9jrnCNychnCntai6zNpE2L9dML49RwHU9Q=; fh=hiTKwefEd832GltDW4MJ/Xzb4c4y2e/FuH05c8YgukA=; b=LzA40MJlkCzTAJESD2i+9M03ZODrh7xoqSdzC0ppJ9ZI2AFMcfp/kfgyOQFMoGRT7I jB4f3H2qn63C5ngxFRDfo4AzyMPm5qREes6EoR67bmL4sijXBmh4pwguqgcwjPvnUpHt QxZ7YGid/htwTbOXqAiaCKfuV80RQsqWltGUj+KWzPHmxYgtck7teZvkCZrTM8CIyzHN 10xafNXLb5lM7jpSRXE7+7/4CdkZGUXBXaVVQxk8urWIYBrSUr5IKGceSgmD+VZnErtz qxrxU0ZkoascUfqNadAMSZCstaMLbMFYHexY3PgpRTPQpS7OGQj2PgMB1TPn1kw62+JZ v5OQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Jl8wvDG8; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=FJcgaGE4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id t64-20020a638143000000b005ae22729b09si5104994pgd.683.2023.11.13.01.12.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Nov 2023 01:12:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Jl8wvDG8; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=FJcgaGE4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id F18B1805360B; Mon, 13 Nov 2023 01:12:22 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233238AbjKMJL5 (ORCPT + 29 others); Mon, 13 Nov 2023 04:11:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229817AbjKMJL4 (ORCPT ); Mon, 13 Nov 2023 04:11:56 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6390810D2; Mon, 13 Nov 2023 01:11:53 -0800 (PST) Date: Mon, 13 Nov 2023 09:11:50 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1699866711; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XcsUChw2M9jrnCNychnCntai6zNpE2L9dML49RwHU9Q=; b=Jl8wvDG8oJBWAwKxQWyuiL0R9FVlHXmDEE2Yfb7SOGUDZC0/3EmJg5GkEIKJGs2iIh5wXl rFKpEbVf5cTp+CQhOJ/SPPIK5tG6uqnvQd1E5jiDmQ9HSGLbu0NcvB9vae034sA/g1ojHU UTs8CKBg643BAs7pdB7DV1vLAMLE1RVvskvFqo/eBbQ6KtW2S/z5Zw+k6XHh0Ie/nC+Uua vH0xHNvrtGexTnZgpr2lgj1vCSzQBa1fxgpAES14N0M1IAscZ1YF6qEMDHkXQaDGFGCL5L v6TtH/NtMBddBQ7EdJRmW6H6bYkM/QiAhsf4fDfrVzf+hbl6lsZgEcKGacfwuw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1699866711; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XcsUChw2M9jrnCNychnCntai6zNpE2L9dML49RwHU9Q=; b=FJcgaGE4nC5aTaBApdl6fzm4HfwPJ6SDQ+glUXmr4ejMX6jpUwK3ZnMf9VS/vHVh769xCF qiU2osDmeaMNYuCw== From: "tip-bot2 for Zhiquan Li" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: ras/core] x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel Cc: Youquan Song , Zhiquan Li , "Borislav Petkov (AMD)" , Naoya Horiguchi , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20231014051754.3759099-1-zhiquan1.li@intel.com> References: <20231014051754.3759099-1-zhiquan1.li@intel.com> MIME-Version: 1.0 Message-ID: <169986671058.3135.14638395012955463403.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 13 Nov 2023 01:12:23 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780551026432243825 X-GMAIL-MSGID: 1782439472606996954 The following commit has been merged into the ras/core branch of tip: Commit-ID: 9f3b130048bfa2e44a8cfb1b616f826d9d5d8188 Gitweb: https://git.kernel.org/tip/9f3b130048bfa2e44a8cfb1b616f826d9d5d8188 Author: Zhiquan Li AuthorDate: Thu, 26 Oct 2023 08:39:03 +08:00 Committer: Borislav Petkov (AMD) CommitterDate: Mon, 13 Nov 2023 09:53:15 +01:00 x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel Memory errors don't happen very often, especially fatal ones. However, in large-scale scenarios such as data centers, that probability increases with the amount of machines present. When a fatal machine check happens, mce_panic() is called based on the severity grading of that error. The page containing the error is not marked as poison. However, when kexec is enabled, tools like makedumpfile understand when pages are marked as poison and do not touch them so as not to cause a fatal machine check exception again while dumping the previous kernel's memory. Therefore, mark the page containing the error as poisoned so that the kexec'ed kernel can avoid accessing the page. [ bp: Rewrite commit message and comment. ] Co-developed-by: Youquan Song Signed-off-by: Youquan Song Signed-off-by: Zhiquan Li Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Naoya Horiguchi Link: https://lore.kernel.org/r/20231014051754.3759099-1-zhiquan1.li@intel.com --- arch/x86/kernel/cpu/mce/core.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 7b39737..df8d25e 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -233,6 +234,7 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) struct llist_node *pending; struct mce_evt_llist *l; int apei_err = 0; + struct page *p; /* * Allow instrumentation around external facilities usage. Not that it @@ -286,6 +288,20 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) if (!fake_panic) { if (panic_timeout == 0) panic_timeout = mca_cfg.panic_timeout; + + /* + * Kdump skips the poisoned page in order to avoid + * touching the error bits again. Poison the page even + * if the error is fatal and the machine is about to + * panic. + */ + if (kexec_crash_loaded()) { + if (final && (final->status & MCI_STATUS_ADDRV)) { + p = pfn_to_online_page(final->addr >> PAGE_SHIFT); + if (p) + SetPageHWPoison(p); + } + } panic(msg); } else pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);