From patchwork Mon Oct 23 12:55:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: tip-bot2 for Thomas Gleixner X-Patchwork-Id: 156862 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp1274706vqx; Mon, 23 Oct 2023 05:56:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IECrFGvbZZ5Ped1z6T+tldUog7uTm+cUtJOXC5vvgcqO9gTl7pSJqKE803tLKDJf1dzluMg X-Received: by 2002:a17:902:f092:b0:1ca:86b:7ed9 with SMTP id p18-20020a170902f09200b001ca086b7ed9mr5147967pla.40.1698065782811; Mon, 23 Oct 2023 05:56:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698065782; cv=none; d=google.com; s=arc-20160816; b=kimNBD6CyhPhlAjCUCQdDBTo6vyb8DJHJIcSq0ezPtcNF8HiCu9ODuyClo+3AO9SQ5 mMxbi9YbvOXaaY5raOc6vpfIIc2VVg++r8tDMvYRP3rs0p3SEGuvPcH/8oa+U0n2ck/F PtQ2vH4oGL/f7ukS7wZII3lyzkwXB0ZPLmVA0lzaUdR4ydGLLxGox5OKBHltNBdb55c9 SBXNU7Dhx0RtIY2qyN112Pix/kWNJKN+Kaltpd2rsXSxnO/QfbH68LZHRnmdWAfCPP1n vLCd62PgVm0jj146BLl6RjAnGsBhSur2R/YZbqW/BA2go+7IWT16LpcA0xk8UGPjit8Q TNIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:references:in-reply-to:cc:subject :to:reply-to:sender:from:dkim-signature:dkim-signature:date; bh=YZEg4B8ZXuhFQLOdRKI34p1p/QHIZWG+7pUU2/5EiMU=; fh=hiTKwefEd832GltDW4MJ/Xzb4c4y2e/FuH05c8YgukA=; b=jrrD0RntKDG9kdV15OY/nqfyCKDS+dCbxOhTgpw9q+JKbZqRJOZMRYeyOzAaScL1jP 39vkWvVbJCdhxwCXfcXAnRn9tE+Og0qvClDPgUwpgGxNL689EPUILBaLkftjoMuEOynv /WgfQPLQIKD4gYgkOw8hLeOS5mif8EUUXfcZA0xIBW76f1DKiktXdROm9VPzlbumT5p4 ATvkWcaHN6mNtSr2MKxoEtDEvAO9LkyOr+BkMMeyg5SHCFsUYrTzgO2ucxVqwcfp+xg3 LRC9G3c+w0cSWCyUadf7aSHxynFSXCOgeHIvbYd6Q6+Y3zySc0Po8l7XEUTN56BJGU9A fWdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=HEjVNzW1; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id v10-20020a17090a898a00b002745c4e215fsi8699684pjn.175.2023.10.23.05.56.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 05:56:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=HEjVNzW1; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 51E298053C78; Mon, 23 Oct 2023 05:56:20 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230221AbjJWM4E (ORCPT + 27 others); Mon, 23 Oct 2023 08:56:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230108AbjJWM4C (ORCPT ); Mon, 23 Oct 2023 08:56:02 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8BA4D7C; Mon, 23 Oct 2023 05:55:57 -0700 (PDT) Date: Mon, 23 Oct 2023 12:55:54 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1698065755; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YZEg4B8ZXuhFQLOdRKI34p1p/QHIZWG+7pUU2/5EiMU=; b=HEjVNzW1zV1rgMga3FSLpp2/ZGqS7LRTfJgrdmPLx5E7oddsN74CCITdiZeWH3DyR56/UN RMOusumsYXqMqUD3lDcbxLqRhfwmcpQIVurrXMuGuPVU5AmsjRlCU8czlSjmD1scncoVDw rrdV9z3xL1fnY3HOtISDf7rKG7Ylx6DEqf0L0jIYRDq7u1DLKy7VE9aaOu8TzoZWuG/n7V Kl6+8N3gK0Cbkn0JvVunA0WDVSI7Kq4bZ2paHN28Thubt9c1fzVRlmS/0u1DuC7rWuaoJ3 ejYzqCbyNpXmmWmPDQB8z9/Bb16eJ86zN3muUvEjzOBips8Hlz98q0qe9ynEeA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1698065755; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YZEg4B8ZXuhFQLOdRKI34p1p/QHIZWG+7pUU2/5EiMU=; b=mSfJDKZKNoiuvWp2LvWNTUxV57wKGl9eFLT9W5nwzXsPYLqjxy/NJBU3wjWU/xWBVGtI/w 1aExUxCUvVYclVCg== From: "tip-bot2 for Zhiquan Li" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: ras/core] x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel Cc: Youquan Song , Zhiquan Li , "Borislav Petkov (AMD)" , Naoya Horiguchi , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20231014051754.3759099-1-zhiquan1.li@intel.com> References: <20231014051754.3759099-1-zhiquan1.li@intel.com> MIME-Version: 1.0 Message-ID: <169806575453.3135.14265994538914502918.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 23 Oct 2023 05:56:20 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780551026432243825 X-GMAIL-MSGID: 1780551026432243825 The following commit has been merged into the ras/core branch of tip: Commit-ID: 1d11b153d23b5fd131d4ea125ff23c9e8ebc98ab Gitweb: https://git.kernel.org/tip/1d11b153d23b5fd131d4ea125ff23c9e8ebc98ab Author: Zhiquan Li AuthorDate: Mon, 23 Oct 2023 12:22:37 +08:00 Committer: Borislav Petkov (AMD) CommitterDate: Mon, 23 Oct 2023 14:53:13 +02:00 x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel Memory errors don't happen very often, especially fatal ones. However, in large-scale scenarios such as data centers, that probability increases with the amount of machines present. When a fatal machine check happens, mce_panic() is called based on the severity grading of that error. The page containing the error is not marked as poison. However, when kexec is enabled, tools like makedumpfile understand when pages are marked as poison and do not touch them so as not to cause a fatal machine check exception again while dumping the previous kernel's memory. Therefore, mark the page containing the error as poisoned so that the kexec'ed kernel can avoid accessing the page. [ bp: Rewrite commit message and comment. ] Co-developed-by: Youquan Song Signed-off-by: Youquan Song Signed-off-by: Zhiquan Li Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Naoya Horiguchi Link: https://lore.kernel.org/r/20231014051754.3759099-1-zhiquan1.li@intel.com --- arch/x86/kernel/cpu/mce/core.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 0214d42..a25e692 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -233,6 +234,7 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) struct llist_node *pending; struct mce_evt_llist *l; int apei_err = 0; + struct page *p; /* * Allow instrumentation around external facilities usage. Not that it @@ -286,6 +288,18 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) if (!fake_panic) { if (panic_timeout == 0) panic_timeout = mca_cfg.panic_timeout; + + /* + * Kdump skips the poisoned page in order to avoid + * touching the error bits again. Poison the page even + * if the error is fatal and the machine is about to + * panic. + */ + if (kexec_crash_loaded()) { + p = pfn_to_online_page(final->addr >> PAGE_SHIFT); + if (final && (final->status & MCI_STATUS_ADDRV) && p) + SetPageHWPoison(p); + } panic(msg); } else pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);