From patchwork Thu Oct 27 04:24:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuai Xue X-Patchwork-Id: 11541 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp23401wru; Wed, 26 Oct 2022 21:29:33 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6eRIISk1Yhnti20BaGVjI/rroSnoglEjStFP5iTZkYlk+KBmgDB9oGUVtt3ZT7LiFx9MUt X-Received: by 2002:a17:907:25c9:b0:77b:a343:bd62 with SMTP id ae9-20020a17090725c900b0077ba343bd62mr41121590ejc.660.1666844973460; Wed, 26 Oct 2022 21:29:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666844973; cv=none; d=google.com; s=arc-20160816; b=V2DNRg5hIaXA3DCQMs2pRlXq5Dxvj0uDk71GN3MyoqnxmA2S2z37TmoZxkPzPSvGyK MynxGjDn7ypFr0SdK4IkQx74wIEQeOXBQUD/5BqNb4W2gtITE5wBK87/51giHVA36u+2 DFQERE65LB1IXw9SRj4HiGScHCvK36rn93P+hStSKZi1gekUWVp76ocQxbcOg0Esaj8B ZZWwXvaoBYxhL39VJpP61TNhg4Sz59tFg/wP8MIrHrPBi7QiKT7LgYz73ay5Lusm2Pda SiK04407oeJLfCRBaSpgMKAIX239ESe3AczZbR/6sVgRyqjyvsmLmjIQf4TwdvLN8O9x 26rA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=cRTopnn/I4PebiG4ATIB0zuti517mulJj5Q+GYfa14g=; b=a4iCbcPmiGa/L0N9n+aEra+E3MTfuLYojhjzjk3eis2q0JEXo5pvI6olQakLeR2uIT Y7QFoEMEw9U+sDHQhoQMht+zpB6aaUDtwMZ9DjaNNjHdOQl0kQ1qAclekCRoZUALvdFf u1GVegMUYYF4pGOQuzJLgWeuuDGqID4x1OxSv3Cu7XtpWG7g5WKdJyDwQY3/8cSbHQe+ +u/52fJh41TpiiDiNd38xxVD+v+ZEqCgsN8v14ycdtWLVKUbBXI2YIRR7UjRMfXkoU2j z+3pvBBwTApKVcnLU1mDGgSS1ahewk01wRUz9uXtbWRAkNVELP+F8xrsJSwWoXJ4xBOb n7BA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hs35-20020a1709073ea300b00791a2a7e578si285608ejc.641.2022.10.26.21.29.09; Wed, 26 Oct 2022 21:29:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233809AbiJ0EZi (ORCPT + 99 others); Thu, 27 Oct 2022 00:25:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229691AbiJ0EZg (ORCPT ); Thu, 27 Oct 2022 00:25:36 -0400 Received: from out30-43.freemail.mail.aliyun.com (out30-43.freemail.mail.aliyun.com [115.124.30.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C5D132B9D; Wed, 26 Oct 2022 21:25:00 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=xueshuai@linux.alibaba.com;NM=1;PH=DS;RN=17;SR=0;TI=SMTPD_---0VT9cxpK_1666844688; Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0VT9cxpK_1666844688) by smtp.aliyun-inc.com; Thu, 27 Oct 2022 12:24:51 +0800 From: Shuai Xue To: rafael@kernel.org, lenb@kernel.org, james.morse@arm.com, tony.luck@intel.com, bp@alien8.de, dave.hansen@linux.intel.com, jarkko@kernel.org, naoya.horiguchi@nec.com, linmiaohe@huawei.com, akpm@linux-foundation.org Cc: stable@vger.kernel.org, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, cuibixuan@linux.alibaba.com, baolin.wang@linux.alibaba.com, zhuo.song@linux.alibaba.com, xueshuai@linux.alibaba.com Subject: [PATCH] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on action required events Date: Thu, 27 Oct 2022 12:24:45 +0800 Message-Id: <20221027042445.60108-1-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747813634971242749?= X-GMAIL-MSGID: =?utf-8?q?1747813634971242749?= There are two major types of uncorrected error (UC) : - Action Required: The error is detected and the processor already consumes the memory. OS requires to take action (for example, offline failure page/kill failure thread) to recover this uncorrectable error. - Action Optional: The error is detected out of processor execution context. Some data in the memory are corrupted. But the data have not been consumed. OS is optional to take action to recover this uncorrectable error. For X86 platforms, we can easily distinguish between these two types based on the MCA Bank. While for arm64 platform, the memory failure flags for all UCs which severity are GHES_SEV_RECOVERABLE are set as 0, a.k.a, Action Optional now. If UC is detected by a background scrubber, it is obviously an Action Optional error. For other errors, we should conservatively regard them as Action Required. cper_sec_mem_err::error_type identifies the type of error that occurred if CPER_MEM_VALID_ERROR_TYPE is set. So, set memory failure flags as 0 for Scrub Uncorrected Error (type 14). Otherwise, set memory failure flags as MF_ACTION_REQUIRED. Signed-off-by: Shuai Xue --- drivers/acpi/apei/ghes.c | 10 ++++++++-- include/linux/cper.h | 3 +++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 80ad530583c9..6c03059cbfc6 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -474,8 +474,14 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, if (sec_sev == GHES_SEV_CORRECTED && (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED)) flags = MF_SOFT_OFFLINE; - if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) - flags = 0; + if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) { + if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE) + flags = mem_err->error_type == CPER_MEM_SCRUB_UC ? + 0 : + MF_ACTION_REQUIRED; + else + flags = MF_ACTION_REQUIRED; + } if (flags != -1) return ghes_do_memory_failure(mem_err->physical_addr, flags); diff --git a/include/linux/cper.h b/include/linux/cper.h index eacb7dd7b3af..b77ab7636614 100644 --- a/include/linux/cper.h +++ b/include/linux/cper.h @@ -235,6 +235,9 @@ enum { #define CPER_MEM_VALID_BANK_ADDRESS 0x100000 #define CPER_MEM_VALID_CHIP_ID 0x200000 +#define CPER_MEM_SCRUB_CE 13 +#define CPER_MEM_SCRUB_UC 14 + #define CPER_MEM_EXT_ROW_MASK 0x3 #define CPER_MEM_EXT_ROW_SHIFT 16