From patchwork Tue Dec 6 15:33:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuai Xue X-Patchwork-Id: 30387 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2891923wrr; Tue, 6 Dec 2022 07:37:36 -0800 (PST) X-Google-Smtp-Source: AA0mqf7PsnwuTqay1MFTJu7xYkg56OUmeI5YfHxNuYaqsVyJEb1dCA0sJgyKE9a1N+ae2VBDx/Jh X-Received: by 2002:a63:5910:0:b0:477:2793:f821 with SMTP id n16-20020a635910000000b004772793f821mr78711684pgb.150.1670341056635; Tue, 06 Dec 2022 07:37:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670341056; cv=none; d=google.com; s=arc-20160816; b=ft8CkVKe+UppjFFZhYhNuLrdihYVdT6VHVDTQ3rGIIJQ83Yc7LS6c5r9Ox6bgNdf/y kBdhxE0d6oCZj6bY8JhwhqOGTPE/QdnZAnOlt1zv38YJX0Nqr033BAs6C0VC39QW4uD5 TQJjTdwz1panjRkcopjOT1K3AG2yb7uhCbxO4QC2aimHSmIYbEKqaEjMUnHX9sUU/JZm ybXWcF2xZxoxzRqqa1co27UisqTXYgQroSSr5lsTNsRN/aOYW5e5YUHE9IXQ2RUflEtr 91k2cSkGzsbCWo1VhmQpGI7TnCeIknW/Izvae0yq01F+PdjlQMLOIyFGCafZGbBb8k0h YHTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=f4OILKG0K2S2TobE+EBzWIfTdwoEE/jdZcOt7HtlBQo=; b=l/DiMLGNYZ77U3VbttItYt7aNy/pCLy4EQ7HdUsgCSO93raen0w8Qo4kIhUy1o7z7H HdY1wmjmc31drintCfUoRo5J1otZXBKTWVEIXxc+8YhZM7GXC/MZNDnjHFXgr55eE6/g UvOylX+iBamgrjNZ/z3ILLSDAMPt8Z+e1BHv65ahfyPnbyZlkJGNhFjO5DKEZN1aBWMT NsW+Yc7HARlEvTnxQ0Ql6XOpTYw433ztNDZtatxiPzLx+J54Ovs/CGTyE5BE4d3QOamV wjM0ZrBxTIZ0StuxkkIoF95zJ513zzlzfewZMwlZN4SrzrkZLhhRkAwdFNeBk2T/bO2i QRMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j7-20020a632307000000b004773a919af4si17358742pgj.690.2022.12.06.07.37.21; Tue, 06 Dec 2022 07:37:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234932AbiLFPeP (ORCPT + 99 others); Tue, 6 Dec 2022 10:34:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232458AbiLFPeJ (ORCPT ); Tue, 6 Dec 2022 10:34:09 -0500 Received: from out30-7.freemail.mail.aliyun.com (out30-7.freemail.mail.aliyun.com [115.124.30.7]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5521E2AD4; Tue, 6 Dec 2022 07:34:04 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=xueshuai@linux.alibaba.com;NM=0;PH=DS;RN=16;SR=0;TI=SMTPD_---0VWhZY-6_1670340839; Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0VWhZY-6_1670340839) by smtp.aliyun-inc.com; Tue, 06 Dec 2022 23:34:00 +0800 From: Shuai Xue To: rafael@kernel.org, lenb@kernel.org, james.morse@arm.com, tony.luck@intel.com, bp@alien8.de, dave.hansen@linux.intel.com, jarkko@kernel.org, naoya.horiguchi@nec.com, linmiaohe@huawei.com, akpm@linux-foundation.org Cc: linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, cuibixuan@linux.alibaba.com, baolin.wang@linux.alibaba.com, zhuo.song@linux.alibaba.com, xueshuai@linux.alibaba.com Subject: [RFC PATCH 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Date: Tue, 6 Dec 2022 23:33:53 +0800 Message-Id: <20221206153354.92394-2-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221027042445.60108-1-xueshuai@linux.alibaba.com> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751479544010757659?= X-GMAIL-MSGID: =?utf-8?q?1751479544010757659?= There are two major types of uncorrected error (UC) : - Action Required: The error is detected and the processor already consumes the memory. OS requires to take action (for example, offline failure page/kill failure thread) to recover this uncorrectable error. - Action Optional: The error is detected out of processor execution context. Some data in the memory are corrupted. But the data have not been consumed. OS is optional to take action to recover this uncorrectable error. For X86 platforms, we can easily distinguish between these two types based on the MCA Bank. While for arm64 platform, the memory failure flags for all UCs which severity are GHES_SEV_RECOVERABLE are set as 0, a.k.a, Action Optional now. Set memory failure flags as MF_ACTION_REQUIRED on synchronous events. Signed-off-by: Shuai Xue --- drivers/acpi/apei/ghes.c | 2 +- include/linux/cper.h | 22 ++++++++++++++++++++++ 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 9952f3a792ba..a420759fce2d 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -475,7 +475,7 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED)) flags = MF_SOFT_OFFLINE; if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) - flags = 0; + flags = (gdata->flags & CPER_SEC_SYNC) ? MF_ACTION_REQUIRED : 0; if (flags != -1) return ghes_do_memory_failure(mem_err->physical_addr, flags); diff --git a/include/linux/cper.h b/include/linux/cper.h index eacb7dd7b3af..a3571fa8a73d 100644 --- a/include/linux/cper.h +++ b/include/linux/cper.h @@ -144,6 +144,28 @@ enum { * corrective action before the data is consumed */ #define CPER_SEC_LATENT_ERROR 0x0020 +/* + * If set, the section is to be associated with an error that has been + * propagated due to hardware poisoning. This implies the error is a symptom of + * another error. It is not always possible to ascertain whether this is the + * case for an error, therefore if the flag is not set, it is unknown whether + * the error was propagated. this helps determining FRU when dealing with HW + * failures + */ +#define CPER_SEC_PROPAGATED 0x0040 +/* + * If set this flag indicates the firmware has detected an overflow of + * buffers/queues that are used to accumulate, collect, or report errors (e.g. + * the error status control block exposed to the OS). When this occurs, some + * error records may be lost. + */ +#define CPER_SEC_OVERFLOW 0x0080 +/* + * If set, it indicates that this event record is synchronous(e.g. cpu core + * consumes poison data, then cause instruction/data abort); if not set, + * this event record is asynchronous. + */ +#define CPER_SEC_SYNC 0x00100 /* * Section type definitions, used in section_type field in struct