From patchwork Sat Sep 16 13:03:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuai Xue X-Patchwork-Id: 14104 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp1647678vqi; Sat, 16 Sep 2023 06:05:03 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEQclnh0ctkK35trTfQ/AcAaNMlPSaq+CS7sgh6sQO2UExZQwAsg1ZJvLxgxGR5y5mTtezA X-Received: by 2002:a17:90b:23d7:b0:25b:c454:a366 with SMTP id md23-20020a17090b23d700b0025bc454a366mr4164347pjb.5.1694869503294; Sat, 16 Sep 2023 06:05:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694869503; cv=none; d=google.com; s=arc-20160816; b=av6opPYwv4xlqpd1t7EQJqprDeL5dU1rmCaPq0iIwpkqDi2d5RAvLEYn9WbCMVkdqd d4mccIPXb6F94ILMi2aNftxeKyrQWHlZf/+nFIIUT2ur5JHwOCCVUe+W5qOVnGH6MZ1p JQeCYsaVBa84120NKF5nvm3QagAF43MHMbUIdRPIuPdFggS6S/SxZXkcYGUy/7T9b1lk 6fLURDNJaWtzrJ2xnesu/sj6tjBF/nwSBsjJUi2XYyYjEnrCQp8Ic6yHLrlwzRqQZJPy yPQ+7VUNAVTnZOAGfs4WDblcI40iYN0B6OLmji7U0rxztnAJ0RcnwZS1DHNBXXNAloQT p+pA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=gONWExyblDsgymJs2/1dlmzQpL/w7gLX1tPT7ZleJ6k=; fh=FxxuQyV5zt5uOZlISi71Gibj4pONM7+jzC8ls2GxgWY=; b=YK1yP5UI0IWkg+apwz3pVU4a3D0CUIS3q++5b92tITzAHT4uy5novqeh+7B4x3RH/+ KsN8rXH0AkG4CMJSFCK8keslX2y6iu/2vIEcm5Xsj2noOm2gd9fuPkwZtSd+Pr2vFQqZ r9/nuph9YYEe0j9atq4YagkgrseZgqkhQaa25hiRIytdO3hEGZ6DPzZU7ehMIbsv/Jkp dKU8GAsdns/85V79oLpSGBIlJE7ZWpdIcSwEqmNmeSbXSp5/c0Tzp1IggueC1KtG+OOJ 3Qtlqs+bgrTcMrgZwUr//YZn9Y3ySnIa1UWIQ2ZIbLWX1NiPwXvcqsRoxoTETLHMw6Mc Q0Nw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id l6-20020a17090ac58600b0026f73c2056dsi4844779pjt.184.2023.09.16.06.05.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 Sep 2023 06:05:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id D8C9E8060020; Sat, 16 Sep 2023 06:04:59 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239172AbjIPNDu (ORCPT + 29 others); Sat, 16 Sep 2023 09:03:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55272 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229617AbjIPND3 (ORCPT ); Sat, 16 Sep 2023 09:03:29 -0400 Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 532ADDD; Sat, 16 Sep 2023 06:03:23 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R401e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045170;MF=xueshuai@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0VsA36sM_1694869397; Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0VsA36sM_1694869397) by smtp.aliyun-inc.com; Sat, 16 Sep 2023 21:03:20 +0800 From: Shuai Xue To: keescook@chromium.org, tony.luck@intel.com, gpiccoli@igalia.com, rafael@kernel.org, lenb@kernel.org, james.morse@arm.com, bp@alien8.de, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, ardb@kernel.org, robert.moore@intel.com Cc: linux-hardening@vger.kernel.org, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-efi@vger.kernel.org, acpica-devel@lists.linuxfoundation.org, xueshuai@linux.alibaba.com, baolin.wang@linux.alibaba.com Subject: [RFC PATCH 0/9] [RFC PATCH 0/9] Use ERST for persistent storage of MCE and APEI errors Date: Sat, 16 Sep 2023 21:03:07 +0800 Message-Id: <20230916130316.65815-1-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Sat, 16 Sep 2023 06:04:59 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777199484481435194 X-GMAIL-MSGID: 1777199484481435194 In certain scenarios (ie. hosts/guests with root filesystems on NFS/iSCSI where networking software and/or hardware fails, and thus kdump fails), it is necessary to serialize hardware error information available for post-mortem debugging. Save the hardware error log into flash via ERST before go panic, the hardware error log can be gotten from the flash after system boot successful again, which is very useful in production. On X86 platform, the kernel has supported to serialize and deserialize MCE error record by commit 482908b49ebf ("ACPI, APEI, Use ERST for persistent storage of MCE"). The process involves two steps: - MCE Producer: When a hardware error is detected, MCE raised and its handler writes MCE error record into flash via ERST before panic - MCE Consumor: After system reboot, /sbin/mcelog run, it reads /dev/mcelog to check flash for error record of previous boot via ERST After /dev/mcelog character device deprecated by commit 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver"), the serialized MCE error record, of previous boot in persistent storage is not collected via APEI ERST. This patch set include two part: - PATCH 1-3: rework apei_{read,write}_mce to use pstore data structure and emit the mce_record tracepoint, enabling the collection of MCE records by the rasdaemon tool. - PATCH 4-9: use ERST for persistent storage of APEI errors, and emit tracepoints for CPER sections, enabling the collection of MCE records by the rasdaemon tool. Shuai Xue (9): pstore: move pstore creator id, section type and record struct to common header ACPI: APEI: Use common ERST struct to read/write serialized MCE record ACPI: APEI: ERST: Emit the mce_record tracepoint ACPI: tables: change section_type of generic error data as guid_t ACPI: APEI: GHES: Use ERST to serialize APEI generic error before panic ACPI: APEI: GHES: export ghes_report_chain ACPI: APEI: ESRT: kick ghes_report_chain notifier to report serialized memory errors ACPI: APEI: ESRT: print AER to report serialized PCIe errors ACPI: APEI: ESRT: log ARM processor error arch/x86/kernel/cpu/mce/apei.c | 82 +++++++++++++++------------------- drivers/acpi/acpi_extlog.c | 2 +- drivers/acpi/apei/erst.c | 51 ++++++++++++--------- drivers/acpi/apei/ghes.c | 48 +++++++++++++++++++- drivers/firmware/efi/cper.c | 2 +- fs/pstore/platform.c | 3 ++ include/acpi/actbl1.h | 5 ++- include/acpi/ghes.h | 2 +- include/linux/pstore.h | 29 ++++++++++++ 9 files changed, 150 insertions(+), 74 deletions(-)