Message ID | 20230214064509.3622044-3-quan@os.amperecomputing.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2803057wrn; Mon, 13 Feb 2023 22:50:26 -0800 (PST) X-Google-Smtp-Source: AK7set/2P8W8gTxsfAouUcJ8kfH634TKiNKtQ2cAEW17JwPx5632dFMHuSN8W6bHxEyW7VSjnw3f X-Received: by 2002:a17:907:a4e:b0:885:5682:7e52 with SMTP id be14-20020a1709070a4e00b0088556827e52mr2275619ejc.13.1676357425965; Mon, 13 Feb 2023 22:50:25 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1676357425; cv=pass; d=google.com; s=arc-20160816; b=Lw1uNd80+PBPg6SeWyTFNraS5F3GL9NH1rSW69Q3i3ymD3kmXLT5e6Vapr6uQZ1EFI toNbjolyujbRolV1nDOZqEZlUZRT1/XJMTcNaovqtuG6WQooz9dSpsBiowBVvo6ZVBgj F4qcp7Mw3Vy4qPs54vdzXalfhlDwfetJNuW9PFI8kWQJd+i6RvBz0MLq/npR4CkvVJGQ ke41EtuabL1axOqCK0dLD7ATAU4dKESJku8TOGxS+29Zo1aFd1Ylimjsa8XzrRam4cdA qLsBhyYkzsY2UfO4TjXx9mtC0QwaqAUQT6/3eiTtP91jrCM5ilfk9enooXpkdwjeX+80 tUUA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=AWAUChPo/u4uLIMKqytJT9gJ5W5sswUJY4XizDq0Y3Q=; b=jNhaPuODFg4oRMhey1r6F8hhhgMZRqGb5uZnuTimkhwd9u4r47E3fSQggiGg6e3ACr b9RMk/tSgHTQY77yT7n8A7JQRjlqrUOk3A5ilGmf6f7WhKxIJnZlo7qtgUjnlZcuXRG5 JgfeI215NP+vpPxhEFnkJZPUNutervAjmC3ylPDxgae9SBWIBcvxq2rw4vlc/rBO3iVO lKFX+nynWiA5w9ByreZCyb48pEKbes8zWRB5fAjKdXsp3QthLtdguUP3HwvQweky7e5x XHbDbFEh7joxehQr0FTpZHfBscLaRlbtmtdXyYlVI1rP6046mMvcC53uUcwtsp+pgsc5 F1gA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@os.amperecomputing.com header.s=selector2 header.b="JxD/d924"; arc=pass (i=1 spf=pass spfdomain=os.amperecomputing.com dkim=pass dkdomain=os.amperecomputing.com dmarc=pass fromdomain=os.amperecomputing.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amperecomputing.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mt31-20020a170907619f00b008b11939568dsi4526597ejc.365.2023.02.13.22.50.03; Mon, 13 Feb 2023 22:50:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@os.amperecomputing.com header.s=selector2 header.b="JxD/d924"; arc=pass (i=1 spf=pass spfdomain=os.amperecomputing.com dkim=pass dkdomain=os.amperecomputing.com dmarc=pass fromdomain=os.amperecomputing.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amperecomputing.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231683AbjBNGps (ORCPT <rfc822;tebrre53rla2o@gmail.com> + 99 others); Tue, 14 Feb 2023 01:45:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231190AbjBNGpp (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 14 Feb 2023 01:45:45 -0500 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2100.outbound.protection.outlook.com [40.107.237.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B23F1CAE0 for <linux-kernel@vger.kernel.org>; Mon, 13 Feb 2023 22:45:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TicLpZ1Y5XMSmnuLLDHh1oVnQsL8Wol9nJvGtEx/WoIBrk+mItC3xGkDfmUM0xvQ5DGs1vC5CJgadfpDrgwwSJ0quJsdCtN58+vzzkADPq3sF7IBaMBkb7U78VZ+MrlyBFTeRLjjWJl9ryNHLpcM3Ru658eRzb0tQoFmvP6cqr3BRNmFWuYUqnVOv8m3EYtnEylz6HEJ7jaGz4V38mM+Ph6gcO9VDamSpk5BtQGOh91Ab3i8TnnX5Qaj5P3BinEt77ueIDJgenDqFOsq8pdGxfb7MJJlqgKmKAtYak6jY+voxhj40/aVMJVTID/R0Ww0jnhpzyP80ktWt86OfQl94A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=AWAUChPo/u4uLIMKqytJT9gJ5W5sswUJY4XizDq0Y3Q=; b=Kmthv1hxloVFY2OowDrYc+sEbCbDqJhmB8sRPf17ucc8LA/Itj6SVl4GgY+pkGhTzpoQwEPY5MWCOnKYy/v9kflkDWNXi9VvxgdBUsR/G+xkr8IjE/Tt6kbOglCYciWLF1fyoXgVbjeD3EcXGEHjeFhOkihWwPrOnvVvBbBl+3FhmO/5NdZ75akxkEIgBzmduMR3xAuVKBZfsrcKBV8DGOcNWKgZvB49/pe3GAcYz2uvU4pQ1uPPMqPwjWa5egtQ88HVIxTRcvpRpEZCQCAbddWUPoSrfVK02Qz3BDliqYEDy08RDnKBJWbGtXKWaACU0aLkll6hRIUm68uaPSFilQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AWAUChPo/u4uLIMKqytJT9gJ5W5sswUJY4XizDq0Y3Q=; b=JxD/d924eLnbj1EjpOIAmvObOCEE1GiKlcUjVXuYlqA+i9XBExIhQgk3RTBo8xA36oadt4J+/o4WwHaDDbRmlok1BMbQeoRZ5S644bLnhCktGDL+d70D0isU+pl9Ks+tLajwKrn7+Wwsc4XA5cDpwTjYH84HOPa9ecWtSGsMbyA= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; Received: from SN4PR01MB7455.prod.exchangelabs.com (2603:10b6:806:202::11) by DM4PR01MB7738.prod.exchangelabs.com (2603:10b6:8:66::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.26; Tue, 14 Feb 2023 06:45:38 +0000 Received: from SN4PR01MB7455.prod.exchangelabs.com ([fe80::bf6:8038:9fe8:1588]) by SN4PR01MB7455.prod.exchangelabs.com ([fe80::bf6:8038:9fe8:1588%4]) with mapi id 15.20.6086.023; Tue, 14 Feb 2023 06:45:38 +0000 From: Quan Nguyen <quan@os.amperecomputing.com> To: Arnd Bergmann <arnd@arndb.de>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Joel Stanley <joel@jms.id.au>, Andrew Jeffery <andrew@aj.id.au>, linux-kernel@vger.kernel.org, openbmc@lists.ozlabs.org, Open Source Submission <patches@amperecomputing.com> Cc: Phong Vo <phong@os.amperecomputing.com>, Thang Nguyen <thang@os.amperecomputing.com>, Quan Nguyen <quan@os.amperecomputing.com> Subject: [PATCH 2/2] misc: smpro-errmon: Add dimm training failure syndrome Date: Tue, 14 Feb 2023 13:45:09 +0700 Message-Id: <20230214064509.3622044-3-quan@os.amperecomputing.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230214064509.3622044-1-quan@os.amperecomputing.com> References: <20230214064509.3622044-1-quan@os.amperecomputing.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: SI2PR01CA0023.apcprd01.prod.exchangelabs.com (2603:1096:4:192::17) To SN4PR01MB7455.prod.exchangelabs.com (2603:10b6:806:202::11) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN4PR01MB7455:EE_|DM4PR01MB7738:EE_ X-MS-Office365-Filtering-Correlation-Id: 0c2b7dc1-3261-40b9-ebfb-08db0e571508 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: G0meULAQiqfM9LcqRl1E3ZCDNGdyomJ5M26be3I1qKyStnebbgPlCrcCRtIBNDv9xNqedaWh7BqH9Jof1f0867l/KPmzB+Au2eZHy1+CTyT65WQC/MATW/0SxaWDRNNFHZzxXx2gL/zU2y8yrbZ8ReP4ISrE/tPmW50Ax7nBKSl8wog+9TcgX4JDuLU58XfLh7n40P0zsbe9LKLYKvQLP0s4h9cxe+2Lfl6Q55a/GIrusPtqM7+Iy8yX8EqhQxme4BHT+IstLVqTMdI5ZFAqUSl91/WebEAiO6xMgQIbKaLQB3cBH3axZJEg5/a/xvfbuqjcotMkd00Rj+3v5qNcGn0x9+H/Gj9VZIc/19AzAZ/J03BzZaS0mS/hMQZIxph2HYLNJN+aNVrQ1RlbjhdLT3eEw9cetOQb7XRG4ghv0V2XBuJsPdSSZQy9ERzqbSzE2AniiL/6aL3MNUuF4MTPCnvmrT9VvQzE+QiEcllNiHXb+lZmnrTOe8OjVF3Y9W0Nzcdgp3cZIP60diXAL2bmqexmNfSUzMe0tkFC5nQdrneDL0l2xAj+NPe8BLdZiiJSPvlCvj9LBLWec7Ow5zDYV9etmRgz0qjMm1yP9gCuq/s4WxTcZnqwNxtYCmEgt7LDFBMUUNs3vxeaCP8/ntQoRMvNNAjGz25r8bg8K8svViTWyB53C5hnXUYpvHZx8fKhExLOLt6qXuNmZCbcVxvUgg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SN4PR01MB7455.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(13230025)(4636009)(366004)(136003)(346002)(396003)(376002)(39850400004)(451199018)(2906002)(8936002)(83380400001)(5660300002)(41300700001)(478600001)(6666004)(107886003)(6486002)(1076003)(26005)(186003)(66946007)(52116002)(6512007)(2616005)(6506007)(38100700002)(38350700002)(86362001)(4326008)(8676002)(66476007)(66556008)(54906003)(110136005)(316002);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: HpXTjnsuOKLkt93t3uK/hNBD4MPAZvrQDL6PirhdfVZ7Lx0pXsr52eSc5hBw4VvxBXsXh2D69/G05dS3zyI6h26a+SSwYpZSqhImTDXetEhzhuFNYaZEiDp/vSGrMNQj1sIxXbBqOKUCjDzld0GFLnIzb68XN1p9xza/J8W0F8U/z3hRDpgg+DXmeQpqmE4cfZnQBS9bQc22yVO66e2+Lfk/L3Vm6XBa92E2ApafE8QQTLiUs6XGHMWZ4JhxjQw++/3oW+FRLoYHb2yd95DwyRAVC45fg3tXvb2zfB6ITnubiCEhcoa/+0i05NUKLldPrFSEVIQ9b/+ao2G6NFRzjDaolgsJemHOXtB78hVNYa0Xg6COM3XWIEaqWqFDVbt9fLuU1ymM7DVJvSljtAGm7aaHNvHhBP9XsvG7sXq0NXPSHt3QKYDDzh3ogLBGalBn42bLiJQ0CvI78GNRZwEu7J6AOgkRpRbqsNeoDXwO40f4ZAC0AYsYzvuCbxjVkmp1dK6C6MoFGaoWFIGGA/w5uhJu0XyxPZzSvD1ODG7Kzv/VyFdQKLqbTZMPY7aVRYVquAHQMbEmOelTIJ7GUEr35D07tCbao73blJo+UuDQiM/n9cukdqj01E9dLS3UyWz0Rkzj9ruENKSY8pGXyKJxGJFNuE89amNwALN+VBCAQON3ZbvsJWVyiftxNDbu02IidOMrzLwABM/iFoVE6B2Lw+34pgVOvXFe2xQLWT00/69s1THvg54oS4tLb8RE4P8u8ry7dvdTw6FDA8X9aXzNYhgifs9qptB99ZMGEeryhmOzg2nTzZ9cyAgTQUONgims07JEP6ovdHbEU87p98M4OfQsRiBKdJTjl4KMF4XFZhy5bxWtWoICI5kJxOdZ7JhTZfiiQtaf214qhAdjM549BNDF4NpBiStT8EopkVfs5a+fHOYN+IcdCNXIKU1tvp4y+C3OJhXeXD4+nKRwdBkLnblCrYw2HSywHVrwqiw9oFEsyHccapZ8paJj3nKAj2P5MK0PKz0R4+8tIdJSz8S+6C9vtBKfFmn0YyEBjOn3v11uxpBb972Sdz0l5F2vJV0FKljnwkE4lZNDdK4CkBtl5W+PByL8xQK+TsG0GdC4mBOPNiRQ8i6hA0y/B0v/oT6QIV93IHpiOysJExaN9SS+9fWaAR0uFacc10Qanp0Ks1D5806vU65uvwJL25MV3CcrZutAq8QUU3XlRVGHJIxduedyCQ5Lo6wysThoXmlFoUAOeqwJPgtIdKjal+OHAq31rSWOze1UcAI7HIeJodg5JMEmG2nGgRBnaTEKqOsCwKtGOcJCDPoXPGXck7JV9hZoVnFsa6AZH7fw5iX35zHBa1uhua+u0qF0FiQrd07ACdBrtt20+gMKAQjO6oPAZ6W6jiOBxB6/nSxzTri9/xvFtMQwedUpSM3+yxkecWPeSoLJWZgqOehbEBuIjl+DO2tR3iAqI0xxIhIb65FMpl92EFVZL2eU3POHvF4TqfaxbvHCVmqwvKaqIp6tSl+ufIaV2F78imS+R+RQmWds1ufrMhGcK4E+1q9FowiWNxEPhlsO5jHS69gDIcOh3bKyj7VyJU+GPdZ12yHuziR7es8fI+kVqj7s2sEAOKc+JbNJu6g= X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-Network-Message-Id: 0c2b7dc1-3261-40b9-ebfb-08db0e571508 X-MS-Exchange-CrossTenant-AuthSource: SN4PR01MB7455.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Feb 2023 06:45:38.2762 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: Eo0RfXqXuxXyr34Heigbm8FfdqfX+pcsKnMrKnk+yQlXakxvsDBUX/eUql0omiM8BmQ7438lca3cvOrCKLu1NEEiuljogpCCWPgn1tEo/38= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR01MB7738 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757788164438286563?= X-GMAIL-MSGID: =?utf-8?q?1757788164438286563?= |
Series |
Add DIMM 2x refresh event and failure syndrome
|
|
Commit Message
Quan Nguyen
Feb. 14, 2023, 6:45 a.m. UTC
Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome
to BMC when DIMM training failed.
Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
---
.../sysfs-bus-platform-devices-ampere-smpro | 10 +++
drivers/misc/smpro-errmon.c | 77 +++++++++++++++++++
2 files changed, 87 insertions(+)
Comments
Dear Quan, Thank you for your patch. Am 14.02.23 um 07:45 schrieb Quan Nguyen: > Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome > to BMC when DIMM training failed. Where you able to verify that it works? Out of curiosity, how? > Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com> > --- > .../sysfs-bus-platform-devices-ampere-smpro | 10 +++ > drivers/misc/smpro-errmon.c | 77 +++++++++++++++++++ > 2 files changed, 87 insertions(+) > > diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro > index d4e3f308c451..c35f1d45e656 100644 > --- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro > +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro > @@ -265,6 +265,16 @@ Description: > For more details, see section `5.7 GPI Status Registers and 5.9 Memory Error Register Definitions, > Altra Family Soc BMC Interface Specification`. > > +What: /sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome > +KernelVersion: 6.1 Should it be 6.2, as it probably won’t make it into 6.1? > +Contact: Quan Nguyen <quan@os.amperecomputing.com> > +Description: > + (RO) The sysfs returns the 2-byte DIMM failure syndrome data for slot > + 0-15 if it failed to initialized. to initialize > + > + For more details, see section `5.11 Boot Stage Register Definitions, > + Altra Family Soc BMC Interface Specification`. > + > What: /sys/bus/platform/devices/smpro-misc.*/boot_progress > KernelVersion: 6.1 > Contact: Quan Nguyen <quan@os.amperecomputing.com> > diff --git a/drivers/misc/smpro-errmon.c b/drivers/misc/smpro-errmon.c > index 1635e881aefb..3e8570cbb740 100644 > --- a/drivers/misc/smpro-errmon.c > +++ b/drivers/misc/smpro-errmon.c > @@ -47,6 +47,12 @@ > #define WARN_PMPRO_INFO_LO 0xAC > #define WARN_PMPRO_INFO_HI 0xAD > > +/* Boot Stage Register */ > +#define BOOTSTAGE 0xB0 > +#define DIMM_SYNDROME_SEL 0xB4 > +#define DIMM_SYNDROME_ERR 0xB5 > +#define DIMM_SYNDROME_STAGE 4 > + > /* PCIE Error Registers */ > #define PCIE_CE_ERR_CNT 0xC0 > #define PCIE_CE_ERR_LEN 0xC1 > @@ -468,6 +474,61 @@ EVENT_RO(vrd_hot, VRD_HOT_EVENT); > EVENT_RO(dimm_hot, DIMM_HOT_EVENT); > EVENT_RO(dimm_2x_refresh, DIMM_2X_REFRESH_EVENT); > > +static ssize_t smpro_dimm_syndrome_read(struct device *dev, struct device_attribute *da, > + char *buf, int slot) Could `slot` be passed as `unsigned int`? > +{ > + struct smpro_errmon *errmon = dev_get_drvdata(dev); > + s32 data; > + int ret; > + > + ret = regmap_read(errmon->regmap, BOOTSTAGE, &data); The function signature is: int regmap_read(struct regmap *map, unsigned int reg, unsigned int *val) So why not use unsigned int as data type for `data`? > + if (ret) > + return ret; > + > + /* check for valid stage */ > + data = (data >> 8) & 0xff; > + if (data != DIMM_SYNDROME_STAGE) > + return ret; Isn’t now success returned? Should a debug message be printed? > + > + /* Write the slot ID to retrieve Error Syndrome */ > + ret = regmap_write(errmon->regmap, DIMM_SYNDROME_SEL, slot); > + if (ret) > + return ret; > + > + /* Read the Syndrome error */ > + ret = regmap_read(errmon->regmap, DIMM_SYNDROME_ERR, &data); > + if (ret || !data) > + return ret; > + > + return sysfs_emit(buf, "%04x\n", data); > +} > + > +#define EVENT_DIMM_SYNDROME(_slot) \ > + static ssize_t event_dimm##_slot##_syndrome_show(struct device *dev, \ > + struct device_attribute *da, \ > + char *buf) \ > + { \ > + return smpro_dimm_syndrome_read(dev, da, buf, _slot); \ > + } \ > + static DEVICE_ATTR_RO(event_dimm##_slot##_syndrome) > + > +EVENT_DIMM_SYNDROME(0); > +EVENT_DIMM_SYNDROME(1); > +EVENT_DIMM_SYNDROME(2); > +EVENT_DIMM_SYNDROME(3); > +EVENT_DIMM_SYNDROME(4); > +EVENT_DIMM_SYNDROME(5); > +EVENT_DIMM_SYNDROME(6); > +EVENT_DIMM_SYNDROME(7); > +EVENT_DIMM_SYNDROME(8); > +EVENT_DIMM_SYNDROME(9); > +EVENT_DIMM_SYNDROME(10); > +EVENT_DIMM_SYNDROME(11); > +EVENT_DIMM_SYNDROME(12); > +EVENT_DIMM_SYNDROME(13); > +EVENT_DIMM_SYNDROME(14); > +EVENT_DIMM_SYNDROME(15); > + > static struct attribute *smpro_errmon_attrs[] = { > &dev_attr_overflow_core_ce.attr, > &dev_attr_overflow_core_ue.attr, > @@ -493,6 +554,22 @@ static struct attribute *smpro_errmon_attrs[] = { > &dev_attr_event_vrd_hot.attr, > &dev_attr_event_dimm_hot.attr, > &dev_attr_event_dimm_2x_refresh.attr, > + &dev_attr_event_dimm0_syndrome.attr, > + &dev_attr_event_dimm1_syndrome.attr, > + &dev_attr_event_dimm2_syndrome.attr, > + &dev_attr_event_dimm3_syndrome.attr, > + &dev_attr_event_dimm4_syndrome.attr, > + &dev_attr_event_dimm5_syndrome.attr, > + &dev_attr_event_dimm6_syndrome.attr, > + &dev_attr_event_dimm7_syndrome.attr, > + &dev_attr_event_dimm8_syndrome.attr, > + &dev_attr_event_dimm9_syndrome.attr, > + &dev_attr_event_dimm10_syndrome.attr, > + &dev_attr_event_dimm11_syndrome.attr, > + &dev_attr_event_dimm12_syndrome.attr, > + &dev_attr_event_dimm13_syndrome.attr, > + &dev_attr_event_dimm14_syndrome.attr, > + &dev_attr_event_dimm15_syndrome.attr, > NULL > }; Kind regards, Paul
On 15/02/2023 14:33, Paul Menzel wrote: > Dear Quan, > > > Thank you for your patch. > Thanks Paul for the review. > Am 14.02.23 um 07:45 schrieb Quan Nguyen: >> Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome >> to BMC when DIMM training failed. > > Where you able to verify that it works? Out of curiosity, how? > Yes, we verified it by injecting DIMM errors and confirm that errors was reported correctly via sysfs. For about how to do error injection, we may need to refer to section 3.2 Memory Error Group in Altra Family RAS Error Injection User Manual. It is shared in our Ampere Customer Connect [1]. The latest version is v1.00_20220329. [1] https://connect.amperecomputing.com >> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com> >> --- >> .../sysfs-bus-platform-devices-ampere-smpro | 10 +++ >> drivers/misc/smpro-errmon.c | 77 +++++++++++++++++++ >> 2 files changed, 87 insertions(+) >> >> diff --git >> a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro >> b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro >> index d4e3f308c451..c35f1d45e656 100644 >> --- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro >> +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro >> @@ -265,6 +265,16 @@ Description: >> For more details, see section `5.7 GPI Status Registers and >> 5.9 Memory Error Register Definitions, >> Altra Family Soc BMC Interface Specification`. >> +What: >> /sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome >> +KernelVersion: 6.1 > > Should it be 6.2, as it probably won’t make it into 6.1? > Thanks for the catch. Will fix in next version. >> +Contact: Quan Nguyen <quan@os.amperecomputing.com> >> +Description: >> + (RO) The sysfs returns the 2-byte DIMM failure syndrome data >> for slot >> + 0-15 if it failed to initialized. > > to initialize > Will fix in next version. >> + >> + For more details, see section `5.11 Boot Stage Register >> Definitions, >> + Altra Family Soc BMC Interface Specification`. >> + >> What: /sys/bus/platform/devices/smpro-misc.*/boot_progress >> KernelVersion: 6.1 >> Contact: Quan Nguyen <quan@os.amperecomputing.com> >> diff --git a/drivers/misc/smpro-errmon.c b/drivers/misc/smpro-errmon.c >> index 1635e881aefb..3e8570cbb740 100644 >> --- a/drivers/misc/smpro-errmon.c >> +++ b/drivers/misc/smpro-errmon.c >> @@ -47,6 +47,12 @@ >> #define WARN_PMPRO_INFO_LO 0xAC >> #define WARN_PMPRO_INFO_HI 0xAD >> +/* Boot Stage Register */ >> +#define BOOTSTAGE 0xB0 >> +#define DIMM_SYNDROME_SEL 0xB4 >> +#define DIMM_SYNDROME_ERR 0xB5 >> +#define DIMM_SYNDROME_STAGE 4 >> + >> /* PCIE Error Registers */ >> #define PCIE_CE_ERR_CNT 0xC0 >> #define PCIE_CE_ERR_LEN 0xC1 >> @@ -468,6 +474,61 @@ EVENT_RO(vrd_hot, VRD_HOT_EVENT); >> EVENT_RO(dimm_hot, DIMM_HOT_EVENT); >> EVENT_RO(dimm_2x_refresh, DIMM_2X_REFRESH_EVENT); >> +static ssize_t smpro_dimm_syndrome_read(struct device *dev, struct >> device_attribute *da, >> + char *buf, int slot) > > Could `slot` be passed as `unsigned int`? > Yes, I will update in next version. Indeed, the "slot" will be passed as the last param to regmap_write() which is in unsigned int as below. int regmap_write(struct regmap *map, unsigned int reg, unsigned int val); So, change to "unsigned int" is corrected. >> +{ >> + struct smpro_errmon *errmon = dev_get_drvdata(dev); >> + s32 data; >> + int ret; >> + >> + ret = regmap_read(errmon->regmap, BOOTSTAGE, &data); > > The function signature is: > > int regmap_read(struct regmap *map, unsigned int reg, unsigned int > *val) > > So why not use unsigned int as data type for `data`? > Agree, you comment is correct. Will change data to "unsigned int" in new version. >> + if (ret) >> + return ret; >> + >> + /* check for valid stage */ >> + data = (data >> 8) & 0xff; >> + if (data != DIMM_SYNDROME_STAGE) >> + return ret; > > Isn’t now success returned? Should a debug message be printed? > Yes, this is a success case, there is no dimm training error and has nothing to return in sysfs. If no dimm training error, host will never stay in DIMM_SYNDROME_STAGE. I dont think printing a debug message in this case is good because printing out something like "No dimm traning error" is not necessary as I think. >> + >> + /* Write the slot ID to retrieve Error Syndrome */ >> + ret = regmap_write(errmon->regmap, DIMM_SYNDROME_SEL, slot); >> + if (ret) >> + return ret; >> + >> + /* Read the Syndrome error */ >> + ret = regmap_read(errmon->regmap, DIMM_SYNDROME_ERR, &data); >> + if (ret || !data) >> + return ret; >> + >> + return sysfs_emit(buf, "%04x\n", data); >> +} >> + >> +#define EVENT_DIMM_SYNDROME(_slot) \ >> + static ssize_t event_dimm##_slot##_syndrome_show(struct device >> *dev, \ >> + struct device_attribute *da, \ >> + char *buf) \ >> + >> { \ >> + return smpro_dimm_syndrome_read(dev, da, buf, >> _slot); \ >> + >> } \ >> + static DEVICE_ATTR_RO(event_dimm##_slot##_syndrome) >> + >> +EVENT_DIMM_SYNDROME(0); >> +EVENT_DIMM_SYNDROME(1); >> +EVENT_DIMM_SYNDROME(2); >> +EVENT_DIMM_SYNDROME(3); >> +EVENT_DIMM_SYNDROME(4); >> +EVENT_DIMM_SYNDROME(5); >> +EVENT_DIMM_SYNDROME(6); >> +EVENT_DIMM_SYNDROME(7); >> +EVENT_DIMM_SYNDROME(8); >> +EVENT_DIMM_SYNDROME(9); >> +EVENT_DIMM_SYNDROME(10); >> +EVENT_DIMM_SYNDROME(11); >> +EVENT_DIMM_SYNDROME(12); >> +EVENT_DIMM_SYNDROME(13); >> +EVENT_DIMM_SYNDROME(14); >> +EVENT_DIMM_SYNDROME(15); >> + >> static struct attribute *smpro_errmon_attrs[] = { >> &dev_attr_overflow_core_ce.attr, >> &dev_attr_overflow_core_ue.attr, >> @@ -493,6 +554,22 @@ static struct attribute *smpro_errmon_attrs[] = { >> &dev_attr_event_vrd_hot.attr, >> &dev_attr_event_dimm_hot.attr, >> &dev_attr_event_dimm_2x_refresh.attr, >> + &dev_attr_event_dimm0_syndrome.attr, >> + &dev_attr_event_dimm1_syndrome.attr, >> + &dev_attr_event_dimm2_syndrome.attr, >> + &dev_attr_event_dimm3_syndrome.attr, >> + &dev_attr_event_dimm4_syndrome.attr, >> + &dev_attr_event_dimm5_syndrome.attr, >> + &dev_attr_event_dimm6_syndrome.attr, >> + &dev_attr_event_dimm7_syndrome.attr, >> + &dev_attr_event_dimm8_syndrome.attr, >> + &dev_attr_event_dimm9_syndrome.attr, >> + &dev_attr_event_dimm10_syndrome.attr, >> + &dev_attr_event_dimm11_syndrome.attr, >> + &dev_attr_event_dimm12_syndrome.attr, >> + &dev_attr_event_dimm13_syndrome.attr, >> + &dev_attr_event_dimm14_syndrome.attr, >> + &dev_attr_event_dimm15_syndrome.attr, >> NULL >> }; > > > Kind regards, > > Paul
On Thu, Feb 16, 2023 at 10:22:14AM +0700, Quan Nguyen wrote: > > > On 15/02/2023 14:33, Paul Menzel wrote: > > Dear Quan, > > > > > > Thank you for your patch. > > > > Thanks Paul for the review. > > > Am 14.02.23 um 07:45 schrieb Quan Nguyen: > > > Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome > > > to BMC when DIMM training failed. > > > > Where you able to verify that it works? Out of curiosity, how? > > > > Yes, we verified it by injecting DIMM errors and confirm that errors was > reported correctly via sysfs. > For about how to do error injection, we may need to refer to section 3.2 > Memory Error Group in Altra Family RAS Error Injection User Manual. It is > shared in our Ampere Customer Connect [1]. The latest version is > v1.00_20220329. > > [1] https://connect.amperecomputing.com > > > > Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com> > > > --- > > > .../sysfs-bus-platform-devices-ampere-smpro | 10 +++ > > > drivers/misc/smpro-errmon.c | 77 +++++++++++++++++++ > > > 2 files changed, 87 insertions(+) > > > > > > diff --git > > > a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro > > > b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro > > > index d4e3f308c451..c35f1d45e656 100644 > > > --- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro > > > +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro > > > @@ -265,6 +265,16 @@ Description: > > > For more details, see section `5.7 GPI Status Registers > > > and 5.9 Memory Error Register Definitions, > > > Altra Family Soc BMC Interface Specification`. > > > +What: > > > /sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome > > > +KernelVersion: 6.1 > > > > Should it be 6.2, as it probably won’t make it into 6.1? > > > > Thanks for the catch. Will fix in next version. Should be 6.3, it's missed the 6.2 merge window cycle, sorry. thanks, greg k-h
On 16/02/2023 19:30, Greg Kroah-Hartman wrote: > On Thu, Feb 16, 2023 at 10:22:14AM +0700, Quan Nguyen wrote: >> >> >> On 15/02/2023 14:33, Paul Menzel wrote: >>> Dear Quan, >>> >>> >>> Thank you for your patch. >>> >> >> Thanks Paul for the review. >> >>> Am 14.02.23 um 07:45 schrieb Quan Nguyen: >>>> Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome >>>> to BMC when DIMM training failed. >>> >>> Where you able to verify that it works? Out of curiosity, how? >>> >> >> Yes, we verified it by injecting DIMM errors and confirm that errors was >> reported correctly via sysfs. >> For about how to do error injection, we may need to refer to section 3.2 >> Memory Error Group in Altra Family RAS Error Injection User Manual. It is >> shared in our Ampere Customer Connect [1]. The latest version is >> v1.00_20220329. >> >> [1] https://connect.amperecomputing.com >> >>>> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com> >>>> --- >>>> .../sysfs-bus-platform-devices-ampere-smpro | 10 +++ >>>> drivers/misc/smpro-errmon.c | 77 +++++++++++++++++++ >>>> 2 files changed, 87 insertions(+) >>>> >>>> diff --git >>>> a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro >>>> b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro >>>> index d4e3f308c451..c35f1d45e656 100644 >>>> --- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro >>>> +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro >>>> @@ -265,6 +265,16 @@ Description: >>>> For more details, see section `5.7 GPI Status Registers >>>> and 5.9 Memory Error Register Definitions, >>>> Altra Family Soc BMC Interface Specification`. >>>> +What: >>>> /sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome >>>> +KernelVersion: 6.1 >>> >>> Should it be 6.2, as it probably won’t make it into 6.1? >>> >> >> Thanks for the catch. Will fix in next version. > > Should be 6.3, it's missed the 6.2 merge window cycle, sorry. > > thanks, > Thanks Greg, Will update to 6.3
diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro index d4e3f308c451..c35f1d45e656 100644 --- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro @@ -265,6 +265,16 @@ Description: For more details, see section `5.7 GPI Status Registers and 5.9 Memory Error Register Definitions, Altra Family Soc BMC Interface Specification`. +What: /sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome +KernelVersion: 6.1 +Contact: Quan Nguyen <quan@os.amperecomputing.com> +Description: + (RO) The sysfs returns the 2-byte DIMM failure syndrome data for slot + 0-15 if it failed to initialized. + + For more details, see section `5.11 Boot Stage Register Definitions, + Altra Family Soc BMC Interface Specification`. + What: /sys/bus/platform/devices/smpro-misc.*/boot_progress KernelVersion: 6.1 Contact: Quan Nguyen <quan@os.amperecomputing.com> diff --git a/drivers/misc/smpro-errmon.c b/drivers/misc/smpro-errmon.c index 1635e881aefb..3e8570cbb740 100644 --- a/drivers/misc/smpro-errmon.c +++ b/drivers/misc/smpro-errmon.c @@ -47,6 +47,12 @@ #define WARN_PMPRO_INFO_LO 0xAC #define WARN_PMPRO_INFO_HI 0xAD +/* Boot Stage Register */ +#define BOOTSTAGE 0xB0 +#define DIMM_SYNDROME_SEL 0xB4 +#define DIMM_SYNDROME_ERR 0xB5 +#define DIMM_SYNDROME_STAGE 4 + /* PCIE Error Registers */ #define PCIE_CE_ERR_CNT 0xC0 #define PCIE_CE_ERR_LEN 0xC1 @@ -468,6 +474,61 @@ EVENT_RO(vrd_hot, VRD_HOT_EVENT); EVENT_RO(dimm_hot, DIMM_HOT_EVENT); EVENT_RO(dimm_2x_refresh, DIMM_2X_REFRESH_EVENT); +static ssize_t smpro_dimm_syndrome_read(struct device *dev, struct device_attribute *da, + char *buf, int slot) +{ + struct smpro_errmon *errmon = dev_get_drvdata(dev); + s32 data; + int ret; + + ret = regmap_read(errmon->regmap, BOOTSTAGE, &data); + if (ret) + return ret; + + /* check for valid stage */ + data = (data >> 8) & 0xff; + if (data != DIMM_SYNDROME_STAGE) + return ret; + + /* Write the slot ID to retrieve Error Syndrome */ + ret = regmap_write(errmon->regmap, DIMM_SYNDROME_SEL, slot); + if (ret) + return ret; + + /* Read the Syndrome error */ + ret = regmap_read(errmon->regmap, DIMM_SYNDROME_ERR, &data); + if (ret || !data) + return ret; + + return sysfs_emit(buf, "%04x\n", data); +} + +#define EVENT_DIMM_SYNDROME(_slot) \ + static ssize_t event_dimm##_slot##_syndrome_show(struct device *dev, \ + struct device_attribute *da, \ + char *buf) \ + { \ + return smpro_dimm_syndrome_read(dev, da, buf, _slot); \ + } \ + static DEVICE_ATTR_RO(event_dimm##_slot##_syndrome) + +EVENT_DIMM_SYNDROME(0); +EVENT_DIMM_SYNDROME(1); +EVENT_DIMM_SYNDROME(2); +EVENT_DIMM_SYNDROME(3); +EVENT_DIMM_SYNDROME(4); +EVENT_DIMM_SYNDROME(5); +EVENT_DIMM_SYNDROME(6); +EVENT_DIMM_SYNDROME(7); +EVENT_DIMM_SYNDROME(8); +EVENT_DIMM_SYNDROME(9); +EVENT_DIMM_SYNDROME(10); +EVENT_DIMM_SYNDROME(11); +EVENT_DIMM_SYNDROME(12); +EVENT_DIMM_SYNDROME(13); +EVENT_DIMM_SYNDROME(14); +EVENT_DIMM_SYNDROME(15); + static struct attribute *smpro_errmon_attrs[] = { &dev_attr_overflow_core_ce.attr, &dev_attr_overflow_core_ue.attr, @@ -493,6 +554,22 @@ static struct attribute *smpro_errmon_attrs[] = { &dev_attr_event_vrd_hot.attr, &dev_attr_event_dimm_hot.attr, &dev_attr_event_dimm_2x_refresh.attr, + &dev_attr_event_dimm0_syndrome.attr, + &dev_attr_event_dimm1_syndrome.attr, + &dev_attr_event_dimm2_syndrome.attr, + &dev_attr_event_dimm3_syndrome.attr, + &dev_attr_event_dimm4_syndrome.attr, + &dev_attr_event_dimm5_syndrome.attr, + &dev_attr_event_dimm6_syndrome.attr, + &dev_attr_event_dimm7_syndrome.attr, + &dev_attr_event_dimm8_syndrome.attr, + &dev_attr_event_dimm9_syndrome.attr, + &dev_attr_event_dimm10_syndrome.attr, + &dev_attr_event_dimm11_syndrome.attr, + &dev_attr_event_dimm12_syndrome.attr, + &dev_attr_event_dimm13_syndrome.attr, + &dev_attr_event_dimm14_syndrome.attr, + &dev_attr_event_dimm15_syndrome.attr, NULL };