Message ID | 20230731185103.18436-1-mario.limonciello@amd.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2219906vqg; Mon, 31 Jul 2023 12:24:19 -0700 (PDT) X-Google-Smtp-Source: APBJJlEp3/yGeAiHDpsRYCjBuyKOkxKQ2zGjAbUcTKj7eajExqPaL3YlFmkCDHEEBXPVy8iiCQ+Y X-Received: by 2002:a17:90b:4f46:b0:25e:ad19:5f46 with SMTP id pj6-20020a17090b4f4600b0025ead195f46mr9966880pjb.12.1690831459628; Mon, 31 Jul 2023 12:24:19 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1690831459; cv=pass; d=google.com; s=arc-20160816; b=cjQ7s4AaO+A2h4ORe7WsOMA/1gn4lI2x6jvx7p5yXbMQqaqE1JoM9W9zhkVlKaGdV5 eV9Zw7wcE0hO37pLjKGQ5S/yzvjXI73WRi27Bb6YVo8EhPn7botTs/smKLjKOnjnZlET +SOMD0G8nYi+LXY1e6EZO2byhP6BXz2SYMI6ZbSNeM5BqxSLUNQbVhK/s/HoxP0u0YHp n+T7J53T8Vgh5wHNMJ7xIpaR5wbwDO5L/tf2b0tdQSLKyplsDtZbrpaqV/hqyk/w3DUF j6p/oh8/6iF81o2SC4gV3IbC0roprt5LyFHkN0AdKRHpTEjjx740zTmy7XDhSOZfwSsn pBdQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=CYgDZ5EGiEGzhLj7qJy1EWjkjXCeQLYU5vdMyEi+I6Q=; fh=/99wYlrpbBgSdaGCkW5evD64S81skT1i1Nl2elrZ0VQ=; b=VVpneuTwO1+BJcsyohLY3lMQNYaZ7BipJZqSTAwd7ORPnhWdoK7r0xGmsqdCUs6oN1 iJxv4qVYxcp/nBKpA3RdYhJHS/X01DSbzKCiJtBn+e/KJ8ZPyCFK2WlyXm3AXOkHF8DZ wHcuFPFrNjYQOMIpjxS7LQ1UmZtCugdwr5Gh9cLWWqRv8zV1lMSBS4p0GJXQxaUPZtRf oPNWKfqGwoO6In5Y3lLP051yxqEfkBz5xlyVWihYG8yhesW9oHVhOmE9XwXqxLV80zW4 FrTsThVI9O/PinCsynlch0vrZ3Dom3g6eXI/lueFe/QzPfb84+AnJg83ml0T42x+Zx5f 7T/g== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b="MhlfREo/"; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l11-20020a17090a72cb00b00263b9233a57si5405396pjk.64.2023.07.31.12.24.06; Mon, 31 Jul 2023 12:24:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b="MhlfREo/"; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229782AbjGaSvb (ORCPT <rfc822;dengxinlin2429@gmail.com> + 99 others); Mon, 31 Jul 2023 14:51:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229452AbjGaSva (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 31 Jul 2023 14:51:30 -0400 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2050.outbound.protection.outlook.com [40.107.220.50]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C850139 for <linux-kernel@vger.kernel.org>; Mon, 31 Jul 2023 11:51:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VG8rf0n+dDcFRTK6oiAMrBwsuho6UHW9okfNN2GUnzDxy89SZxeEvdnYUCaG4QmpZJBYtWUVhZGsyTw3xfpVOAFpP7yRpVDA99YwgSg25n6MQ4JHdRMj7gMLtFBTQQe//t1IJDDD2JY7uQe59LBSdlou+AYw1ywiV71AE+ioXuOgY7hG2HkSgLq7NJy01IpY0LbIOYxEg+IBaBFZ84JM6O89JLQMB3GdzrNr4zIHN+frpSLeCDdCsy/i98NZ+Czbcs+/AHMerN3+lCO95UBnuT+6iqzPKgfERpLOjHtGRzQm/nNiIYmupfEvmTn5AGOJnj4KFHgbxUQcBRCem4GThQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CYgDZ5EGiEGzhLj7qJy1EWjkjXCeQLYU5vdMyEi+I6Q=; b=JQphs4QklVal61g9NefOAVsueyOrUYCyaV8s08YfdFBwS1N3cyTZF4fc20VCJowOxIKrHMxkMhBfNdEiQKAF9eFblT59oKWTR4XtUmPnESziAwT9CrZBL/4TL6lvB8VsjtArwCdcfcucdXPIk9u7PwuD9Xsg2EgOO6eD3QqYCEIMXIKjJJ/f0Ywu+ocqx3BbdDn7bCefnrC89JDysnf530021nLkjITiqirPqGIKPVnd2iVXWCXsdZVhsgSTDOaE/YNBKM18rXR+SCJNug7h9IiQ8rk8skb8uFsEagc8IhHctei58BVWZvWsSsYztY7V/yEyGSpxfuSPqKY4cXBcRA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CYgDZ5EGiEGzhLj7qJy1EWjkjXCeQLYU5vdMyEi+I6Q=; b=MhlfREo/s5fl0f/q2W84mWZvFsarGFCexl8/SpZzInFNZ6QNM5xPPSRCPPpdWoj52FOW+QLoAI9Fnn9QHUdFrxwHpOsbU+X86G/+9zdGauxU3mvju3dCEW6coZH7MnY4srvIlPX8Ucj8uNtPilHkMwMf/9ba80zmFrvoJRpZQYo= Received: from DM6PR14CA0057.namprd14.prod.outlook.com (2603:10b6:5:18f::34) by MW3PR12MB4505.namprd12.prod.outlook.com (2603:10b6:303:5a::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6631.43; Mon, 31 Jul 2023 18:51:25 +0000 Received: from DM6NAM11FT097.eop-nam11.prod.protection.outlook.com (2603:10b6:5:18f:cafe::e3) by DM6PR14CA0057.outlook.office365.com (2603:10b6:5:18f::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6631.43 via Frontend Transport; Mon, 31 Jul 2023 18:51:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT097.mail.protection.outlook.com (10.13.172.72) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6631.42 via Frontend Transport; Mon, 31 Jul 2023 18:51:25 +0000 Received: from SITE-L-T34-2.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Mon, 31 Jul 2023 13:51:24 -0500 From: Mario Limonciello <mario.limonciello@amd.com> To: <kbusch@kernel.org>, <axboe@fb.com>, <hch@lst.de>, <sagi@grimberg.me> CC: <linux-nvme@lists.infradead.org>, <linux-kernel@vger.kernel.org>, <nilskruse97@gmail.com>, <git@augustwikerfors.se>, <David.Chang@amd.com>, Mario Limonciello <mario.limonciello@amd.com> Subject: [PATCH] nvme: Don't fail to resume if NSIDs change Date: Mon, 31 Jul 2023 13:51:03 -0500 Message-ID: <20230731185103.18436-1-mario.limonciello@amd.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT097:EE_|MW3PR12MB4505:EE_ X-MS-Office365-Filtering-Correlation-Id: 21d11281-6ad9-49b1-2d01-08db91f7244f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: AD5J/SmsOLHTrri96mm4wzrns+F20g63RBtZthFWlCpvapsupIQTBrzgyYWhJI84fzRjQDvpJCgyibx59s6eabWRJEVZTXgCPFewK+mXkzNsIeykQlN1w3M7Io6xbwzG4rrzzY0yZuh4M2fzB/IKATdgFpcMNqksSaetuChycviUZo2ib48lfPLP22uQ9PhOxrxeNPl7zKU/rU63SW40PvQooz1XNsU7yC1Vwct+6UkiOy4U1+GT8jyh/hqbQZvfbRlc5fsJhhm3BebSYnXwbfNoh5xN8uuwahErIaFP35o+89x9aQRJU6aRHxvVZpiWHUgeBNOc65MVkI/9vH5qpeHSayoRcDCREyO7R92z0NHofbkugh8KVTW7T/nVZPQ1VMMMu7/TfF8HQBuoOxIy9tUautsVNN3JA+3DLhHcX9cYFM/eTTn+h1utfCe9VHKdBMlEAx1sOy44pTX6zgqek0+eWqZ2VG+hsobeV827YHWRop0wL4cOxsjaHzV37MZ5qIlwfIQEnCReWC1fQNyGTqPcn1DhJ8Z2DJn85QeXxdujSSkvBAO640fuFObN0mbyBXlVAw0DAFjFUppDNqul8AvR+nEbpSaDBYVj/uLUqETyInHfLZbjo6R70MkJIwl7fGmu77+cPYRD9om37cwiFNmRBnK7n4muZnwiNtq8OPcNjT+IB0UwgVQSIkfBVe5Xt7VJ4ERTZXAagnr1pHj6mbgwgdElxFh2MPDMDsDCKzHxFZX/d/KOyE0ENTIS9B8fCbF1/+VXYQL2QwIW281jvGCiNK5ReSUyMFiw7Q6iKCY= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230028)(4636009)(396003)(136003)(39860400002)(376002)(346002)(451199021)(82310400008)(36840700001)(40470700004)(46966006)(2616005)(5660300002)(426003)(8936002)(336012)(16526019)(8676002)(1076003)(316002)(36860700001)(83380400001)(26005)(47076005)(44832011)(186003)(478600001)(70206006)(70586007)(4326008)(110136005)(54906003)(7696005)(6666004)(966005)(86362001)(41300700001)(40480700001)(356005)(40460700003)(36756003)(2906002)(81166007)(82740400003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Jul 2023 18:51:25.4394 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 21d11281-6ad9-49b1-2d01-08db91f7244f X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT097.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW3PR12MB4505 X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772965288445395184 X-GMAIL-MSGID: 1772965288445395184 |
Series |
nvme: Don't fail to resume if NSIDs change
|
|
Commit Message
Mario Limonciello
July 31, 2023, 6:51 p.m. UTC
Samsung PM9B1 has problems after resume because NSID has changed.
This has been reported in the past on OEM varities of PM9B1 parts
and fixed by firmware updates on 'some' of those parts.
However this same issue also happens on 'retail' PM9B1 parts which
Samsung has not released firmware updates for.
As the check has been relaxed at startup for multiple disks with
duplicate NSIDs with commit ac522fc6c3165 ("nvme: don't reject
probe due to duplicate IDs for single-ported PCIe devices") also
relax the check that runs on resume for NSIDs and mark them bogus
if this occurs on resume.
Fixes: 1d5df6af8c74 ("nvme: don't blindly overwrite identifiers on disk revalidate")
Cc: stable@vger.kernel.org # 6.1+
Cc: Nils Kruse <nilskruse97@gmail.com>
Cc: August Wikerfors <git@augustwikerfors.se>
Cc: David Chang <David.Chang@amd.com>
Link: https://github.com/tomsom/yoga-linux/issues/9
Link: https://lore.kernel.org/linux-nvme/b99a5149-c3d6-2a9b-1298-576a1b4b22c1@gmail.com/
Link: https://lore.kernel.org/all/20221116171727.4083-1-git@augustwikerfors.se/t/
Link: https://lore.kernel.org/all/d0ce0f3b-9407-9207-73a4-3536f0948653@augustwikerfors.se/
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
drivers/nvme/host/core.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
Comments
On Mon, Jul 31, 2023 at 01:51:03PM -0500, Mario Limonciello wrote: > Samsung PM9B1 has problems after resume because NSID has changed. > This has been reported in the past on OEM varities of PM9B1 parts > and fixed by firmware updates on 'some' of those parts. > > However this same issue also happens on 'retail' PM9B1 parts which > Samsung has not released firmware updates for. > > As the check has been relaxed at startup for multiple disks with > duplicate NSIDs with commit ac522fc6c3165 ("nvme: don't reject > probe due to duplicate IDs for single-ported PCIe devices") also > relax the check that runs on resume for NSIDs and mark them bogus > if this occurs on resume. How could the driver tell the difference between the device needing a quirk compared to a rapid delete-create-attach namespace sequence? Proceeding with the namespace now may get dirty writes intended for the previous namespace, corrupting the new one. The commit you mentioned tries to constrain allowing duplication where we can reasonably assume the quirk is needed. If we need to do similiar for this condition, one possible constraint might be that the device doesn't report OACS bit 3 (Namespace Management).
On 2023-07-31 20:51, Mario Limonciello wrote: > Samsung PM9B1 has problems after resume because NSID has changed. > This has been reported in the past on OEM varities of PM9B1 parts > and fixed by firmware updates on 'some' of those parts. > > However this same issue also happens on 'retail' PM9B1 parts which > Samsung has not released firmware updates for. > > As the check has been relaxed at startup for multiple disks with > duplicate NSIDs with commit ac522fc6c3165 ("nvme: don't reject > probe due to duplicate IDs for single-ported PCIe devices") also > relax the check that runs on resume for NSIDs and mark them bogus > if this occurs on resume. > > Fixes: 1d5df6af8c74 ("nvme: don't blindly overwrite identifiers on disk revalidate") > Cc: stable@vger.kernel.org # 6.1+ > Cc: Nils Kruse <nilskruse97@gmail.com> > Cc: August Wikerfors <git@augustwikerfors.se> > Cc: David Chang <David.Chang@amd.com> > Link: https://github.com/tomsom/yoga-linux/issues/9 > Link: https://lore.kernel.org/linux-nvme/b99a5149-c3d6-2a9b-1298-576a1b4b22c1@gmail.com/ > Link: https://lore.kernel.org/all/20221116171727.4083-1-git@augustwikerfors.se/t/ > Link: https://lore.kernel.org/all/d0ce0f3b-9407-9207-73a4-3536f0948653@augustwikerfors.se/ > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: August Wikerfors <git@augustwikerfors.se> Thanks!
On Mon, Jul 31, 2023 at 01:10:11PM -0600, Keith Busch wrote: > > As the check has been relaxed at startup for multiple disks with > > duplicate NSIDs with commit ac522fc6c3165 ("nvme: don't reject > > probe due to duplicate IDs for single-ported PCIe devices") also > > relax the check that runs on resume for NSIDs and mark them bogus > > if this occurs on resume. > > How could the driver tell the difference between the device needing a > quirk compared to a rapid delete-create-attach namespace sequence? > Proceeding with the namespace now may get dirty writes intended for the > previous namespace, corrupting the new one. > > The commit you mentioned tries to constrain allowing duplication where > we can reasonably assume the quirk is needed. If we need to do similiar > for this condition, one possible constraint might be that the device > doesn't report OACS bit 3 (Namespace Management). Yes, this patch as-is looks really dangerous. I don't think we should just ignore the fact that IDs change when queried again.
On 2023-07-31 21:10, Keith Busch wrote: > On Mon, Jul 31, 2023 at 01:51:03PM -0500, Mario Limonciello wrote: >> Samsung PM9B1 has problems after resume because NSID has changed. >> This has been reported in the past on OEM varities of PM9B1 parts >> and fixed by firmware updates on 'some' of those parts. >> >> However this same issue also happens on 'retail' PM9B1 parts which >> Samsung has not released firmware updates for. >> >> As the check has been relaxed at startup for multiple disks with >> duplicate NSIDs with commit ac522fc6c3165 ("nvme: don't reject >> probe due to duplicate IDs for single-ported PCIe devices") also >> relax the check that runs on resume for NSIDs and mark them bogus >> if this occurs on resume. > > How could the driver tell the difference between the device needing a > quirk compared to a rapid delete-create-attach namespace sequence? > Proceeding with the namespace now may get dirty writes intended for the > previous namespace, corrupting the new one. > > The commit you mentioned tries to constrain allowing duplication where > we can reasonably assume the quirk is needed. If we need to do similiar > for this condition, one possible constraint might be that the device > doesn't report OACS bit 3 (Namespace Management). It looks like that would work for the PM9B1: > $ sudo nvme id-ctrl -H /dev/nvme0 > [...] > oacs : 0x17 > [10:10] : 0 Lockdown Command and Feature Not Supported > [9:9] : 0 Get LBA Status Capability Not Supported > [8:8] : 0 Doorbell Buffer Config Not Supported > [7:7] : 0 Virtualization Management Not Supported > [6:6] : 0 NVMe-MI Send and Receive Not Supported > [5:5] : 0 Directives Not Supported > [4:4] : 0x1 Device Self-test Supported > [3:3] : 0 NS Management and Attachment Not Supported > [2:2] : 0x1 FW Commit and Download Supported > [1:1] : 0x1 Format NVM Supported > [0:0] : 0x1 Security Send and Receive Supported Regards, August Wikerfors
On 7/31/2023 2:54 PM, August Wikerfors wrote: > On 2023-07-31 21:10, Keith Busch wrote: >> On Mon, Jul 31, 2023 at 01:51:03PM -0500, Mario Limonciello wrote: >>> Samsung PM9B1 has problems after resume because NSID has changed. >>> This has been reported in the past on OEM varities of PM9B1 parts >>> and fixed by firmware updates on 'some' of those parts. >>> >>> However this same issue also happens on 'retail' PM9B1 parts which >>> Samsung has not released firmware updates for. >>> >>> As the check has been relaxed at startup for multiple disks with >>> duplicate NSIDs with commit ac522fc6c3165 ("nvme: don't reject >>> probe due to duplicate IDs for single-ported PCIe devices") also >>> relax the check that runs on resume for NSIDs and mark them bogus >>> if this occurs on resume. >> >> How could the driver tell the difference between the device needing a >> quirk compared to a rapid delete-create-attach namespace sequence? >> Proceeding with the namespace now may get dirty writes intended for the >> previous namespace, corrupting the new one. >> >> The commit you mentioned tries to constrain allowing duplication where >> we can reasonably assume the quirk is needed. If we need to do similiar >> for this condition, one possible constraint might be that the device >> doesn't report OACS bit 3 (Namespace Management). > > It looks like that would work for the PM9B1: >> $ sudo nvme id-ctrl -H /dev/nvme0 >> [...] > oacs : 0x17 >> [10:10] : 0 Lockdown Command and Feature Not Supported >> [9:9] : 0 Get LBA Status Capability Not Supported >> [8:8] : 0 Doorbell Buffer Config Not Supported >> [7:7] : 0 Virtualization Management Not Supported >> [6:6] : 0 NVMe-MI Send and Receive Not Supported >> [5:5] : 0 Directives Not Supported >> [4:4] : 0x1 Device Self-test Supported >> [3:3] : 0 NS Management and Attachment Not Supported >> [2:2] : 0x1 FW Commit and Download Supported >> [1:1] : 0x1 Format NVM Supported >> [0:0] : 0x1 Security Send and Receive Supported > > Regards, > August Wikerfors So is it reasonable to just add a check for ctrl->oacs & NVME_CTRL_OACS_NS_MNGT_SUPP In the same error handling path as this patch?
On Mon, Jul 31, 2023 at 03:09:08PM -0500, Limonciello, Mario wrote: > So is it reasonable to just add a check for > > ctrl->oacs & NVME_CTRL_OACS_NS_MNGT_SUPP > > In the same error handling path as this patch? No. There are tons of NVMe devices that only support creating and deleting namespace out of band, especially in virtualized and cloud setups.
On 7/31/2023 3:10 PM, Christoph Hellwig wrote: > On Mon, Jul 31, 2023 at 03:09:08PM -0500, Limonciello, Mario wrote: >> So is it reasonable to just add a check for >> >> ctrl->oacs & NVME_CTRL_OACS_NS_MNGT_SUPP >> >> In the same error handling path as this patch? > > No. There are tons of NVMe devices that only support creating and > deleting namespace out of band, especially in virtualized and cloud > setups. Even if it's only the error handling path only that it's checked? If you don't want more changes or heuristics on the error handling path for this case, I think the best solution is probably to pick up https://lore.kernel.org/all/20221116171727.4083-1-git@augustwikerfors.se/t/ instead then and hopefully we don't end up with more disks like this.
On Mon, Jul 31, 2023 at 03:14:54PM -0500, Limonciello, Mario wrote: >> No. There are tons of NVMe devices that only support creating and >> deleting namespace out of band, especially in virtualized and cloud >> setups. > > Even if it's only the error handling path only that it's checked? Do you mean nvme_validate_ns with the error code? I wouldn't really call that an error case, that's the function called to check namespaces are still the same after we did a rescan (either manually or triggered by the AEN). > If you don't want more changes or heuristics on the error handling path for > this case, I think the best solution is probably to pick up > > https://lore.kernel.org/all/20221116171727.4083-1-git@augustwikerfors.se/t/ > > instead then and hopefully we don't end up with more disks like this. That's probably the better idea. I know at least one of the early quirked devices also IDs that changed for subsequent identify calls.
On 8/1/23 06:24, Christoph Hellwig wrote: > On Mon, Jul 31, 2023 at 03:14:54PM -0500, Limonciello, Mario wrote: >>> No. There are tons of NVMe devices that only support creating and >>> deleting namespace out of band, especially in virtualized and cloud >>> setups. >> >> Even if it's only the error handling path only that it's checked? > > Do you mean nvme_validate_ns with the error code? I wouldn't really call > that an error case, that's the function called to check namespaces are > still the same after we did a rescan (either manually or triggered by the > AEN). > >> If you don't want more changes or heuristics on the error handling path for >> this case, I think the best solution is probably to pick up >> >> https://lore.kernel.org/all/20221116171727.4083-1-git@augustwikerfors.se/t/ >> >> instead then and hopefully we don't end up with more disks like this. > > That's probably the better idea. I know at least one of the early > quirked devices also IDs that changed for subsequent identify calls. Do you want that re-sent? Or can you just pick up from that lore link?
On Tue, Aug 01, 2023 at 06:30:52AM -0500, Mario Limonciello wrote: > > Do you want that re-sent? Or can you just pick up from that lore link? I got it, and applied to nvme-6.5 now.
On 8/1/2023 15:29, Keith Busch wrote: > On Tue, Aug 01, 2023 at 06:30:52AM -0500, Mario Limonciello wrote: >> >> Do you want that re-sent? Or can you just pick up from that lore link? > > I got it, and applied to nvme-6.5 now. Thanks! If you can still change it before sending out can you add a stable tag as well?
On 2023-08-01 22:34, Mario Limonciello wrote: > If you can still change it before sending out can you add a stable tag > as well? This didn't get added in time, so, stable team, please backport: 688b419c57c1 ("nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and 512G") Regards, August Wikerfors
On Fri, Aug 11, 2023 at 10:19:35PM +0200, August Wikerfors wrote: > On 2023-08-01 22:34, Mario Limonciello wrote: > > If you can still change it before sending out can you add a stable tag > > as well? > > This didn't get added in time, so, stable team, please backport: > > 688b419c57c1 ("nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and 512G") Perhaps bad form on my end for relying on it, but in my experience, the stable bot has a great record on auto selecting nvme quirks.
On Fri, Aug 11, 2023 at 10:19:35PM +0200, August Wikerfors wrote: > On 2023-08-01 22:34, Mario Limonciello wrote: > > If you can still change it before sending out can you add a stable tag > > as well? > > This didn't get added in time, so, stable team, please backport: > > 688b419c57c1 ("nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and 512G") Now queued up to 6.4.y and 6.1.y, thanks. greg k-h
On Fri, Aug 11, 2023 at 02:59:33PM -0600, Keith Busch wrote: > On Fri, Aug 11, 2023 at 10:19:35PM +0200, August Wikerfors wrote: > > On 2023-08-01 22:34, Mario Limonciello wrote: > > > If you can still change it before sending out can you add a stable tag > > > as well? > > > > This didn't get added in time, so, stable team, please backport: > > > > 688b419c57c1 ("nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and 512G") > > Perhaps bad form on my end for relying on it, but in my experience, the > stable bot has a great record on auto selecting nvme quirks. It's better for everyone if you mark it for stable, so you know it will get reviewed, otherwise you are at the mercy of our scripts and free time to dig for patches :) thanks, greg k-h
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 37b6fa7466620..fc85b4cd11fa2 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3423,6 +3423,16 @@ static int nvme_global_check_duplicate_ids(struct nvme_subsystem *this, return ret; } +static void nvme_mark_nid_bogus(struct nvme_ns *ns, struct nvme_ns_info *info) +{ + dev_warn(ns->ctrl->device, + "use of /dev/disk/by-id/ may cause data corruption\n"); + memset(&info->ids.nguid, 0, sizeof(info->ids.nguid)); + memset(&info->ids.uuid, 0, sizeof(info->ids.uuid)); + memset(&info->ids.eui64, 0, sizeof(info->ids.eui64)); + ns->ctrl->quirks |= NVME_QUIRK_BOGUS_NID; +} + static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info) { struct nvme_ctrl *ctrl = ns->ctrl; @@ -3459,12 +3469,7 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info) dev_err(ctrl->device, "clearing duplicate IDs for nsid %d\n", info->nsid); - dev_err(ctrl->device, - "use of /dev/disk/by-id/ may cause data corruption\n"); - memset(&info->ids.nguid, 0, sizeof(info->ids.nguid)); - memset(&info->ids.uuid, 0, sizeof(info->ids.uuid)); - memset(&info->ids.eui64, 0, sizeof(info->ids.eui64)); - ctrl->quirks |= NVME_QUIRK_BOGUS_NID; + nvme_mark_nid_bogus(ns, info); } mutex_lock(&ctrl->subsys->lock); @@ -3706,14 +3711,14 @@ static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info) { int ret = NVME_SC_INVALID_NS | NVME_SC_DNR; - if (!nvme_ns_ids_equal(&ns->head->ids, &info->ids)) { + if (!nvme_ns_ids_equal(&ns->head->ids, &info->ids) && + !(ns->ctrl->quirks & NVME_QUIRK_BOGUS_NID)) { dev_err(ns->ctrl->device, "identifiers changed for nsid %d\n", ns->head->ns_id); - goto out; + nvme_mark_nid_bogus(ns, info); } ret = nvme_update_ns_info(ns, info); -out: /* * Only remove the namespace if we got a fatal error back from the * device, otherwise ignore the error and just move on.