Message ID | 20230418210526.36514-3-Smita.KoralahalliChannabasappa@amd.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp3135298vqo; Tue, 18 Apr 2023 14:23:39 -0700 (PDT) X-Google-Smtp-Source: AKy350ZCwjk0su4Sy+tAsc6m9HRnzcAKCIbBNTc4oQ3bBjcD4aFKAdUkTrEYcEHpjqlN1Vfv7JHg X-Received: by 2002:a05:6a00:1255:b0:637:aea0:b23d with SMTP id u21-20020a056a00125500b00637aea0b23dmr1340829pfi.10.1681853019129; Tue, 18 Apr 2023 14:23:39 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1681853019; cv=pass; d=google.com; s=arc-20160816; b=WC2P3EtXjZwkssZGskbyXJyyNtHiNf60AprzFIPFqFgGQr8y+vxSmXyQrgzQ7HljP3 wG4dlTvgDHUnR/6aGWiB79eWAv1dL3ZrcG9a0sbgS3AqmDQqi9mWyZe4llFVzNOkMbaB /PUsbiNiBBYUwXI2rohJwEc0D8Wn7oljCZ0v1njXTEjnj/IYDIIJDJhrWwb0tr3iXlIK KEWZzxgZ22wwfk4uUqDPsEbu0RHPKGmVMSJdWtBYMF+He9u0yzUCW0KYviRStHEvrhp6 Ki18mJKZ+b39nG4qqo9mMRNi6nJWQezR5JhlElHLwQA/pXulwxu96ulnbwYeFY2wUqKq gm0Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=QtePaqksZc8XoaO/o8KDI+VRfVhOhKrJ0skPy4eEXPQ=; b=hf52bTNlHquIg4XA6ZeQKLXWyowqXZjcqLJI6pCerVTRSoZfrAxG2WHlS7OYbDykeR CNFKlrNXzlAYC8eUSbrgfk2VsLLAqiViyVQAmVgFIdDDrGpswbUYRG6j3J9h74Z4aElg 5vIRVBbzThi5GQH4AWxjRKiKuxtEIp6k1ugaFwGFPguhgBcXZQtjuvD7j3yOjsM2spvl u7rVJ8FG+4M2jkc1uC1maYrGKof5G7LUzeVxslMeZe32sTlqBkcnRkNCv6coMRNuMxRL cmePv5fbmLKDadsgBOBkrufLkDz2VoDUgvTIL6ScLk1UAykI8R0urEWwjeEbtz5Nj752 IGHA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=tcdbBRjj; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w129-20020a623087000000b005a0c57eac92si10512770pfw.232.2023.04.18.14.23.20; Tue, 18 Apr 2023 14:23:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=tcdbBRjj; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232854AbjDRVFx (ORCPT <rfc822;peter110.wang@gmail.com> + 99 others); Tue, 18 Apr 2023 17:05:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58038 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232834AbjDRVFs (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 18 Apr 2023 17:05:48 -0400 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2041.outbound.protection.outlook.com [40.107.236.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 409179EC6; Tue, 18 Apr 2023 14:05:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MOfmHt8dJFdnfmnRJHbjIqdGr15IrOZOZG30neTl3rAwd8jqvpi6Ur5AYLakNOpOPSDqmLZ8vKCr+NqW2BrmV+6X/H6FPkBUFOJZ1cWv6WItVAb6KHEm6q7Eo8udGeIi9iA5O+I34AenCIrMoEV8zLP7720v5Bhwq99l5i0sW5goIpoBzR+ddM80TaBWMB0CuVIPWpLqvv0392dJNCw/saO8cuhiB8NPUtF4iTe9vYzXxqveRL0vfFHudE3bbCZOxlSaYVTI6ZYTbgIkJf8a0Pe6nT4ip5xlDjg21AB8uBWMCuopMLbxxzAUuUROcUZBhtaICuuitTNffQ3nw9Q+zg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QtePaqksZc8XoaO/o8KDI+VRfVhOhKrJ0skPy4eEXPQ=; b=NfQR5IrZ9yZMFQvFHccDYJy0HocVZjPNBcexTkoNwsf6w6yCIRSVETsow7ML2Ua8B899x6Jq/3yZGOuONXR84Y8OMCg+YMYdWhfxjs+ND294GvyP6Adm+kUdNLjfreOLBCOrVUC+DrbZLNXPFU3gDG9TXqbCDRCdE1dw9cW7/sZ50hpWnalIGjF5rxkgLkpzCpZDG1RB4t0rVTIMWbvW8Ty9ULre7c9R1gzqDIwh/PCZU11U/1D9IlIM5VC6Nm1GeFgkJtdQ4X3XuHSCdNjmvJ8xEe8aHkxDIQo+67ifinGq4Yg5UqetgtsRuyrXKKX/+jHBpboUD4CBwzIC1jBByQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QtePaqksZc8XoaO/o8KDI+VRfVhOhKrJ0skPy4eEXPQ=; b=tcdbBRjjxcJwNIG8kmLywb1SizIzdV2IRWAfWxS3lG07rySPL4AU5zYzYb/v/Ap2mpDRDJg+aag/eVGaLHo88cXVDkEeYw6UZHFgo9yfqYfa27VE58S6+KhH1rRw4iMfr2lfj65KE20v8DQEMR+yvq1rSJN7N/yMyzu55hDS7Q0= Received: from MW4PR03CA0238.namprd03.prod.outlook.com (2603:10b6:303:b9::33) by BN9PR12MB5145.namprd12.prod.outlook.com (2603:10b6:408:136::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.45; Tue, 18 Apr 2023 21:05:44 +0000 Received: from CO1NAM11FT003.eop-nam11.prod.protection.outlook.com (2603:10b6:303:b9:cafe::1f) by MW4PR03CA0238.outlook.office365.com (2603:10b6:303:b9::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.46 via Frontend Transport; Tue, 18 Apr 2023 21:05:44 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT003.mail.protection.outlook.com (10.13.175.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6319.20 via Frontend Transport; Tue, 18 Apr 2023 21:05:43 +0000 Received: from ethanolx50f7host.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Tue, 18 Apr 2023 16:05:42 -0500 From: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> To: <linux-pci@vger.kernel.org>, <linux-kernel@vger.kernel.org> CC: Bjorn Helgaas <bhelgaas@google.com>, <oohall@gmail.com>, "Mahesh J Salgaonkar" <mahesh@linux.ibm.com>, Lukas Wunner <lukas@wunner.de>, "Kuppuswamy Sathyanarayanan" <sathyanarayanan.kuppuswamy@linux.intel.com>, Yazen Ghannam <yazen.ghannam@amd.com>, Fontenot Nathan <Nathan.Fontenot@amd.com>, "Smita Koralahalli" <Smita.KoralahalliChannabasappa@amd.com> Subject: [PATCH v2 2/2] PCI: pciehp: Clear the optional capabilities in DEVCTL2 on a hot-plug Date: Tue, 18 Apr 2023 21:05:26 +0000 Message-ID: <20230418210526.36514-3-Smita.KoralahalliChannabasappa@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230418210526.36514-1-Smita.KoralahalliChannabasappa@amd.com> References: <20230418210526.36514-1-Smita.KoralahalliChannabasappa@amd.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT003:EE_|BN9PR12MB5145:EE_ X-MS-Office365-Filtering-Correlation-Id: c7e506fb-1fd5-49f0-84c8-08db4050ac9c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: p+VVOXLQ6VEx9+q+2rnEAWn9PvSL+mB4SBk23H1O9lkf/lxXf+cw4Zgn1TDySlDFaVHz34GSVxdvYTCwP8Hhx8aFPo4xZQJY3PN2DvLW9UtCMKKXnlwc2+z/hgxvYzoVoNJfvX8cJW8rmiIG4vNj2T915umx//HXbaVl7xy58BLRCV21E6BBgIHjfNUu9t8H6kIrUFP6IXAlZQNNHXDhKuOcGo1NQ3WkzIOAmDZJKyXOoAvOP/mDgVEtGIBYd+93ex6Yhg3IsVo2vX/O4Qtjzjo1boaxIJ5sI2R5FKxyC386v8wFB64DJ+2Tjb9Hiy4mkTDr5BFEcsWWf1zGj6AygSk6eiH9ZB4Ax0tAvJbfyXBbR+0Ea7woHiATXEk3aB9fuFOsSzaq9jAdDKRnAbcDhF57tpCxBZx0nJ2znwcqrQR5XlmDGczCOHFTjw6xz36vVWau4Pfc/YLVeogR6SpSRuPzgme8CemJUujaXhoLHfZ4WS7+/yqdt7L56zvjuIZH4l9n8BPiuFF77Pwbs2V8c/aBsChJ6Qywfz7Og6aaasHuinDHDCEvYtK7pvb6gt5jfJs8fX4kH4EuY2icvGu/ZBzTyQxO+At/i5kqTbcAbo61NHPK6jFZUE5yQOjS6kOh42eX8Z8oUYwvcNiUCQ6qORtJ/dtFj4n4Al1x+2GXPhj4GFPtlUHSE3y1cBqqOotCRB5bkWsD3uGDCcuYIK6ZgoxaZt//iV2ND6wBioiGqys= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230028)(4636009)(376002)(396003)(346002)(39860400002)(136003)(451199021)(40470700004)(36840700001)(46966006)(36860700001)(2616005)(54906003)(110136005)(26005)(1076003)(478600001)(82740400003)(70586007)(70206006)(47076005)(316002)(83380400001)(7696005)(6666004)(186003)(966005)(16526019)(336012)(4326008)(426003)(8676002)(8936002)(2906002)(41300700001)(5660300002)(356005)(82310400005)(81166007)(36756003)(40460700003)(40480700001)(86362001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Apr 2023 21:05:43.7744 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c7e506fb-1fd5-49f0-84c8-08db4050ac9c X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT003.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN9PR12MB5145 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763550711091327453?= X-GMAIL-MSGID: =?utf-8?q?1763550711091327453?= |
Series |
PCI: pciehp: Add support for native AER and DPC handling on async remove
|
|
Commit Message
Smita Koralahalli
April 18, 2023, 9:05 p.m. UTC
Clear all capabilities in Device Control 2 register as they are optional
and it is not determined whether the next device inserted will support
these capabilities. Moreover, Section 6.13 of the PCIe Base
Specification [1], recommends clearing the ARI Forwarding Enable bit on
a hot-plug event as its not guaranteed that the newly added component
is in fact an ARI device.
[1] PCI Express Base Specification Revision 6.0, Dec 16 2021.
https://members.pcisig.com/wg/PCI-SIG/document/16609
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
v2:
Clear all optional capabilities in Device Control 2 register
instead of individually clearing ARI Forwarding Enable,
AtomicOp Requestor Enable and 10-bit Tag Requestor Enable.
---
drivers/pci/hotplug/pciehp_pci.c | 1 +
1 file changed, 1 insertion(+)
Comments
On Tue, Apr 18, 2023 at 09:05:26PM +0000, Smita Koralahalli wrote: > Clear all capabilities in Device Control 2 register as they are optional > and it is not determined whether the next device inserted will support > these capabilities. Moreover, Section 6.13 of the PCIe Base > Specification [1], recommends clearing the ARI Forwarding Enable bit on > a hot-plug event as its not guaranteed that the newly added component > is in fact an ARI device. Clearing ARI Forwarding Enable sounds reasonable, but I don't see why all the other bits in the Device Control 2 register need to be cleared. If there isn't a reason to clear them, I'd be in favor of leaving them alone. As for clearing ARI Forwarding Enable, it seems commit b0cc6020e1cc ("PCI: Enable ARI if dev and upstream bridge support it; disable otherwise") already solved this problem. Quoth its commit message: "Currently, we enable ARI in a device's upstream bridge if the bridge and the device support it. But we never disable ARI, even if the device is removed and replaced with a device that doesn't support ARI. This means that if we hot-remove an ARI device and replace it with a non-ARI multi-function device, we find only function 0 of the new device because the upstream bridge still has ARI enabled, and next_ari_fn() only returns function 0 for the new non-ARI device. This patch disables ARI in the upstream bridge if the device doesn't support ARI. See the PCIe spec, r3.0, sec 6.13." My superficial understanding of that patch is that we do find function 0, while enumerating it we clear the ARI Forwarding Enable bit and thus the remaining functions become accessible and are subsequently enumerated. Are you seeing issues when removing an ARI-capable endpoint from a hotplug slot and replacing it with a non-ARI-capable device? If you do, the question is why the above-quoted commit doesn't avoid them. > --- a/drivers/pci/hotplug/pciehp_pci.c > +++ b/drivers/pci/hotplug/pciehp_pci.c > @@ -104,6 +104,7 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence) > list_for_each_entry_safe_reverse(dev, temp, &parent->devices, > bus_list) { > pci_dev_get(dev); > + pcie_capability_clear_word(dev, PCI_EXP_DEVCTL2, 0xffff); > pci_stop_and_remove_bus_device(dev); > /* > * Ensure that no new Requests will be generated from This clears the Device Control 2 register on the hotplugged device, but to clear ARI Forwarding Enable, you'd have to clear the register of the hotplug port, i.e. the *parent* of the hotplugged device. Also, this patch doesn't apply cleanly to v6.4-rc1 because of a context change by f5eff5591b8f ("PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock"). Thanks, Lukas
Hi Lukas, Sorry for the delay in replying to this patch. We had some internal discussions after the review comments. Please find the responses inline.. On 5/11/2023 4:19 AM, Lukas Wunner wrote: > On Tue, Apr 18, 2023 at 09:05:26PM +0000, Smita Koralahalli wrote: >> Clear all capabilities in Device Control 2 register as they are optional >> and it is not determined whether the next device inserted will support >> these capabilities. Moreover, Section 6.13 of the PCIe Base >> Specification [1], recommends clearing the ARI Forwarding Enable bit on >> a hot-plug event as its not guaranteed that the newly added component >> is in fact an ARI device. > > Clearing ARI Forwarding Enable sounds reasonable, but I don't see > why all the other bits in the Device Control 2 register need to be > cleared. If there isn't a reason to clear them, I'd be in favor of > leaving them alone. I understand. The SPEC doesn't "clearly" state to clear the other bits except ARI on a hot-plug event. But, we came across issues when a device with 10-bit tags was removed and replaced with a device that didn't support 10-bit tags. The device became inaccessible and the port was not able to be recovered without a system reset. So, we thought it would be better to cherry pick all bits that were negotiated between endpoint and root port and decided that we should clear them all on removal. Hence, my first revision of this patch series had aimed to clear only ARI, AtomicOp Requestor and 10 bit tags as these were the negotiated settings between endpoint and root port. Ideally, these settings should be re-negotiated and set up for optimal operation on a hot add. We had some internal discussions to understand if SPEC has it documented somewhere. And we could see in Section 2.2.6.2, it implies that: [i] If a Requester sends a 10-Bit Tag Request to a Completer that lacks 10-Bit Completer capability, the returned Completion(s) will have Tags with Tag[9:8] equal to 00b. Since the Requester is forbidden to generate these Tag values for 10-Bit Tags, such Completions will be handled as Unexpected Completions, which by default are Advisory Non-Fatal Errors. The Requester must follow standard PCI Express error handling requirements. [ii] In configurations where a Requester with 10-Bit Tag Requester capability needs to target multiple Completers, one needs to ensure that the Requester sends 10-Bit Tag Requests only to Completers that have 10-Bit Tag Completer capability. Now, we might wonder, why clear them (especially 10-bit tags and AtomicOps) if Linux hasn't enabled them at all as the "10-Bit Tag Requester Enable" bit is not defined in Linux currently. But, these features might be enabled by Platform FW for "performance reasons" if the endpoint supports and now it is the responsibility of the operating system to disable it on a hot remove event. According to implementation notes in 2.2.6.2: "For platforms where the RC supports 10-Bit Tag Completer capability, it is highly recommended for platform firmware or operating software that configures PCIe hierarchies to Set the 10-Bit Tag Requester Enable bit automatically in Endpoints with 10-Bit Tag Requester capability. This enables the important class of 10-Bit Tag capable adapters that send Memory Read Requests only to host memory." So if the endpoint and root port both support 10-bit tags BIOS is enabling it at boot time.. I ran a quick check to see how DEV_CTL2 registers for root port look on a 10-bit tag supported NVMe device. pci 0000:80:05.1: DEVCTL2 0x1726 (Bit 12: 10-bit tag is enabled..) pci 0000:80:05.1: DEVCAP2 0x7f19ff So, it seems like BIOS has enabled 10-bit tags at boot time even though Linux hasn't enabled it. Some couple of ways we think could be: [1] Check if these bits are enabled by Platform at boot time, clear them only it is set during hotplug flow. [2] Clear them unconditionally as I did.. [3] Enable 10-bits tags in Linux when a device is probed just like how we do for ARI.. Similarly call pci_enable_atomic_ops_to_root() during a hot add.. Let me know what you think.. > > As for clearing ARI Forwarding Enable, it seems commit b0cc6020e1cc > ("PCI: Enable ARI if dev and upstream bridge support it; disable > otherwise") already solved this problem. Quoth its commit message: > > "Currently, we enable ARI in a device's upstream bridge if the bridge and > the device support it. But we never disable ARI, even if the device is > removed and replaced with a device that doesn't support ARI. > > This means that if we hot-remove an ARI device and replace it with a > non-ARI multi-function device, we find only function 0 of the new device > because the upstream bridge still has ARI enabled, and next_ari_fn() > only returns function 0 for the new non-ARI device. > > This patch disables ARI in the upstream bridge if the device doesn't > support ARI. See the PCIe spec, r3.0, sec 6.13." > > My superficial understanding of that patch is that we do find function 0, > while enumerating it we clear the ARI Forwarding Enable bit and thus the > remaining functions become accessible and are subsequently enumerated. > > Are you seeing issues when removing an ARI-capable endpoint from a > hotplug slot and replacing it with a non-ARI-capable device? > If you do, the question is why the above-quoted commit doesn't avoid them. Yeah! Sorry I missed this. ARI is already checked and enabled during device initialization. > > >> --- a/drivers/pci/hotplug/pciehp_pci.c >> +++ b/drivers/pci/hotplug/pciehp_pci.c >> @@ -104,6 +104,7 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence) >> list_for_each_entry_safe_reverse(dev, temp, &parent->devices, >> bus_list) { >> pci_dev_get(dev); >> + pcie_capability_clear_word(dev, PCI_EXP_DEVCTL2, 0xffff); >> pci_stop_and_remove_bus_device(dev); >> /* >> * Ensure that no new Requests will be generated from > > This clears the Device Control 2 register on the hotplugged device, > but to clear ARI Forwarding Enable, you'd have to clear the register > of the hotplug port, i.e. the *parent* of the hotplugged device. I agree! Thanks, Smita > > Also, this patch doesn't apply cleanly to v6.4-rc1 because of a context > change by f5eff5591b8f ("PCI: pciehp: Fix AB-BA deadlock between > reset_lock and device_lock"). > > Thanks, > > Lukas
Hi Smita, My apologies for the delay! On Mon, May 22, 2023 at 03:23:31PM -0700, Smita Koralahalli wrote: > On 5/11/2023 4:19 AM, Lukas Wunner wrote: > > On Tue, Apr 18, 2023 at 09:05:26PM +0000, Smita Koralahalli wrote: > > > Clear all capabilities in Device Control 2 register as they are optional > > > and it is not determined whether the next device inserted will support > > > these capabilities. Moreover, Section 6.13 of the PCIe Base > > > Specification [1], recommends clearing the ARI Forwarding Enable bit on > > > a hot-plug event as its not guaranteed that the newly added component > > > is in fact an ARI device. > > > > Clearing ARI Forwarding Enable sounds reasonable, but I don't see > > why all the other bits in the Device Control 2 register need to be > > cleared. If there isn't a reason to clear them, I'd be in favor of > > leaving them alone. > > I understand. The SPEC doesn't "clearly" state to clear the other bits > except ARI on a hot-plug event. > > But, we came across issues when a device with 10-bit tags was removed and > replaced with a device that didn't support 10-bit tags. The device became > inaccessible and the port was not able to be recovered without a system > reset. So, we thought it would be better to cherry pick all bits that were > negotiated between endpoint and root port and decided that we should clear > them all on removal. Hence, my first revision of this patch series had aimed > to clear only ARI, AtomicOp Requestor and 10 bit tags as these were the > negotiated settings between endpoint and root port. Ideally, these settings > should be re-negotiated and set up for optimal operation on a hot add. Makes total sense. I like the approach of clearing only these three bits, as you did in v1 of the patch. I also appreciate the detailed explanation that you've provided. Much of your e-mail can be copy-pasted to the commit message, in my opinion it's valuable information to any reviewer and future reader of the commit. > We had some internal discussions to understand if SPEC has it documented > somewhere. And we could see in Section 2.2.6.2, it implies that: > [i] If a Requester sends a 10-Bit Tag Request to a Completer that lacks > 10-Bit Completer capability, the returned Completion(s) will have Tags with > Tag[9:8] equal to 00b. Since the Requester is forbidden to generate these > Tag values for 10-Bit Tags, such Completions will be handled as Unexpected > Completions, which by default are Advisory Non-Fatal Errors. The Requester > must follow standard PCI Express error handling requirements. > [ii] In configurations where a Requester with 10-Bit Tag Requester > capability needs to target multiple Completers, one needs to ensure that the > Requester sends 10-Bit Tag Requests only to Completers that have 10-Bit Tag > Completer capability. > > Now, we might wonder, why clear them (especially 10-bit tags and AtomicOps) > if Linux hasn't enabled them at all as the "10-Bit Tag Requester Enable" bit > is not defined in Linux currently. But, these features might be enabled by > Platform FW for "performance reasons" if the endpoint supports and now it is > the responsibility of the operating system to disable it on a hot remove > event. Again, makes total sense. > According to implementation notes in 2.2.6.2: "For platforms where the RC > supports 10-Bit Tag Completer capability, it is highly recommended for > platform firmware or operating software that configures PCIe hierarchies to > Set the 10-Bit Tag Requester Enable bit automatically in Endpoints with > 10-Bit Tag Requester capability. This enables the important class of 10-Bit > Tag capable adapters that send Memory Read Requests only to host memory." So > if the endpoint and root port both support 10-bit tags BIOS is enabling it > at boot time.. > > I ran a quick check to see how DEV_CTL2 registers for root port look on a > 10-bit tag supported NVMe device. > > pci 0000:80:05.1: DEVCTL2 0x1726 (Bit 12: 10-bit tag is enabled..) > pci 0000:80:05.1: DEVCAP2 0x7f19ff > > So, it seems like BIOS has enabled 10-bit tags at boot time even though > Linux hasn't enabled it. > > Some couple of ways we think could be: > [1] Check if these bits are enabled by Platform at boot time, clear them > only it is set during hotplug flow. > [2] Clear them unconditionally as I did.. > [3] Enable 10-bits tags in Linux when a device is probed just like how we do > for ARI.. > > Similarly call pci_enable_atomic_ops_to_root() during a hot add.. Personally I'm fine with option [2]. If you or Bjorn prefer option [3], I'm fine with that as well. > > As for clearing ARI Forwarding Enable, it seems commit b0cc6020e1cc > > ("PCI: Enable ARI if dev and upstream bridge support it; disable > > otherwise") already solved this problem. Quoth its commit message: [...] > > My superficial understanding of that patch is that we do find function 0, > > while enumerating it we clear the ARI Forwarding Enable bit and thus the > > remaining functions become accessible and are subsequently enumerated. > > > > Are you seeing issues when removing an ARI-capable endpoint from a > > hotplug slot and replacing it with a non-ARI-capable device? > > If you do, the question is why the above-quoted commit doesn't avoid them. > > Yeah! Sorry I missed this. ARI is already checked and enabled during device > initialization. It doesn't hurt to additionally clear on hot-removal. Thanks, Lukas
Hi Lukas, Thanks for reviewing my patch On 6/16/2023 11:24 AM, Lukas Wunner wrote: > Hi Smita, > > My apologies for the delay! > > On Mon, May 22, 2023 at 03:23:31PM -0700, Smita Koralahalli wrote: >> On 5/11/2023 4:19 AM, Lukas Wunner wrote: >>> On Tue, Apr 18, 2023 at 09:05:26PM +0000, Smita Koralahalli wrote: >>>> Clear all capabilities in Device Control 2 register as they are optional >>>> and it is not determined whether the next device inserted will support >>>> these capabilities. Moreover, Section 6.13 of the PCIe Base >>>> Specification [1], recommends clearing the ARI Forwarding Enable bit on >>>> a hot-plug event as its not guaranteed that the newly added component >>>> is in fact an ARI device. >>> >>> Clearing ARI Forwarding Enable sounds reasonable, but I don't see >>> why all the other bits in the Device Control 2 register need to be >>> cleared. If there isn't a reason to clear them, I'd be in favor of >>> leaving them alone. >> >> I understand. The SPEC doesn't "clearly" state to clear the other bits >> except ARI on a hot-plug event. >> >> But, we came across issues when a device with 10-bit tags was removed and >> replaced with a device that didn't support 10-bit tags. The device became >> inaccessible and the port was not able to be recovered without a system >> reset. So, we thought it would be better to cherry pick all bits that were >> negotiated between endpoint and root port and decided that we should clear >> them all on removal. Hence, my first revision of this patch series had aimed >> to clear only ARI, AtomicOp Requestor and 10 bit tags as these were the >> negotiated settings between endpoint and root port. Ideally, these settings >> should be re-negotiated and set up for optimal operation on a hot add. > > Makes total sense. I like the approach of clearing only these three > bits, as you did in v1 of the patch. I also appreciate the detailed > explanation that you've provided. Much of your e-mail can be copy-pasted > to the commit message, in my opinion it's valuable information to any > reviewer and future reader of the commit. > > >> We had some internal discussions to understand if SPEC has it documented >> somewhere. And we could see in Section 2.2.6.2, it implies that: >> [i] If a Requester sends a 10-Bit Tag Request to a Completer that lacks >> 10-Bit Completer capability, the returned Completion(s) will have Tags with >> Tag[9:8] equal to 00b. Since the Requester is forbidden to generate these >> Tag values for 10-Bit Tags, such Completions will be handled as Unexpected >> Completions, which by default are Advisory Non-Fatal Errors. The Requester >> must follow standard PCI Express error handling requirements. >> [ii] In configurations where a Requester with 10-Bit Tag Requester >> capability needs to target multiple Completers, one needs to ensure that the >> Requester sends 10-Bit Tag Requests only to Completers that have 10-Bit Tag >> Completer capability. >> >> Now, we might wonder, why clear them (especially 10-bit tags and AtomicOps) >> if Linux hasn't enabled them at all as the "10-Bit Tag Requester Enable" bit >> is not defined in Linux currently. But, these features might be enabled by >> Platform FW for "performance reasons" if the endpoint supports and now it is >> the responsibility of the operating system to disable it on a hot remove >> event. > > Again, makes total sense. > > >> According to implementation notes in 2.2.6.2: "For platforms where the RC >> supports 10-Bit Tag Completer capability, it is highly recommended for >> platform firmware or operating software that configures PCIe hierarchies to >> Set the 10-Bit Tag Requester Enable bit automatically in Endpoints with >> 10-Bit Tag Requester capability. This enables the important class of 10-Bit >> Tag capable adapters that send Memory Read Requests only to host memory." So >> if the endpoint and root port both support 10-bit tags BIOS is enabling it >> at boot time.. >> >> I ran a quick check to see how DEV_CTL2 registers for root port look on a >> 10-bit tag supported NVMe device. >> >> pci 0000:80:05.1: DEVCTL2 0x1726 (Bit 12: 10-bit tag is enabled..) >> pci 0000:80:05.1: DEVCAP2 0x7f19ff >> >> So, it seems like BIOS has enabled 10-bit tags at boot time even though >> Linux hasn't enabled it. >> >> Some couple of ways we think could be: >> [1] Check if these bits are enabled by Platform at boot time, clear them >> only it is set during hotplug flow. >> [2] Clear them unconditionally as I did.. >> [3] Enable 10-bits tags in Linux when a device is probed just like how we do >> for ARI.. >> >> Similarly call pci_enable_atomic_ops_to_root() during a hot add.. > > Personally I'm fine with option [2]. If you or Bjorn prefer option [3], > I'm fine with that as well. Looking forward for Bjorn comments! Thanks, Smita > > >>> As for clearing ARI Forwarding Enable, it seems commit b0cc6020e1cc >>> ("PCI: Enable ARI if dev and upstream bridge support it; disable >>> otherwise") already solved this problem. Quoth its commit message: > [...] >>> My superficial understanding of that patch is that we do find function 0, >>> while enumerating it we clear the ARI Forwarding Enable bit and thus the >>> remaining functions become accessible and are subsequently enumerated. >>> >>> Are you seeing issues when removing an ARI-capable endpoint from a >>> hotplug slot and replacing it with a non-ARI-capable device? >>> If you do, the question is why the above-quoted commit doesn't avoid them. >> >> Yeah! Sorry I missed this. ARI is already checked and enabled during device >> initialization. > > It doesn't hurt to additionally clear on hot-removal. > > Thanks, > > Lukas
On Fri, Jun 16, 2023 at 04:34:27PM -0700, Smita Koralahalli wrote: > On 6/16/2023 11:24 AM, Lukas Wunner wrote: > > On Mon, May 22, 2023 at 03:23:31PM -0700, Smita Koralahalli wrote: > > > On 5/11/2023 4:19 AM, Lukas Wunner wrote: > > > > On Tue, Apr 18, 2023 at 09:05:26PM +0000, Smita Koralahalli wrote: > > > Some couple of ways we think could be: > > > [1] Check if these bits are enabled by Platform at boot time, clear them > > > only it is set during hotplug flow. > > > [2] Clear them unconditionally as I did.. > > > [3] Enable 10-bits tags in Linux when a device is probed just like how > > > we do for ARI.. > > > > > > Similarly call pci_enable_atomic_ops_to_root() during a hot add.. > > > > Personally I'm fine with option [2]. If you or Bjorn prefer option [3], > > I'm fine with that as well. > > Looking forward for Bjorn comments! You may want to consider first doing [2], i.e. clear the DevCtl2 bits on hot removal, and then in a separate step do [3], i.e. add support for enabling 10 bit tags and atomic ops in the kernel. Having that would certainly be useful, but it's more complex than just clearing the DevCtl2 bits on unplug. So you may want to do the latter as a stop-gap. Thanks, Lukas
On 6/21/2023 12:12 AM, Lukas Wunner wrote: > On Fri, Jun 16, 2023 at 04:34:27PM -0700, Smita Koralahalli wrote: >> On 6/16/2023 11:24 AM, Lukas Wunner wrote: >>> On Mon, May 22, 2023 at 03:23:31PM -0700, Smita Koralahalli wrote: >>>> On 5/11/2023 4:19 AM, Lukas Wunner wrote: >>>>> On Tue, Apr 18, 2023 at 09:05:26PM +0000, Smita Koralahalli wrote: >>>> Some couple of ways we think could be: >>>> [1] Check if these bits are enabled by Platform at boot time, clear them >>>> only it is set during hotplug flow. >>>> [2] Clear them unconditionally as I did.. >>>> [3] Enable 10-bits tags in Linux when a device is probed just like how >>>> we do for ARI.. >>>> >>>> Similarly call pci_enable_atomic_ops_to_root() during a hot add.. >>> >>> Personally I'm fine with option [2]. If you or Bjorn prefer option [3], >>> I'm fine with that as well. >> >> Looking forward for Bjorn comments! > > You may want to consider first doing [2], i.e. clear the DevCtl2 bits > on hot removal, and then in a separate step do [3], i.e. add support > for enabling 10 bit tags and atomic ops in the kernel. Having that > would certainly be useful, but it's more complex than just clearing > the DevCtl2 bits on unplug. So you may want to do the latter as a > stop-gap. Okay I just sent v3 now. I will take care of this in v4 along with other comments in v3. Thanks, Smita > > Thanks, > > Lukas
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c index d17f3bf36f70..aabf7884ff30 100644 --- a/drivers/pci/hotplug/pciehp_pci.c +++ b/drivers/pci/hotplug/pciehp_pci.c @@ -104,6 +104,7 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence) list_for_each_entry_safe_reverse(dev, temp, &parent->devices, bus_list) { pci_dev_get(dev); + pcie_capability_clear_word(dev, PCI_EXP_DEVCTL2, 0xffff); pci_stop_and_remove_bus_device(dev); /* * Ensure that no new Requests will be generated from