Message ID | 20221109184244.7032-1-ajit.khaparde@broadcom.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp500548wru; Wed, 9 Nov 2022 10:43:22 -0800 (PST) X-Google-Smtp-Source: AMsMyM435G5WHbFphsSBltRUhsTB2xFoBz7u/vIpx7uTyzTGTYQz0zennceJE5LvCzoG6cdWMS5I X-Received: by 2002:a17:90a:e606:b0:212:f100:22e3 with SMTP id j6-20020a17090ae60600b00212f10022e3mr81739183pjy.83.1668019401750; Wed, 09 Nov 2022 10:43:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668019401; cv=none; d=google.com; s=arc-20160816; b=hwzJz0vJlI0Ag0ztH95Pi34BQlR665JUJL5S4j2kdPJCep+Pg+LLR3l+KsN7TwXDfO e8yMkrIrAkBVBT03vvvi6Zgzm/D6iHQUiLo9a2eZdd7c05UvH4d+JCO66RRiv65kzwxX xo1idBeZd03YKu7QAl5u7ReHLQBpDbdL5rX1ertLfA8vX/escRcyYzU0y7Jd/Q75gPzL omUUXU30qUAHkzUF2mQcbANw9lC27/jwZqN3A7RF55i0vz24ba7QJaWwJYwJAIUKt4CQ IUGdH/NgC1Pm0j4e0trbQi8KJt6txBQVEnqkLABZbNiAhlSqJT2cUiOnIOVli3dETGVy Ss6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from :dkim-signature; bh=dB0UvAC9qRRi5JRmDnt/vBs1sSk0zJ8lYzCltRQocc0=; b=mZcEiDoVPJ7sVvvVNDu2s2jeyNhnCNFzDaqstAxT3dbsPJi4Upsul96rNg9oJb8WEZ PH8HzOxCuibCcNZtfGfDwhEn2VNBUF+BZA0a3uZm5rcbB+A6QeQku6DByz2F8/eKxb7k lV7fPsGpWvooMvhkpMH74YA/fAZkI8la9ThexwxbfxEWuETC8DmCgKVeOWS4319upcSt +pAzpY4Dgy4Ai3vfYTo3v1m9JS15cMsZW3MPENwi4ZNxcv0n2I4/p3KIyOvrM38Z+dHv 3EOT3FR5u94U0YN9UOXY5Bn/qyCDQqpNFUr0p2YQguQXrHz7nsla4/pgBWEgdHDGghIq pt2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@broadcom.com header.s=google header.b=gA3xojlk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadcom.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q6-20020a631f46000000b00437ec09f591si15879752pgm.135.2022.11.09.10.43.06; Wed, 09 Nov 2022 10:43:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@broadcom.com header.s=google header.b=gA3xojlk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadcom.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229691AbiKISm5 (ORCPT <rfc822;dexuan.linux@gmail.com> + 99 others); Wed, 9 Nov 2022 13:42:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230261AbiKISmy (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 9 Nov 2022 13:42:54 -0500 Received: from mail-qt1-x834.google.com (mail-qt1-x834.google.com [IPv6:2607:f8b0:4864:20::834]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7FFE8F5B0 for <linux-kernel@vger.kernel.org>; Wed, 9 Nov 2022 10:42:53 -0800 (PST) Received: by mail-qt1-x834.google.com with SMTP id h21so10886892qtu.2 for <linux-kernel@vger.kernel.org>; Wed, 09 Nov 2022 10:42:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=mime-version:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=dB0UvAC9qRRi5JRmDnt/vBs1sSk0zJ8lYzCltRQocc0=; b=gA3xojlkYjkNffbgpbwGSuTDYlQBm8qLvW8KHBYJXV+UUxpEXAnN+siFhPKh02SgE1 8+g6AnXj5U9cWwPUUT/Cvn/uWygCgyancEqlaq8h71teMF07NGDJeD4jODWuk/E11Hy/ yyzphYBDPfVeGIDXPJxBbKxCR4MnOZEbRLOac= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=dB0UvAC9qRRi5JRmDnt/vBs1sSk0zJ8lYzCltRQocc0=; b=hwAtx5YtRq5/Jq54jOvcta5ghSGKrFgY0eAElhYuqcvZwgcJk2ydt9sWOwTO4r6HZU BtBwa0jCH3+dtb/aEDU47CERZ1ep0K8IrW3oal5LEtL3QyVznH8D3Tvc6/7DbWpavhDv W1C3UsBW78CsU3cDmEOI7Rht2xvwnYtAB+uYA39dafBcERLksmXl7R+NDsjxmyMVWnis nkou9xOLe6eVTFXg1ovoD9HAHoScW3xwkgJ+6Qcm3e1Ygx/lKTu3mOTuxWYwn0zmRMc8 /c+ZX8IZdkj7S93yjsyIDfbeZ4XT8cQToDb+/6PjqRr+jT/aUV+H0NO313kQxS6oA8vh DEZQ== X-Gm-Message-State: ACrzQf19IY57BdePKJjEJ3+YxtCLzo+3MRW7gr38oc0xSn2wSuKo1nub 5R0NmgdQhfxBzpwxPOI2wqSEhOcxZ9re8A== X-Received: by 2002:ac8:4d82:0:b0:3a5:1bae:a18f with SMTP id a2-20020ac84d82000000b003a51baea18fmr46763080qtw.265.1668019372642; Wed, 09 Nov 2022 10:42:52 -0800 (PST) Received: from C02GC2QQMD6T.wifi.broadcom.net ([192.19.223.252]) by smtp.gmail.com with ESMTPSA id u17-20020a05622a011100b003a598fcddefsm4795108qtw.87.2022.11.09.10.42.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Nov 2022 10:42:51 -0800 (PST) From: Ajit Khaparde <ajit.khaparde@broadcom.com> To: ajit.khaparde@broadcom.com Cc: andrew.gospodarek@broadcom.com, davem@davemloft.net, edumazet@google.com, jgg@ziepe.ca, kuba@kernel.org, leon@kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, michael.chan@broadcom.com, netdev@vger.kernel.org, pabeni@redhat.com, selvin.xavier@broadcom.com Subject: [PATCH v4 0/6] Add Auxiliary driver support Date: Wed, 9 Nov 2022 10:42:38 -0800 Message-Id: <20221109184244.7032-1-ajit.khaparde@broadcom.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) MIME-Version: 1.0 Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-256; boundary="0000000000003338f305ed0e0629" X-Spam-Status: No, score=1.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MIME_NO_TEXT, RCVD_IN_DNSWL_NONE,SORTED_RECIPS,SPF_HELO_NONE,SPF_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749045112358972378?= X-GMAIL-MSGID: =?utf-8?q?1749045112358972378?= |
Series |
Add Auxiliary driver support
|
|
Message
Ajit Khaparde
Nov. 9, 2022, 6:42 p.m. UTC
Add auxiliary device driver for Broadcom devices. The bnxt_en driver will register and initialize an aux device if RDMA is enabled in the underlying device. The bnxt_re driver will then probe and initialize the RoCE interfaces with the infiniband stack. We got rid of the bnxt_en_ops which the bnxt_re driver used to communicate with bnxt_en. Similarly We have tried to clean up most of the bnxt_ulp_ops. In most of the cases we used the functions and entry points provided by the auxiliary bus driver framework. And now these are the minimal functions needed to support the functionality. We will try to work on getting rid of the remaining if we find any other viable option in future. v1->v2: - Incorporated review comments including usage of ulp_id & complex function indirections. - Used function calls provided by the auxiliary bus interface instead of proprietary calls. - Refactor code to remove ROCE driver's access to bnxt structure. v2->v3: - Addressed review comments including cleanup of some unnecessary wrappers - Fixed warnings seen during cross compilation v3->v4: - Cleaned up bnxt_ulp.c and bnxt_ulp.h further - Removed some more dead code - Sending the patchset as a standalone series Please apply. Thanks. Ajit Khaparde (5): bnxt_en: Add auxiliary driver support RDMA/bnxt_re: Use auxiliary driver interface bnxt_en: Remove usage of ulp_id bnxt_en: Use direct API instead of indirection bnxt_en: Use auxiliary bus calls over proprietary calls Hongguang Gao (1): bnxt_en: Remove struct bnxt access from RoCE driver drivers/infiniband/hw/bnxt_re/bnxt_re.h | 9 +- drivers/infiniband/hw/bnxt_re/main.c | 578 +++++++----------- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 10 +- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 8 + drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 388 ++++++------ drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 52 +- 6 files changed, 463 insertions(+), 582 deletions(-)
Comments
On Wed, Nov 09, 2022 at 10:42:38AM -0800, Ajit Khaparde wrote: > Add auxiliary device driver for Broadcom devices. > The bnxt_en driver will register and initialize an aux device > if RDMA is enabled in the underlying device. > The bnxt_re driver will then probe and initialize the > RoCE interfaces with the infiniband stack. > > We got rid of the bnxt_en_ops which the bnxt_re driver used to > communicate with bnxt_en. > Similarly We have tried to clean up most of the bnxt_ulp_ops. > In most of the cases we used the functions and entry points provided > by the auxiliary bus driver framework. > And now these are the minimal functions needed to support the functionality. > > We will try to work on getting rid of the remaining if we find any > other viable option in future. I still see extra checks for something that was already checked in upper functions, for example in bnxt_re_register_netdev() you check rdev, which you already checked before. However, the most important part is still existence of bnxt_ulp_ops, which shows completely no-go thing - SR-IOV config in RDMA code. 302 static struct bnxt_ulp_ops bnxt_re_ulp_ops = { 303 .ulp_sriov_config = bnxt_re_sriov_config, 304 .ulp_irq_stop = bnxt_re_stop_irq, 305 .ulp_irq_restart = bnxt_re_start_irq 306 }; All PCI management logic and interfaces are needed to be inside eth part of your driver and only that part should implement SR-IOV config. Once user enabled SR-IOV, the PCI driver should create auxiliary devices for each VF. These device will have RDMA capabilities and it will trigger RDMA driver to bind to them. Thanks
Leon, We appreciate your valuable feedback. Please see inline. On Thu, Nov 10, 2022 at 2:53 AM Leon Romanovsky <leon@kernel.org> wrote: > > On Wed, Nov 09, 2022 at 10:42:38AM -0800, Ajit Khaparde wrote: > > Add auxiliary device driver for Broadcom devices. > > The bnxt_en driver will register and initialize an aux device > > if RDMA is enabled in the underlying device. > > The bnxt_re driver will then probe and initialize the > > RoCE interfaces with the infiniband stack. > > > > We got rid of the bnxt_en_ops which the bnxt_re driver used to > > communicate with bnxt_en. > > Similarly We have tried to clean up most of the bnxt_ulp_ops. > > In most of the cases we used the functions and entry points provided > > by the auxiliary bus driver framework. > > And now these are the minimal functions needed to support the functionality. > > > > We will try to work on getting rid of the remaining if we find any > > other viable option in future. > > I still see extra checks for something that was already checked in upper > functions, for example in bnxt_re_register_netdev() you check rdev, which > you already checked before. Sure. I will do another sweep and clean up. > > However, the most important part is still existence of bnxt_ulp_ops, > which shows completely no-go thing - SR-IOV config in RDMA code. > > 302 static struct bnxt_ulp_ops bnxt_re_ulp_ops = { > 303 .ulp_sriov_config = bnxt_re_sriov_config, > 304 .ulp_irq_stop = bnxt_re_stop_irq, > 305 .ulp_irq_restart = bnxt_re_start_irq > 306 }; > > All PCI management logic and interfaces are needed to be inside eth part > of your driver and only that part should implement SR-IOV config. Once > user enabled SR-IOV, the PCI driver should create auxiliary devices for > each VF. These device will have RDMA capabilities and it will trigger RDMA > driver to bind to them. I agree and once the PF creates the auxiliary devices for the VF, the RoCE Vf indeed get probed and created. But the twist in bnxt_en/bnxt_re design is that the RoCE driver is responsible for making adjustments to the RoCE resources. So once the VF's are created and the bnxt_en driver enables SRIOV adjusts the NIC resources for the VF, and such, it tries to call into the bnxt_re driver for the same purpose. 1. We do something like this to the auxiliary_device structure: diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h index de21d9d24a95..4e581fbf458f 100644 --- a/include/linux/auxiliary_bus.h +++ b/include/linux/auxiliary_bus.h @@ -148,6 +148,7 @@ struct auxiliary_device { * @shutdown: Called at shut-down time to quiesce the device. * @suspend: Called to put the device to sleep mode. Usually to a power state. * @resume: Called to bring a device from sleep mode. + * @sriov_configure: Called to allow configuration of VFs . * @name: Driver name. * @driver: Core driver structure. * @id_table: Table of devices this driver should match on the bus. @@ -183,6 +184,7 @@ struct auxiliary_driver { void (*shutdown)(struct auxiliary_device *auxdev); int (*suspend)(struct auxiliary_device *auxdev, pm_message_t state); int (*resume)(struct auxiliary_device *auxdev); + int (*sriov_configure)(struct auxiliary_device *auxdev, int num_vfs); /* On PF */ const char *name; struct device_driver driver; const struct auxiliary_device_id *id_table; Then the bnxt_en driver could call into bnxt_re via that interface. Please let me know what you feel. 2. While it may take care of the first function in the ulp_ops, it leaves us with two. And that is where I will need some input if we need to absolutely get rid of the ops. 2a. We may be able to use the auxiliary_device suspend & resume with a private flag in the driver's aux_dev pointer. 2b. Or just like (1) above, add another hook to auxiliary_driver void (*restart)(struct auxiliary_device *auxdev); And then use the auxiliary_driver shutdown & restart with a private flag. Note that we may get creative right now and get rid of the ulp_ops. But the bnxt_en driver having a need to update the bnxt_re driver is a part of the design. So it will help if we can consider beyond the ulp_irq_stop, ulp_irq_restart. 2c. Maybe keep the bnxt_ulp_ops for that reason? Thank you for your time. Thanks Ajit > > Thanks
On Mon, Nov 14, 2022 at 04:47:31PM -0800, Ajit Khaparde wrote: > Leon, > We appreciate your valuable feedback. > Please see inline. > > On Thu, Nov 10, 2022 at 2:53 AM Leon Romanovsky <leon@kernel.org> wrote: > > > > On Wed, Nov 09, 2022 at 10:42:38AM -0800, Ajit Khaparde wrote: > > > Add auxiliary device driver for Broadcom devices. > > > The bnxt_en driver will register and initialize an aux device > > > if RDMA is enabled in the underlying device. > > > The bnxt_re driver will then probe and initialize the > > > RoCE interfaces with the infiniband stack. > > > > > > We got rid of the bnxt_en_ops which the bnxt_re driver used to > > > communicate with bnxt_en. > > > Similarly We have tried to clean up most of the bnxt_ulp_ops. > > > In most of the cases we used the functions and entry points provided > > > by the auxiliary bus driver framework. > > > And now these are the minimal functions needed to support the functionality. > > > > > > We will try to work on getting rid of the remaining if we find any > > > other viable option in future. > > > > I still see extra checks for something that was already checked in upper > > functions, for example in bnxt_re_register_netdev() you check rdev, which > > you already checked before. > Sure. I will do another sweep and clean up. > > > > > However, the most important part is still existence of bnxt_ulp_ops, > > which shows completely no-go thing - SR-IOV config in RDMA code. > > > > 302 static struct bnxt_ulp_ops bnxt_re_ulp_ops = { > > 303 .ulp_sriov_config = bnxt_re_sriov_config, > > 304 .ulp_irq_stop = bnxt_re_stop_irq, > > 305 .ulp_irq_restart = bnxt_re_start_irq > > 306 }; > > > > All PCI management logic and interfaces are needed to be inside eth part > > of your driver and only that part should implement SR-IOV config. Once > > user enabled SR-IOV, the PCI driver should create auxiliary devices for > > each VF. These device will have RDMA capabilities and it will trigger RDMA > > driver to bind to them. > I agree and once the PF creates the auxiliary devices for the VF, the RoCE > Vf indeed get probed and created. But the twist in bnxt_en/bnxt_re > design is that > the RoCE driver is responsible for making adjustments to the RoCE resources. You can still do these adjustments by checking type of function that called to RDMA .probe. PCI core exposes some functions to help distinguish between PF and VFs. > > So once the VF's are created and the bnxt_en driver enables SRIOV adjusts the > NIC resources for the VF, and such, it tries to call into the bnxt_re > driver for the > same purpose. If I read code correctly, all these resources are for one PCI function. Something like this: bnxt_re_probe() { ... if (is_virtfn(p)) bnxt_re_sriov_config(p); ... } > > 1. We do something like this to the auxiliary_device structure: > > diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h > index de21d9d24a95..4e581fbf458f 100644 > --- a/include/linux/auxiliary_bus.h > +++ b/include/linux/auxiliary_bus.h > @@ -148,6 +148,7 @@ struct auxiliary_device { > * @shutdown: Called at shut-down time to quiesce the device. > * @suspend: Called to put the device to sleep mode. Usually to a power state. > * @resume: Called to bring a device from sleep mode. > + * @sriov_configure: Called to allow configuration of VFs . > * @name: Driver name. > * @driver: Core driver structure. > * @id_table: Table of devices this driver should match on the bus. > @@ -183,6 +184,7 @@ struct auxiliary_driver { > void (*shutdown)(struct auxiliary_device *auxdev); > int (*suspend)(struct auxiliary_device *auxdev, pm_message_t state); > int (*resume)(struct auxiliary_device *auxdev); > + int (*sriov_configure)(struct auxiliary_device *auxdev, int > num_vfs); /* On PF */ > const char *name; > struct device_driver driver; > const struct auxiliary_device_id *id_table; > > Then the bnxt_en driver could call into bnxt_re via that interface. > @sriov_configure callback is PCI specific and doesn't belong to aux devices. > Please let me know what you feel. > > 2. While it may take care of the first function in the ulp_ops, it > leaves us with two. > And that is where I will need some input if we need to absolutely get > rid of the ops. > > 2a. We may be able to use the auxiliary_device suspend & resume with a > private flag > in the driver's aux_dev pointer. > 2b. Or just like (1) above, add another hook to auxiliary_driver > void (*restart)(struct auxiliary_device *auxdev); > And then use the auxiliary_driver shutdown & restart with a private flag. > > Note that we may get creative right now and get rid of the ulp_ops. > But the bnxt_en driver having a need to update the bnxt_re driver is a > part of the > design. So it will help if we can consider beyond the ulp_irq_stop, > ulp_irq_restart. > 2c. Maybe keep the bnxt_ulp_ops for that reason? It is nicer to get rid from bnxt_ulp_ops completely, but it is not must. To get rid from sriov_configure is the most important comment here. Thanks > > Thank you for your time. > > Thanks > Ajit > > > > > Thanks
On Wed, Nov 16, 2022 at 5:22 AM Leon Romanovsky <leon@kernel.org> wrote: > ::snip:: > > > All PCI management logic and interfaces are needed to be inside eth part > > > of your driver and only that part should implement SR-IOV config. Once > > > user enabled SR-IOV, the PCI driver should create auxiliary devices for > > > each VF. These device will have RDMA capabilities and it will trigger RDMA > > > driver to bind to them. > > I agree and once the PF creates the auxiliary devices for the VF, the RoCE > > Vf indeed get probed and created. But the twist in bnxt_en/bnxt_re > > design is that > > the RoCE driver is responsible for making adjustments to the RoCE resources. > > You can still do these adjustments by checking type of function that > called to RDMA .probe. PCI core exposes some functions to help distinguish between > PF and VFs. > > > > > So once the VF's are created and the bnxt_en driver enables SRIOV adjusts the > > NIC resources for the VF, and such, it tries to call into the bnxt_re > > driver for the > > same purpose. > > If I read code correctly, all these resources are for one PCI function. > > Something like this: > > bnxt_re_probe() > { > ... > if (is_virtfn(p)) > bnxt_re_sriov_config(p); > ... > } I understand what you are suggesting. But what I want is a way to do this in the context of the PF preferably before the VFs are probed. So we are trying to call the bnxt_re_sriov_config in the context of handling the PF's sriov_configure implementation. Having the ulp_ops is allowing us to avoid resource wastage and assumptions in the bnxt_re driver. ::snip::
On Tue, Nov 22, 2022 at 07:02:45AM -0800, Ajit Khaparde wrote: > On Wed, Nov 16, 2022 at 5:22 AM Leon Romanovsky <leon@kernel.org> wrote: > > > ::snip:: > > > > All PCI management logic and interfaces are needed to be inside eth part > > > > of your driver and only that part should implement SR-IOV config. Once > > > > user enabled SR-IOV, the PCI driver should create auxiliary devices for > > > > each VF. These device will have RDMA capabilities and it will trigger RDMA > > > > driver to bind to them. > > > I agree and once the PF creates the auxiliary devices for the VF, the RoCE > > > Vf indeed get probed and created. But the twist in bnxt_en/bnxt_re > > > design is that > > > the RoCE driver is responsible for making adjustments to the RoCE resources. > > > > You can still do these adjustments by checking type of function that > > called to RDMA .probe. PCI core exposes some functions to help distinguish between > > PF and VFs. > > > > > > > > So once the VF's are created and the bnxt_en driver enables SRIOV adjusts the > > > NIC resources for the VF, and such, it tries to call into the bnxt_re > > > driver for the > > > same purpose. > > > > If I read code correctly, all these resources are for one PCI function. > > > > Something like this: > > > > bnxt_re_probe() > > { > > ... > > if (is_virtfn(p)) > > bnxt_re_sriov_config(p); > > ... > > } > I understand what you are suggesting. > But what I want is a way to do this in the context of the PF > preferably before the VFs are probed. I don't understand the last sentence. You call to this sriov_config in bnxt_re driver without any protection from VFs being probed, > So we are trying to call the > bnxt_re_sriov_config in the context of handling the PF's > sriov_configure implementation. Having the ulp_ops is allowing us to > avoid resource wastage and assumptions in the bnxt_re driver. To which resource wastage are you referring? There are no differences if same limits will be in bnxt_en driver when RDMA bnxt device is created or in bnxt_re which will be called once RDMA device is created. Thanks > > ::snip::
On Tue, Nov 22, 2022 at 10:59 PM Leon Romanovsky <leon@kernel.org> wrote: > > On Tue, Nov 22, 2022 at 07:02:45AM -0800, Ajit Khaparde wrote: > > On Wed, Nov 16, 2022 at 5:22 AM Leon Romanovsky <leon@kernel.org> wrote: > > > > > ::snip:: > > > > > All PCI management logic and interfaces are needed to be inside eth part > > > > > of your driver and only that part should implement SR-IOV config. Once > > > > > user enabled SR-IOV, the PCI driver should create auxiliary devices for > > > > > each VF. These device will have RDMA capabilities and it will trigger RDMA > > > > > driver to bind to them. > > > > I agree and once the PF creates the auxiliary devices for the VF, the RoCE > > > > Vf indeed get probed and created. But the twist in bnxt_en/bnxt_re > > > > design is that > > > > the RoCE driver is responsible for making adjustments to the RoCE resources. > > > > > > You can still do these adjustments by checking type of function that > > > called to RDMA .probe. PCI core exposes some functions to help distinguish between > > > PF and VFs. > > > > > > > > > > > So once the VF's are created and the bnxt_en driver enables SRIOV adjusts the > > > > NIC resources for the VF, and such, it tries to call into the bnxt_re > > > > driver for the > > > > same purpose. > > > > > > If I read code correctly, all these resources are for one PCI function. > > > > > > Something like this: > > > > > > bnxt_re_probe() > > > { > > > ... > > > if (is_virtfn(p)) > > > bnxt_re_sriov_config(p); > > > ... > > > } > > I understand what you are suggesting. > > But what I want is a way to do this in the context of the PF > > preferably before the VFs are probed. > > I don't understand the last sentence. You call to this sriov_config in > bnxt_re driver without any protection from VFs being probed, Let me elaborate - When a user sets num_vfs to a non-zero number, the PCI driver hook sriov_configure calls bnxt_sriov_configure(). Once pci_enable_sriov() succeeds, bnxt_ulp_sriov_cfg() is issued under bnxt_sriov_configure(). All this happens under bnxt_en. bnxt_ulp_sriov_cfg() ultimately calls into the bnxt_re driver. Since bnxt_sriov_configure() is called only for PFs, bnxt_ulp_sriov_cfg() is called for PFs only. Once bnxt_ulp_sriov_cfg() calls into the bnxt_re via the ulp_ops, it adjusts the QPs, SRQs, CQs, MRs, GIDs and such. > > > So we are trying to call the > > bnxt_re_sriov_config in the context of handling the PF's > > sriov_configure implementation. Having the ulp_ops is allowing us to > > avoid resource wastage and assumptions in the bnxt_re driver. > > To which resource wastage are you referring? Essentially the PF driver reserves a set of above resources for the PF, and divides the remaining resources among the VFs. If the calculation is based on sriov_totalvfs instead of sriov_numvfs, there can be a difference in the resources provisioned for a VF. And that is because a user may create a subset of VFs instead of the total VFs allowed in the PCI SR-IOV capability register. I was referring to the resource wastage in that deployment scenario. Thanks Ajit > > There are no differences if same limits will be in bnxt_en driver when > RDMA bnxt device is created or in bnxt_re which will be called once RDMA > device is created. > > Thanks > > > > > ::snip:: > >
On Mon, Nov 28, 2022 at 06:01:13PM -0800, Ajit Khaparde wrote: > On Tue, Nov 22, 2022 at 10:59 PM Leon Romanovsky <leon@kernel.org> wrote: > > > > On Tue, Nov 22, 2022 at 07:02:45AM -0800, Ajit Khaparde wrote: > > > On Wed, Nov 16, 2022 at 5:22 AM Leon Romanovsky <leon@kernel.org> wrote: > > > > > > > ::snip:: > > > > > > All PCI management logic and interfaces are needed to be inside eth part > > > > > > of your driver and only that part should implement SR-IOV config. Once > > > > > > user enabled SR-IOV, the PCI driver should create auxiliary devices for > > > > > > each VF. These device will have RDMA capabilities and it will trigger RDMA > > > > > > driver to bind to them. > > > > > I agree and once the PF creates the auxiliary devices for the VF, the RoCE > > > > > Vf indeed get probed and created. But the twist in bnxt_en/bnxt_re > > > > > design is that > > > > > the RoCE driver is responsible for making adjustments to the RoCE resources. > > > > > > > > You can still do these adjustments by checking type of function that > > > > called to RDMA .probe. PCI core exposes some functions to help distinguish between > > > > PF and VFs. > > > > > > > > > > > > > > So once the VF's are created and the bnxt_en driver enables SRIOV adjusts the > > > > > NIC resources for the VF, and such, it tries to call into the bnxt_re > > > > > driver for the > > > > > same purpose. > > > > > > > > If I read code correctly, all these resources are for one PCI function. > > > > > > > > Something like this: > > > > > > > > bnxt_re_probe() > > > > { > > > > ... > > > > if (is_virtfn(p)) > > > > bnxt_re_sriov_config(p); > > > > ... > > > > } > > > I understand what you are suggesting. > > > But what I want is a way to do this in the context of the PF > > > preferably before the VFs are probed. > > > > I don't understand the last sentence. You call to this sriov_config in > > bnxt_re driver without any protection from VFs being probed, > > Let me elaborate - > When a user sets num_vfs to a non-zero number, the PCI driver hook > sriov_configure calls bnxt_sriov_configure(). Once pci_enable_sriov() > succeeds, bnxt_ulp_sriov_cfg() is issued under bnxt_sriov_configure(). > All this happens under bnxt_en. > bnxt_ulp_sriov_cfg() ultimately calls into the bnxt_re driver. > Since bnxt_sriov_configure() is called only for PFs, bnxt_ulp_sriov_cfg() > is called for PFs only. > > Once bnxt_ulp_sriov_cfg() calls into the bnxt_re via the ulp_ops, > it adjusts the QPs, SRQs, CQs, MRs, GIDs and such. Once you called to pci_enable_sriov(), PCI core created sysfs entries and it triggers udev rules and VFs probe. Because you are calling it in bnxt_sriov_configure(), you will have inherit protection for PF with PCI lock, but not for VFs. > > > > > > So we are trying to call the > > > bnxt_re_sriov_config in the context of handling the PF's > > > sriov_configure implementation. Having the ulp_ops is allowing us to > > > avoid resource wastage and assumptions in the bnxt_re driver. > > > > To which resource wastage are you referring? > Essentially the PF driver reserves a set of above resources for the PF, > and divides the remaining resources among the VFs. > If the calculation is based on sriov_totalvfs instead of sriov_numvfs, > there can be a difference in the resources provisioned for a VF. > And that is because a user may create a subset of VFs instead of the > total VFs allowed in the PCI SR-IOV capability register. > I was referring to the resource wastage in that deployment scenario. It is ok, set all needed limits in bnxt_en. You don't need to call to bnxt_re for that. > > Thanks > Ajit > > > > > There are no differences if same limits will be in bnxt_en driver when > > RDMA bnxt device is created or in bnxt_re which will be called once RDMA > > device is created. > > > > Thanks > > > > > > > > ::snip:: > > > >
On Tue, Nov 29, 2022 at 1:13 AM Leon Romanovsky <leon@kernel.org> wrote: > > On Mon, Nov 28, 2022 at 06:01:13PM -0800, Ajit Khaparde wrote: > > On Tue, Nov 22, 2022 at 10:59 PM Leon Romanovsky <leon@kernel.org> wrote: > > > ::: snip ::: > > > > > > > > > So we are trying to call the > > > > bnxt_re_sriov_config in the context of handling the PF's > > > > sriov_configure implementation. Having the ulp_ops is allowing us to > > > > avoid resource wastage and assumptions in the bnxt_re driver. > > > > > > To which resource wastage are you referring? > > Essentially the PF driver reserves a set of above resources for the PF, > > and divides the remaining resources among the VFs. > > If the calculation is based on sriov_totalvfs instead of sriov_numvfs, > > there can be a difference in the resources provisioned for a VF. > > And that is because a user may create a subset of VFs instead of the > > total VFs allowed in the PCI SR-IOV capability register. > > I was referring to the resource wastage in that deployment scenario. > > It is ok, set all needed limits in bnxt_en. You don't need to call to > bnxt_re for that. Thanks. We have removed the sriov_config callback. I will send the fresh patchset in a few minutes. > > > > > Thanks > > Ajit > > > > > > > > There are no differences if same limits will be in bnxt_en driver when > > > RDMA bnxt device is created or in bnxt_re which will be called once RDMA > > > device is created. > > > > > > Thanks > > > > > > > > > > > ::snip:: > > > > > > > >