Message ID | 1687823827-15850-1-git-send-email-longli@linuxonhyperv.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp7834595vqr; Mon, 26 Jun 2023 17:01:52 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5/G2DeDnOabwtHtN4uck9cQSOOio/0PSs0OGk3bXsFtuRlEIkRXvMo9H6+5mQEVLfTEATq X-Received: by 2002:a05:6358:ce2a:b0:134:cd32:3565 with SMTP id gt42-20020a056358ce2a00b00134cd323565mr177204rwb.31.1687824111928; Mon, 26 Jun 2023 17:01:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687824111; cv=none; d=google.com; s=arc-20160816; b=Z6c82q69M2h3T+Cv21exl4O83c1ibpw1QhRDmAtxNWbz2zVO5AWg69N8cBKNOZj9lp ldB4HgEt6b5KhclvQEwSmGMCDZ7g5f8ifwzpdt+Y5XzThlfFMaobWdS6dz49DebysjrK jXbWPTocP5QnmPdgm0Wgbg3boLOcqhHlE0JZLDDiPiEG1GFzVcz7cWdFeMcaHdeKZaep s32XjrDcfh/jnouuh9YQG6LXXryHdREvJS6JTFHmPdkVtN0r/sU8clCOJqKqHPwxP0zK 5gG70jWxiunVDmyC4klzPhcLZoCgD6pdbN1daiskjeTX6tA1dAlaprRT1DlEpBL0y6my 31sQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature:dkim-filter; bh=fiPgyc8J5BXMwqhx//yUhPKJcnCZ0LyA5rM0uvnK2RY=; fh=dVxoVqDYtdePnaMQdSRy1n7NK50htmcJ3Osj1EbtXE8=; b=vwSaG0BGAqYhfCy3wlUZDDiCHZo4gcZnKqP+duCPgT6HbBDXuavnacR9mCowAmAhGF pB+0rI2EYeR3BdWEm1FsqrMbM/GmZc9mO8tlcXQeVTA20OKPfOlaHaXBEU7M2Bsae2E2 E72SC757XG5sDHEhBg3KxFmHz5gXzLZOQbzupX/SyMsb9TSmzvDaRvG8sejjtPxBX/ZH 1kOf6HgjI0djvP+ypddP1toXNpkdo9Cdtr6wkKMnkS3VcEz20f2ui4KZsl/U5hAUY6oz 2Fb9X7dpT4bTGe7TTLn/A3lq0IJZ5/xLSszwPAxZfeUKrUQZXMpLuH17Sqef4DydREQX v6aQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxonhyperv.com header.s=default header.b=hwuzGsd2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxonhyperv.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n124-20020a632782000000b0051f869d7089si6285580pgn.74.2023.06.26.17.01.39; Mon, 26 Jun 2023 17:01:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxonhyperv.com header.s=default header.b=hwuzGsd2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxonhyperv.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229637AbjFZX5O (ORCPT <rfc822;filip.gregor98@gmail.com> + 99 others); Mon, 26 Jun 2023 19:57:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229459AbjFZX5N (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 26 Jun 2023 19:57:13 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2FB90E5A; Mon, 26 Jun 2023 16:57:12 -0700 (PDT) Received: by linux.microsoft.com (Postfix, from userid 1004) id 5992020553EC; Mon, 26 Jun 2023 16:57:11 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 5992020553EC DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxonhyperv.com; s=default; t=1687823831; bh=fiPgyc8J5BXMwqhx//yUhPKJcnCZ0LyA5rM0uvnK2RY=; h=From:To:Cc:Subject:Date:From; b=hwuzGsd2Oc4je6vn0zd2blhZI89dhXLSyTpRe5TdImHU+QimtplZLUomb62Ga1ozO yyl/HRhtzwXuY0YUNAsVfgWjMCkuZYyzlxUFAzpWOVq7fD/gHqDcDxjP99XqwqP4eV 3G7iIQ3K0o0LhNNMULVp2YA0FtGWVRutI8KDUZCY= From: longli@linuxonhyperv.com To: Jason Gunthorpe <jgg@ziepe.ca>, Leon Romanovsky <leon@kernel.org>, Ajay Sharma <sharmaajay@microsoft.com>, Dexuan Cui <decui@microsoft.com>, "K. Y. Srinivasan" <kys@microsoft.com>, Haiyang Zhang <haiyangz@microsoft.com>, Wei Liu <wei.liu@kernel.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Long Li <longli@microsoft.com>, stable@vger.kernel.org Subject: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets Date: Mon, 26 Jun 2023 16:57:07 -0700 Message-Id: <1687823827-15850-1-git-send-email-longli@linuxonhyperv.com> X-Mailer: git-send-email 1.8.3.1 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769811855704104578?= X-GMAIL-MSGID: =?utf-8?q?1769811855704104578?= |
Series |
[v3] net: mana: Batch ringing RX queue doorbell on receiving packets
|
|
Commit Message
longli@linuxonhyperv.com
June 26, 2023, 11:57 p.m. UTC
From: Long Li <longli@microsoft.com> It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance. Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue during a callback from napi_poll(). With this change, tests showed an improvement from 120G/s to 160G/s on a 200G physical link, with 16 or 32 hardware queues. Tests showed no regression in network latency benchmarks on single connection. While we are making changes in this code path, change the code for ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The hardware specification specifies that it should set to 0. Although currently the hardware doesn't enforce the check, in the future releases it may do. Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> --- Change log: v2: Check for comp_read > 0 as it might be negative on completion error. Set rq.wqe_cnt to 0 according to BNIC spec. v3: Add details in the commit on the reason of performance increase and test numbers. Add details in the commit on why rq.wqe_cnt should be set to 0 according to hardware spec. Add "Reviewed-by" from Haiyang and Dexuan. drivers/net/ethernet/microsoft/mana/gdma_main.c | 5 ++++- drivers/net/ethernet/microsoft/mana/mana_en.c | 10 ++++++++-- 2 files changed, 12 insertions(+), 3 deletions(-)
Comments
On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote: > From: Long Li <longli@microsoft.com> > > It's inefficient to ring the doorbell page every time a WQE is posted to > the received queue. Excessive MMIO writes result in CPU spending more > time waiting on LOCK instructions (atomic operations), resulting in > poor scaling performance. > > Move the code for ringing doorbell page to where after we have posted all > WQEs to the receive queue during a callback from napi_poll(). > > With this change, tests showed an improvement from 120G/s to 160G/s on a > 200G physical link, with 16 or 32 hardware queues. > > Tests showed no regression in network latency benchmarks on single > connection. > > While we are making changes in this code path, change the code for > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > hardware specification specifies that it should set to 0. Although > currently the hardware doesn't enforce the check, in the future releases > it may do. > > Cc: stable@vger.kernel.org > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Uhmmm... this looks like a performance improvement to me, more suitable for the net-next tree ?!? (Note that net-next is closed now). In any case you must avoid empty lines in the tag area. If you really intend targeting the -net tree, please repost fixing the above and explicitly specifying the target tree in the subj prefix. thanks! Paolo
On Thu, 29 Jun 2023 10:42:34 +0200 Paolo Abeni wrote: > > While we are making changes in this code path, change the code for > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > > hardware specification specifies that it should set to 0. Although > > currently the hardware doesn't enforce the check, in the future releases > > it may do. And please split this cleanup into a separate patch, it doesn't sound like it has to be done as part of the optimization.
> -----Original Message----- > From: Paolo Abeni <pabeni@redhat.com> > Sent: Thursday, June 29, 2023 4:43 AM > To: longli@linuxonhyperv.com; Jason Gunthorpe <jgg@ziepe.ca>; Leon > Romanovsky <leon@kernel.org>; Ajay Sharma <sharmaajay@microsoft.com>; > Dexuan Cui <decui@microsoft.com>; KY Srinivasan <kys@microsoft.com>; > Haiyang Zhang <haiyangz@microsoft.com>; Wei Liu <wei.liu@kernel.org>; > David S. Miller <davem@davemloft.net>; Eric Dumazet > <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org> > Cc: linux-rdma@vger.kernel.org; linux-hyperv@vger.kernel.org; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Long Li > <longli@microsoft.com>; stable@vger.kernel.org > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving > packets > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote: > > From: Long Li <longli@microsoft.com> > > > > It's inefficient to ring the doorbell page every time a WQE is posted to > > the received queue. Excessive MMIO writes result in CPU spending more > > time waiting on LOCK instructions (atomic operations), resulting in > > poor scaling performance. > > > > Move the code for ringing doorbell page to where after we have posted all > > WQEs to the receive queue during a callback from napi_poll(). > > > > With this change, tests showed an improvement from 120G/s to 160G/s on a > > 200G physical link, with 16 or 32 hardware queues. > > > > Tests showed no regression in network latency benchmarks on single > > connection. > > > > While we are making changes in this code path, change the code for > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > > hardware specification specifies that it should set to 0. Although > > currently the hardware doesn't enforce the check, in the future releases > > it may do. > > > > Cc: stable@vger.kernel.org > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network > Adapter (MANA)") > > Uhmmm... this looks like a performance improvement to me, more suitable > for the net-next tree ?!? (Note that net-next is closed now). This web page shows the net-next is "open": http://vger.kernel.org/~davem/net-next.html Is this still the right place to check net-next status? - Haiyang
On Thu, 29 Jun 2023 16:53:42 +0000 Haiyang Zhang wrote: > This web page shows the net-next is "open": > http://vger.kernel.org/~davem/net-next.html > > Is this still the right place to check net-next status? We're working on fixing it. Unfortunately it's a private page and most of the netdev maintainers don't have access to changing it :(
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving > packets > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote: > > From: Long Li <longli@microsoft.com> > > > > It's inefficient to ring the doorbell page every time a WQE is posted > > to the received queue. Excessive MMIO writes result in CPU spending > > more time waiting on LOCK instructions (atomic operations), resulting > > in poor scaling performance. > > > > Move the code for ringing doorbell page to where after we have posted > > all WQEs to the receive queue during a callback from napi_poll(). > > > > With this change, tests showed an improvement from 120G/s to 160G/s on > > a 200G physical link, with 16 or 32 hardware queues. > > > > Tests showed no regression in network latency benchmarks on single > > connection. > > > > While we are making changes in this code path, change the code for > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > > hardware specification specifies that it should set to 0. Although > > currently the hardware doesn't enforce the check, in the future > > releases it may do. > > > > Cc: stable@vger.kernel.org > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure > > Network Adapter (MANA)") > > Uhmmm... this looks like a performance improvement to me, more suitable for > the net-next tree ?!? (Note that net-next is closed now). This issue is a blocker for usage on 200G physical link. I think it can be categorized as a fix. > > In any case you must avoid empty lines in the tag area. > > If you really intend targeting the -net tree, please repost fixing the above and > explicitly specifying the target tree in the subj prefix. Will do, thank you. Long > > thanks! > > Paolo
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving > packets > > On Thu, 29 Jun 2023 10:42:34 +0200 Paolo Abeni wrote: > > > While we are making changes in this code path, change the code for > > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > > > hardware specification specifies that it should set to 0. Although > > > currently the hardware doesn't enforce the check, in the future > > > releases it may do. > > And please split this cleanup into a separate patch, it doesn't sound like it has to > be done as part of the optimization. Will do, thanks. Long
On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote: > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell > > on receiving > > packets > > > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote: > > > From: Long Li <longli@microsoft.com> > > > > > > It's inefficient to ring the doorbell page every time a WQE is > > > posted > > > to the received queue. Excessive MMIO writes result in CPU > > > spending > > > more time waiting on LOCK instructions (atomic operations), > > > resulting > > > in poor scaling performance. > > > > > > Move the code for ringing doorbell page to where after we have > > > posted > > > all WQEs to the receive queue during a callback from napi_poll(). > > > > > > With this change, tests showed an improvement from 120G/s to > > > 160G/s on > > > a 200G physical link, with 16 or 32 hardware queues. > > > > > > Tests showed no regression in network latency benchmarks on > > > single > > > connection. > > > > > > While we are making changes in this code path, change the code > > > for > > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > > > hardware specification specifies that it should set to 0. > > > Although > > > currently the hardware doesn't enforce the check, in the future > > > releases it may do. > > > > > > Cc: stable@vger.kernel.org > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure > > > Network Adapter (MANA)") > > > > Uhmmm... this looks like a performance improvement to me, more > > suitable for > > the net-next tree ?!? (Note that net-next is closed now). > > This issue is a blocker for usage on 200G physical link. I think it > can be categorized as a fix. Let me ask the question the other way around: is there any specific reason to have this fix into 6.5 and all the way back to 5.13? Especially the latest bit (CC-ing stable) looks at least debatable. Thanks, Paolo
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on > receiving packets > > On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote: > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell > > > on receiving packets > > > > > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote: > > > > From: Long Li <longli@microsoft.com> > > > > > > > > It's inefficient to ring the doorbell page every time a WQE is > > > > posted to the received queue. Excessive MMIO writes result in CPU > > > > spending more time waiting on LOCK instructions (atomic > > > > operations), resulting in poor scaling performance. > > > > > > > > Move the code for ringing doorbell page to where after we have > > > > posted all WQEs to the receive queue during a callback from > > > > napi_poll(). > > > > > > > > With this change, tests showed an improvement from 120G/s to > > > > 160G/s on a 200G physical link, with 16 or 32 hardware queues. > > > > > > > > Tests showed no regression in network latency benchmarks on single > > > > connection. > > > > > > > > While we are making changes in this code path, change the code for > > > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > > > > hardware specification specifies that it should set to 0. > > > > Although > > > > currently the hardware doesn't enforce the check, in the future > > > > releases it may do. > > > > > > > > Cc: stable@vger.kernel.org > > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure > > > > Network Adapter (MANA)") > > > > > > Uhmmm... this looks like a performance improvement to me, more > > > suitable for the net-next tree ?!? (Note that net-next is closed > > > now). > > > > This issue is a blocker for usage on 200G physical link. I think it > > can be categorized as a fix. > > Let me ask the question the other way around: is there any specific reason to > have this fix into 6.5 and all the way back to 5.13? > Especially the latest bit (CC-ing stable) looks at least debatable. There are many deployed Linux distributions with MANA driver on kernel 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve the performance target. Thanks, Long
On Fri, Jun 30, 2023 at 05:31:48PM +0000, Long Li wrote: > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on > > receiving packets > > > > On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote: > > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell > > > > on receiving packets > > > > > > > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote: > > > > > From: Long Li <longli@microsoft.com> > > > > > > > > > > It's inefficient to ring the doorbell page every time a WQE is > > > > > posted to the received queue. Excessive MMIO writes result in CPU > > > > > spending more time waiting on LOCK instructions (atomic > > > > > operations), resulting in poor scaling performance. > > > > > > > > > > Move the code for ringing doorbell page to where after we have > > > > > posted all WQEs to the receive queue during a callback from > > > > > napi_poll(). > > > > > > > > > > With this change, tests showed an improvement from 120G/s to > > > > > 160G/s on a 200G physical link, with 16 or 32 hardware queues. > > > > > > > > > > Tests showed no regression in network latency benchmarks on single > > > > > connection. > > > > > > > > > > While we are making changes in this code path, change the code for > > > > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > > > > > hardware specification specifies that it should set to 0. > > > > > Although > > > > > currently the hardware doesn't enforce the check, in the future > > > > > releases it may do. > > > > > > > > > > Cc: stable@vger.kernel.org > > > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure > > > > > Network Adapter (MANA)") > > > > > > > > Uhmmm... this looks like a performance improvement to me, more > > > > suitable for the net-next tree ?!? (Note that net-next is closed > > > > now). > > > > > > This issue is a blocker for usage on 200G physical link. I think it > > > can be categorized as a fix. > > > > Let me ask the question the other way around: is there any specific reason to > > have this fix into 6.5 and all the way back to 5.13? > > Especially the latest bit (CC-ing stable) looks at least debatable. > > There are many deployed Linux distributions with MANA driver on kernel 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve the performance target. Why can't they be upgraded to get that performance target, and all the other goodness that those kernels have? We don't normally backport new features, right? thanks, greg k-h
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on > receiving packets > > On Fri, Jun 30, 2023 at 05:31:48PM +0000, Long Li wrote: > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell > > > on receiving packets > > > > > > On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote: > > > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue > > > > > doorbell on receiving packets > > > > > > > > > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com > wrote: > > > > > > From: Long Li <longli@microsoft.com> > > > > > > > > > > > > It's inefficient to ring the doorbell page every time a WQE is > > > > > > posted to the received queue. Excessive MMIO writes result in > > > > > > CPU spending more time waiting on LOCK instructions (atomic > > > > > > operations), resulting in poor scaling performance. > > > > > > > > > > > > Move the code for ringing doorbell page to where after we have > > > > > > posted all WQEs to the receive queue during a callback from > > > > > > napi_poll(). > > > > > > > > > > > > With this change, tests showed an improvement from 120G/s to > > > > > > 160G/s on a 200G physical link, with 16 or 32 hardware queues. > > > > > > > > > > > > Tests showed no regression in network latency benchmarks on > > > > > > single connection. > > > > > > > > > > > > While we are making changes in this code path, change the code > > > > > > for ringing doorbell to set the WQE_COUNT to 0 for Receive > > > > > > Queue. The hardware specification specifies that it should set to 0. > > > > > > Although > > > > > > currently the hardware doesn't enforce the check, in the > > > > > > future releases it may do. > > > > > > > > > > > > Cc: stable@vger.kernel.org > > > > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft > > > > > > Azure Network Adapter (MANA)") > > > > > > > > > > Uhmmm... this looks like a performance improvement to me, more > > > > > suitable for the net-next tree ?!? (Note that net-next is closed > > > > > now). > > > > > > > > This issue is a blocker for usage on 200G physical link. I think > > > > it can be categorized as a fix. > > > > > > Let me ask the question the other way around: is there any specific > > > reason to have this fix into 6.5 and all the way back to 5.13? > > > Especially the latest bit (CC-ing stable) looks at least debatable. > > > > There are many deployed Linux distributions with MANA driver on kernel > 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve > the performance target. > > Why can't they be upgraded to get that performance target, and all the other > goodness that those kernels have? We don't normally backport new features, > right? I think this should be considered as a fix, not a new feature. MANA is designed to be 200GB full duplex at the start. Due to lack of hardware testing capability at early stage of the project, we could only test 100GB for the Linux driver. When hardware is fully capable of reaching designed spec, this bug in the Linux driver shows up. Thanks, Long > > thanks, > > greg k-h
On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote: > > > 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve > > > the performance target. > > > > Why can't they be upgraded to get that performance target, and all the other > > goodness that those kernels have? We don't normally backport new features, > > right? > > I think this should be considered as a fix, not a new feature. > > MANA is designed to be 200GB full duplex at the start. Due to lack of > hardware testing capability at early stage of the project, we could only test 100GB > for the Linux driver. When hardware is fully capable of reaching designed spec, > this bug in the Linux driver shows up. That part we understand. If I were you I'd try to convince Greg and Paolo that the change is small and significant for user experience. And answer Greg's question why upgrading the kernel past 6.1 is a challenge in your environment.
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving > packets > > On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote: > > > > 5.15 and kernel 6.1. (those kernels are longterm) They need this > > > > fix to achieve the performance target. > > > > > > Why can't they be upgraded to get that performance target, and all > > > the other goodness that those kernels have? We don't normally > > > backport new features, right? > > > > I think this should be considered as a fix, not a new feature. > > > > MANA is designed to be 200GB full duplex at the start. Due to lack of > > hardware testing capability at early stage of the project, we could > > only test 100GB for the Linux driver. When hardware is fully capable > > of reaching designed spec, this bug in the Linux driver shows up. > > That part we understand. > > If I were you I'd try to convince Greg and Paolo that the change is small and > significant for user experience. And answer Greg's question why upgrading the > kernel past 6.1 is a challenge in your environment. I was under the impression that this patch was considered to be a feature, not a bug fix. I was trying to justify that the "Fixes:" tag was needed. I apologize for misunderstanding this. Without this fix, it's not possible to run a typical workload designed for 200Gb physical link speed. We see a large number of customers and Linux distributions committed on 5.15 and 6.1 kernels. They planned the product cycles and certification processes around these longterm kernel versions. It's difficult for them to upgrade to newer kernel versions. Thanks, Long
On Sun, 2023-07-02 at 20:18 +0000, Long Li wrote: > > > > > > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX > > > > > > > > queue > > > > > > > > doorbell > > > > > > > > on receiving > > > > > > > > packets > > > > > > > > > > > > > > > > On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote: > > > > > > > > > > > > > > > > > > > > 5.15 and kernel 6.1. (those > > > > > > > > > > > > > > > > > > > > kernels are longterm) > > > > > > > > > > > > > > > > > > > > They need > > > > > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > > > fix to achieve the performance > > > > > > > > > > > > > > > > > > > > target. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Why can't they be upgraded to get that > > > > > > > > > > > > > > > > performance > > > > > > > > > > > > > > > > target, and > > > > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > the other goodness that those kernels > > > > > > > > > > > > > > > > have? We don't > > > > > > > > > > > > > > > > normally > > > > > > > > > > > > > > > > backport new features, right? > > > > > > > > > > > > > > > > > > > > > > > > I think this should be considered as a fix, not > > > > > > > > > > > > a new > > > > > > > > > > > > feature. > > > > > > > > > > > > > > > > > > > > > > > > MANA is designed to be 200GB full duplex at the > > > > > > > > > > > > start. Due > > > > > > > > > > > > to > > > > > > > > > > > > lack of > > > > > > > > > > > > hardware testing capability at early stage of > > > > > > > > > > > > the project, > > > > > > > > > > > > we > > > > > > > > > > > > could > > > > > > > > > > > > only test 100GB for the Linux driver. When > > > > > > > > > > > > hardware is > > > > > > > > > > > > fully > > > > > > > > > > > > capable > > > > > > > > > > > > of reaching designed spec, this bug in the > > > > > > > > > > > > Linux driver > > > > > > > > > > > > shows up. > > > > > > > > > > > > > > > > That part we understand. > > > > > > > > > > > > > > > > If I were you I'd try to convince Greg and Paolo that > > > > > > > > the > > > > > > > > change is > > > > > > > > small and > > > > > > > > significant for user experience. And answer Greg's > > > > > > > > question why > > > > > > > > upgrading the > > > > > > > > kernel past 6.1 is a challenge in your environment. > > > > > > > > I was under the impression that this patch was considered to be > > > > a > > > > feature, > > > > not a bug fix. I was trying to justify that the "Fixes:" tag > > > > was > > > > needed. > > > > > > > > I apologize for misunderstanding this. > > > > > > > > Without this fix, it's not possible to run a typical workload > > > > designed for 200Gb > > > > physical link speed. > > > > > > > > We see a large number of customers and Linux distributions > > > > committed > > > > on 5.15 > > > > and 6.1 kernels. They planned the product cycles and > > > > certification > > > > processes > > > > around these longterm kernel versions. It's difficult for them > > > > to > > > > upgrade to newer > > > > kernel versions. I think there are some misunderstanding WRT distros and stable kernels. (Commercial) distros will backport the patch as needed, regardless such patch landing in the 5.15 upstream tree or not. Individual users running their own vanilla 5.15 kernel can't expect performance improvement landing there. All in all I feel undecided. I would endorse this change going trough net-next (without the stable tag). I would feel less torn with this change targeting -net without the stable tag. Targeting -net with the stable tag sounds a bit too much to me. Cheers, Paolo
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving > packets > > On Sun, 2023-07-02 at 20:18 +0000, Long Li wrote: > > > > > > > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX > > > > > > > > > queue doorbell on receiving packets > > > > > > > > > > > > > > > > > > On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote: > > > > > > > > > > > > > > > > > > > > > 5.15 and kernel 6.1. (those > > > > > > > > > > > > > > > > > > > > > kernels are longterm) They need > > > > > > > > > > > > > > > > > > > > > this fix to achieve the > > > > > > > > > > > > > > > > > > > > > performance target. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Why can't they be upgraded to get that > > > > > > > > > > > > > > > > > performance target, and all the other > > > > > > > > > > > > > > > > > goodness that those kernels have? We > > > > > > > > > > > > > > > > > don't normally backport new features, > > > > > > > > > > > > > > > > > right? > > > > > > > > > > > > > > > > > > > > > > > > > > I think this should be considered as a fix, not > > > > > > > > > > > > > a new feature. > > > > > > > > > > > > > > > > > > > > > > > > > > MANA is designed to be 200GB full duplex at the > > > > > > > > > > > > > start. Due to lack of hardware testing > > > > > > > > > > > > > capability at early stage of the project, we > > > > > > > > > > > > > could only test 100GB for the Linux driver. When > > > > > > > > > > > > > hardware is fully capable of reaching designed > > > > > > > > > > > > > spec, this bug in the Linux driver shows up. > > > > > > > > > > > > > > > > > > That part we understand. > > > > > > > > > > > > > > > > > > If I were you I'd try to convince Greg and Paolo that > > > > > > > > > the change is small and significant for user experience. > > > > > > > > > And answer Greg's question why upgrading the kernel past > > > > > > > > > 6.1 is a challenge in your environment. > > > > > > > > > > I was under the impression that this patch was considered to be > > > > > a feature, not a bug fix. I was trying to justify that the > > > > > "Fixes:" tag was needed. > > > > > > > > > > I apologize for misunderstanding this. > > > > > > > > > > Without this fix, it's not possible to run a typical workload > > > > > designed for 200Gb physical link speed. > > > > > > > > > > We see a large number of customers and Linux distributions > > > > > committed on 5.15 and 6.1 kernels. They planned the product > > > > > cycles and certification processes around these longterm kernel > > > > > versions. It's difficult for them to upgrade to newer kernel > > > > > versions. > > I think there are some misunderstanding WRT distros and stable kernels. > (Commercial) distros will backport the patch as needed, regardless such patch > landing in the 5.15 upstream tree or not. Individual users running their own > vanilla 5.15 kernel can't expect performance improvement landing there. > > All in all I feel undecided. I would endorse this change going trough net-next > (without the stable tag). I would feel less torn with this change targeting -net > without the stable tag. Targeting -net with the stable tag sounds a bit too much to > me. > > Cheers, > Paolo I'm sending this patch to net-next without stable tag. Thanks, Long
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c index 8f3f78b68592..3765d3389a9a 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -300,8 +300,11 @@ static void mana_gd_ring_doorbell(struct gdma_context *gc, u32 db_index, void mana_gd_wq_ring_doorbell(struct gdma_context *gc, struct gdma_queue *queue) { + /* Hardware Spec specifies that software client should set 0 for + * wqe_cnt for Receive Queues. This value is not used in Send Queues. + */ mana_gd_ring_doorbell(gc, queue->gdma_dev->doorbell, queue->type, - queue->id, queue->head * GDMA_WQE_BU_SIZE, 1); + queue->id, queue->head * GDMA_WQE_BU_SIZE, 0); } void mana_gd_ring_cq(struct gdma_queue *cq, u8 arm_bit) diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index cd4d5ceb9f2d..1d8abe63fcb8 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -1383,8 +1383,8 @@ static void mana_post_pkt_rxq(struct mana_rxq *rxq) recv_buf_oob = &rxq->rx_oobs[curr_index]; - err = mana_gd_post_and_ring(rxq->gdma_rq, &recv_buf_oob->wqe_req, - &recv_buf_oob->wqe_inf); + err = mana_gd_post_work_request(rxq->gdma_rq, &recv_buf_oob->wqe_req, + &recv_buf_oob->wqe_inf); if (WARN_ON_ONCE(err)) return; @@ -1654,6 +1654,12 @@ static void mana_poll_rx_cq(struct mana_cq *cq) mana_process_rx_cqe(rxq, cq, &comp[i]); } + if (comp_read > 0) { + struct gdma_context *gc = rxq->gdma_rq->gdma_dev->gdma_context; + + mana_gd_wq_ring_doorbell(gc, rxq->gdma_rq); + } + if (rxq->xdp_flush) xdp_do_flush(); }