[v3] net: mana: Batch ringing RX queue doorbell on receiving packets

Message ID 1687823827-15850-1-git-send-email-longli@linuxonhyperv.com
State New
Headers
Series [v3] net: mana: Batch ringing RX queue doorbell on receiving packets |

Commit Message

longli@linuxonhyperv.com June 26, 2023, 11:57 p.m. UTC
  From: Long Li <longli@microsoft.com>

It's inefficient to ring the doorbell page every time a WQE is posted to
the received queue. Excessive MMIO writes result in CPU spending more
time waiting on LOCK instructions (atomic operations), resulting in
poor scaling performance.

Move the code for ringing doorbell page to where after we have posted all
WQEs to the receive queue during a callback from napi_poll().

With this change, tests showed an improvement from 120G/s to 160G/s on a
200G physical link, with 16 or 32 hardware queues.

Tests showed no regression in network latency benchmarks on single
connection.

While we are making changes in this code path, change the code for
ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The
hardware specification specifies that it should set to 0. Although
currently the hardware doesn't enforce the check, in the future releases
it may do.

Cc: stable@vger.kernel.org
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
---
Change log:
v2:
Check for comp_read > 0 as it might be negative on completion error.
Set rq.wqe_cnt to 0 according to BNIC spec.

v3:
Add details in the commit on the reason of performance increase and test numbers.
Add details in the commit on why rq.wqe_cnt should be set to 0 according to hardware spec.
Add "Reviewed-by" from Haiyang and Dexuan.

 drivers/net/ethernet/microsoft/mana/gdma_main.c |  5 ++++-
 drivers/net/ethernet/microsoft/mana/mana_en.c   | 10 ++++++++--
 2 files changed, 12 insertions(+), 3 deletions(-)
  

Comments

Paolo Abeni June 29, 2023, 8:42 a.m. UTC | #1
On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
> From: Long Li <longli@microsoft.com>
> 
> It's inefficient to ring the doorbell page every time a WQE is posted to
> the received queue. Excessive MMIO writes result in CPU spending more
> time waiting on LOCK instructions (atomic operations), resulting in
> poor scaling performance.
> 
> Move the code for ringing doorbell page to where after we have posted all
> WQEs to the receive queue during a callback from napi_poll().
> 
> With this change, tests showed an improvement from 120G/s to 160G/s on a
> 200G physical link, with 16 or 32 hardware queues.
> 
> Tests showed no regression in network latency benchmarks on single
> connection.
> 
> While we are making changes in this code path, change the code for
> ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The
> hardware specification specifies that it should set to 0. Although
> currently the hardware doesn't enforce the check, in the future releases
> it may do.
> 
> Cc: stable@vger.kernel.org
> Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")

Uhmmm... this looks like a performance improvement to me, more suitable
for the net-next tree ?!? (Note that net-next is closed now).

In any case you must avoid empty lines in the tag area.

If you really intend targeting the -net tree, please repost fixing the
above and explicitly specifying the target tree in the subj prefix.

thanks!

Paolo
  
Jakub Kicinski June 29, 2023, 4:11 p.m. UTC | #2
On Thu, 29 Jun 2023 10:42:34 +0200 Paolo Abeni wrote:
> > While we are making changes in this code path, change the code for
> > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The
> > hardware specification specifies that it should set to 0. Although
> > currently the hardware doesn't enforce the check, in the future releases
> > it may do.

And please split this cleanup into a separate patch, it doesn't sound
like it has to be done as part of the optimization.
  
Haiyang Zhang June 29, 2023, 4:53 p.m. UTC | #3
> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Thursday, June 29, 2023 4:43 AM
> To: longli@linuxonhyperv.com; Jason Gunthorpe <jgg@ziepe.ca>; Leon
> Romanovsky <leon@kernel.org>; Ajay Sharma <sharmaajay@microsoft.com>;
> Dexuan Cui <decui@microsoft.com>; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Wei Liu <wei.liu@kernel.org>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>
> Cc: linux-rdma@vger.kernel.org; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Long Li
> <longli@microsoft.com>; stable@vger.kernel.org
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving
> packets
> 
> On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > It's inefficient to ring the doorbell page every time a WQE is posted to
> > the received queue. Excessive MMIO writes result in CPU spending more
> > time waiting on LOCK instructions (atomic operations), resulting in
> > poor scaling performance.
> >
> > Move the code for ringing doorbell page to where after we have posted all
> > WQEs to the receive queue during a callback from napi_poll().
> >
> > With this change, tests showed an improvement from 120G/s to 160G/s on a
> > 200G physical link, with 16 or 32 hardware queues.
> >
> > Tests showed no regression in network latency benchmarks on single
> > connection.
> >
> > While we are making changes in this code path, change the code for
> > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The
> > hardware specification specifies that it should set to 0. Although
> > currently the hardware doesn't enforce the check, in the future releases
> > it may do.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network
> Adapter (MANA)")
> 
> Uhmmm... this looks like a performance improvement to me, more suitable
> for the net-next tree ?!? (Note that net-next is closed now).

This web page shows the net-next is "open":
http://vger.kernel.org/~davem/net-next.html

Is this still the right place to check net-next status?

- Haiyang
  
Jakub Kicinski June 29, 2023, 5:06 p.m. UTC | #4
On Thu, 29 Jun 2023 16:53:42 +0000 Haiyang Zhang wrote:
> This web page shows the net-next is "open":
> http://vger.kernel.org/~davem/net-next.html
> 
> Is this still the right place to check net-next status?

We're working on fixing it. Unfortunately it's a private page and most
of the netdev maintainers don't have access to changing it :(
  
Long Li June 29, 2023, 6:18 p.m. UTC | #5
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving
> packets
> 
> On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > It's inefficient to ring the doorbell page every time a WQE is posted
> > to the received queue. Excessive MMIO writes result in CPU spending
> > more time waiting on LOCK instructions (atomic operations), resulting
> > in poor scaling performance.
> >
> > Move the code for ringing doorbell page to where after we have posted
> > all WQEs to the receive queue during a callback from napi_poll().
> >
> > With this change, tests showed an improvement from 120G/s to 160G/s on
> > a 200G physical link, with 16 or 32 hardware queues.
> >
> > Tests showed no regression in network latency benchmarks on single
> > connection.
> >
> > While we are making changes in this code path, change the code for
> > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The
> > hardware specification specifies that it should set to 0. Although
> > currently the hardware doesn't enforce the check, in the future
> > releases it may do.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure
> > Network Adapter (MANA)")
> 
> Uhmmm... this looks like a performance improvement to me, more suitable for
> the net-next tree ?!? (Note that net-next is closed now).

This issue is a blocker for usage on 200G physical link. I think it can be categorized as a fix.

> 
> In any case you must avoid empty lines in the tag area.
> 
> If you really intend targeting the -net tree, please repost fixing the above and
> explicitly specifying the target tree in the subj prefix.

Will do, thank you.

Long

> 
> thanks!
> 
> Paolo
  
Long Li June 29, 2023, 6:19 p.m. UTC | #6
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving
> packets
> 
> On Thu, 29 Jun 2023 10:42:34 +0200 Paolo Abeni wrote:
> > > While we are making changes in this code path, change the code for
> > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The
> > > hardware specification specifies that it should set to 0. Although
> > > currently the hardware doesn't enforce the check, in the future
> > > releases it may do.
> 
> And please split this cleanup into a separate patch, it doesn't sound like it has to
> be done as part of the optimization.

Will do, thanks.

Long
  
Paolo Abeni June 30, 2023, 10:36 a.m. UTC | #7
On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote:
> > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell
> > on receiving
> > packets
> > 
> > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
> > > From: Long Li <longli@microsoft.com>
> > > 
> > > It's inefficient to ring the doorbell page every time a WQE is
> > > posted
> > > to the received queue. Excessive MMIO writes result in CPU
> > > spending
> > > more time waiting on LOCK instructions (atomic operations),
> > > resulting
> > > in poor scaling performance.
> > > 
> > > Move the code for ringing doorbell page to where after we have
> > > posted
> > > all WQEs to the receive queue during a callback from napi_poll().
> > > 
> > > With this change, tests showed an improvement from 120G/s to
> > > 160G/s on
> > > a 200G physical link, with 16 or 32 hardware queues.
> > > 
> > > Tests showed no regression in network latency benchmarks on
> > > single
> > > connection.
> > > 
> > > While we are making changes in this code path, change the code
> > > for
> > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The
> > > hardware specification specifies that it should set to 0.
> > > Although
> > > currently the hardware doesn't enforce the check, in the future
> > > releases it may do.
> > > 
> > > Cc: stable@vger.kernel.org
> > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure
> > > Network Adapter (MANA)")
> > 
> > Uhmmm... this looks like a performance improvement to me, more
> > suitable for
> > the net-next tree ?!? (Note that net-next is closed now).
> 
> This issue is a blocker for usage on 200G physical link. I think it
> can be categorized as a fix.

Let me ask the question the other way around: is there any specific
reason to have this fix into 6.5 and all the way back to 5.13?
Especially the latest bit (CC-ing stable) looks at least debatable.

Thanks,

Paolo
  
Long Li June 30, 2023, 5:31 p.m. UTC | #8
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on
> receiving packets
> 
> On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote:
> > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell
> > > on receiving packets
> > >
> > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
> > > > From: Long Li <longli@microsoft.com>
> > > >
> > > > It's inefficient to ring the doorbell page every time a WQE is
> > > > posted to the received queue. Excessive MMIO writes result in CPU
> > > > spending more time waiting on LOCK instructions (atomic
> > > > operations), resulting in poor scaling performance.
> > > >
> > > > Move the code for ringing doorbell page to where after we have
> > > > posted all WQEs to the receive queue during a callback from
> > > > napi_poll().
> > > >
> > > > With this change, tests showed an improvement from 120G/s to
> > > > 160G/s on a 200G physical link, with 16 or 32 hardware queues.
> > > >
> > > > Tests showed no regression in network latency benchmarks on single
> > > > connection.
> > > >
> > > > While we are making changes in this code path, change the code for
> > > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The
> > > > hardware specification specifies that it should set to 0.
> > > > Although
> > > > currently the hardware doesn't enforce the check, in the future
> > > > releases it may do.
> > > >
> > > > Cc: stable@vger.kernel.org
> > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure
> > > > Network Adapter (MANA)")
> > >
> > > Uhmmm... this looks like a performance improvement to me, more
> > > suitable for the net-next tree ?!? (Note that net-next is closed
> > > now).
> >
> > This issue is a blocker for usage on 200G physical link. I think it
> > can be categorized as a fix.
> 
> Let me ask the question the other way around: is there any specific reason to
> have this fix into 6.5 and all the way back to 5.13?
> Especially the latest bit (CC-ing stable) looks at least debatable.

There are many deployed Linux distributions with MANA driver on kernel 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve the performance target.

Thanks,

Long
  
Greg KH June 30, 2023, 7:46 p.m. UTC | #9
On Fri, Jun 30, 2023 at 05:31:48PM +0000, Long Li wrote:
> > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on
> > receiving packets
> > 
> > On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote:
> > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell
> > > > on receiving packets
> > > >
> > > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
> > > > > From: Long Li <longli@microsoft.com>
> > > > >
> > > > > It's inefficient to ring the doorbell page every time a WQE is
> > > > > posted to the received queue. Excessive MMIO writes result in CPU
> > > > > spending more time waiting on LOCK instructions (atomic
> > > > > operations), resulting in poor scaling performance.
> > > > >
> > > > > Move the code for ringing doorbell page to where after we have
> > > > > posted all WQEs to the receive queue during a callback from
> > > > > napi_poll().
> > > > >
> > > > > With this change, tests showed an improvement from 120G/s to
> > > > > 160G/s on a 200G physical link, with 16 or 32 hardware queues.
> > > > >
> > > > > Tests showed no regression in network latency benchmarks on single
> > > > > connection.
> > > > >
> > > > > While we are making changes in this code path, change the code for
> > > > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The
> > > > > hardware specification specifies that it should set to 0.
> > > > > Although
> > > > > currently the hardware doesn't enforce the check, in the future
> > > > > releases it may do.
> > > > >
> > > > > Cc: stable@vger.kernel.org
> > > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure
> > > > > Network Adapter (MANA)")
> > > >
> > > > Uhmmm... this looks like a performance improvement to me, more
> > > > suitable for the net-next tree ?!? (Note that net-next is closed
> > > > now).
> > >
> > > This issue is a blocker for usage on 200G physical link. I think it
> > > can be categorized as a fix.
> > 
> > Let me ask the question the other way around: is there any specific reason to
> > have this fix into 6.5 and all the way back to 5.13?
> > Especially the latest bit (CC-ing stable) looks at least debatable.
> 
> There are many deployed Linux distributions with MANA driver on kernel 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve the performance target.

Why can't they be upgraded to get that performance target, and all the
other goodness that those kernels have?  We don't normally backport new
features, right?

thanks,

greg k-h
  
Long Li June 30, 2023, 8:42 p.m. UTC | #10
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on
> receiving packets
> 
> On Fri, Jun 30, 2023 at 05:31:48PM +0000, Long Li wrote:
> > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell
> > > on receiving packets
> > >
> > > On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote:
> > > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue
> > > > > doorbell on receiving packets
> > > > >
> > > > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com
> wrote:
> > > > > > From: Long Li <longli@microsoft.com>
> > > > > >
> > > > > > It's inefficient to ring the doorbell page every time a WQE is
> > > > > > posted to the received queue. Excessive MMIO writes result in
> > > > > > CPU spending more time waiting on LOCK instructions (atomic
> > > > > > operations), resulting in poor scaling performance.
> > > > > >
> > > > > > Move the code for ringing doorbell page to where after we have
> > > > > > posted all WQEs to the receive queue during a callback from
> > > > > > napi_poll().
> > > > > >
> > > > > > With this change, tests showed an improvement from 120G/s to
> > > > > > 160G/s on a 200G physical link, with 16 or 32 hardware queues.
> > > > > >
> > > > > > Tests showed no regression in network latency benchmarks on
> > > > > > single connection.
> > > > > >
> > > > > > While we are making changes in this code path, change the code
> > > > > > for ringing doorbell to set the WQE_COUNT to 0 for Receive
> > > > > > Queue. The hardware specification specifies that it should set to 0.
> > > > > > Although
> > > > > > currently the hardware doesn't enforce the check, in the
> > > > > > future releases it may do.
> > > > > >
> > > > > > Cc: stable@vger.kernel.org
> > > > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft
> > > > > > Azure Network Adapter (MANA)")
> > > > >
> > > > > Uhmmm... this looks like a performance improvement to me, more
> > > > > suitable for the net-next tree ?!? (Note that net-next is closed
> > > > > now).
> > > >
> > > > This issue is a blocker for usage on 200G physical link. I think
> > > > it can be categorized as a fix.
> > >
> > > Let me ask the question the other way around: is there any specific
> > > reason to have this fix into 6.5 and all the way back to 5.13?
> > > Especially the latest bit (CC-ing stable) looks at least debatable.
> >
> > There are many deployed Linux distributions with MANA driver on kernel
> 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve
> the performance target.
> 
> Why can't they be upgraded to get that performance target, and all the other
> goodness that those kernels have?  We don't normally backport new features,
> right?

I think this should be considered as a fix, not a new feature.

MANA is designed to be 200GB full duplex at the start.  Due to lack of
hardware testing capability at early stage of the project, we could only test 100GB
for the Linux driver. When hardware is fully capable of reaching designed spec,
this bug in the Linux driver shows up.

Thanks,

Long

> 
> thanks,
> 
> greg k-h
  
Jakub Kicinski June 30, 2023, 11:38 p.m. UTC | #11
On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote:
> > > 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve
> > > the performance target.
> > 
> > Why can't they be upgraded to get that performance target, and all the other
> > goodness that those kernels have?  We don't normally backport new features,
> > right?  
> 
> I think this should be considered as a fix, not a new feature.
> 
> MANA is designed to be 200GB full duplex at the start.  Due to lack of
> hardware testing capability at early stage of the project, we could only test 100GB
> for the Linux driver. When hardware is fully capable of reaching designed spec,
> this bug in the Linux driver shows up.

That part we understand.

If I were you I'd try to convince Greg and Paolo that the change is
small and significant for user experience. And answer Greg's question
why upgrading the kernel past 6.1 is a challenge in your environment.
  
Long Li July 2, 2023, 8:18 p.m. UTC | #12
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving
> packets
> 
> On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote:
> > > > 5.15 and kernel 6.1. (those kernels are longterm) They need this
> > > > fix to achieve the performance target.
> > >
> > > Why can't they be upgraded to get that performance target, and all
> > > the other goodness that those kernels have?  We don't normally
> > > backport new features, right?
> >
> > I think this should be considered as a fix, not a new feature.
> >
> > MANA is designed to be 200GB full duplex at the start.  Due to lack of
> > hardware testing capability at early stage of the project, we could
> > only test 100GB for the Linux driver. When hardware is fully capable
> > of reaching designed spec, this bug in the Linux driver shows up.
> 
> That part we understand.
> 
> If I were you I'd try to convince Greg and Paolo that the change is small and
> significant for user experience. And answer Greg's question why upgrading the
> kernel past 6.1 is a challenge in your environment.

I was under the impression that this patch was considered to be a feature, 
not a bug fix. I was trying to justify that the "Fixes:" tag was needed. 

I apologize for misunderstanding this.

Without this fix, it's not possible to run a typical workload designed for 200Gb
physical link speed.

We see a large number of customers and Linux distributions committed on 5.15 
and 6.1 kernels. They planned the product cycles and certification processes 
around these longterm kernel versions. It's difficult for them to upgrade to newer
kernel versions.

Thanks,

Long
  
Paolo Abeni July 3, 2023, 10:14 a.m. UTC | #13
On Sun, 2023-07-02 at 20:18 +0000, Long Li wrote:
> > > > > > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX
> > > > > > > > queue
> > > > > > > > doorbell
> > > > > > > > on receiving
> > > > > > > > packets
> > > > > > > > 
> > > > > > > > On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote:
> > > > > > > > > > > > > > > > > > > > 5.15 and kernel 6.1. (those
> > > > > > > > > > > > > > > > > > > > kernels are longterm)
> > > > > > > > > > > > > > > > > > > > They need
> > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > fix to achieve the performance
> > > > > > > > > > > > > > > > > > > > target.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Why can't they be upgraded to get that
> > > > > > > > > > > > > > > > performance
> > > > > > > > > > > > > > > > target, and
> > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > the other goodness that those kernels
> > > > > > > > > > > > > > > > have? We don't
> > > > > > > > > > > > > > > > normally
> > > > > > > > > > > > > > > > backport new features, right?
> > > > > > > > > > > > 
> > > > > > > > > > > > I think this should be considered as a fix, not
> > > > > > > > > > > > a new
> > > > > > > > > > > > feature.
> > > > > > > > > > > > 
> > > > > > > > > > > > MANA is designed to be 200GB full duplex at the
> > > > > > > > > > > > start. Due
> > > > > > > > > > > > to
> > > > > > > > > > > > lack of
> > > > > > > > > > > > hardware testing capability at early stage of
> > > > > > > > > > > > the project,
> > > > > > > > > > > > we
> > > > > > > > > > > > could
> > > > > > > > > > > > only test 100GB for the Linux driver. When
> > > > > > > > > > > > hardware is
> > > > > > > > > > > > fully
> > > > > > > > > > > > capable
> > > > > > > > > > > > of reaching designed spec, this bug in the
> > > > > > > > > > > > Linux driver
> > > > > > > > > > > > shows up.
> > > > > > > > 
> > > > > > > > That part we understand.
> > > > > > > > 
> > > > > > > > If I were you I'd try to convince Greg and Paolo that
> > > > > > > > the
> > > > > > > > change is
> > > > > > > > small and
> > > > > > > > significant for user experience. And answer Greg's
> > > > > > > > question why
> > > > > > > > upgrading the
> > > > > > > > kernel past 6.1 is a challenge in your environment.
> > > > 
> > > > I was under the impression that this patch was considered to be
> > > > a
> > > > feature, 
> > > > not a bug fix. I was trying to justify that the "Fixes:" tag
> > > > was
> > > > needed. 
> > > > 
> > > > I apologize for misunderstanding this.
> > > > 
> > > > Without this fix, it's not possible to run a typical workload
> > > > designed for 200Gb
> > > > physical link speed.
> > > > 
> > > > We see a large number of customers and Linux distributions
> > > > committed
> > > > on 5.15 
> > > > and 6.1 kernels. They planned the product cycles and
> > > > certification
> > > > processes 
> > > > around these longterm kernel versions. It's difficult for them
> > > > to
> > > > upgrade to newer
> > > > kernel versions.

I think there are some misunderstanding WRT distros and stable kernels.
(Commercial) distros will backport the patch as needed, regardless such
patch landing in the 5.15 upstream tree or not. Individual users
running their own vanilla 5.15 kernel can't expect performance
improvement landing there.

All in all I feel undecided. I would endorse this change going trough
net-next (without the stable tag). I would feel less torn with this
change targeting -net without the stable tag. Targeting -net with the
stable tag sounds a bit too much to me.

Cheers,
Paolo
  
Long Li July 6, 2023, 5:51 p.m. UTC | #14
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving
> packets
> 
> On Sun, 2023-07-02 at 20:18 +0000, Long Li wrote:
> > > > > > > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX
> > > > > > > > > queue doorbell on receiving packets
> > > > > > > > >
> > > > > > > > > On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote:
> > > > > > > > > > > > > > > > > > > > > 5.15 and kernel 6.1. (those
> > > > > > > > > > > > > > > > > > > > > kernels are longterm) They need
> > > > > > > > > > > > > > > > > > > > > this fix to achieve the
> > > > > > > > > > > > > > > > > > > > > performance target.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Why can't they be upgraded to get that
> > > > > > > > > > > > > > > > > performance target, and all the other
> > > > > > > > > > > > > > > > > goodness that those kernels have? We
> > > > > > > > > > > > > > > > > don't normally backport new features,
> > > > > > > > > > > > > > > > > right?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think this should be considered as a fix, not
> > > > > > > > > > > > > a new feature.
> > > > > > > > > > > > >
> > > > > > > > > > > > > MANA is designed to be 200GB full duplex at the
> > > > > > > > > > > > > start. Due to lack of hardware testing
> > > > > > > > > > > > > capability at early stage of the project, we
> > > > > > > > > > > > > could only test 100GB for the Linux driver. When
> > > > > > > > > > > > > hardware is fully capable of reaching designed
> > > > > > > > > > > > > spec, this bug in the Linux driver shows up.
> > > > > > > > >
> > > > > > > > > That part we understand.
> > > > > > > > >
> > > > > > > > > If I were you I'd try to convince Greg and Paolo that
> > > > > > > > > the change is small and significant for user experience.
> > > > > > > > > And answer Greg's question why upgrading the kernel past
> > > > > > > > > 6.1 is a challenge in your environment.
> > > > >
> > > > > I was under the impression that this patch was considered to be
> > > > > a feature, not a bug fix. I was trying to justify that the
> > > > > "Fixes:" tag was needed.
> > > > >
> > > > > I apologize for misunderstanding this.
> > > > >
> > > > > Without this fix, it's not possible to run a typical workload
> > > > > designed for 200Gb physical link speed.
> > > > >
> > > > > We see a large number of customers and Linux distributions
> > > > > committed on 5.15 and 6.1 kernels. They planned the product
> > > > > cycles and certification processes around these longterm kernel
> > > > > versions. It's difficult for them to upgrade to newer kernel
> > > > > versions.
> 
> I think there are some misunderstanding WRT distros and stable kernels.
> (Commercial) distros will backport the patch as needed, regardless such patch
> landing in the 5.15 upstream tree or not. Individual users running their own
> vanilla 5.15 kernel can't expect performance improvement landing there.
> 
> All in all I feel undecided. I would endorse this change going trough net-next
> (without the stable tag). I would feel less torn with this change targeting -net
> without the stable tag. Targeting -net with the stable tag sounds a bit too much to
> me.
> 
> Cheers,
> Paolo

I'm sending this patch to net-next without stable tag.

Thanks,

Long
  

Patch

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 8f3f78b68592..3765d3389a9a 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -300,8 +300,11 @@  static void mana_gd_ring_doorbell(struct gdma_context *gc, u32 db_index,
 
 void mana_gd_wq_ring_doorbell(struct gdma_context *gc, struct gdma_queue *queue)
 {
+	/* Hardware Spec specifies that software client should set 0 for
+	 * wqe_cnt for Receive Queues. This value is not used in Send Queues.
+	 */
 	mana_gd_ring_doorbell(gc, queue->gdma_dev->doorbell, queue->type,
-			      queue->id, queue->head * GDMA_WQE_BU_SIZE, 1);
+			      queue->id, queue->head * GDMA_WQE_BU_SIZE, 0);
 }
 
 void mana_gd_ring_cq(struct gdma_queue *cq, u8 arm_bit)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index cd4d5ceb9f2d..1d8abe63fcb8 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1383,8 +1383,8 @@  static void mana_post_pkt_rxq(struct mana_rxq *rxq)
 
 	recv_buf_oob = &rxq->rx_oobs[curr_index];
 
-	err = mana_gd_post_and_ring(rxq->gdma_rq, &recv_buf_oob->wqe_req,
-				    &recv_buf_oob->wqe_inf);
+	err = mana_gd_post_work_request(rxq->gdma_rq, &recv_buf_oob->wqe_req,
+					&recv_buf_oob->wqe_inf);
 	if (WARN_ON_ONCE(err))
 		return;
 
@@ -1654,6 +1654,12 @@  static void mana_poll_rx_cq(struct mana_cq *cq)
 		mana_process_rx_cqe(rxq, cq, &comp[i]);
 	}
 
+	if (comp_read > 0) {
+		struct gdma_context *gc = rxq->gdma_rq->gdma_dev->gdma_context;
+
+		mana_gd_wq_ring_doorbell(gc, rxq->gdma_rq);
+	}
+
 	if (rxq->xdp_flush)
 		xdp_do_flush();
 }