[v5,0/4] UIO_MEM_DMA_COHERENT for cnic/bnx2/bnx2x

Message ID 20240201233400.3394996-1-cleech@redhat.com
Headers
Series UIO_MEM_DMA_COHERENT for cnic/bnx2/bnx2x |

Message

Chris Leech Feb. 1, 2024, 11:33 p.m. UTC
  During bnx2i iSCSI testing we ran into page refcounting issues in the
uio mmaps exported from cnic to the iscsiuio process, and bisected back
to the removal of the __GFP_COMP flag from dma_alloc_coherent calls.

The cnic uio interface also has issues running with an iommu enabled,
which these changes correct.

In order to fix these drivers to be able to mmap dma coherent memory via
a uio device, introduce a new uio mmap type backed by dma_mmap_coherent.

While I understand some complaints about how these drivers have been
structured, I also don't like letting support bitrot when there's a
reasonable alternative to re-architecting an existing driver. I believe
this to be the most sane way to restore these drivers to functioning
properly.

There are two other uio drivers which are mmaping dma_alloc_coherent
memory as UIO_MEM_PHYS, uio_dmem_genirq and uio_pruss.
These drivers are converted in the later patches of this series.

v5:
- convert uio_pruss and uio_dmem_genirq
- added dev_warn and comment about not adding more users
- put some PAGE_ALIGNs back in cnic to keep checks in
  uio_mmap_dma_coherent matched with uio_mmap_physical.
- dropped the Fixes trailer
v4:
- re-introduce the dma_device member to uio_map,
  it needs to be passed to dma_mmap_coherent somehow
- drop patch 3 to focus only on the uio interface,
  explicit page alignment isn't needed
- re-add the v1 mail recipients,
  this isn't something to be handled through linux-scsi
v3 (Nilesh Javali <njavali@marvell.com>):
- fix warnings reported by kernel test robot
  and added base commit
v2 (Nilesh Javali <njavali@marvell.com>):
- expose only the dma_addr within uio and cnic.
- Cleanup newly added unions comprising virtual_addr
  and struct device

previous threads:
v1: https://lore.kernel.org/all/20230929170023.1020032-1-cleech@redhat.com/
attempt at an alternative change: https://lore.kernel.org/all/20231219055514.12324-1-njavali@marvell.com/
v2: https://lore.kernel.org/all/20240103091137.27142-1-njavali@marvell.com/
v3: https://lore.kernel.org/all/20240109121458.26475-1-njavali@marvell.com/
v4: https://lore.kernel.org/all/20240131191732.3247996-1-cleech@redhat.com/

Chris Leech (4):
  uio: introduce UIO_MEM_DMA_COHERENT type
  cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
  uio_pruss: UIO_MEM_DMA_COHERENT conversion
  uio_dmem_genirq: UIO_MEM_DMA_COHERENT conversion

 drivers/net/ethernet/broadcom/bnx2.c          |  1 +
 .../net/ethernet/broadcom/bnx2x/bnx2x_main.c  |  2 +
 drivers/net/ethernet/broadcom/cnic.c          | 25 ++++++----
 drivers/net/ethernet/broadcom/cnic.h          |  1 +
 drivers/net/ethernet/broadcom/cnic_if.h       |  1 +
 drivers/uio/uio.c                             | 47 +++++++++++++++++++
 drivers/uio/uio_dmem_genirq.c                 | 22 ++++-----
 drivers/uio/uio_pruss.c                       |  6 ++-
 include/linux/uio_driver.h                    |  8 ++++
 9 files changed, 89 insertions(+), 24 deletions(-)


base-commit: 861c0981648f5b64c86fd028ee622096eb7af05a
  

Comments

Alexander Lobakin Feb. 5, 2024, 4:57 p.m. UTC | #1
From: Chris Leech <cleech@redhat.com>
Date: Thu,  1 Feb 2024 15:33:56 -0800

> During bnx2i iSCSI testing we ran into page refcounting issues in the
> uio mmaps exported from cnic to the iscsiuio process, and bisected back
> to the removal of the __GFP_COMP flag from dma_alloc_coherent calls.

IIRC Jakub mentioned some time ago that he doesn't want to see
third-party userspace <-> kernel space communication in the networking
drivers, to me this looks exactly like that :z

Thanks,
Olek
  
Chris Leech Feb. 5, 2024, 7:51 p.m. UTC | #2
On Mon, Feb 05, 2024 at 05:57:58PM +0100, Alexander Lobakin wrote:
> From: Chris Leech <cleech@redhat.com>
> Date: Thu,  1 Feb 2024 15:33:56 -0800
> 
> > During bnx2i iSCSI testing we ran into page refcounting issues in the
> > uio mmaps exported from cnic to the iscsiuio process, and bisected back
> > to the removal of the __GFP_COMP flag from dma_alloc_coherent calls.
> 
> IIRC Jakub mentioned some time ago that he doesn't want to see
> third-party userspace <-> kernel space communication in the networking
> drivers, to me this looks exactly like that :z

This isn't something anyone likes, but it's an interface that's been in
the kernel and in use since 2009.  I'm trying to see if it can be fixed
"enough" to keep existing users functioning.  If not, maybe the cnic
interface and the stacking protocol drivers (bnx2i/bnx2fc) should be
marked as broken.

- Chris
  
Jakub Kicinski Feb. 6, 2024, 3:54 p.m. UTC | #3
On Mon, 5 Feb 2024 11:51:00 -0800 Chris Leech wrote:
> > IIRC Jakub mentioned some time ago that he doesn't want to see
> > third-party userspace <-> kernel space communication in the networking
> > drivers, to me this looks exactly like that :z  
> 
> This isn't something anyone likes, but it's an interface that's been in
> the kernel and in use since 2009.  I'm trying to see if it can be fixed
> "enough" to keep existing users functioning.  If not, maybe the cnic
> interface and the stacking protocol drivers (bnx2i/bnx2fc) should be
> marked as broken.

Yeah, is this one of the "converged Ethernet" monstrosities from
the 2000s. All the companies which went deep into this stuff are
now defunct AFAIK, and we're left holding the bag.

Yay.
  
Lee Duncan Feb. 6, 2024, 8:16 p.m. UTC | #4
On Tue, Feb 6, 2024 at 7:54 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 5 Feb 2024 11:51:00 -0800 Chris Leech wrote:
> > > IIRC Jakub mentioned some time ago that he doesn't want to see
> > > third-party userspace <-> kernel space communication in the networking
> > > drivers, to me this looks exactly like that :z
> >
> > This isn't something anyone likes, but it's an interface that's been in
> > the kernel and in use since 2009.  I'm trying to see if it can be fixed
> > "enough" to keep existing users functioning.  If not, maybe the cnic
> > interface and the stacking protocol drivers (bnx2i/bnx2fc) should be
> > marked as broken.
>
> Yeah, is this one of the "converged Ethernet" monstrosities from
> the 2000s. All the companies which went deep into this stuff are
> now defunct AFAIK, and we're left holding the bag.
>
> Yay.

Actually, Marvel is around but seems loath to invest in re-architecting this
driver since it's so long in the tooth.
  
Chris Leech Feb. 21, 2024, 6:28 p.m. UTC | #5
I think all the feedback on these has been addressed, so I'm asking
once more if these UIO additions can be considered for inclusion.

Thanks,
- Chris

On Thu, Feb 1, 2024 at 3:34 PM Chris Leech <cleech@redhat.com> wrote:
>
> During bnx2i iSCSI testing we ran into page refcounting issues in the
> uio mmaps exported from cnic to the iscsiuio process, and bisected back
> to the removal of the __GFP_COMP flag from dma_alloc_coherent calls.
>
> The cnic uio interface also has issues running with an iommu enabled,
> which these changes correct.
>
> In order to fix these drivers to be able to mmap dma coherent memory via
> a uio device, introduce a new uio mmap type backed by dma_mmap_coherent.
>
> While I understand some complaints about how these drivers have been
> structured, I also don't like letting support bitrot when there's a
> reasonable alternative to re-architecting an existing driver. I believe
> this to be the most sane way to restore these drivers to functioning
> properly.
>
> There are two other uio drivers which are mmaping dma_alloc_coherent
> memory as UIO_MEM_PHYS, uio_dmem_genirq and uio_pruss.
> These drivers are converted in the later patches of this series.
>
> v5:
> - convert uio_pruss and uio_dmem_genirq
> - added dev_warn and comment about not adding more users
> - put some PAGE_ALIGNs back in cnic to keep checks in
>   uio_mmap_dma_coherent matched with uio_mmap_physical.
> - dropped the Fixes trailer
> v4:
> - re-introduce the dma_device member to uio_map,
>   it needs to be passed to dma_mmap_coherent somehow
> - drop patch 3 to focus only on the uio interface,
>   explicit page alignment isn't needed
> - re-add the v1 mail recipients,
>   this isn't something to be handled through linux-scsi
> v3 (Nilesh Javali <njavali@marvell.com>):
> - fix warnings reported by kernel test robot
>   and added base commit
> v2 (Nilesh Javali <njavali@marvell.com>):
> - expose only the dma_addr within uio and cnic.
> - Cleanup newly added unions comprising virtual_addr
>   and struct device
>
> previous threads:
> v1: https://lore.kernel.org/all/20230929170023.1020032-1-cleech@redhat.com/
> attempt at an alternative change: https://lore.kernel.org/all/20231219055514.12324-1-njavali@marvell.com/
> v2: https://lore.kernel.org/all/20240103091137.27142-1-njavali@marvell.com/
> v3: https://lore.kernel.org/all/20240109121458.26475-1-njavali@marvell.com/
> v4: https://lore.kernel.org/all/20240131191732.3247996-1-cleech@redhat.com/
>
> Chris Leech (4):
>   uio: introduce UIO_MEM_DMA_COHERENT type
>   cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
>   uio_pruss: UIO_MEM_DMA_COHERENT conversion
>   uio_dmem_genirq: UIO_MEM_DMA_COHERENT conversion
>
>  drivers/net/ethernet/broadcom/bnx2.c          |  1 +
>  .../net/ethernet/broadcom/bnx2x/bnx2x_main.c  |  2 +
>  drivers/net/ethernet/broadcom/cnic.c          | 25 ++++++----
>  drivers/net/ethernet/broadcom/cnic.h          |  1 +
>  drivers/net/ethernet/broadcom/cnic_if.h       |  1 +
>  drivers/uio/uio.c                             | 47 +++++++++++++++++++
>  drivers/uio/uio_dmem_genirq.c                 | 22 ++++-----
>  drivers/uio/uio_pruss.c                       |  6 ++-
>  include/linux/uio_driver.h                    |  8 ++++
>  9 files changed, 89 insertions(+), 24 deletions(-)
>
>
> base-commit: 861c0981648f5b64c86fd028ee622096eb7af05a
> --
> 2.43.0
>
  
Lee Duncan Feb. 28, 2024, 6:20 p.m. UTC | #6
Is this series stalled?

I believe the main objections came from Greg earlier in the series,
but I'd gotten the impression Greg accepted the latest version.

On Thu, Feb 1, 2024 at 3:34 PM Chris Leech <cleech@redhat.com> wrote:
>
> During bnx2i iSCSI testing we ran into page refcounting issues in the
> uio mmaps exported from cnic to the iscsiuio process, and bisected back
> to the removal of the __GFP_COMP flag from dma_alloc_coherent calls.
>
> The cnic uio interface also has issues running with an iommu enabled,
> which these changes correct.
>
> In order to fix these drivers to be able to mmap dma coherent memory via
> a uio device, introduce a new uio mmap type backed by dma_mmap_coherent.
>
> While I understand some complaints about how these drivers have been
> structured, I also don't like letting support bitrot when there's a
> reasonable alternative to re-architecting an existing driver. I believe
> this to be the most sane way to restore these drivers to functioning
> properly.
>
> There are two other uio drivers which are mmaping dma_alloc_coherent
> memory as UIO_MEM_PHYS, uio_dmem_genirq and uio_pruss.
> These drivers are converted in the later patches of this series.
>
> v5:
> - convert uio_pruss and uio_dmem_genirq
> - added dev_warn and comment about not adding more users
> - put some PAGE_ALIGNs back in cnic to keep checks in
>   uio_mmap_dma_coherent matched with uio_mmap_physical.
> - dropped the Fixes trailer
> v4:
> - re-introduce the dma_device member to uio_map,
>   it needs to be passed to dma_mmap_coherent somehow
> - drop patch 3 to focus only on the uio interface,
>   explicit page alignment isn't needed
> - re-add the v1 mail recipients,
>   this isn't something to be handled through linux-scsi
> v3 (Nilesh Javali <njavali@marvell.com>):
> - fix warnings reported by kernel test robot
>   and added base commit
> v2 (Nilesh Javali <njavali@marvell.com>):
> - expose only the dma_addr within uio and cnic.
> - Cleanup newly added unions comprising virtual_addr
>   and struct device
>
> previous threads:
> v1: https://lore.kernel.org/all/20230929170023.1020032-1-cleech@redhat.com/
> attempt at an alternative change: https://lore.kernel.org/all/20231219055514.12324-1-njavali@marvell.com/
> v2: https://lore.kernel.org/all/20240103091137.27142-1-njavali@marvell.com/
> v3: https://lore.kernel.org/all/20240109121458.26475-1-njavali@marvell.com/
> v4: https://lore.kernel.org/all/20240131191732.3247996-1-cleech@redhat.com/
>
> Chris Leech (4):
>   uio: introduce UIO_MEM_DMA_COHERENT type
>   cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
>   uio_pruss: UIO_MEM_DMA_COHERENT conversion
>   uio_dmem_genirq: UIO_MEM_DMA_COHERENT conversion
>
>  drivers/net/ethernet/broadcom/bnx2.c          |  1 +
>  .../net/ethernet/broadcom/bnx2x/bnx2x_main.c  |  2 +
>  drivers/net/ethernet/broadcom/cnic.c          | 25 ++++++----
>  drivers/net/ethernet/broadcom/cnic.h          |  1 +
>  drivers/net/ethernet/broadcom/cnic_if.h       |  1 +
>  drivers/uio/uio.c                             | 47 +++++++++++++++++++
>  drivers/uio/uio_dmem_genirq.c                 | 22 ++++-----
>  drivers/uio/uio_pruss.c                       |  6 ++-
>  include/linux/uio_driver.h                    |  8 ++++
>  9 files changed, 89 insertions(+), 24 deletions(-)
>
>
> base-commit: 861c0981648f5b64c86fd028ee622096eb7af05a
> --
> 2.43.0
>
  
Greg KH Feb. 29, 2024, 6:10 a.m. UTC | #7
On Wed, Feb 28, 2024 at 10:20:58AM -0800, Lee Duncan wrote:
> Is this series stalled?
> 
> I believe the main objections came from Greg earlier in the series,
> but I'd gotten the impression Greg accepted the latest version.

I'm behind in patch review, should get to it in the next few days...

thanks,

greg k-h