[v2,net-next,0/10] pds_core: Various improvements and AQ race condition cleanup

Message ID 20240126174255.17052-1-brett.creeley@amd.com
Headers
Series pds_core: Various improvements and AQ race condition cleanup |

Message

Brett Creeley Jan. 26, 2024, 5:42 p.m. UTC
  This series includes the following changes:

There can be many users of the pds_core's adminq. This includes
pds_core's uses and any clients that depend on it. When the pds_core
device goes through a reset for any reason the adminq is freed
and reconfigured. There are some gaps in the current implementation
that will cause crashes during reset if any of the previously mentioned
users of the adminq attempt to use it after it's been freed.

Issues around how resets are handled, specifically regarding the driver's
error handlers.

Some general cleanups.

v1:
https://lore.kernel.org/netdev/20240104171221.31399-1-brett.creeley@amd.com/

v2:
- Combined the RCT clean-ups with an incorrect goto label fix
- Added a couple more patches related to reset flows
- Slightly updated the cover letter to mention the extra patches that
  were added
- Changed a function used only once to be static

Brett Creeley (10):
  pds_core: Prevent health thread from running during reset/remove
  pds_core: Cancel AQ work on teardown
  pds_core: Use struct pdsc for the pdsc_adminq_isr private data
  pds_core: Prevent race issues involving the adminq
  pds_core: Clear BARs on reset
  pds_core: Don't assign interrupt index/bound_intr to notifyq
  pds_core: Unmask adminq interrupt in work thread
  pds_core: Fix up some minor issues
  pds_core: Rework teardown/setup flow to be more common
  pds_core: Clean up init/uninit flows to be more readable

 drivers/net/ethernet/amd/pds_core/adminq.c  |  74 +++++++----
 drivers/net/ethernet/amd/pds_core/core.c    | 130 ++++++++++++--------
 drivers/net/ethernet/amd/pds_core/core.h    |   3 +-
 drivers/net/ethernet/amd/pds_core/debugfs.c |  12 +-
 drivers/net/ethernet/amd/pds_core/dev.c     |  30 +++--
 drivers/net/ethernet/amd/pds_core/devlink.c |   3 +-
 drivers/net/ethernet/amd/pds_core/fw.c      |   3 +
 drivers/net/ethernet/amd/pds_core/main.c    |  26 +++-
 8 files changed, 187 insertions(+), 94 deletions(-)
  

Comments

Jakub Kicinski Jan. 27, 2024, 4:44 a.m. UTC | #1
On Fri, 26 Jan 2024 09:42:45 -0800 Brett Creeley wrote:
> This series includes the following changes:
> 
> There can be many users of the pds_core's adminq. This includes
> pds_core's uses and any clients that depend on it. When the pds_core
> device goes through a reset for any reason the adminq is freed
> and reconfigured. There are some gaps in the current implementation
> that will cause crashes during reset if any of the previously mentioned
> users of the adminq attempt to use it after it's been freed.
> 
> Issues around how resets are handled, specifically regarding the driver's
> error handlers.

Patches 1, 2 and 4 look like fixes. Is there any reason these are
targeting net-next? If someone deploys this device at scale rare
things will happen a lot..
  
Brett Creeley Jan. 29, 2024, 5:27 p.m. UTC | #2
On 1/26/2024 8:44 PM, Jakub Kicinski wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Fri, 26 Jan 2024 09:42:45 -0800 Brett Creeley wrote:
>> This series includes the following changes:
>>
>> There can be many users of the pds_core's adminq. This includes
>> pds_core's uses and any clients that depend on it. When the pds_core
>> device goes through a reset for any reason the adminq is freed
>> and reconfigured. There are some gaps in the current implementation
>> that will cause crashes during reset if any of the previously mentioned
>> users of the adminq attempt to use it after it's been freed.
>>
>> Issues around how resets are handled, specifically regarding the driver's
>> error handlers.
> 
> Patches 1, 2 and 4 look like fixes. Is there any reason these are
> targeting net-next? If someone deploys this device at scale rare
> things will happen a lot..

No reason, just an oversight on my part. I actually think patches 1, 2, 
3, 4, 5, and 9 could all go to net. Unfortunately some of these patches 
are intertwined (i.e. patch 10 depends on patch 9).

If I push the previously mentioned patches to net and they get accepted, 
how soon are fixes typically added to the net-next tree so I can 
rebase/re-push the remaining patches?

Thank for the review,

Brett
  
Jakub Kicinski Jan. 29, 2024, 8:05 p.m. UTC | #3
On Mon, 29 Jan 2024 09:27:21 -0800 Brett Creeley wrote:
> > On Fri, 26 Jan 2024 09:42:45 -0800 Brett Creeley wrote:  
> >> This series includes the following changes:
> >>
> >> There can be many users of the pds_core's adminq. This includes
> >> pds_core's uses and any clients that depend on it. When the pds_core
> >> device goes through a reset for any reason the adminq is freed
> >> and reconfigured. There are some gaps in the current implementation
> >> that will cause crashes during reset if any of the previously mentioned
> >> users of the adminq attempt to use it after it's been freed.
> >>
> >> Issues around how resets are handled, specifically regarding the driver's
> >> error handlers.  
> > 
> > Patches 1, 2 and 4 look like fixes. Is there any reason these are
> > targeting net-next? If someone deploys this device at scale rare
> > things will happen a lot..  
> 
> No reason, just an oversight on my part. I actually think patches 1, 2, 
> 3, 4, 5, and 9 could all go to net. Unfortunately some of these patches 
> are intertwined (i.e. patch 10 depends on patch 9).
> 
> If I push the previously mentioned patches to net and they get accepted, 
> how soon are fixes typically added to the net-next tree so I can 
> rebase/re-push the remaining patches?

net gets merged into net-next very Thursday, exact timing depends on how
quickly Linus pulls from us.
  
Brett Creeley Jan. 29, 2024, 9:12 p.m. UTC | #4
On 1/29/2024 12:05 PM, Jakub Kicinski wrote:
> net gets merged into net-next very Thursday, exact timing depends on how
> quickly Linus pulls from us

Okay, then I will work on splitting this series up between net and net-next.

Thanks again,

Brett