[v3,00/17] enable nvmet-fc for blktests

Message ID 20231218153105.12717-1-dwagner@suse.de
Headers
Series enable nvmet-fc for blktests |

Message

Daniel Wagner Dec. 18, 2023, 3:30 p.m. UTC
  Another update on getting nvmet-fc ready for blktests. The main change here is
that I tried make sense of the ref count taking in nvmet-fc. When running
blktests with the auto connect udev rule activated the additional connect
attempt etc made nvmet-fc explode and choke everywhere. After a lot of poking
and pondering I decided to change the rules who the ref counts are taken for the
ctrl, association, target port and host port. This made a big difference and I
am able to get blktests pass the tests.

Also KASAN was reporting a lot of UAFs. There are still some problems left as I
can still observe hangers when running blktests in a loop for a while. But it
doesn't explode immediately so I consider this a win.

Apropos KASAN, it still reports the problem from [1], so anyone who want to run
this series needs to revert ee6fdc5055e9 ("nvme-fc: fix race between error
recovery and creating association").

The first four patches are independent of the rest and could go in sooner.

[1] https://lore.kernel.org/linux-nvme/hkhl56n665uvc6t5d6h3wtx7utkcorw4xlwi7d2t2bnonavhe6@xaan6pu43ap6/

changes:
v3:
 - collected all patches into one series
 - updated ref counting in nvmet-fc

v2:
  - added RBs
  - added new patches
  - https://lore.kernel.org/linux-nvme/20230620133711.22840-1-dwagner@suse.de/
  
v1:
  - https://lore.kernel.org/linux-nvme/20230615094356.14878-1-dwagner@suse.de/ 


Daniel Wagner (16):
  nvmet: report ioccsz and iorcsz for disc ctrl
  nvmet-fc: remove unnecessary bracket
  nvmet-trace: avoid dereferencing pointer too early
  nvmet-trace: null terminate device name string correctly
  nvmet-fcloop: Remove remote port from list when unlinking
  nvme-fc: Do not wait in vain when unloading module
  nvmet-fc: Release reference on target port
  nvmet-fc: untangle cross refcounting objects
  nvmet-fc: free queue and assoc directly
  nvmet-fc: hold reference on hostport match
  nvmet-fc: remove null hostport pointer check
  nvmet-fc: do not tack refs on tgtports from assoc
  nvmet-fc: abort command if when there is binding
  nvmet-fc: free hostport after release reference to tgtport
  nvmet-fc: avoid deadlock on delete association path
  nvmet-fc: take ref count on tgtport before delete assoc

 drivers/nvme/host/fc.c          |  20 +++--
 drivers/nvme/target/discovery.c |  13 +++
 drivers/nvme/target/fc.c        | 153 ++++++++++++++++++--------------
 drivers/nvme/target/fcloop.c    |   7 +-
 drivers/nvme/target/trace.c     |   6 +-
 drivers/nvme/target/trace.h     |  33 ++++---
 6 files changed, 135 insertions(+), 97 deletions(-)
  

Comments

Maurizio Lombardi Dec. 18, 2023, 4:10 p.m. UTC | #1
po 18. 12. 2023 v 16:34 odesílatel Daniel Wagner <dwagner@suse.de> napsal:
>
> Apropos KASAN, it still reports the problem from [1], so anyone who want to run
> this series needs to revert ee6fdc5055e9 ("nvme-fc: fix race between error
> recovery and creating association").

We hit this regression in RHEL too and we were forced to revert that
commit, it's obviously buggy
because it calls blocking functions with interrupts disabled.
Please revert it.

Note: I have tried to fix it and close the race condition but it all
became a bit too complex, also,
I didn't have the opportunity to test it yet.
http://bsdbackstore.eu/misc/0001-nvme-fc-fix-races-and-scheduling-while-atomic-bugs.patch

Maurizio
  
Keith Busch Jan. 2, 2024, 9:36 p.m. UTC | #2
On Mon, Dec 18, 2023 at 04:30:48PM +0100, Daniel Wagner wrote:
> Another update on getting nvmet-fc ready for blktests. The main change here is
> that I tried make sense of the ref count taking in nvmet-fc. When running
> blktests with the auto connect udev rule activated the additional connect
> attempt etc made nvmet-fc explode and choke everywhere. After a lot of poking
> and pondering I decided to change the rules who the ref counts are taken for the
> ctrl, association, target port and host port. This made a big difference and I
> am able to get blktests pass the tests.
> 
> Also KASAN was reporting a lot of UAFs. There are still some problems left as I
> can still observe hangers when running blktests in a loop for a while. But it
> doesn't explode immediately so I consider this a win.
> 
> Apropos KASAN, it still reports the problem from [1], so anyone who want to run
> this series needs to revert ee6fdc5055e9 ("nvme-fc: fix race between error
> recovery and creating association").
> 
> The first four patches are independent of the rest and could go in sooner.

Applied patches 2-5 to nvme-6.8.