[v3,06/16] nvme-fc: Do not wait in vain when unloading module
Commit Message
The module unload code will wait for a controller to be delete even when
there is no controller and we wait for completion forever to happen.
Thus only wait for the completion when there is a controller which
needs to be removed.
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
drivers/nvme/host/fc.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
Comments
On Mon, Dec 18, 2023 at 04:30:54PM +0100, Daniel Wagner wrote:
> The module unload code will wait for a controller to be delete even when
> there is no controller and we wait for completion forever to happen.
> Thus only wait for the completion when there is a controller which
> needs to be removed.
This whole code looks fishy to me, and I suspect this patch only papers
over it. Why do we this wait to start with? If we've found that out and
documented it, the code really should be using a wait_event variant that
checks for the actual condition (no more controllers), because without
that you might still have a race otherwise.
On 12/18/23 16:30, Daniel Wagner wrote:
> The module unload code will wait for a controller to be delete even when
> there is no controller and we wait for completion forever to happen.
> Thus only wait for the completion when there is a controller which
> needs to be removed.
>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
> drivers/nvme/host/fc.c | 20 +++++++++++++-------
> 1 file changed, 13 insertions(+), 7 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
On Tue, Dec 19, 2023 at 05:35:14AM +0100, Christoph Hellwig wrote:
> On Mon, Dec 18, 2023 at 04:30:54PM +0100, Daniel Wagner wrote:
> > The module unload code will wait for a controller to be delete even when
> > there is no controller and we wait for completion forever to happen.
> > Thus only wait for the completion when there is a controller which
> > needs to be removed.
>
> This whole code looks fishy to me, and I suspect this patch only papers
> over it. Why do we this wait to start with? If we've found that out and
> documented it, the code really should be using a wait_event variant that
> checks for the actual condition (no more controllers), because without
> that you might still have a race otherwise.
The synchronization here does feel off, but Daniel's change looks
correct to the current implementation and is a minimal diff to fix it.
Do you want to see this re-worked with a better wait condition or can we
proceed with this?
@@ -3947,10 +3947,11 @@ static int __init nvme_fc_init_module(void)
return ret;
}
-static void
+static bool
nvme_fc_delete_controllers(struct nvme_fc_rport *rport)
{
struct nvme_fc_ctrl *ctrl;
+ bool cleanup = false;
spin_lock(&rport->lock);
list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) {
@@ -3958,21 +3959,28 @@ nvme_fc_delete_controllers(struct nvme_fc_rport *rport)
"NVME-FC{%d}: transport unloading: deleting ctrl\n",
ctrl->cnum);
nvme_delete_ctrl(&ctrl->ctrl);
+ cleanup = true;
}
spin_unlock(&rport->lock);
+
+ return cleanup;
}
-static void
+static bool
nvme_fc_cleanup_for_unload(void)
{
struct nvme_fc_lport *lport;
struct nvme_fc_rport *rport;
+ bool cleanup = false;
list_for_each_entry(lport, &nvme_fc_lport_list, port_list) {
list_for_each_entry(rport, &lport->endp_list, endp_list) {
- nvme_fc_delete_controllers(rport);
+ if (nvme_fc_delete_controllers(rport))
+ cleanup = true;
}
}
+
+ return cleanup;
}
static void __exit nvme_fc_exit_module(void)
@@ -3982,10 +3990,8 @@ static void __exit nvme_fc_exit_module(void)
spin_lock_irqsave(&nvme_fc_lock, flags);
nvme_fc_waiting_to_unload = true;
- if (!list_empty(&nvme_fc_lport_list)) {
- need_cleanup = true;
- nvme_fc_cleanup_for_unload();
- }
+ if (!list_empty(&nvme_fc_lport_list))
+ need_cleanup = nvme_fc_cleanup_for_unload();
spin_unlock_irqrestore(&nvme_fc_lock, flags);
if (need_cleanup) {
pr_info("%s: waiting for ctlr deletes\n", __func__);