USB:UAS:return ENODEV when submit urbs fail with device not attached.

Message ID 20240222165441.6148-1-WeitaoWang-oc@zhaoxin.com
State New
Headers
Series USB:UAS:return ENODEV when submit urbs fail with device not attached. |

Commit Message

Weitao Wang Feb. 22, 2024, 4:54 p.m. UTC
  In the scenario of entering hibernation with udisk in the system, if the
udisk was gone or resume fail in the thaw phase of hibernation. Its state
will be set to NOTATTACHED. However, usb_hub_wq was already freezed and
can't not handle disconnect event. Then, sync cache SCSI command will be
sent to this udisk on the poweroff phase of hibernation, that will cause
uas_submit_urbs to be called to submit URB to sense/data/cmd pipe. Then,
usb_submit_urb return value -ENODEV when device was set to NOTATTACHED
state. However, uas_submit_urbs always return "SCSI_MLQUEUE_DEVICE_BUSY"
regardless of the reason for submission failure.That will lead the SCSI
layer go into an ugly loop and system fail to go into hibernation.

To fix this issue, let uas_submit_urbs function to return real error
-ENODEV when submit URB with device in the NOTATTACHED state. In the error
checking inside of function uas_queuecommand_lck, reporting DID_ERROR will
cause device poweroff fail and system shutdown instead of entering
hibernation. So,replace DID_ERROR with DID_NO_CONNECT to report to SCSI
upper layer.

Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
---
 drivers/usb/storage/uas.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)
  

Comments

Oliver Neukum Feb. 22, 2024, 9:47 a.m. UTC | #1
On 22.02.24 17:54, Weitao Wang wrote:
> In the scenario of entering hibernation with udisk in the system, if the
> udisk was gone or resume fail in the thaw phase of hibernation. Its state
> will be set to NOTATTACHED. However, usb_hub_wq was already freezed and
> can't not handle disconnect event. Then, sync cache SCSI command will be
> sent to this udisk on the poweroff phase of hibernation, that will cause

Wait, this seems like a contradiction. Are we in thaw or are we powering off?

> uas_submit_urbs to be called to submit URB to sense/data/cmd pipe. Then,
> usb_submit_urb return value -ENODEV when device was set to NOTATTACHED
> state. However, uas_submit_urbs always return "SCSI_MLQUEUE_DEVICE_BUSY"
> regardless of the reason for submission failure.That will lead the SCSI
> layer go into an ugly loop and system fail to go into hibernation.

The thing is that the SCSI documentation explicitly tells us to return
either SCSI_MLQUEUE_DEVICE_BUSY or SCSI_MLQUEUE_HOST_BUSY. Now, it makes
sense to tell the SCSI laer that a device or host is gone for good,
if we know that. But we cannot just introduce new error returns on our own.

This needs to be addressed. That means that the SCSI layer or at the
very least the documentation needs to be fixed. Frankly, this is not strictly
speaking a UAS issue. Any thing hotunpluggable should have this issue.

	Regards
		Oliver
  
Weitao Wang Feb. 22, 2024, 8:06 p.m. UTC | #2
On 2024/2/22 17:47, Oliver Neukum wrote:
> 

> On 22.02.24 17:54, Weitao Wang wrote:
>> In the scenario of entering hibernation with udisk in the system, if the
>> udisk was gone or resume fail in the thaw phase of hibernation. Its state
>> will be set to NOTATTACHED. However, usb_hub_wq was already freezed and
>> can't not handle disconnect event. Then, sync cache SCSI command will be
>> sent to this udisk on the poweroff phase of hibernation, that will cause
> 
> Wait, this seems like a contradiction. Are we in thaw or are we powering off?

This fail appear in poweroff phase of hibernation.

>> uas_submit_urbs to be called to submit URB to sense/data/cmd pipe. Then,
>> usb_submit_urb return value -ENODEV when device was set to NOTATTACHED
>> state. However, uas_submit_urbs always return "SCSI_MLQUEUE_DEVICE_BUSY"
>> regardless of the reason for submission failure.That will lead the SCSI
>> layer go into an ugly loop and system fail to go into hibernation.
> 
> The thing is that the SCSI documentation explicitly tells us to return
> either SCSI_MLQUEUE_DEVICE_BUSY or SCSI_MLQUEUE_HOST_BUSY. Now, it makes
> sense to tell the SCSI laer that a device or host is gone for good,
> if we know that. But we cannot just introduce new error returns on our own.
> 
> This needs to be addressed. That means that the SCSI layer or at the
> very least the documentation needs to be fixed. Frankly, this is not strictly
> speaking a UAS issue. Any thing hotunpluggable should have this issue.
> 

Maybe, my description was not accurate enough, here not add new return
value to scsi layer,it just add a case to tell device is gone in the uas
driver internal and the ENODEV error code not return to scsi layer.
Here just notify SCSI layer of device loss through flag DID_NO_CONNECT.
This is also hope to fix this issue in the uas driver internal.

Thanks and best regards,
weitao
  
Oliver Neukum Feb. 27, 2024, 9:05 a.m. UTC | #3
On 22.02.24 21:06, WeitaoWang-oc@zhaoxin.com wrote:

> Maybe, my description was not accurate enough, here not add new return
> value to scsi layer,it just add a case to tell device is gone in the uas
> driver internal and the ENODEV error code not return to scsi layer.
> Here just notify SCSI layer of device loss through flag DID_NO_CONNECT.
> This is also hope to fix this issue in the uas driver internal.

Hi,

sorry for the delay. OK, I see what you are aiming at. Could you redo
the patch with a better description, like:

We need to translate -ENODEV to DID_NOT_CONNECT for the SCSI layer.

	Regards
		Oliver
  
Weitao Wang Feb. 27, 2024, 9:35 p.m. UTC | #4
On 2024/2/27 17:05, Oliver Neukum wrote:
> 

> On 22.02.24 21:06, WeitaoWang-oc@zhaoxin.com wrote:
> 
>> Maybe, my description was not accurate enough, here not add new return
>> value to scsi layer,it just add a case to tell device is gone in the uas
>> driver internal and the ENODEV error code not return to scsi layer.
>> Here just notify SCSI layer of device loss through flag DID_NO_CONNECT.
>> This is also hope to fix this issue in the uas driver internal.
> 
> Hi,
> 
> sorry for the delay. OK, I see what you are aiming at. Could you redo
> the patch with a better description, like:
> 
> We need to translate -ENODEV to DID_NOT_CONNECT for the SCSI layer.
> 
Okay, Thanks for your suggestion. And I'll improve this patch
description in the next version.

Thanks & Best regards,
Weitao
  

Patch

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index 9707f53cfda9..967f18db525a 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -533,7 +533,7 @@  static struct urb *uas_alloc_cmd_urb(struct uas_dev_info *devinfo, gfp_t gfp,
  * daft to me.
  */
 
-static struct urb *uas_submit_sense_urb(struct scsi_cmnd *cmnd, gfp_t gfp)
+static int uas_submit_sense_urb(struct scsi_cmnd *cmnd, gfp_t gfp)
 {
 	struct uas_dev_info *devinfo = cmnd->device->hostdata;
 	struct urb *urb;
@@ -541,16 +541,15 @@  static struct urb *uas_submit_sense_urb(struct scsi_cmnd *cmnd, gfp_t gfp)
 
 	urb = uas_alloc_sense_urb(devinfo, gfp, cmnd);
 	if (!urb)
-		return NULL;
+		return -ENOMEM;
 	usb_anchor_urb(urb, &devinfo->sense_urbs);
 	err = usb_submit_urb(urb, gfp);
 	if (err) {
 		usb_unanchor_urb(urb);
 		uas_log_cmd_state(cmnd, "sense submit err", err);
 		usb_free_urb(urb);
-		return NULL;
 	}
-	return urb;
+	return err;
 }
 
 static int uas_submit_urbs(struct scsi_cmnd *cmnd,
@@ -562,9 +561,9 @@  static int uas_submit_urbs(struct scsi_cmnd *cmnd,
 
 	lockdep_assert_held(&devinfo->lock);
 	if (cmdinfo->state & SUBMIT_STATUS_URB) {
-		urb = uas_submit_sense_urb(cmnd, GFP_ATOMIC);
-		if (!urb)
-			return SCSI_MLQUEUE_DEVICE_BUSY;
+		err = uas_submit_sense_urb(cmnd, GFP_ATOMIC);
+		if (err)
+			return (err == -ENODEV) ? -ENODEV : SCSI_MLQUEUE_DEVICE_BUSY;
 		cmdinfo->state &= ~SUBMIT_STATUS_URB;
 	}
 
@@ -582,7 +581,7 @@  static int uas_submit_urbs(struct scsi_cmnd *cmnd,
 		if (err) {
 			usb_unanchor_urb(cmdinfo->data_in_urb);
 			uas_log_cmd_state(cmnd, "data in submit err", err);
-			return SCSI_MLQUEUE_DEVICE_BUSY;
+			return (err == -ENODEV) ? -ENODEV : SCSI_MLQUEUE_DEVICE_BUSY;
 		}
 		cmdinfo->state &= ~SUBMIT_DATA_IN_URB;
 		cmdinfo->state |= DATA_IN_URB_INFLIGHT;
@@ -602,7 +601,7 @@  static int uas_submit_urbs(struct scsi_cmnd *cmnd,
 		if (err) {
 			usb_unanchor_urb(cmdinfo->data_out_urb);
 			uas_log_cmd_state(cmnd, "data out submit err", err);
-			return SCSI_MLQUEUE_DEVICE_BUSY;
+			return (err == -ENODEV) ? -ENODEV : SCSI_MLQUEUE_DEVICE_BUSY;
 		}
 		cmdinfo->state &= ~SUBMIT_DATA_OUT_URB;
 		cmdinfo->state |= DATA_OUT_URB_INFLIGHT;
@@ -621,7 +620,7 @@  static int uas_submit_urbs(struct scsi_cmnd *cmnd,
 		if (err) {
 			usb_unanchor_urb(cmdinfo->cmd_urb);
 			uas_log_cmd_state(cmnd, "cmd submit err", err);
-			return SCSI_MLQUEUE_DEVICE_BUSY;
+			return (err == -ENODEV) ? -ENODEV : SCSI_MLQUEUE_DEVICE_BUSY;
 		}
 		cmdinfo->cmd_urb = NULL;
 		cmdinfo->state &= ~SUBMIT_CMD_URB;
@@ -698,7 +697,7 @@  static int uas_queuecommand_lck(struct scsi_cmnd *cmnd)
 	 * of queueing, no matter how fatal the error
 	 */
 	if (err == -ENODEV) {
-		set_host_byte(cmnd, DID_ERROR);
+		set_host_byte(cmnd, DID_NO_CONNECT);
 		scsi_done(cmnd);
 		goto zombie;
 	}