[RFC] ata: libata-eh: Retry the cmnd when normal complete occurrd after scsi timeout

Message ID 1672220216-46938-1-git-send-email-wubo40@huawei.com
State New
Headers
Series [RFC] ata: libata-eh: Retry the cmnd when normal complete occurrd after scsi timeout |

Commit Message

Wu Bo Dec. 28, 2022, 9:36 a.m. UTC
  From: wubo <wubo40@huawei.com>

Hi,

Now SCSI middle layer EH and normal IO handler can only choose one of them,
after the SCSI command is completed normally after scsi timeout period,
Should this scenario be given a chance to retry?

Signed-off-by: wubo <wubo40@huawei.com>
---
 drivers/ata/libata-eh.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)
  

Comments

Niklas Cassel Dec. 28, 2022, 5:59 p.m. UTC | #1
On Wed, Dec 28, 2022 at 05:36:56PM +0800, Wu Bo wrote:
> From: wubo <wubo40@huawei.com>
> 
> Hi,
> 
> Now SCSI middle layer EH and normal IO handler can only choose one of them,
> after the SCSI command is completed normally after scsi timeout period,
> Should this scenario be given a chance to retry?
> 
> Signed-off-by: wubo <wubo40@huawei.com>
> ---
>  drivers/ata/libata-eh.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 34303ce..8d1856f 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -617,14 +617,8 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
>  					qc->flags |= ATA_QCFLAG_FAILED;
>  					nr_timedout++;
>  				}
> -			} else {
> -				/* Normal completion occurred after
> -				 * SCSI timeout but before this point.
> -				 * Successfully complete it.
> -				 */
> -				scmd->retries = scmd->allowed;
> +			} else
>  				scsi_eh_finish_cmd(scmd, &ap->eh_done_q);
> -			}
>  		}
>

Hello Wu,


The function comment looks like this:
	/* For new EH, all qcs are finished in one of three ways -
	 * normal completion, error completion, and SCSI timeout.
	 * Both completions can race against SCSI timeout.  When normal
	 * completion wins, the qc never reaches EH.  When error
	 * completion wins, the qc has ATA_QCFLAG_FAILED set.
	 *
	 * When SCSI timeout wins, things are a bit more complex.
	 * Normal or error completion can occur after the timeout but
	 * before this point.  In such cases, both types of
	 * completions are honored.  A scmd is determined to have
	 * timed out iff its associated qc is active and not failed.
	 */

And the code looks like this:

			if (i < ATA_MAX_QUEUE) {
				/* the scmd has an associated qc */
				if (!(qc->flags & ATA_QCFLAG_FAILED)) {
					/* which hasn't failed yet, timeout */
					qc->err_mask |= AC_ERR_TIMEOUT;
					qc->flags |= ATA_QCFLAG_FAILED;
					nr_timedout++;
				}
			} else {
				/* Normal completion occurred after
				 * SCSI timeout but before this point.
				 * Successfully complete it.
				 */
				scmd->retries = scmd->allowed;
				scsi_eh_finish_cmd(scmd, &ap->eh_done_q);
			}



If SCSI timeout wins, but there came an error completion after,
then we will go into the if (i < ATA_MAX_QUEUE) case, but we will
not enter the if !(qc->flags & ATA_QCFLAG_FAILED), as ATA_QCFLAG_FAILED
will already be set by the irq handler. This QC will be completed
by the ata_scsi_port_error_handler(), which gets to run just after
this function has returned:
https://github.com/torvalds/linux/blob/v6.2-rc1/drivers/ata/libata-eh.c#L546



The else clause you are modifying however is for the case where SCSI timeout
wins, but there came a normal completion occurred after the SCSI timeout.

In more detail, what happens first is that scsi_timeout() gets called,
and if scsi timeout wins, it sets SCMD_STATE_COMPLETE:
https://github.com/torvalds/linux/blob/v6.2-rc1/drivers/scsi/scsi_error.c#L355
and then schedules EH for that command using scsi_eh_scmd_add().

What happens next in this specific case is that the IRQ handler is called,
takes the ap->lock (which is also taken is this function so that it can run
at the same time as the IRQ handler), then the IRQ handler calls
__ata_qc_complete() for the QC, however, when scsi_done() is finally called
in ata_qc_done() (from IRQ context), it will not be a no-op, since
SCMD_STATE_COMPLETE is already set:
https://github.com/torvalds/linux/blob/v6.2-rc1/drivers/scsi/scsi_lib.c#L1623

Since scsi_done() never finished the scsi_cmd, we need to finish it here,
in the else clause, by calling scsi_eh_finish_cmd().

When the EH queue is flushed, it will check if scsi_cmd_retry_allowed()
and if it is, the command will be retried, otherwise it will call scsi_finish()
on the command:
https://github.com/torvalds/linux/blob/v6.2-rc1/drivers/scsi/scsi_error.c#L2150

Considering that we want to only finish the scmd here, libata sets
scmd->retries = scmd->allowed; such that the check:
	return ++cmd->retries <= cmd->allowed;
in scsi_cmd_retry_allowed() will evaulate to false.


So TL;DR:
It is absolutely essential to set scmd->retries = scmd->allowed;
in this else clause, as that is the only reason why this command will
be finished instead of retried.
Since this else clause is for a command that timed out, but got completed
successfully via the IRQ handler after timing out (so the QC got freed),
we only need to finish the scmd. Retrying the scmd is wrong in this case.


Kind regards,
Niklas
  
Wu Bo Dec. 29, 2022, 3:17 a.m. UTC | #2
On 2022/12/29 1:59, Niklas Cassel wrote:
> On Wed, Dec 28, 2022 at 05:36:56PM +0800, Wu Bo wrote:
>> From: wubo <wubo40@huawei.com>
>>
>> Hi,
>>
>> Now SCSI middle layer EH and normal IO handler can only choose one of them,
>> after the SCSI command is completed normally after scsi timeout period,
>> Should this scenario be given a chance to retry?
>>
>> Signed-off-by: wubo <wubo40@huawei.com>
>> ---
>>   drivers/ata/libata-eh.c | 8 +-------
>>   1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
>> index 34303ce..8d1856f 100644
>> --- a/drivers/ata/libata-eh.c
>> +++ b/drivers/ata/libata-eh.c
>> @@ -617,14 +617,8 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
>>   					qc->flags |= ATA_QCFLAG_FAILED;
>>   					nr_timedout++;
>>   				}
>> -			} else {
>> -				/* Normal completion occurred after
>> -				 * SCSI timeout but before this point.
>> -				 * Successfully complete it.
>> -				 */
>> -				scmd->retries = scmd->allowed;
>> +			} else
>>   				scsi_eh_finish_cmd(scmd, &ap->eh_done_q);
>> -			}
>>   		}
>>
> 
> Hello Wu,
> 
> 
> The function comment looks like this:
> 	/* For new EH, all qcs are finished in one of three ways -
> 	 * normal completion, error completion, and SCSI timeout.
> 	 * Both completions can race against SCSI timeout.  When normal
> 	 * completion wins, the qc never reaches EH.  When error
> 	 * completion wins, the qc has ATA_QCFLAG_FAILED set.
> 	 *
> 	 * When SCSI timeout wins, things are a bit more complex.
> 	 * Normal or error completion can occur after the timeout but
> 	 * before this point.  In such cases, both types of
> 	 * completions are honored.  A scmd is determined to have
> 	 * timed out iff its associated qc is active and not failed.
> 	 */
> 
> And the code looks like this:
> 
> 			if (i < ATA_MAX_QUEUE) {
> 				/* the scmd has an associated qc */
> 				if (!(qc->flags & ATA_QCFLAG_FAILED)) {
> 					/* which hasn't failed yet, timeout */
> 					qc->err_mask |= AC_ERR_TIMEOUT;
> 					qc->flags |= ATA_QCFLAG_FAILED;
> 					nr_timedout++;
> 				}
> 			} else {
> 				/* Normal completion occurred after
> 				 * SCSI timeout but before this point.
> 				 * Successfully complete it.
> 				 */
> 				scmd->retries = scmd->allowed;
> 				scsi_eh_finish_cmd(scmd, &ap->eh_done_q);
> 			}
> 
> 
> 
> If SCSI timeout wins, but there came an error completion after,
> then we will go into the if (i < ATA_MAX_QUEUE) case, but we will
> not enter the if !(qc->flags & ATA_QCFLAG_FAILED), as ATA_QCFLAG_FAILED
> will already be set by the irq handler. This QC will be completed
> by the ata_scsi_port_error_handler(), which gets to run just after
> this function has returned:
> https://github.com/torvalds/linux/blob/v6.2-rc1/drivers/ata/libata-eh.c#L546
> 
> 
> 
> The else clause you are modifying however is for the case where SCSI timeout
> wins, but there came a normal completion occurred after the SCSI timeout.
> 
> In more detail, what happens first is that scsi_timeout() gets called,
> and if scsi timeout wins, it sets SCMD_STATE_COMPLETE:
> https://github.com/torvalds/linux/blob/v6.2-rc1/drivers/scsi/scsi_error.c#L355
> and then schedules EH for that command using scsi_eh_scmd_add().
> 
> What happens next in this specific case is that the IRQ handler is called,
> takes the ap->lock (which is also taken is this function so that it can run
> at the same time as the IRQ handler), then the IRQ handler calls
> __ata_qc_complete() for the QC, however, when scsi_done() is finally called
> in ata_qc_done() (from IRQ context), it will not be a no-op, since
> SCMD_STATE_COMPLETE is already set:
> https://github.com/torvalds/linux/blob/v6.2-rc1/drivers/scsi/scsi_lib.c#L1623
> 
> Since scsi_done() never finished the scsi_cmd, we need to finish it here,
> in the else clause, by calling scsi_eh_finish_cmd().
> 
> When the EH queue is flushed, it will check if scsi_cmd_retry_allowed()
> and if it is, the command will be retried, otherwise it will call scsi_finish()
> on the command:
> https://github.com/torvalds/linux/blob/v6.2-rc1/drivers/scsi/scsi_error.c#L2150
> 
> Considering that we want to only finish the scmd here, libata sets
> scmd->retries = scmd->allowed; such that the check:
> 	return ++cmd->retries <= cmd->allowed;
> in scsi_cmd_retry_allowed() will evaulate to false.
> 
> 
> So TL;DR:
> It is absolutely essential to set scmd->retries = scmd->allowed;
> in this else clause, as that is the only reason why this command will
> be finished instead of retried.
> Since this else clause is for a command that timed out, but got completed
> successfully via the IRQ handler after timing out (so the QC got freed),
> we only need to finish the scmd. Retrying the scmd is wrong in this case.
> 
> 
> Kind regards,
> Niklas.

Hi Niklas,

Thanks for your detailed reply.

The case where SCSI timeout wins,
but there came a normal completion occurred after the SCSI timeout.
in this scenario, The EH process cannot get the command status
from the IRQ handler process.

If the command is not retried, it can only be returned to the upper 
layer with timeout state.

Log as fllows:
sd 1:0:0:0: [sda] tag#30 scsi_eh_1: flush retry cmd, scmd->retries:3, 
scmd->allowed:2
sd 1:0:0:0: [sda] tag#30 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_TIMEOUT
sd 1:0:0:0: [sda] tag#30 CDB: Read(10) 28 00 07 5e e6 7b 00 00 01 00
print_req_error: I/O error, dev sda, sector 123659899

In fact, The I/O can be completed normally, but it takes a little 
longer. In this scenario, do we give the opportunity as much as possible
so that the IO can return successfully?
  
Niklas Cassel Dec. 29, 2022, 9:21 a.m. UTC | #3
On Thu, Dec 29, 2022 at 11:17:31AM +0800, Wu Bo wrote:
> 
> Hi Niklas,
> 
> Thanks for your detailed reply.
> 
> The case where SCSI timeout wins,
> but there came a normal completion occurred after the SCSI timeout.
> in this scenario, The EH process cannot get the command status
> from the IRQ handler process.
> 
> If the command is not retried, it can only be returned to the upper layer
> with timeout state.

So the possible scenarios are:
1) command completed normally via IRQ handler
2) command completed with error via the IRQ hander

3) SCSI timeout, which is by default 30 seconds. No IRQ came for this specific
   QC for 30 seconds. You should always get a completion for a command
   even if it is error. If the HBA did not send an IRQ for this specific QC
   for 30 seconds, it is obviously misbehaving. libata will reset the HBA,
   and outstanding QCs will be retried.
4) SCSI timeout, so no IRQ came for this specific QC for 30 seconds, so EH gets
   scheduled for this QC, however it takes some milliseconds before the EH
   thread starts running, and then it needs to call the EH ->eh_strategy_handler()
   (ata_scsi_error()) which takes the ap->lock.
   Between the time EH gets scheduled and the ap->lock gets acquired,
   there comes a completion via the IRQ handler that completes the
   command normally.
5) Same as 4) but the IRQ handler completes the command with error.

Note that 4) and 5) are very rare cases.
3) is way more common than 4) and 5).

> 
> Log as fllows:
> sd 1:0:0:0: [sda] tag#30 scsi_eh_1: flush retry cmd, scmd->retries:3,
> scmd->allowed:2
> sd 1:0:0:0: [sda] tag#30 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_TIMEOUT
> sd 1:0:0:0: [sda] tag#30 CDB: Read(10) 28 00 07 5e e6 7b 00 00 01 00
> print_req_error: I/O error, dev sda, sector 123659899

How do you know that this is for scenario 4) ?
You have no prints of the IRQ handler, and the QCs it completed.

To me, it actually looks like scenario 3),
especially since you have "flush retry cmd".

In scenario 3), timed out commands will be retried after the HBA is reset.
Do you see that the HBA gets reset in your log?

If it was scenario 4) or 5) I would have expected a "flush finish cmd"
print instead of a "flush retry cmd" print.


Kind regards,
Niklas
  

Patch

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 34303ce..8d1856f 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -617,14 +617,8 @@  void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
 					qc->flags |= ATA_QCFLAG_FAILED;
 					nr_timedout++;
 				}
-			} else {
-				/* Normal completion occurred after
-				 * SCSI timeout but before this point.
-				 * Successfully complete it.
-				 */
-				scmd->retries = scmd->allowed;
+			} else
 				scsi_eh_finish_cmd(scmd, &ap->eh_done_q);
-			}
 		}
 
 		/* If we have timed out qcs.  They belong to EH from