[7/7] s390/vfio_ap: always clean up IRQ resources

Message ID 20221213154437.15480-8-akrowiak@linux.ibm.com
State New
Headers
Series improve AP queue reset processing |

Commit Message

Anthony Krowiak Dec. 13, 2022, 3:44 p.m. UTC
  Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an error
not handled by a case statement.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Harald Freudenberger Dec. 15, 2022, 10:58 a.m. UTC | #1
On 2022-12-13 16:44, Tony Krowiak wrote:
> Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an 
> error
> not handled by a case statement.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index e80c5a6b91be..2dd8db9ddb39 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1676,7 +1676,7 @@ static int vfio_ap_mdev_reset_queue(struct
> vfio_ap_queue *q)
>  		     "PQAP/ZAPQ for %02x.%04x failed with invalid rc=%u\n",
>  		     AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
>  		     status.response_code);
> -		return -EIO;
> +		break;
>  	}
> 
>  	vfio_ap_free_aqic_resources(q);

Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
  
Halil Pasic Dec. 19, 2022, 2:10 p.m. UTC | #2
On Tue, 13 Dec 2022 10:44:37 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an error
> not handled by a case statement.

Why?

I'm afraid this is a step in the wrong direction...

> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index e80c5a6b91be..2dd8db9ddb39 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1676,7 +1676,7 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>  		     "PQAP/ZAPQ for %02x.%04x failed with invalid rc=%u\n",
>  		     AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
>  		     status.response_code);
> -		return -EIO;
> +		break;
>  	}
>  
>  	vfio_ap_free_aqic_resources(q);
  
Anthony Krowiak Dec. 20, 2022, 2:33 p.m. UTC | #3
On 12/19/22 9:10 AM, Halil Pasic wrote:
> On Tue, 13 Dec 2022 10:44:37 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an error
>> not handled by a case statement.
> Why?


If the ZAPQ failed, then instructions submitted to the same queue will 
likewise fail. Are you saying it's not safe to assume, therefore, that 
interrupts will not be occurring?


>
> I'm afraid this is a step in the wrong direction...


Please explain why.


>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index e80c5a6b91be..2dd8db9ddb39 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -1676,7 +1676,7 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>>   		     "PQAP/ZAPQ for %02x.%04x failed with invalid rc=%u\n",
>>   		     AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
>>   		     status.response_code);
>> -		return -EIO;
>> +		break;
>>   	}
>>   
>>   	vfio_ap_free_aqic_resources(q);
  
Halil Pasic Dec. 20, 2022, 5:24 p.m. UTC | #4
On Tue, 20 Dec 2022 09:33:03 -0500
Anthony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 12/19/22 9:10 AM, Halil Pasic wrote:
> > On Tue, 13 Dec 2022 10:44:37 -0500
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >  
> >> Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an error
> >> not handled by a case statement.  
> > Why?  
> 
> 
> If the ZAPQ failed, then instructions submitted to the same queue will 
> likewise fail. Are you saying it's not safe to assume, therefore, that 
> interrupts will not be occurring?

Right. We are talking about the default branch here, and I suppose, the
codes where we know that it is safe to assume that no reset is needed
handled separately (AP_RESPONSE_DECONFIGURED).

I'm not convinced that if we take the default branch we can safely
assume, that we won't see any interrupts.

For example consider hot-unplug as done by KVM. We modify the
CRYCB/APCB with all vCPUS take out of SIE, but we don't keep
the vCPUs out of SIE until the resets of the unpugged queues
are done, and we don't do any extra interrupt disablement
with all vCPUs keept out of SIE. So I believe currently there
may be a window where the guest can observe a 01 but the
interrupts are still live. That may be a bug, but IMHO it ain't clear
cut.

But it is not just about interrupts. Before we returned an error
code, which gets propagated to the userspace if this reset was
triggered via the ioctl.

With this change, ret seems to be uninitialized when returned 
if we take the code path which you change here. So we would
end up logging a warning and returning garbage?

One could also debate, whether RCs introduced down the road
can affect the logic here (even if the statement "if we
see an RC other that 00 and 02, we don't need to pursue a
reset any further, and interrpts are disabled" were to be
guaranteed to be true now, new RCs could theoretically mess
this up).

 
> 
> 
> >
> > I'm afraid this is a step in the wrong direction...  
> 
> 
> Please explain why.
> 

Sorry, I kept this brief because IMHO it is your job to tell us why
this needs to be changed. But I gave in, as you see.

Regards,
Halil
  
Anthony Krowiak Jan. 9, 2023, 7:40 p.m. UTC | #5
On 12/20/22 12:24 PM, Halil Pasic wrote:
> On Tue, 20 Dec 2022 09:33:03 -0500
> Anthony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On 12/19/22 9:10 AM, Halil Pasic wrote:
>>> On Tue, 13 Dec 2022 10:44:37 -0500
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>   
>>>> Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an error
>>>> not handled by a case statement.
>>> Why?
>>
>> If the ZAPQ failed, then instructions submitted to the same queue will
>> likewise fail. Are you saying it's not safe to assume, therefore, that
>> interrupts will not be occurring?
> Right. We are talking about the default branch here, and I suppose, the
> codes where we know that it is safe to assume that no reset is needed
> handled separately (AP_RESPONSE_DECONFIGURED).
>
> I'm not convinced that if we take the default branch we can safely
> assume, that we won't see any interrupts.
>
> For example consider hot-unplug as done by KVM. We modify the
> CRYCB/APCB with all vCPUS take out of SIE, but we don't keep
> the vCPUs out of SIE until the resets of the unpugged queues
> are done, and we don't do any extra interrupt disablement
> with all vCPUs keept out of SIE. So I believe currently there
> may be a window where the guest can observe a 01 but the
> interrupts are still live. That may be a bug, but IMHO it ain't clear
> cut.
>
> But it is not just about interrupts. Before we returned an error
> code, which gets propagated to the userspace if this reset was
> triggered via the ioctl.
>
> With this change, ret seems to be uninitialized when returned
> if we take the code path which you change here. So we would
> end up logging a warning and returning garbage?


That was an oversight. The -EIO value was returned previously, so the 
ret = -EIO should be set in the default case.


>
> One could also debate, whether RCs introduced down the road
> can affect the logic here (even if the statement "if we
> see an RC other that 00 and 02, we don't need to pursue a
> reset any further, and interrpts are disabled" were to be
> guaranteed to be true now, new RCs could theoretically mess
> this up).


I think that would be the case regardless of this change. If new RCs are 
introduced, this function ought to be revisited anyway and appropriate 
changes made.


>
>   
>>
>>> I'm afraid this is a step in the wrong direction...
>>
>> Please explain why.
>>
> Sorry, I kept this brief because IMHO it is your job to tell us why
> this needs to be changed. But I gave in, as you see.
>
> Regards,
> Halil
  

Patch

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e80c5a6b91be..2dd8db9ddb39 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1676,7 +1676,7 @@  static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
 		     "PQAP/ZAPQ for %02x.%04x failed with invalid rc=%u\n",
 		     AP_QID_CARD(q->apqn), AP_QID_QUEUE(q->apqn),
 		     status.response_code);
-		return -EIO;
+		break;
 	}
 
 	vfio_ap_free_aqic_resources(q);