[03/17] vfio/pci: Consistently acquire mutex for interrupt management

Message ID e7d35d7730f3f83417e757bc264a470f8c2671ed.1706849424.git.reinette.chatre@intel.com
State New
Headers
Series vfio/pci: Remove duplicate code and logic from VFIO PCI interrupt management |

Commit Message

Reinette Chatre Feb. 2, 2024, 4:56 a.m. UTC
  vfio_pci_set_irqs_ioctl() is the entrypoint for interrupt management
via the VFIO_DEVICE_SET_IRQS ioctl(). The igate mutex is obtained
before calling vfio_pci_set_irqs_ioctl() for management of all interrupt
types to protect against concurrent changes to the eventfds associated
with device request notification and error interrupts.

The igate mutex is not acquired consistently. The mutex is always
(for all interrupt types) acquired from within vfio_pci_ioctl_set_irqs()
before calling vfio_pci_set_irqs_ioctl(), but vfio_pci_set_irqs_ioctl() is
called via vfio_pci_core_disable() without the mutex held. The latter
is expected to be correct if the code flow can be guaranteed that
the provided interrupt type is not a device request notification or error
interrupt.

Move igate mutex acquire and release into vfio_pci_set_irqs_ioctl()
to make the locking consistent irrespective of interrupt type.
This is one step closer to contain the interrupt management locking
internals within the interrupt management code so that the VFIO PCI
core can trigger management of the eventfds associated with device
request notification and error interrupts without needing to access
and manipulate VFIO interrupt management locks and data.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Note to maintainers:
Originally formed part of the IMS submission below, but is not
specific to IMS.
https://lore.kernel.org/lkml/cover.1696609476.git.reinette.chatre@intel.com

 drivers/vfio/pci/vfio_pci_core.c  |  3 ---
 drivers/vfio/pci/vfio_pci_intrs.c | 10 ++++++++--
 2 files changed, 8 insertions(+), 5 deletions(-)
  

Comments

Alex Williamson Feb. 5, 2024, 10:34 p.m. UTC | #1
On Thu,  1 Feb 2024 20:56:57 -0800
Reinette Chatre <reinette.chatre@intel.com> wrote:

> vfio_pci_set_irqs_ioctl() is the entrypoint for interrupt management
> via the VFIO_DEVICE_SET_IRQS ioctl(). The igate mutex is obtained
> before calling vfio_pci_set_irqs_ioctl() for management of all interrupt
> types to protect against concurrent changes to the eventfds associated
> with device request notification and error interrupts.
> 
> The igate mutex is not acquired consistently. The mutex is always
> (for all interrupt types) acquired from within vfio_pci_ioctl_set_irqs()
> before calling vfio_pci_set_irqs_ioctl(), but vfio_pci_set_irqs_ioctl() is
> called via vfio_pci_core_disable() without the mutex held. The latter
> is expected to be correct if the code flow can be guaranteed that
> the provided interrupt type is not a device request notification or error
> interrupt.

The latter is correct because it's always a physical interrupt type
(INTx/MSI/MSIX), vdev->irq_type dictates this, and the interrupt code
prevents the handler from being called after the interrupt is disabled.
It's intentional that we don't acquire igate here since we only need to
prevent a race with concurrent user access, which cannot occur in the
fd release path.  The igate mutex is acquired consistently, where it's
required. 

It would be more forthcoming to describe that potential future emulated
device interrupts don't make the same guarantees, but if that's true,
why can't they?

> Move igate mutex acquire and release into vfio_pci_set_irqs_ioctl()
> to make the locking consistent irrespective of interrupt type.
> This is one step closer to contain the interrupt management locking
> internals within the interrupt management code so that the VFIO PCI
> core can trigger management of the eventfds associated with device
> request notification and error interrupts without needing to access
> and manipulate VFIO interrupt management locks and data.

If all we want to do is move the mutex into vfio_pci_intr.c then we
could rename to __vfio_pci_set_irqs_ioctl() and create a wrapper around
it that grabs the mutex.  The disable path could use the lockless
version and we wouldn't need to clutter the exit path unlocking the
mutex as done below.  Thanks,

Alex

> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Note to maintainers:
> Originally formed part of the IMS submission below, but is not
> specific to IMS.
> https://lore.kernel.org/lkml/cover.1696609476.git.reinette.chatre@intel.com
> 
>  drivers/vfio/pci/vfio_pci_core.c  |  3 ---
>  drivers/vfio/pci/vfio_pci_intrs.c | 10 ++++++++--
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 1cbc990d42e0..d2847ca2f0cb 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1214,12 +1214,9 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,
>  			return PTR_ERR(data);
>  	}
>  
> -	mutex_lock(&vdev->igate);
> -
>  	ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index, hdr.start,
>  				      hdr.count, data);
>  
> -	mutex_unlock(&vdev->igate);
>  	kfree(data);
>  
>  	return ret;
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 69ab11863282..97a3bb22b186 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -793,7 +793,9 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
>  	int (*func)(struct vfio_pci_core_device *vdev, unsigned int index,
>  		    unsigned int start, unsigned int count, uint32_t flags,
>  		    void *data) = NULL;
> +	int ret = -ENOTTY;
>  
> +	mutex_lock(&vdev->igate);
>  	switch (index) {
>  	case VFIO_PCI_INTX_IRQ_INDEX:
>  		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> @@ -838,7 +840,11 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
>  	}
>  
>  	if (!func)
> -		return -ENOTTY;
> +		goto out_unlock;
> +
> +	ret = func(vdev, index, start, count, flags, data);
> +out_unlock:
> +	mutex_unlock(&vdev->igate);
> +	return ret;
>  
> -	return func(vdev, index, start, count, flags, data);
>  }
  
Reinette Chatre Feb. 6, 2024, 9:44 p.m. UTC | #2
Hi Alex,

On 2/5/2024 2:34 PM, Alex Williamson wrote:
> On Thu,  1 Feb 2024 20:56:57 -0800
> Reinette Chatre <reinette.chatre@intel.com> wrote:
> 
>> vfio_pci_set_irqs_ioctl() is the entrypoint for interrupt management
>> via the VFIO_DEVICE_SET_IRQS ioctl(). The igate mutex is obtained
>> before calling vfio_pci_set_irqs_ioctl() for management of all interrupt
>> types to protect against concurrent changes to the eventfds associated
>> with device request notification and error interrupts.
>>
>> The igate mutex is not acquired consistently. The mutex is always
>> (for all interrupt types) acquired from within vfio_pci_ioctl_set_irqs()
>> before calling vfio_pci_set_irqs_ioctl(), but vfio_pci_set_irqs_ioctl() is
>> called via vfio_pci_core_disable() without the mutex held. The latter
>> is expected to be correct if the code flow can be guaranteed that
>> the provided interrupt type is not a device request notification or error
>> interrupt.
> 
> The latter is correct because it's always a physical interrupt type
> (INTx/MSI/MSIX), vdev->irq_type dictates this, and the interrupt code
> prevents the handler from being called after the interrupt is disabled.

Thank you for confirming. 

> It's intentional that we don't acquire igate here since we only need to
> prevent a race with concurrent user access, which cannot occur in the
> fd release path.  The igate mutex is acquired consistently, where it's
> required. 

Thank you. I do think it will be helpful to document some of this
in the code to help newcomers distinguish the scenarios (more below).

> It would be more forthcoming to describe that potential future emulated
> device interrupts don't make the same guarantees, but if that's true,
> why can't they?

As I understand an emulated interrupt will be triggered by VFIO PCI driver
as a result from, for example, a mmio write from user space. I thus expect
similar locking to existing device request notification and error interrupts.
I would like to focus this series on existing flows though.

>> Move igate mutex acquire and release into vfio_pci_set_irqs_ioctl()
>> to make the locking consistent irrespective of interrupt type.
>> This is one step closer to contain the interrupt management locking
>> internals within the interrupt management code so that the VFIO PCI
>> core can trigger management of the eventfds associated with device
>> request notification and error interrupts without needing to access
>> and manipulate VFIO interrupt management locks and data.
> 
> If all we want to do is move the mutex into vfio_pci_intr.c then we
> could rename to __vfio_pci_set_irqs_ioctl() and create a wrapper around
> it that grabs the mutex.  The disable path could use the lockless
> version and we wouldn't need to clutter the exit path unlocking the
> mutex as done below.  Thanks,

Will do. This creates an opportunity to document the flows involving
the mutex (essentially adding comments that includes your description
above).

Reinette
  

Patch

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1cbc990d42e0..d2847ca2f0cb 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1214,12 +1214,9 @@  static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,
 			return PTR_ERR(data);
 	}
 
-	mutex_lock(&vdev->igate);
-
 	ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index, hdr.start,
 				      hdr.count, data);
 
-	mutex_unlock(&vdev->igate);
 	kfree(data);
 
 	return ret;
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 69ab11863282..97a3bb22b186 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -793,7 +793,9 @@  int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
 	int (*func)(struct vfio_pci_core_device *vdev, unsigned int index,
 		    unsigned int start, unsigned int count, uint32_t flags,
 		    void *data) = NULL;
+	int ret = -ENOTTY;
 
+	mutex_lock(&vdev->igate);
 	switch (index) {
 	case VFIO_PCI_INTX_IRQ_INDEX:
 		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
@@ -838,7 +840,11 @@  int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
 	}
 
 	if (!func)
-		return -ENOTTY;
+		goto out_unlock;
+
+	ret = func(vdev, index, start, count, flags, data);
+out_unlock:
+	mutex_unlock(&vdev->igate);
+	return ret;
 
-	return func(vdev, index, start, count, flags, data);
 }