[1/2] iommufd/selftest: Use a fwnode to distinguish devices

Message ID e365c08b21a8d0b60e6f5d1411be6701c1a06a53.1701165201.git.robin.murphy@arm.com
State New
Headers
Series iommufd/selftest: Fix and cleanup for bus ops |

Commit Message

Robin Murphy Nov. 28, 2023, 10:42 a.m. UTC
  With bus ops gone, the trick of registering against a specific bus no
longer really works, and we start getting given devices from other buses
to probe, which leads to spurious groups for devices with no IOMMU on
arm64, but may inadvertently steal devices from the real IOMMU on Intel,
AMD or S390. Driver coexistence is based on the fwspec mechanism, so
register with a non-NULL fwnode and give mock devices a corresponding
fwspec, to let the IOMMU core distinguish things correctly for us.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/iommufd/selftest.c | 9 +++++++++
 1 file changed, 9 insertions(+)
  

Comments

Jason Gunthorpe Nov. 28, 2023, 2:43 p.m. UTC | #1
On Tue, Nov 28, 2023 at 10:42:11AM +0000, Robin Murphy wrote:
> With bus ops gone, the trick of registering against a specific bus no
> longer really works, and we start getting given devices from other buses
> to probe,

Make sense

> which leads to spurious groups for devices with no IOMMU on
> arm64, 

I'm not sure I'm fully understanding what this means?

I guess that the mock driver is matching random things once it starts
being called all the time because this is missing:

 static struct iommu_device *mock_probe_device(struct device *dev)
 {
+       if (dev->bus != &iommufd_mock_bus_type)
+               return -ENODEV;
        return &mock_iommu_device;
 }

Is that sufficient to solve the problem?

> but may inadvertently steal devices from the real IOMMU on Intel,
> AMD or S390. 

AMD/Intel/S390 drivers already reject bus's they don't understand.

Intel's device_to_iommu() will fail because
for_each_active_dev_scope() will never match the mock device.

amd fails because check_device() -> get_device_sbdf_id() fails due to
no PCI and not get_acpihid_device_id().

s390 fails because !dev_is_pci(dev).

The fwspec drivers should all fail if they don't have a fwspec, and
they shouldn't for mock bus devices since it doesn't implement
dma_configure.

Jason
  
Robin Murphy Nov. 28, 2023, 4:02 p.m. UTC | #2
On 28/11/2023 2:43 pm, Jason Gunthorpe wrote:
> On Tue, Nov 28, 2023 at 10:42:11AM +0000, Robin Murphy wrote:
>> With bus ops gone, the trick of registering against a specific bus no
>> longer really works, and we start getting given devices from other buses
>> to probe,
> 
> Make sense
> 
>> which leads to spurious groups for devices with no IOMMU on
>> arm64,
> 
> I'm not sure I'm fully understanding what this means?

It means on my arm64 ACPI system, random platform devices which are 
created after iommufd_test_init() has run get successfully probed by the 
mock driver, unexpectedly:

root@crazy-taxi:~# ls /sys/kernel/iommu_groups/*/devices
/sys/kernel/iommu_groups/0/devices:
0000:07:00.0

/sys/kernel/iommu_groups/1/devices:
'Fixed MDIO bus.0'

/sys/kernel/iommu_groups/10/devices:
0001:00:00.0

/sys/kernel/iommu_groups/2/devices:
0000:04:05.0

/sys/kernel/iommu_groups/3/devices:
0000:08:00.0

/sys/kernel/iommu_groups/4/devices:
0000:09:00.0

/sys/kernel/iommu_groups/5/devices:
0001:01:00.0

/sys/kernel/iommu_groups/6/devices:
alarmtimer.2.auto

/sys/kernel/iommu_groups/7/devices:
psci-cpuidle

/sys/kernel/iommu_groups/8/devices:
snd-soc-dummy

/sys/kernel/iommu_groups/9/devices:
0000:00:00.0  0000:01:00.0  0000:02:08.0  0000:02:10.0  0000:02:11.0 
0000:02:12.0  0000:02:13.0  0000:02:14.0  0000:03:00.0
root@crazy-taxi:~# cat /sys/kernel/iommu_groups/*/type
DMA
blocked
DMA
DMA
DMA
DMA
DMA
blocked
blocked
blocked
DMA

> I guess that the mock driver is matching random things once it starts
> being called all the time because this is missing:
> 
>   static struct iommu_device *mock_probe_device(struct device *dev)
>   {
> +       if (dev->bus != &iommufd_mock_bus_type)
> +               return -ENODEV;
>          return &mock_iommu_device;
>   }
> 
> Is that sufficient to solve the problem?

Unfortunately not...

>> but may inadvertently steal devices from the real IOMMU on Intel,
>> AMD or S390.
> 
> AMD/Intel/S390 drivers already reject bus's they don't understand.
> 
> Intel's device_to_iommu() will fail because
> for_each_active_dev_scope() will never match the mock device.
> 
> amd fails because check_device() -> get_device_sbdf_id() fails due to
> no PCI and not get_acpihid_device_id().
> 
> s390 fails because !dev_is_pci(dev).

Indeed, but then when such probes do fail, they've failed for good. We 
don't have any way to somehow dig up the mock driver's ops and try 
again, so the selftest ends up broken (i.e. the real driver "steals" the 
mock devices, in the inverse of the case I was concerned about if the 
mock driver somehow manages to register first).

The assumption was as commented in the code, that there would only ever 
be one driver per system *not* using fwnodes, but as I say I missed the 
mock driver when considering that. To be fair, I'm not sure it even 
existed when I *first* wrote that code :)

I did intend coexistence to work on x86 too, where the "other" driver 
would be virtio-iommu using fwnodes, so aligning the mock driver that 
way seemed far neater than any more special-case hacks in core code.

> The fwspec drivers should all fail if they don't have a fwspec, and
> they shouldn't for mock bus devices since it doesn't implement
> dma_configure.

Right, the selftests still work fine on my arm64 system (and the 
spurious groups happen to be benign since those aren't real DMA-capable 
device anyway), but I expect they're busted on x86/s390 with today's -next.

Thanks,
Robin.
  
Jason Gunthorpe Nov. 28, 2023, 4:33 p.m. UTC | #3
On Tue, Nov 28, 2023 at 04:02:42PM +0000, Robin Murphy wrote:
> On 28/11/2023 2:43 pm, Jason Gunthorpe wrote:
> > On Tue, Nov 28, 2023 at 10:42:11AM +0000, Robin Murphy wrote:
> > > With bus ops gone, the trick of registering against a specific bus no
> > > longer really works, and we start getting given devices from other buses
> > > to probe,
> > 
> > Make sense
> > 
> > > which leads to spurious groups for devices with no IOMMU on
> > > arm64,
> > 
> > I'm not sure I'm fully understanding what this means?
> 
> It means on my arm64 ACPI system, random platform devices which are created
> after iommufd_test_init() has run get successfully probed by the mock
> driver, unexpectedly:

Okay that is what I guessed

> > I guess that the mock driver is matching random things once it starts
> > being called all the time because this is missing:
> > 
> >   static struct iommu_device *mock_probe_device(struct device *dev)
> >   {
> > +       if (dev->bus != &iommufd_mock_bus_type)
> > +               return -ENODEV;
> >          return &mock_iommu_device;
> >   }
> > 
> > Is that sufficient to solve the problem?
> 
> Unfortunately not...

I see, so we create the other problem that without bus ops we don't
get to have two 'global' drivers and with the above mock won't probe
on x86.

> I did intend coexistence to work on x86 too, where the "other" driver would
> be virtio-iommu using fwnodes, so aligning the mock driver that way seemed
> far neater than any more special-case hacks in core code.

Lets just do the above and what I suggested earlier. This is from a
WIP tree I have, it shows the idea but needs other stuff to work. If
you agree I'll pull its parts out and post a clean version of them.

commit 51c9a54cc111b4b31af6a0527015db82e782e1d3
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date:   Tue Nov 28 11:54:47 2023 -0400

    iommu: Call all drivers if there is no fwspec
    
    Real systems only have one ops, so this effectively invokes the single op
    in the system to probe each device. If there are multiple ops we invoke
    each one once, and drivers that don't understand the struct device should
    return -ENODEV.
    
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7468a64778931b..54e3f14429b3b4 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -241,6 +241,26 @@ static int remove_iommu_group(struct device *dev, void *data)
 	return 0;
 }
 
+static void iommu_device_add(struct iommu_device *iommu)
+{
+	struct iommu_device *cur;
+
+	/*
+	 * Keep the iommu_device_list grouped by ops so that
+	 * iommu_find_init_device() works efficiently.
+	 */
+	mutex_lock(&iommu_probe_device_lock);
+	list_for_each_entry(cur, &iommu_device_list, list) {
+		if (cur->ops == iommu->ops) {
+			list_add(&iommu->list, &cur->list);
+			goto out;
+		}
+	}
+	list_add(&iommu->list, &iommu_device_list);
+out:
+	mutex_unlock(&iommu_probe_device_lock);
+}
+
 /**
  * iommu_device_register() - Register an IOMMU hardware instance
  * @iommu: IOMMU handle for the instance
@@ -262,9 +282,7 @@ int iommu_device_register(struct iommu_device *iommu,
 	if (hwdev)
 		iommu->fwnode = dev_fwnode(hwdev);
 
-	mutex_lock(&iommu_probe_device_lock);
-	list_add_tail(&iommu->list, &iommu_device_list);
-	mutex_unlock(&iommu_probe_device_lock);
+	iommu_device_add(iommu);
 
 	for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++)
 		err = bus_iommu_probe(iommu_buses[i]);
@@ -502,6 +520,29 @@ static void iommu_deinit_device(struct device *dev)
 
 DEFINE_MUTEX(iommu_probe_device_lock);
 
+static int iommu_find_init_device(struct iommu_probe_info *pinf)
+{
+	const struct iommu_ops *ops = NULL;
+	struct iommu_device *iommu;
+	int ret;
+
+	lockdep_assert_held(&iommu_probe_device_lock);
+
+	/*
+	 * Each unique ops gets a chance to claim the device, -ENODEV means the
+	 * driver does not support the device.
+	 */
+	list_for_each_entry(iommu, &iommu_device_list, list) {
+		if (iommu->ops != ops) {
+			ops = iommu->ops;
+			ret = iommu_init_device(pinf, iommu->ops);
+			if (ret != -ENODEV)
+				return ret;
+		}
+	}
+	return -ENODEV;
+}
+
 static int __iommu_probe_device(struct iommu_probe_info *pinf)
 {
 	struct device *dev = pinf->dev;
@@ -524,13 +565,6 @@ static int __iommu_probe_device(struct iommu_probe_info *pinf)
 		ops = fwspec->ops;
 		if (!ops)
 			return -ENODEV;
-	} else {
-		struct iommu_device *iommu;
-
-		iommu = iommu_device_from_fwnode(NULL);
-		if (!iommu)
-			return -ENODEV;
-		ops = iommu->ops;
 	}
 
 	/*
@@ -546,7 +580,10 @@ static int __iommu_probe_device(struct iommu_probe_info *pinf)
 	if (dev->iommu_group)
 		return 0;
 
-	ret = iommu_init_device(pinf, ops);
+	if (ops)
+		ret = iommu_init_device(pinf, ops);
+	else
+		ret = iommu_find_init_device(pinf);
 	if (ret)
 		return ret;
  
Robin Murphy Nov. 28, 2023, 5:36 p.m. UTC | #4
On 28/11/2023 4:33 pm, Jason Gunthorpe wrote:
> On Tue, Nov 28, 2023 at 04:02:42PM +0000, Robin Murphy wrote:
>> On 28/11/2023 2:43 pm, Jason Gunthorpe wrote:
>>> On Tue, Nov 28, 2023 at 10:42:11AM +0000, Robin Murphy wrote:
>>>> With bus ops gone, the trick of registering against a specific bus no
>>>> longer really works, and we start getting given devices from other buses
>>>> to probe,
>>>
>>> Make sense
>>>
>>>> which leads to spurious groups for devices with no IOMMU on
>>>> arm64,
>>>
>>> I'm not sure I'm fully understanding what this means?
>>
>> It means on my arm64 ACPI system, random platform devices which are created
>> after iommufd_test_init() has run get successfully probed by the mock
>> driver, unexpectedly:
> 
> Okay that is what I guessed
> 
>>> I guess that the mock driver is matching random things once it starts
>>> being called all the time because this is missing:
>>>
>>>    static struct iommu_device *mock_probe_device(struct device *dev)
>>>    {
>>> +       if (dev->bus != &iommufd_mock_bus_type)
>>> +               return -ENODEV;
>>>           return &mock_iommu_device;
>>>    }
>>>
>>> Is that sufficient to solve the problem?
>>
>> Unfortunately not...
> 
> I see, so we create the other problem that without bus ops we don't
> get to have two 'global' drivers and with the above mock won't probe
> on x86.
> 
>> I did intend coexistence to work on x86 too, where the "other" driver would
>> be virtio-iommu using fwnodes, so aligning the mock driver that way seemed
>> far neater than any more special-case hacks in core code.
> 
> Lets just do the above and what I suggested earlier. This is from a
> WIP tree I have, it shows the idea but needs other stuff to work. If
> you agree I'll pull its parts out and post a clean version of them.
> 
> commit 51c9a54cc111b4b31af6a0527015db82e782e1d3
> Author: Jason Gunthorpe <jgg@ziepe.ca>
> Date:   Tue Nov 28 11:54:47 2023 -0400
> 
>      iommu: Call all drivers if there is no fwspec
>      
>      Real systems only have one ops, so this effectively invokes the single op
>      in the system to probe each device. If there are multiple ops we invoke
>      each one once, and drivers that don't understand the struct device should
>      return -ENODEV.

You see this is exactly the kind of complexity I *don't* want, since the 
only thing it would foreseeably benefit is the one special case of the 
IOMMUFD selftest, which can far more trivially just adopt the other of 
the two "standard" usage models we have. I've been trying to get *away* 
from having to have boilerplate checks in all the drivers, and this 
would require bringing back a load of the ones I've just removed :(

As I said before, I really want to avoid the perf_event_init model of 
calling round every driver saying "hey, do you want this?" since it's 
also error-prone if any of those drivers doesn't get the boilerplate 
exactly right and inadvertently fails to reject something it should 
have. The difference with perf is that it has the notion of generic 
events which *can* be handled by more than one driver. We do not, and 
conceivably never will, have that for IOMMU client devices, so we can 
realistically make the core code responsible for calling the right 
driver by construction, and since we're now mostly there already, it 
seem by far the most sensible thing to continue in that direction.

Thanks,
Robin.


>      Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 7468a64778931b..54e3f14429b3b4 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -241,6 +241,26 @@ static int remove_iommu_group(struct device *dev, void *data)
>   	return 0;
>   }
>   
> +static void iommu_device_add(struct iommu_device *iommu)
> +{
> +	struct iommu_device *cur;
> +
> +	/*
> +	 * Keep the iommu_device_list grouped by ops so that
> +	 * iommu_find_init_device() works efficiently.
> +	 */
> +	mutex_lock(&iommu_probe_device_lock);
> +	list_for_each_entry(cur, &iommu_device_list, list) {
> +		if (cur->ops == iommu->ops) {
> +			list_add(&iommu->list, &cur->list);
> +			goto out;
> +		}
> +	}
> +	list_add(&iommu->list, &iommu_device_list);
> +out:
> +	mutex_unlock(&iommu_probe_device_lock);
> +}
> +
>   /**
>    * iommu_device_register() - Register an IOMMU hardware instance
>    * @iommu: IOMMU handle for the instance
> @@ -262,9 +282,7 @@ int iommu_device_register(struct iommu_device *iommu,
>   	if (hwdev)
>   		iommu->fwnode = dev_fwnode(hwdev);
>   
> -	mutex_lock(&iommu_probe_device_lock);
> -	list_add_tail(&iommu->list, &iommu_device_list);
> -	mutex_unlock(&iommu_probe_device_lock);
> +	iommu_device_add(iommu);
>   
>   	for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++)
>   		err = bus_iommu_probe(iommu_buses[i]);
> @@ -502,6 +520,29 @@ static void iommu_deinit_device(struct device *dev)
>   
>   DEFINE_MUTEX(iommu_probe_device_lock);
>   
> +static int iommu_find_init_device(struct iommu_probe_info *pinf)
> +{
> +	const struct iommu_ops *ops = NULL;
> +	struct iommu_device *iommu;
> +	int ret;
> +
> +	lockdep_assert_held(&iommu_probe_device_lock);
> +
> +	/*
> +	 * Each unique ops gets a chance to claim the device, -ENODEV means the
> +	 * driver does not support the device.
> +	 */
> +	list_for_each_entry(iommu, &iommu_device_list, list) {
> +		if (iommu->ops != ops) {
> +			ops = iommu->ops;
> +			ret = iommu_init_device(pinf, iommu->ops);
> +			if (ret != -ENODEV)
> +				return ret;
> +		}
> +	}
> +	return -ENODEV;
> +}
> +
>   static int __iommu_probe_device(struct iommu_probe_info *pinf)
>   {
>   	struct device *dev = pinf->dev;
> @@ -524,13 +565,6 @@ static int __iommu_probe_device(struct iommu_probe_info *pinf)
>   		ops = fwspec->ops;
>   		if (!ops)
>   			return -ENODEV;
> -	} else {
> -		struct iommu_device *iommu;
> -
> -		iommu = iommu_device_from_fwnode(NULL);
> -		if (!iommu)
> -			return -ENODEV;
> -		ops = iommu->ops;
>   	}
>   
>   	/*
> @@ -546,7 +580,10 @@ static int __iommu_probe_device(struct iommu_probe_info *pinf)
>   	if (dev->iommu_group)
>   		return 0;
>   
> -	ret = iommu_init_device(pinf, ops);
> +	if (ops)
> +		ret = iommu_init_device(pinf, ops);
> +	else
> +		ret = iommu_find_init_device(pinf);
>   	if (ret)
>   		return ret;
>
  
Jason Gunthorpe Nov. 28, 2023, 7:07 p.m. UTC | #5
On Tue, Nov 28, 2023 at 05:36:33PM +0000, Robin Murphy wrote:

> You see this is exactly the kind of complexity I *don't* want, since the
> only thing it would foreseeably benefit is the one special case of the
> IOMMUFD selftest, which can far more trivially just adopt the other of the
> two "standard" usage models we have. I've been trying to get *away* from
> having to have boilerplate checks in all the drivers, and this would require
> bringing back a load of the ones I've just removed :(

I don't think we need to bring back the fwspec checks you removed, the
loop just needs to keep the NULL check:

 +	list_for_each_entry(iommu, &iommu_device_list, list) {
 +		if (iommu->ops != ops && !iommu->fwnode) {
 +			ops = iommu->ops;
 +			ret = iommu_init_device(pinf, iommu->ops);
 +			if (ret != -ENODEV)
 +				return ret;
 +		}
 +	}

Iterate over all the global driver ops only. Drivers with a fwnode
will never be called without a fwspec.

Also, does omap have problems now too? omap seems to set fwnode but
does some slightly different open coded non-fwspec parsing that worked
at bus time? Is it still OK? Does fwspec even find ops in omap's FW
description (ie it looks like it make iommu-cells optional or
something)?

> As I said before, I really want to avoid the perf_event_init model of
> calling round every driver saying "hey, do you want this?" since it's also
> error-prone if any of those drivers doesn't get the boilerplate exactly
> right and inadvertently fails to reject something it should have. 

The core missed an API that every driver needs: give me the struct
iommu_driver* the FW has referenced.

Instead every driver open codes something like
arm_smmu_get_by_fwnode(), or much worse.

If we force the drivers to say

 iommu_driver = iommu_fw_give_me_my_driver(dev, ops)

Then we automatically have a place to do all the rejection checks we
need, and driver's can't inadvertently skip this because they really
can't work without the iommu_driver at all.

Anyhow, I completed the series I talked about yesterday. It turned out
really nice I think, especially the driver facing API is much
cleaner. I'm just going through the last bits before I share it.

Jason
  

Patch

diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 5d93434003d8..f46ce0f8808d 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -432,7 +432,11 @@  static bool mock_domain_capable(struct device *dev, enum iommu_cap cap)
 	return false;
 }
 
+static struct fwnode_handle mock_fwnode = {
+};
+
 static struct iommu_device mock_iommu_device = {
+	.fwnode = &mock_fwnode,
 };
 
 static struct iommu_device *mock_probe_device(struct device *dev)
@@ -569,12 +573,17 @@  static struct mock_dev *mock_dev_create(unsigned long dev_flags)
 	if (rc)
 		goto err_put;
 
+	rc = iommu_fwspec_init(&mdev->dev, &mock_fwnode, &mock_ops);
+	if (rc)
+		goto err_put;
+
 	rc = device_add(&mdev->dev);
 	if (rc)
 		goto err_put;
 	return mdev;
 
 err_put:
+	iommu_fwspec_free(&mdev->dev);
 	put_device(&mdev->dev);
 	return ERR_PTR(rc);
 }