[1/1] iommufd/selftest: Use right iommu_ops for mock device

Message ID 20240111073213.180020-1-baolu.lu@linux.intel.com
State New
Headers
Series [1/1] iommufd/selftest: Use right iommu_ops for mock device |

Commit Message

Baolu Lu Jan. 11, 2024, 7:32 a.m. UTC
  In the iommu probe device path, __iommu_probe_device() gets the iommu_ops
for the device from dev->iommu->fwspec if this field has been initialized
before probing. Otherwise, it will lookup the global iommu device list
and use the iommu_ops of the first iommu device which has no
dev->iommu->fwspec. This causes the wrong iommu_ops to be used for the mock
device on x86 platforms where dev->iommu->fwspec is not used.

Preallocate the fwspec for the mock device so that the right iommu ops can
be used.

Fixes: 17de3f5fdd35 ("iommu: Retire bus ops")
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommufd/selftest.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)
  

Comments

Jason Gunthorpe Jan. 11, 2024, 2:48 p.m. UTC | #1
On Thu, Jan 11, 2024 at 03:32:13PM +0800, Lu Baolu wrote:
> In the iommu probe device path, __iommu_probe_device() gets the iommu_ops
> for the device from dev->iommu->fwspec if this field has been initialized
> before probing. Otherwise, it will lookup the global iommu device list
> and use the iommu_ops of the first iommu device which has no
> dev->iommu->fwspec. This causes the wrong iommu_ops to be used for the mock
> device on x86 platforms where dev->iommu->fwspec is not used.
> 
> Preallocate the fwspec for the mock device so that the right iommu ops can
> be used.

I really don't like this.

The lifecycle model for fwspec is already a bit confusing. Introducing
a new case where a driver pre-allocates the fwspec is making it worse,
not better.

eg iommu_init_device() error unwind will free this allocated fwspec
leaving the device broken. We don't have the concept of a fwspec that
is owned by the device, it is really owned by the probing code.

The fundamental issue is we now have a special kind of driver:

	fwspec = dev_iommu_fwspec_get(dev);
	if (fwspec && fwspec->ops)
		ops = fwspec->ops;
	else
		ops = iommu_ops_from_fwnode(NULL);
                                           ^^^^^^^^

Which represents a "global" non-fwspec using driver that will only
bind to devices that didn't parse into a fwspec.

The code above supports only one of these drivers at time, but allows
more than one to be registered - it is inconsistent.

I think the right/easy answer is to iterate over all the "global"
drivers and call their probe instead of just the first one.

Especially since my approach over here migrates the whole thing to work
by iterating:

https://lore.kernel.org/all/0-v2-f82a05539a64+5109-iommu_fwspec_p2_jgg@nvidia.com/

And this patch:

https://lore.kernel.org/all/28-v2-f82a05539a64+5109-iommu_fwspec_p2_jgg@nvidia.com/

Is how I made the iterating logic, it could be pulled out and tidied a
bit.

Jason
  
Robin Murphy Jan. 11, 2024, 3:50 p.m. UTC | #2
On 11/01/2024 2:48 pm, Jason Gunthorpe wrote:
> On Thu, Jan 11, 2024 at 03:32:13PM +0800, Lu Baolu wrote:
>> In the iommu probe device path, __iommu_probe_device() gets the iommu_ops
>> for the device from dev->iommu->fwspec if this field has been initialized
>> before probing. Otherwise, it will lookup the global iommu device list
>> and use the iommu_ops of the first iommu device which has no
>> dev->iommu->fwspec. This causes the wrong iommu_ops to be used for the mock
>> device on x86 platforms where dev->iommu->fwspec is not used.
>>
>> Preallocate the fwspec for the mock device so that the right iommu ops can
>> be used.
> 
> I really don't like this.
> 
> The lifecycle model for fwspec is already a bit confusing. Introducing
> a new case where a driver pre-allocates the fwspec is making it worse,
> not better.
> 
> eg iommu_init_device() error unwind will free this allocated fwspec
> leaving the device broken. We don't have the concept of a fwspec that
> is owned by the device, it is really owned by the probing code.

As I've tried to explain before, this is in fact the correct use of 
fwspec as originally designed, i.e. being set up by *bus code* before 
device_add() (remember this is not the "IOMMU driver" part of selftest.c).

Indeed for perfect symmetry the bus code would free the fwspec after the 
corresponding device_del() returns, but there's no harm in that being 
factored into iommu_release_device() since the notifier call occurs 
sufficiently late in device_del() itself as to make no practical difference.

I'm working to get things back to that model (wherein the dev_iommu and 
fwspec lifecycles become trivial), just with the slight tweak that these 
days it's going to make more sense to have the initialisation factored 
into device_add() itself (via iommu_probe_device()), rather than beforehand.

Thanks,
Robin.

> The fundamental issue is we now have a special kind of driver:
> 
> 	fwspec = dev_iommu_fwspec_get(dev);
> 	if (fwspec && fwspec->ops)
> 		ops = fwspec->ops;
> 	else
> 		ops = iommu_ops_from_fwnode(NULL);
>                                             ^^^^^^^^
> 
> Which represents a "global" non-fwspec using driver that will only
> bind to devices that didn't parse into a fwspec.
> 
> The code above supports only one of these drivers at time, but allows
> more than one to be registered - it is inconsistent.
> 
> I think the right/easy answer is to iterate over all the "global"
> drivers and call their probe instead of just the first one.
> 
> Especially since my approach over here migrates the whole thing to work
> by iterating:
> 
> https://lore.kernel.org/all/0-v2-f82a05539a64+5109-iommu_fwspec_p2_jgg@nvidia.com/
> 
> And this patch:
> 
> https://lore.kernel.org/all/28-v2-f82a05539a64+5109-iommu_fwspec_p2_jgg@nvidia.com/
> 
> Is how I made the iterating logic, it could be pulled out and tidied a
> bit.
> 
> Jason
  
Jason Gunthorpe Jan. 11, 2024, 3:56 p.m. UTC | #3
On Thu, Jan 11, 2024 at 03:50:51PM +0000, Robin Murphy wrote:
> On 11/01/2024 2:48 pm, Jason Gunthorpe wrote:
> > On Thu, Jan 11, 2024 at 03:32:13PM +0800, Lu Baolu wrote:
> > > In the iommu probe device path, __iommu_probe_device() gets the iommu_ops
> > > for the device from dev->iommu->fwspec if this field has been initialized
> > > before probing. Otherwise, it will lookup the global iommu device list
> > > and use the iommu_ops of the first iommu device which has no
> > > dev->iommu->fwspec. This causes the wrong iommu_ops to be used for the mock
> > > device on x86 platforms where dev->iommu->fwspec is not used.
> > > 
> > > Preallocate the fwspec for the mock device so that the right iommu ops can
> > > be used.
> > 
> > I really don't like this.
> > 
> > The lifecycle model for fwspec is already a bit confusing. Introducing
> > a new case where a driver pre-allocates the fwspec is making it worse,
> > not better.
> > 
> > eg iommu_init_device() error unwind will free this allocated fwspec
> > leaving the device broken. We don't have the concept of a fwspec that
> > is owned by the device, it is really owned by the probing code.
> 
> As I've tried to explain before, this is in fact the correct use of fwspec
> as originally designed, i.e. being set up by *bus code* before device_add()
> (remember this is not the "IOMMU driver" part of selftest.c).

I understand it was the intention, but it doesn't relaly match how the
code works today..

> Indeed for perfect symmetry the bus code would free the fwspec after the
> corresponding device_del() returns, but there's no harm in that being
> factored into iommu_release_device() since the notifier call occurs
> sufficiently late in device_del() itself as to make no practical difference.

IIRC there were issues with leaking the dev_iommu :(

> I'm working to get things back to that model (wherein the dev_iommu and
> fwspec lifecycles become trivial), just with the slight tweak that these
> days it's going to make more sense to have the initialisation factored into
> device_add() itself (via iommu_probe_device()), rather than beforehand.

I would prefer to simply remove fwspec as I've already shown patches
for. You should give some comment on them.

My main complaint is there is no full vision to remove the 'global
drivers', we will always have some drivers doing FW parsing in probe
and then this different fwspec thing on the side for other drivers.

Jason
  
Robin Murphy Jan. 16, 2024, 6:19 p.m. UTC | #4
On 11/01/2024 3:56 pm, Jason Gunthorpe wrote:
> On Thu, Jan 11, 2024 at 03:50:51PM +0000, Robin Murphy wrote:
>> On 11/01/2024 2:48 pm, Jason Gunthorpe wrote:
>>> On Thu, Jan 11, 2024 at 03:32:13PM +0800, Lu Baolu wrote:
>>>> In the iommu probe device path, __iommu_probe_device() gets the iommu_ops
>>>> for the device from dev->iommu->fwspec if this field has been initialized
>>>> before probing. Otherwise, it will lookup the global iommu device list
>>>> and use the iommu_ops of the first iommu device which has no
>>>> dev->iommu->fwspec. This causes the wrong iommu_ops to be used for the mock
>>>> device on x86 platforms where dev->iommu->fwspec is not used.
>>>>
>>>> Preallocate the fwspec for the mock device so that the right iommu ops can
>>>> be used.
>>>
>>> I really don't like this.
>>>
>>> The lifecycle model for fwspec is already a bit confusing. Introducing
>>> a new case where a driver pre-allocates the fwspec is making it worse,
>>> not better.
>>>
>>> eg iommu_init_device() error unwind will free this allocated fwspec
>>> leaving the device broken. We don't have the concept of a fwspec that
>>> is owned by the device, it is really owned by the probing code.
>>
>> As I've tried to explain before, this is in fact the correct use of fwspec
>> as originally designed, i.e. being set up by *bus code* before device_add()
>> (remember this is not the "IOMMU driver" part of selftest.c).
> 
> I understand it was the intention, but it doesn't relaly match how the
> code works today..

The fact that some things aren't following the pattern, and are broken 
and problematic in several ways as a result, does not mean that other 
things that *can* follow the pattern correctly shouldn't.

>> Indeed for perfect symmetry the bus code would free the fwspec after the
>> corresponding device_del() returns, but there's no harm in that being
>> factored into iommu_release_device() since the notifier call occurs
>> sufficiently late in device_del() itself as to make no practical difference.
> 
> IIRC there were issues with leaking the dev_iommu :(

AFAICS there was only an issue introduced last year when some unrelated 
stuff added an erroneous early return to iommu_release_device() if no 
group was assigned, thus subtly broke the existing code (and it did end 
up getting fixed in a roundabout manner a couple of months later).

>> I'm working to get things back to that model (wherein the dev_iommu and
>> fwspec lifecycles become trivial), just with the slight tweak that these
>> days it's going to make more sense to have the initialisation factored into
>> device_add() itself (via iommu_probe_device()), rather than beforehand.
> 
> I would prefer to simply remove fwspec as I've already shown patches
> for. You should give some comment on them.

You mean the 1600 lines of churn which did nothing to address any real 
problem (but did at least acknowledge so in the cover letter)? I thought 
I had responded to that, but it must have been one of the many drafts 
which end up getting deleted out of utter exasperation. Needless to say, 
the response was a NAK. For the last time, any fwspec lifetime issues 
are a *symptom* of a well-understood problem which exists, and not a 
problem in themselves. Yes, due to the evolution of the API there is 
also now some stuff being carried around in iommu_fwspec that really 
shouldn't need to be, but once probing is properly fixed it will get 
stripped back down to the useful shared abstraction of stored firmware 
data that has always been its true spirit. In the meantime, adding a 
load more complexity to unabstract it and support 2 or 3 different ways 
of drivers all individually open-coding storage of the same data is not 
helpful now, and even less helpful in future.

> My main complaint is there is no full vision to remove the 'global
> drivers', we will always have some drivers doing FW parsing in probe
> and then this different fwspec thing on the side for other drivers.

Honestly I would love to see the DMAR/IVRS parsing decoupled a bit more 
from the Intel/AMD drivers, not least in the hope that it might allow 
cleaner separation of the IRQ remapping drivers from the IOMMU API 
drivers. However I don't have my hopes up since in practice it's 
probably a non-trivial amount of work with no real functional benefit in 
the end, and it's certainly not something I'd ever have the time or 
inclination to attempt myself. The SoC drivers doing their own weird 
things to parse DT bindings will get cleaned up once arch/arm 
understands groups, and that *is* all on my to-do list (and as for the 
arm-smmu legacy binding, if it still gets in the way at all by that 
point I'll be inclined to call it obsolete and drop support).

Thanks,
Robin.
  
Jason Gunthorpe Jan. 18, 2024, 7:36 p.m. UTC | #5
On Tue, Jan 16, 2024 at 06:19:00PM +0000, Robin Murphy wrote:

> > > As I've tried to explain before, this is in fact the correct use of fwspec
> > > as originally designed, i.e. being set up by *bus code* before device_add()
> > > (remember this is not the "IOMMU driver" part of selftest.c).
> > 
> > I understand it was the intention, but it doesn't relaly match how the
> > code works today..
> 
> The fact that some things aren't following the pattern, and are broken and
> problematic in several ways as a result, does not mean that other things
> that *can* follow the pattern correctly shouldn't.

What pattern? fwspec was never setup "by bus code" before
device_add(). I'm not even sure I see how that will be possible since
fwspec relies on the iommu driver being present to parse the FW tables
to create the fwspec in the first place.

The main tension is that the information the bus code needs to supply
to parse the FW has to go into the driver to accomplish the parse and
then be discarded once the parsing is done. Why would we attach
temporary data to the struct device prior to device_add and waste that
memory when we can generate it on the fly with a bus op callback?

> > > I'm working to get things back to that model (wherein the dev_iommu and
> > > fwspec lifecycles become trivial), just with the slight tweak that these
> > > days it's going to make more sense to have the initialisation factored into
> > > device_add() itself (via iommu_probe_device()), rather than beforehand.
> > 
> > I would prefer to simply remove fwspec as I've already shown patches
> > for. You should give some comment on them.
> 
> You mean the 1600 lines of churn which did nothing to address any real
> problem (but did at least acknowledge so in the cover letter)?

Sometimes it takes cleanup before you can solve the real problem. Just
constantly hacking around the edges often creates an architectural
mess.

Indeed is not quite nothing - it did solve alot of the wonky lifetime
issues throughout and replaces an incomplete abstraction with a
complete one.

Look, put aside your aesthetic distaste and point to something that is
actually fundamentally wrong and can't ever work with what I've
done. I showed everything, so if there is some issue it should be
visible.

Otherwise you should admit it is technically sound, even if you don't
like how it looks.

> I thought I had responded to that, but it must have been one of the
> many drafts which end up getting deleted out of utter
> exasperation. Needless to say, the response was a NAK.

Nope, no reply!

I think you should take a careful and thoughtful look. Given it is
right there ready to go we can be done with this probe mis-ordering
saga in a couple of months.

My plan is to break up the three parsing ways into smaller series and
go ahead with them in stages.

> For the last time, any fwspec lifetime issues are a *symptom* of a
> well-understood problem which exists, and not a problem in
> themselves.

There is more wrong than just the lifetime issues.

> some stuff being carried around in iommu_fwspec that really
> shouldn't need to be, but once probing is properly fixed it will get
> stripped back down to the useful shared abstraction of stored
> firmware data that has always been its true spirit. 

Which is what exactly? What can you put in the fwspec that is actually
shared by every, or even most, drivers? From my complete analysis the
answer is pretty much nothing.

> In the meantime, adding a load more complexity to unabstract it and
> support 2 or 3 different ways of drivers all individually
> open-coding storage of the same data is not helpful now, and even
> less helpful in future.

I think you should look at the series much more closely, because I
don't think that impression can really be justified.

Most drivers had a net LOC reduction and many had a significant
reduction in their probe-time complexity, like apple-dart. Many little
bugs and missing checks went away because the shared common code was
now actually doing what the drivers need.

Like it or not we actually have 4-5 different ways the drivers do
things! Explicitly supporting those ways and factoring common logic
into common code is *good design*. fwspec doesn't do that, it
 supports *one way* and everything else gets no good support.

We already had several different open coded ways for ID storage. That
isn't going away as far as I can see. The one common method, the u32
array of ids, is shorter, faster, and uses less memory. At no
significant complexity cost either.

The cleaner layering between FW parsing and IOMMU driver parsing, I
maintain, makes the whole thing easier to understand as the FW code
layer has a clear API boundary now instead of being messily co-mingled
with iommu code.

Jason
  

Patch

diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index cf3e9fed039e..4eca67b8a5c6 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -611,6 +611,8 @@  static void mock_dev_release(struct device *dev)
 
 static struct mock_dev *mock_dev_create(unsigned long dev_flags)
 {
+	struct iommu_fwspec *fwspec;
+	struct dev_iommu *param;
 	struct mock_dev *mdev;
 	int rc;
 
@@ -621,10 +623,28 @@  static struct mock_dev *mock_dev_create(unsigned long dev_flags)
 	if (!mdev)
 		return ERR_PTR(-ENOMEM);
 
+	/* fwspec and param will be freed in the iommu core */
+	fwspec = kzalloc(sizeof(*fwspec), GFP_KERNEL);
+	if (!fwspec) {
+		kfree(mdev);
+		return ERR_PTR(-ENOMEM);
+	}
+	fwspec->ops = &mock_ops;
+
+	param = kzalloc(sizeof(*param), GFP_KERNEL);
+	if (!param) {
+		kfree(mdev);
+		kfree(fwspec);
+		return ERR_PTR(-ENOMEM);
+	}
+	mutex_init(&param->lock);
+	param->fwspec = fwspec;
+
 	device_initialize(&mdev->dev);
 	mdev->flags = dev_flags;
 	mdev->dev.release = mock_dev_release;
 	mdev->dev.bus = &iommufd_mock_bus_type.bus;
+	mdev->dev.iommu = param;
 
 	rc = dev_set_name(&mdev->dev, "iommufd_mock%u",
 			  atomic_inc_return(&mock_dev_num));
@@ -638,6 +658,8 @@  static struct mock_dev *mock_dev_create(unsigned long dev_flags)
 
 err_put:
 	put_device(&mdev->dev);
+	kfree(param);
+	kfree(fwspec);
 	return ERR_PTR(rc);
 }