[REGRESSION] iommu: Only allocate FQ domains for IOMMUs that support them

Message ID 20230922-iommu-type-regression-v1-1-1ed3825b2c38@marcan.st
State New
Headers
Series [REGRESSION] iommu: Only allocate FQ domains for IOMMUs that support them |

Commit Message

Hector Martin Sept. 22, 2023, 1:40 p.m. UTC
  Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
introduced in iommu_dma_init_domain() to fall back if not supported, but
this check runs too late: by that point, devices have been attached to
the IOMMU, and the IOMMU driver might not expect FQ domains at
ops->attach_dev() time.

Ensure that we immediately clamp FQ domains to plain DMA if not
supported by the driver at device attach time, not later.

This regressed apple-dart in v6.5.

Cc: regressions@lists.linux.dev
Cc: stable@vger.kernel.org
Fixes: a4fdd9762272 ("iommu: Use flush queue capability")
Signed-off-by: Hector Martin <marcan@marcan.st>
---
 drivers/iommu/iommu.c | 9 +++++++++
 1 file changed, 9 insertions(+)


---
base-commit: ce9ecca0238b140b88f43859b211c9fdfd8e5b70
change-id: 20230922-iommu-type-regression-25b4f43df770

Best regards,
  

Comments

Hector Martin Sept. 22, 2023, 2:41 p.m. UTC | #1
On 22/09/2023 23.21, Robin Murphy wrote:
> On 22/09/2023 2:40 pm, Hector Martin wrote:
>> Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
>> IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
>> introduced in iommu_dma_init_domain() to fall back if not supported, but
>> this check runs too late: by that point, devices have been attached to
>> the IOMMU, and the IOMMU driver might not expect FQ domains at
>> ops->attach_dev() time.
>>
>> Ensure that we immediately clamp FQ domains to plain DMA if not
>> supported by the driver at device attach time, not later.
>>
>> This regressed apple-dart in v6.5.
> 
> Apologies, I missed that apple-dart was doing something unusual here. 
> However, could we just fix that directly instead?
> 
> diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
> index 2082081402d3..0b8927508427 100644
> --- a/drivers/iommu/apple-dart.c
> +++ b/drivers/iommu/apple-dart.c
> @@ -671,8 +671,7 @@ static int apple_dart_attach_dev(struct iommu_domain 
> *domain,
>   		return ret;
> 
>   	switch (domain->type) {
> -	case IOMMU_DOMAIN_DMA:
> -	case IOMMU_DOMAIN_UNMANAGED:
> +	default:
>   		ret = apple_dart_domain_add_streams(dart_domain, cfg);
>   		if (ret)
>   			return ret;
> 
> 
> That's pretty much where we're headed with the domain_alloc_paging 
> redesign anyway - at the driver level, operations on a paging domain 
> should not need to know about the higher-level usage intent of that 
> domain. Ideally, blocking and identity domains should have their own 
> distinct ops now as well, but that might be a bit too big a change for 
> an immediate fix here.

Sure, but it sounded like if there's a capability for this the core
should probably use it and not expose the type at all to drivers that
can't support it :)

If you think defaulting to that branch in DART is correctly future-proof
I can make that change. It's not the only driver checking the domain
type in attach_dev(), but it might be the only one enumerating all the
options instead of checking for specific cases only (e.g. intel checks
for IOMMU_DOMAIN_IDENTITY).

- Hector
  
Jason Gunthorpe Sept. 22, 2023, 2:42 p.m. UTC | #2
On Fri, Sep 22, 2023 at 03:21:17PM +0100, Robin Murphy wrote:
> On 22/09/2023 2:40 pm, Hector Martin wrote:
> > Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
> > IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
> > introduced in iommu_dma_init_domain() to fall back if not supported, but
> > this check runs too late: by that point, devices have been attached to
> > the IOMMU, and the IOMMU driver might not expect FQ domains at
> > ops->attach_dev() time.
> > 
> > Ensure that we immediately clamp FQ domains to plain DMA if not
> > supported by the driver at device attach time, not later.
> > 
> > This regressed apple-dart in v6.5.
> 
> Apologies, I missed that apple-dart was doing something unusual here.
> However, could we just fix that directly instead?
> 
> diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
> index 2082081402d3..0b8927508427 100644
> --- a/drivers/iommu/apple-dart.c
> +++ b/drivers/iommu/apple-dart.c
> @@ -671,8 +671,7 @@ static int apple_dart_attach_dev(struct iommu_domain
> *domain,
>  		return ret;
> 
>  	switch (domain->type) {
> -	case IOMMU_DOMAIN_DMA:
> -	case IOMMU_DOMAIN_UNMANAGED:
> +	default:
>  		ret = apple_dart_domain_add_streams(dart_domain, cfg);
>  		if (ret)
>  			return ret;

Yes, I much prefer this to the original patch please. Drivers should
not be testing DMA_FQ at all.

I already wrote a series to convert DART to domain_alloc_paging() that
fixes this inadvertantly.

Robin's suggestion is good for a temporary -rc fix.

Removing the switch is slightly more robust:

if (domain->type & domain->type & __IOMMU_DOMAIN_PAGING) {
  [..]
  return 0
}

if (domain->type == IOMMU_DOMAIN_BLOCKED) {
  ..
}

return -EOPNOTSUPP;

But not so worthwhile since I deleted all this anyhow...

I'll send out the dart series, it can't go to -rc, so a patch is still needed.

Thanks,
Jason
  
Linux regression tracking (Thorsten Leemhuis) Sept. 24, 2023, 7:49 a.m. UTC | #3
[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 22.09.23 15:40, Hector Martin wrote:
> Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the
> IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was
> introduced in iommu_dma_init_domain() to fall back if not supported, but
> this check runs too late: by that point, devices have been attached to
> the IOMMU, and the IOMMU driver might not expect FQ domains at
> ops->attach_dev() time.
> 
> Ensure that we immediately clamp FQ domains to plain DMA if not
> supported by the driver at device attach time, not later.
> 
> This regressed apple-dart in v6.5.
> [...]


Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced a4fdd9762272
#regzbot title iommu: apple-dart regressed
#regzbot monitor:
https://lore.kernel.org/all/20230922-iommu-type-regression-v2-1-689b2ba9b673@marcan.st/
#regzbot fix: iommu/apple-dart: Handle DMA_FQ domains in attach_dev()
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.
  

Patch

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3bfc56df4f78..12464eaa8d91 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2039,6 +2039,15 @@  static int __iommu_attach_device(struct iommu_domain *domain,
 	if (unlikely(domain->ops->attach_dev == NULL))
 		return -ENODEV;
 
+	/*
+	 * Ensure we do not try to attach devices to FQ domains if the
+	 * IOMMU does not support them. We can safely fall back to
+	 * non-FQ.
+	 */
+	if (domain->type == IOMMU_DOMAIN_DMA_FQ &&
+	    !device_iommu_capable(dev, IOMMU_CAP_DEFERRED_FLUSH))
+		domain->type = IOMMU_DOMAIN_DMA;
+
 	ret = domain->ops->attach_dev(domain, dev);
 	if (ret)
 		return ret;