[1/8] iommu: Decouple iommu_present() from bus ops

Message ID 1fb168b22cbbb5c24162d29d2a9aca339cda2c72.1673978700.git.robin.murphy@arm.com
State New
Headers
Series iommu: The early demise of bus ops |

Commit Message

Robin Murphy Jan. 19, 2023, 7:18 p.m. UTC
  Much as I'd like to remove iommu_present(), the final remaining users
are proving stubbornly difficult to clean up, so kick that can down
the road and just rework it to preserve the current behaviour without
depending on bus ops.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/iommu.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)
  

Comments

Baolu Lu Jan. 26, 2023, 1:13 p.m. UTC | #1
On 2023/1/20 3:18, Robin Murphy wrote:
> Much as I'd like to remove iommu_present(), the final remaining users
> are proving stubbornly difficult to clean up, so kick that can down
> the road and just rework it to preserve the current behaviour without
> depending on bus ops.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>   drivers/iommu/iommu.c | 17 ++++++++++++++++-
>   1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index b189ed345057..a77d58e1b976 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1871,9 +1871,24 @@ int bus_iommu_probe(struct bus_type *bus)
>   	return ret;
>   }
>   
> +static int __iommu_present(struct device *dev, void *unused)
> +{
> +	return device_iommu_mapped(dev);
> +}

/**
  * device_iommu_mapped - Returns true when the device DMA is translated
  *                       by an IOMMU
  * @dev: Device to perform the check on
  */
static inline bool device_iommu_mapped(struct device *dev)
{
         return (dev->iommu_group != NULL);
}

Perhaps device_iommu_mapped() should be improved. In some cases, the
device has an iommu_group filled is not enough to indicate that the
device has IOMMU hardware for DMA translation.

For example, VFIO could allocate an iommu_group and add a device into
the iommu_group even there's no IOMMU hardware in
vfio_noiommu_group_alloc().

Basically iommu_group_add_device() doesn't check the presence of an
IOMMU.

> +
> +/**
> + * iommu_present() - make platform-specific assumptions about an IOMMU
> + * @bus: bus to check
> + *
> + * Do not use this function. You want device_iommu_mapped() instead.
> + *
> + * Return: true if some IOMMU is present for some device on the given bus. In
> + * general it may not be the only IOMMU, and it may not be for the device you
> + * are ultimately interested in.
> + */
>   bool iommu_present(struct bus_type *bus)
>   {
> -	return bus->iommu_ops != NULL;
> +	return bus_for_each_dev(bus, NULL, NULL, __iommu_present) > 0;
>   }
>   EXPORT_SYMBOL_GPL(iommu_present);
>   

--
Best regards,
baolu
  
Robin Murphy Jan. 26, 2023, 2:21 p.m. UTC | #2
On 2023-01-26 13:13, Baolu Lu wrote:
> On 2023/1/20 3:18, Robin Murphy wrote:
>> Much as I'd like to remove iommu_present(), the final remaining users
>> are proving stubbornly difficult to clean up, so kick that can down
>> the road and just rework it to preserve the current behaviour without
>> depending on bus ops.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/iommu/iommu.c | 17 ++++++++++++++++-
>>   1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index b189ed345057..a77d58e1b976 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1871,9 +1871,24 @@ int bus_iommu_probe(struct bus_type *bus)
>>       return ret;
>>   }
>> +static int __iommu_present(struct device *dev, void *unused)
>> +{
>> +    return device_iommu_mapped(dev);
>> +}
> 
> /**
>   * device_iommu_mapped - Returns true when the device DMA is translated
>   *                       by an IOMMU
>   * @dev: Device to perform the check on
>   */
> static inline bool device_iommu_mapped(struct device *dev)
> {
>          return (dev->iommu_group != NULL);
> }
> 
> Perhaps device_iommu_mapped() should be improved. In some cases, the
> device has an iommu_group filled is not enough to indicate that the
> device has IOMMU hardware for DMA translation.
> 
> For example, VFIO could allocate an iommu_group and add a device into
> the iommu_group even there's no IOMMU hardware in
> vfio_noiommu_group_alloc().
> 
> Basically iommu_group_add_device() doesn't check the presence of an
> IOMMU.

/**
  * iommu_group_add_device [...]
  *
  * This function is called by an iommu driver [...]
  */

The "check" is inherent in the fact that it's been called at all. VFIO 
noiommu *is* an IOMMU driver in the sense that it provides a bare 
minimum of IOMMU API functionality (i.e. creating groups), sufficient to 
support (careful) usage by VFIO drivers. There would not seem to be a 
legitimate reason for some *other* driver to be specifically querying a 
device while it is already bound to a VFIO driver (and thus may have a 
noiommu group).

In terms of this patch, I'm confident that nobody is using VFIO noiommu 
on old Tegra SoCs; I'm even more confident that they wouldn't be doing 
it with platform devices; and I'm supremely confident that they're not 
loading the GPU drivers while already in the middle of using noiommu 
vfio_platform. Basically "not using VFIO noiommu" is one of the inherent 
platform-specific assumptions. If anyone else now ignores the first 
sentence of the documentation and tries to use iommu_present() somewhere 
that assumption might not hold, returning a meaningless wrong answer is 
the documented behaviour :)

Cheers,
Robin.

> 
>> +
>> +/**
>> + * iommu_present() - make platform-specific assumptions about an IOMMU
>> + * @bus: bus to check
>> + *
>> + * Do not use this function. You want device_iommu_mapped() instead.
>> + *
>> + * Return: true if some IOMMU is present for some device on the given 
>> bus. In
>> + * general it may not be the only IOMMU, and it may not be for the 
>> device you
>> + * are ultimately interested in.
>> + */
>>   bool iommu_present(struct bus_type *bus)
>>   {
>> -    return bus->iommu_ops != NULL;
>> +    return bus_for_each_dev(bus, NULL, NULL, __iommu_present) > 0;
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_present);
> 
> -- 
> Best regards,
> baolu
  
Jason Gunthorpe Jan. 26, 2023, 2:41 p.m. UTC | #3
On Thu, Jan 26, 2023 at 02:21:29PM +0000, Robin Murphy wrote:

> The "check" is inherent in the fact that it's been called at all. VFIO
> noiommu *is* an IOMMU driver in the sense that it provides a bare minimum of
> IOMMU API functionality (i.e. creating groups), sufficient to support
> (careful) usage by VFIO drivers. There would not seem to be a legitimate
> reason for some *other* driver to be specifically querying a device while it
> is already bound to a VFIO driver (and thus may have a noiommu group).

Yes, the devices that VFIO assigns to its internal groups never leak
outside VFIO's control during their assignment - ie they are
continuously bound to VFIO never another driver.

So no other driver can ever see the internal groups unless it is
messing around with devices it is not bound to :)

Jason
  
Baolu Lu Jan. 27, 2023, 1:50 p.m. UTC | #4
On 2023/1/26 22:41, Jason Gunthorpe wrote:
> On Thu, Jan 26, 2023 at 02:21:29PM +0000, Robin Murphy wrote:
> 
>> The "check" is inherent in the fact that it's been called at all. VFIO
>> noiommu*is*  an IOMMU driver in the sense that it provides a bare minimum of
>> IOMMU API functionality (i.e. creating groups), sufficient to support
>> (careful) usage by VFIO drivers. There would not seem to be a legitimate
>> reason for some*other*  driver to be specifically querying a device while it
>> is already bound to a VFIO driver (and thus may have a noiommu group).
> Yes, the devices that VFIO assigns to its internal groups never leak
> outside VFIO's control during their assignment - ie they are
> continuously bound to VFIO never another driver.
> 
> So no other driver can ever see the internal groups unless it is
> messing around with devices it is not bound to 😄

Fair enough. I was thinking that probably we could make it like below:

/**
  * device_iommu_mapped - Returns true when the device DMA is translated
  *                       by an IOMMU
  * @dev: Device to perform the check on
  */
static inline bool device_iommu_mapped(struct device *dev)
{
         return (dev->iommu && dev->iommu->iommu_dev);
}

The iommu probe device code guarantees that dev->iommu->iommu_dev is
valid only after the IOMMU driver's .probe_device returned successfully.

Any thoughts?

Best regards,
baolu
  
Jason Gunthorpe Jan. 27, 2023, 1:59 p.m. UTC | #5
On Fri, Jan 27, 2023 at 09:50:29PM +0800, Baolu Lu wrote:
> On 2023/1/26 22:41, Jason Gunthorpe wrote:
> > On Thu, Jan 26, 2023 at 02:21:29PM +0000, Robin Murphy wrote:
> > 
> > > The "check" is inherent in the fact that it's been called at all. VFIO
> > > noiommu*is*  an IOMMU driver in the sense that it provides a bare minimum of
> > > IOMMU API functionality (i.e. creating groups), sufficient to support
> > > (careful) usage by VFIO drivers. There would not seem to be a legitimate
> > > reason for some*other*  driver to be specifically querying a device while it
> > > is already bound to a VFIO driver (and thus may have a noiommu group).
> > Yes, the devices that VFIO assigns to its internal groups never leak
> > outside VFIO's control during their assignment - ie they are
> > continuously bound to VFIO never another driver.
> > 
> > So no other driver can ever see the internal groups unless it is
> > messing around with devices it is not bound to 😄
> 
> Fair enough. I was thinking that probably we could make it like below:
> 
> /**
>  * device_iommu_mapped - Returns true when the device DMA is translated
>  *                       by an IOMMU
>  * @dev: Device to perform the check on
>  */
> static inline bool device_iommu_mapped(struct device *dev)
> {
>         return (dev->iommu && dev->iommu->iommu_dev);
> }
> 
> The iommu probe device code guarantees that dev->iommu->iommu_dev is
> valid only after the IOMMU driver's .probe_device returned successfully.
> 
> Any thoughts?

I find the above much clearer if it can work

Jason
  
Robin Murphy Jan. 27, 2023, 3:19 p.m. UTC | #6
On 2023-01-27 13:50, Baolu Lu wrote:
> On 2023/1/26 22:41, Jason Gunthorpe wrote:
>> On Thu, Jan 26, 2023 at 02:21:29PM +0000, Robin Murphy wrote:
>>
>>> The "check" is inherent in the fact that it's been called at all. VFIO
>>> noiommu*is*  an IOMMU driver in the sense that it provides a bare 
>>> minimum of
>>> IOMMU API functionality (i.e. creating groups), sufficient to support
>>> (careful) usage by VFIO drivers. There would not seem to be a legitimate
>>> reason for some*other*  driver to be specifically querying a device 
>>> while it
>>> is already bound to a VFIO driver (and thus may have a noiommu group).
>> Yes, the devices that VFIO assigns to its internal groups never leak
>> outside VFIO's control during their assignment - ie they are
>> continuously bound to VFIO never another driver.
>>
>> So no other driver can ever see the internal groups unless it is
>> messing around with devices it is not bound to 😄
> 
> Fair enough. I was thinking that probably we could make it like below:
> 
> /**
>   * device_iommu_mapped - Returns true when the device DMA is translated
>   *                       by an IOMMU
>   * @dev: Device to perform the check on
>   */
> static inline bool device_iommu_mapped(struct device *dev)
> {
>          return (dev->iommu && dev->iommu->iommu_dev);
> }
> 
> The iommu probe device code guarantees that dev->iommu->iommu_dev is
> valid only after the IOMMU driver's .probe_device returned successfully.
> 
> Any thoughts?

Heh, I actually wrote that helper yesterday for v2, but as an internal 
check for valid ops :)

The current implementation of device_iommu_mapped() just dates back to 
when dev->iommu_group was the only per-device thing we had, so in 
principle I don't have any conceptual objection to redefining it in 
terms of "device has ops" rather than "device has a group", but as 
things stand you'd still have to do something about PPC first (I know 
Jason had been pushing on that, but I've not kept track of where it got to).

Thanks,
Robin.
  
Jason Gunthorpe Jan. 27, 2023, 3:43 p.m. UTC | #7
On Fri, Jan 27, 2023 at 03:19:55PM +0000, Robin Murphy wrote:

> The current implementation of device_iommu_mapped() just dates back to when
> dev->iommu_group was the only per-device thing we had, so in principle I
> don't have any conceptual objection to redefining it in terms of "device has
> ops" rather than "device has a group", but as things stand you'd still have
> to do something about PPC first (I know Jason had been pushing on that, but
> I've not kept track of where it got to).

PPC hasn't moved at all, AFAICT. In a few more months I'm going to
suggest we delete the special VFIO support due to it being broken,
distros already having turned it off and nobody caring enough to fix
it..

What does device_iommu_mapped() even really mean?

Looking at usages..

These are fixing SOC HW bugs/issues - the desire seems to be "is the SOC's
IOMMU enabled"

drivers/char/agp/intel-gtt.c:           device_iommu_mapped(&intel_private.pcidev->dev));
drivers/dma/sh/rcar-dmac.c:     if (device_iommu_mapped(&pdev->dev))
drivers/gpu/drm/i915/i915_utils.c:      if (device_iommu_mapped(i915->drm.dev))
?
drivers/usb/dwc3/dwc3-xilinx.c: if (of_dma_is_coherent(dev->of_node) || device_iommu_mapped(dev)) {
drivers/usb/host/xhci.c:        if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev))
drivers/crypto/qat/qat_common/adf_sriov.c:      if (!device_iommu_mapped(&pdev->dev))
?

These seem to be trying to decide if iommu_domain's can be used (and
they can't be on power):

drivers/gpu/drm/msm/msm_drv.c:  if (device_iommu_mapped(mdp_dev))
drivers/gpu/drm/msm/msm_drv.c:          device_iommu_mapped(dev->dev) ||
drivers/gpu/drm/msm/msm_drv.c:          device_iommu_mapped(dev->dev->parent);
drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c:     if (device_iommu_mapped(dev)) {
drivers/gpu/drm/rockchip/rockchip_drm_drv.c:    if (!device_iommu_mapped(dev))
drivers/gpu/drm/tegra/uapi.c:   if (device_iommu_mapped(client->base.dev) && client->ops->can_use_memory_ctx) {
drivers/gpu/host1x/context.c:           if (!fwspec || !device_iommu_mapped(&ctx->dev)) {
drivers/infiniband/hw/usnic/usnic_ib_main.c:    if (!device_iommu_mapped(&pdev->dev)) {

Yikes, trying to map DMA addresses programmed into devices back to CPU addresses:

drivers/misc/habanalabs/common/debugfs.c: if (!user_address || device_iommu_mapped(&hdev->pdev->dev)) {
drivers/misc/habanalabs/gaudi2/gaudi2.c:                if (!device_iommu_mapped(&hdev->pdev->dev))

And then sequencing the call to iommu_probe_device() which doesn't
apply to power:

drivers/acpi/scan.c:    if (!err && dev->bus && !device_iommu_mapped(dev))
drivers/iommu/of_iommu.c:       if (!err && dev->bus && !device_iommu_mapped(dev))

Leaving these:

arch/powerpc/kernel/eeh.c:      if (device_iommu_mapped(dev)) {

This is only used to support eeh_iommu_group_to_pe which is only
caleld by vfio_iommu_spapr_tce.c. Since power vfio doesn't work right
now this is uncallable, and when power is fixed this will work
properly.

arch/powerpc/kernel/iommu.c:    if (device_iommu_mapped(dev)) {
arch/powerpc/kernel/iommu.c:    if (!device_iommu_mapped(dev)) {

These should both be replaced with some kind of 'device has iommu group', since
it is really driving ppc unique group logic.

So, I'd say Baolu's approach is the right thing, just replace the
above two in ppc with something else.

Jason
  
Baolu Lu Jan. 28, 2023, 8:49 a.m. UTC | #8
On 2023/1/27 23:43, Jason Gunthorpe wrote:
> On Fri, Jan 27, 2023 at 03:19:55PM +0000, Robin Murphy wrote:
> 
>> The current implementation of device_iommu_mapped() just dates back to when
>> dev->iommu_group was the only per-device thing we had, so in principle I
>> don't have any conceptual objection to redefining it in terms of "device has
>> ops" rather than "device has a group", but as things stand you'd still have
>> to do something about PPC first (I know Jason had been pushing on that, but
>> I've not kept track of where it got to).
> PPC hasn't moved at all, AFAICT. In a few more months I'm going to
> suggest we delete the special VFIO support due to it being broken,
> distros already having turned it off and nobody caring enough to fix
> it..
> 
> What does device_iommu_mapped() even really mean?
> 
> Looking at usages..
> 
> These are fixing SOC HW bugs/issues - the desire seems to be "is the SOC's
> IOMMU enabled"
> 
> drivers/char/agp/intel-gtt.c:           device_iommu_mapped(&intel_private.pcidev->dev));
> drivers/dma/sh/rcar-dmac.c:     if (device_iommu_mapped(&pdev->dev))
> drivers/gpu/drm/i915/i915_utils.c:      if (device_iommu_mapped(i915->drm.dev))
> ?
> drivers/usb/dwc3/dwc3-xilinx.c: if (of_dma_is_coherent(dev->of_node) || device_iommu_mapped(dev)) {
> drivers/usb/host/xhci.c:        if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev))
> drivers/crypto/qat/qat_common/adf_sriov.c:      if (!device_iommu_mapped(&pdev->dev))
> ?
> 
> These seem to be trying to decide if iommu_domain's can be used (and
> they can't be on power):
> 
> drivers/gpu/drm/msm/msm_drv.c:  if (device_iommu_mapped(mdp_dev))
> drivers/gpu/drm/msm/msm_drv.c:          device_iommu_mapped(dev->dev) ||
> drivers/gpu/drm/msm/msm_drv.c:          device_iommu_mapped(dev->dev->parent);
> drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c:     if (device_iommu_mapped(dev)) {
> drivers/gpu/drm/rockchip/rockchip_drm_drv.c:    if (!device_iommu_mapped(dev))
> drivers/gpu/drm/tegra/uapi.c:   if (device_iommu_mapped(client->base.dev) && client->ops->can_use_memory_ctx) {
> drivers/gpu/host1x/context.c:           if (!fwspec || !device_iommu_mapped(&ctx->dev)) {
> drivers/infiniband/hw/usnic/usnic_ib_main.c:    if (!device_iommu_mapped(&pdev->dev)) {
> 
> Yikes, trying to map DMA addresses programmed into devices back to CPU addresses:
> 
> drivers/misc/habanalabs/common/debugfs.c: if (!user_address || device_iommu_mapped(&hdev->pdev->dev)) {
> drivers/misc/habanalabs/gaudi2/gaudi2.c:                if (!device_iommu_mapped(&hdev->pdev->dev))
> 
> And then sequencing the call to iommu_probe_device() which doesn't
> apply to power:
> 
> drivers/acpi/scan.c:    if (!err && dev->bus && !device_iommu_mapped(dev))
> drivers/iommu/of_iommu.c:       if (!err && dev->bus && !device_iommu_mapped(dev))
> 
> Leaving these:
> 
> arch/powerpc/kernel/eeh.c:      if (device_iommu_mapped(dev)) {
> 
> This is only used to support eeh_iommu_group_to_pe which is only
> caleld by vfio_iommu_spapr_tce.c. Since power vfio doesn't work right
> now this is uncallable, and when power is fixed this will work
> properly.
> 
> arch/powerpc/kernel/iommu.c:    if (device_iommu_mapped(dev)) {
> arch/powerpc/kernel/iommu.c:    if (!device_iommu_mapped(dev)) {
> 
> These should both be replaced with some kind of 'device has iommu group', since
> it is really driving ppc unique group logic.
> 
> So, I'd say Baolu's approach is the right thing, just replace the
> above two in ppc with something else.

Thank you both. I will follow up a series later.

Best regards,
baolu
  
Robin Murphy Jan. 30, 2023, 1:49 p.m. UTC | #9
On 2023-01-28 08:49, Baolu Lu wrote:
> On 2023/1/27 23:43, Jason Gunthorpe wrote:
>> On Fri, Jan 27, 2023 at 03:19:55PM +0000, Robin Murphy wrote:
>>
>>> The current implementation of device_iommu_mapped() just dates back 
>>> to when
>>> dev->iommu_group was the only per-device thing we had, so in principle I
>>> don't have any conceptual objection to redefining it in terms of 
>>> "device has
>>> ops" rather than "device has a group", but as things stand you'd 
>>> still have
>>> to do something about PPC first (I know Jason had been pushing on 
>>> that, but
>>> I've not kept track of where it got to).
>> PPC hasn't moved at all, AFAICT. In a few more months I'm going to
>> suggest we delete the special VFIO support due to it being broken,
>> distros already having turned it off and nobody caring enough to fix
>> it..
>>
>> What does device_iommu_mapped() even really mean?
>>
>> Looking at usages..
>>
>> These are fixing SOC HW bugs/issues - the desire seems to be "is the 
>> SOC's
>> IOMMU enabled"
>>
>> drivers/char/agp/intel-gtt.c:           
>> device_iommu_mapped(&intel_private.pcidev->dev));
>> drivers/dma/sh/rcar-dmac.c:     if (device_iommu_mapped(&pdev->dev))
>> drivers/gpu/drm/i915/i915_utils.c:      if 
>> (device_iommu_mapped(i915->drm.dev))
>> ?
>> drivers/usb/dwc3/dwc3-xilinx.c: if (of_dma_is_coherent(dev->of_node) 
>> || device_iommu_mapped(dev)) {
>> drivers/usb/host/xhci.c:        if (!(xhci->quirks & 
>> XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev))
>> drivers/crypto/qat/qat_common/adf_sriov.c:      if 
>> (!device_iommu_mapped(&pdev->dev))
>> ?
>>
>> These seem to be trying to decide if iommu_domain's can be used (and
>> they can't be on power):
>>
>> drivers/gpu/drm/msm/msm_drv.c:  if (device_iommu_mapped(mdp_dev))
>> drivers/gpu/drm/msm/msm_drv.c:          device_iommu_mapped(dev->dev) ||
>> drivers/gpu/drm/msm/msm_drv.c:          
>> device_iommu_mapped(dev->dev->parent);
>> drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c:     if 
>> (device_iommu_mapped(dev)) {
>> drivers/gpu/drm/rockchip/rockchip_drm_drv.c:    if 
>> (!device_iommu_mapped(dev))
>> drivers/gpu/drm/tegra/uapi.c:   if 
>> (device_iommu_mapped(client->base.dev) && 
>> client->ops->can_use_memory_ctx) {
>> drivers/gpu/host1x/context.c:           if (!fwspec || 
>> !device_iommu_mapped(&ctx->dev)) {
>> drivers/infiniband/hw/usnic/usnic_ib_main.c:    if 
>> (!device_iommu_mapped(&pdev->dev)) {
>>
>> Yikes, trying to map DMA addresses programmed into devices back to CPU 
>> addresses:
>>
>> drivers/misc/habanalabs/common/debugfs.c: if (!user_address || 
>> device_iommu_mapped(&hdev->pdev->dev)) {
>> drivers/misc/habanalabs/gaudi2/gaudi2.c:                if 
>> (!device_iommu_mapped(&hdev->pdev->dev))
>>
>> And then sequencing the call to iommu_probe_device() which doesn't
>> apply to power:
>>
>> drivers/acpi/scan.c:    if (!err && dev->bus && 
>> !device_iommu_mapped(dev))
>> drivers/iommu/of_iommu.c:       if (!err && dev->bus && 
>> !device_iommu_mapped(dev))
>>
>> Leaving these:
>>
>> arch/powerpc/kernel/eeh.c:      if (device_iommu_mapped(dev)) {
>>
>> This is only used to support eeh_iommu_group_to_pe which is only
>> caleld by vfio_iommu_spapr_tce.c. Since power vfio doesn't work right
>> now this is uncallable, and when power is fixed this will work
>> properly.

Oh wow, I should have looked at more context... Even better, this one is 
already just an elaborate "if (true)" - it has been impossible for 
dev_has_iommu_table() to return 0 since at least 2015 :D
>> arch/powerpc/kernel/iommu.c:    if (device_iommu_mapped(dev)) {
>> arch/powerpc/kernel/iommu.c:    if (!device_iommu_mapped(dev)) {
>>
>> These should both be replaced with some kind of 'device has iommu 
>> group', since
>> it is really driving ppc unique group logic.

And in fact those appear to mostly serve for printing debug messages; if 
we made iommu_group_add_device() return -EBUSY for duplicate calls (not 
necessarily a bad idea anyway vs. relying on the noisy failure of 
sysfs_create_link()), they could arguably just go.

All in all, it's only actually the habanalabs ones that I'm slightly 
wary of, since they're effectively (mis)using device_iommu_mapped() to 
infer the DMA ops implementation, which could potentially go wrong (or 
at least *more* wrong) on POWER with this change. I guess the saving 
grace is that although they are available on PCIe-interfaced modules, 
the userspace driver stack seems to be implicitly x86_64-only - as far 
as I could tell from a quick poke around their site and documentation, 
which doesn't appear to acknowledge the concept of CPU architectures at 
all - so the chances of anyone actually trying to use the kernel drivers 
in anger on POWER seem minimal.

Thanks,
Robin.

>> So, I'd say Baolu's approach is the right thing, just replace the
>> above two in ppc with something else.
> 
> Thank you both. I will follow up a series later.
> 
> Best regards,
> baolu
  
Jason Gunthorpe Jan. 30, 2023, 1:53 p.m. UTC | #10
On Mon, Jan 30, 2023 at 01:49:20PM +0000, Robin Murphy wrote:

> All in all, it's only actually the habanalabs ones that I'm slightly wary
> of, since they're effectively (mis)using device_iommu_mapped() to infer the
> DMA ops implementation, which could potentially go wrong (or at least *more*
> wrong) on POWER with this change. I guess the saving grace is that
> although

IMHO habana is not using the DMA API abstraction properly. If it
doesn't work on some archs is their bugs to deal with - we don't need
to complexify the core code to tiptoe around around such an abuse in
an obscure driver.

Jason
  
Oded Gabbay Jan. 30, 2023, 2:22 p.m. UTC | #11
On Mon, Jan 30, 2023 at 3:53 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Mon, Jan 30, 2023 at 01:49:20PM +0000, Robin Murphy wrote:
>
> > All in all, it's only actually the habanalabs ones that I'm slightly wary
> > of, since they're effectively (mis)using device_iommu_mapped() to infer the
> > DMA ops implementation, which could potentially go wrong (or at least *more*
> > wrong) on POWER with this change. I guess the saving grace is that
> > although
>
> IMHO habana is not using the DMA API abstraction properly. If it
> doesn't work on some archs is their bugs to deal with - we don't need
> to complexify the core code to tiptoe around around such an abuse in
> an obscure driver.
>
> Jason
Agreed, feel free to change the kapi as you see fit. Do the right
thing for the kernel.
In any case, we limit ourselves to x86-64 arch in the 6.3 merge cycle.

Oded
  

Patch

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b189ed345057..a77d58e1b976 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1871,9 +1871,24 @@  int bus_iommu_probe(struct bus_type *bus)
 	return ret;
 }
 
+static int __iommu_present(struct device *dev, void *unused)
+{
+	return device_iommu_mapped(dev);
+}
+
+/**
+ * iommu_present() - make platform-specific assumptions about an IOMMU
+ * @bus: bus to check
+ *
+ * Do not use this function. You want device_iommu_mapped() instead.
+ *
+ * Return: true if some IOMMU is present for some device on the given bus. In
+ * general it may not be the only IOMMU, and it may not be for the device you
+ * are ultimately interested in.
+ */
 bool iommu_present(struct bus_type *bus)
 {
-	return bus->iommu_ops != NULL;
+	return bus_for_each_dev(bus, NULL, NULL, __iommu_present) > 0;
 }
 EXPORT_SYMBOL_GPL(iommu_present);