Message ID | 1fb168b22cbbb5c24162d29d2a9aca339cda2c72.1673978700.git.robin.murphy@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp502821wrn; Thu, 19 Jan 2023 11:21:23 -0800 (PST) X-Google-Smtp-Source: AMrXdXuY1hgBZJR2UKGk6x0ybhtBrODFC/sJANXHe5SPfPQQk1oQDkX0Sj5c/Hl5AQL5kRbMR4qS X-Received: by 2002:a17:90a:f3cf:b0:229:557:8a49 with SMTP id ha15-20020a17090af3cf00b0022905578a49mr30453574pjb.7.1674156083317; Thu, 19 Jan 2023 11:21:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674156083; cv=none; d=google.com; s=arc-20160816; b=jB9nprq72LFVr1e/xlVQnZuiXm4T5W15s8OfQ95pMujYdpf4xNcXQFY1dQOZMGK9Vi 6cc4PLEbPHeysfmbTGWqKztvtEaS9+OKy1cc3JbnkAN95Cc1eznXZDdKWfcmzWT9+8SK 3FEYB6RoMt7WEkE10EcnfluXFIWONlx7RPwiZ7NF3fNgCUJgXqu0M3hamYyVFPLJracj IoNMSpYp/buU9w7SqininRWd3PokXA1ebGKdzqjPDHZitoCGsJ9T/gTuTDmGTXqxVnSI 51W1vSO3p8YJCl1No9lIbKfkKBfXkkq6Qw5Y2LEreKzR8LLcuMVKGxLBGUpHtZ5APJ12 4lUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=Q5TlwsA784Lxz3jyBYMQbqT6OTzKspMVWlP2LfiCt7c=; b=jUNUiZM/s6DyFF2pqQ5ZD6ZCq5JG3RyFNaTLHYz6UCo7mMR2pfZdMAHPZcx5jkIMff MznoqC5VAndZoFbIbzB2h9rvyQv2wenpohx30yEUvYy3v0GhCqYEOCXi4LFV2aGlr1nj RGp1bxFvGxYPGvlc6Xi8H9LnXoXsLKtUoJV0CzIdHNAaCRUIzKnEK3CRoHJ4NO23velD yqZY0ycZb9jqkw1yRRmCiM5k6795jSRbJmYQuQe1WL3XZWp8ZdoC0g6XoglxhJeP21LP Dami9QNz5Hl3nJsXY5UiHF1fA2V7XgCeexUOHPXOdq1HPKg3g2EY0yY/A+bXI9HRH/ll uJEg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d9-20020a631d09000000b004c5936dd6e1si21024293pgd.201.2023.01.19.11.21.11; Thu, 19 Jan 2023 11:21:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231394AbjASTUX (ORCPT <rfc822;pavtiger@gmail.com> + 99 others); Thu, 19 Jan 2023 14:20:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231565AbjASTUD (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 19 Jan 2023 14:20:03 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3752F9EE07 for <linux-kernel@vger.kernel.org>; Thu, 19 Jan 2023 11:19:11 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7B3E41758; Thu, 19 Jan 2023 11:19:23 -0800 (PST) Received: from e121345-lin.cambridge.arm.com (e121345-lin.cambridge.arm.com [10.1.196.40]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 312DD3F67D; Thu, 19 Jan 2023 11:18:41 -0800 (PST) From: Robin Murphy <robin.murphy@arm.com> To: joro@8bytes.org, will@kernel.org Cc: hch@lst.de, jgg@nvidia.com, iommu@lists.linux.dev, linux-kernel@vger.kernel.org Subject: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops Date: Thu, 19 Jan 2023 19:18:19 +0000 Message-Id: <1fb168b22cbbb5c24162d29d2a9aca339cda2c72.1673978700.git.robin.murphy@arm.com> X-Mailer: git-send-email 2.36.1.dirty In-Reply-To: <cover.1673978700.git.robin.murphy@arm.com> References: <cover.1673978700.git.robin.murphy@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755479889109587454?= X-GMAIL-MSGID: =?utf-8?q?1755479889109587454?= |
Series |
iommu: The early demise of bus ops
|
|
Commit Message
Robin Murphy
Jan. 19, 2023, 7:18 p.m. UTC
Much as I'd like to remove iommu_present(), the final remaining users
are proving stubbornly difficult to clean up, so kick that can down
the road and just rework it to preserve the current behaviour without
depending on bus ops.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
drivers/iommu/iommu.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
Comments
On 2023/1/20 3:18, Robin Murphy wrote: > Much as I'd like to remove iommu_present(), the final remaining users > are proving stubbornly difficult to clean up, so kick that can down > the road and just rework it to preserve the current behaviour without > depending on bus ops. > > Signed-off-by: Robin Murphy <robin.murphy@arm.com> > --- > drivers/iommu/iommu.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index b189ed345057..a77d58e1b976 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -1871,9 +1871,24 @@ int bus_iommu_probe(struct bus_type *bus) > return ret; > } > > +static int __iommu_present(struct device *dev, void *unused) > +{ > + return device_iommu_mapped(dev); > +} /** * device_iommu_mapped - Returns true when the device DMA is translated * by an IOMMU * @dev: Device to perform the check on */ static inline bool device_iommu_mapped(struct device *dev) { return (dev->iommu_group != NULL); } Perhaps device_iommu_mapped() should be improved. In some cases, the device has an iommu_group filled is not enough to indicate that the device has IOMMU hardware for DMA translation. For example, VFIO could allocate an iommu_group and add a device into the iommu_group even there's no IOMMU hardware in vfio_noiommu_group_alloc(). Basically iommu_group_add_device() doesn't check the presence of an IOMMU. > + > +/** > + * iommu_present() - make platform-specific assumptions about an IOMMU > + * @bus: bus to check > + * > + * Do not use this function. You want device_iommu_mapped() instead. > + * > + * Return: true if some IOMMU is present for some device on the given bus. In > + * general it may not be the only IOMMU, and it may not be for the device you > + * are ultimately interested in. > + */ > bool iommu_present(struct bus_type *bus) > { > - return bus->iommu_ops != NULL; > + return bus_for_each_dev(bus, NULL, NULL, __iommu_present) > 0; > } > EXPORT_SYMBOL_GPL(iommu_present); > -- Best regards, baolu
On 2023-01-26 13:13, Baolu Lu wrote: > On 2023/1/20 3:18, Robin Murphy wrote: >> Much as I'd like to remove iommu_present(), the final remaining users >> are proving stubbornly difficult to clean up, so kick that can down >> the road and just rework it to preserve the current behaviour without >> depending on bus ops. >> >> Signed-off-by: Robin Murphy <robin.murphy@arm.com> >> --- >> drivers/iommu/iommu.c | 17 ++++++++++++++++- >> 1 file changed, 16 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index b189ed345057..a77d58e1b976 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -1871,9 +1871,24 @@ int bus_iommu_probe(struct bus_type *bus) >> return ret; >> } >> +static int __iommu_present(struct device *dev, void *unused) >> +{ >> + return device_iommu_mapped(dev); >> +} > > /** > * device_iommu_mapped - Returns true when the device DMA is translated > * by an IOMMU > * @dev: Device to perform the check on > */ > static inline bool device_iommu_mapped(struct device *dev) > { > return (dev->iommu_group != NULL); > } > > Perhaps device_iommu_mapped() should be improved. In some cases, the > device has an iommu_group filled is not enough to indicate that the > device has IOMMU hardware for DMA translation. > > For example, VFIO could allocate an iommu_group and add a device into > the iommu_group even there's no IOMMU hardware in > vfio_noiommu_group_alloc(). > > Basically iommu_group_add_device() doesn't check the presence of an > IOMMU. /** * iommu_group_add_device [...] * * This function is called by an iommu driver [...] */ The "check" is inherent in the fact that it's been called at all. VFIO noiommu *is* an IOMMU driver in the sense that it provides a bare minimum of IOMMU API functionality (i.e. creating groups), sufficient to support (careful) usage by VFIO drivers. There would not seem to be a legitimate reason for some *other* driver to be specifically querying a device while it is already bound to a VFIO driver (and thus may have a noiommu group). In terms of this patch, I'm confident that nobody is using VFIO noiommu on old Tegra SoCs; I'm even more confident that they wouldn't be doing it with platform devices; and I'm supremely confident that they're not loading the GPU drivers while already in the middle of using noiommu vfio_platform. Basically "not using VFIO noiommu" is one of the inherent platform-specific assumptions. If anyone else now ignores the first sentence of the documentation and tries to use iommu_present() somewhere that assumption might not hold, returning a meaningless wrong answer is the documented behaviour :) Cheers, Robin. > >> + >> +/** >> + * iommu_present() - make platform-specific assumptions about an IOMMU >> + * @bus: bus to check >> + * >> + * Do not use this function. You want device_iommu_mapped() instead. >> + * >> + * Return: true if some IOMMU is present for some device on the given >> bus. In >> + * general it may not be the only IOMMU, and it may not be for the >> device you >> + * are ultimately interested in. >> + */ >> bool iommu_present(struct bus_type *bus) >> { >> - return bus->iommu_ops != NULL; >> + return bus_for_each_dev(bus, NULL, NULL, __iommu_present) > 0; >> } >> EXPORT_SYMBOL_GPL(iommu_present); > > -- > Best regards, > baolu
On Thu, Jan 26, 2023 at 02:21:29PM +0000, Robin Murphy wrote: > The "check" is inherent in the fact that it's been called at all. VFIO > noiommu *is* an IOMMU driver in the sense that it provides a bare minimum of > IOMMU API functionality (i.e. creating groups), sufficient to support > (careful) usage by VFIO drivers. There would not seem to be a legitimate > reason for some *other* driver to be specifically querying a device while it > is already bound to a VFIO driver (and thus may have a noiommu group). Yes, the devices that VFIO assigns to its internal groups never leak outside VFIO's control during their assignment - ie they are continuously bound to VFIO never another driver. So no other driver can ever see the internal groups unless it is messing around with devices it is not bound to :) Jason
On 2023/1/26 22:41, Jason Gunthorpe wrote: > On Thu, Jan 26, 2023 at 02:21:29PM +0000, Robin Murphy wrote: > >> The "check" is inherent in the fact that it's been called at all. VFIO >> noiommu*is* an IOMMU driver in the sense that it provides a bare minimum of >> IOMMU API functionality (i.e. creating groups), sufficient to support >> (careful) usage by VFIO drivers. There would not seem to be a legitimate >> reason for some*other* driver to be specifically querying a device while it >> is already bound to a VFIO driver (and thus may have a noiommu group). > Yes, the devices that VFIO assigns to its internal groups never leak > outside VFIO's control during their assignment - ie they are > continuously bound to VFIO never another driver. > > So no other driver can ever see the internal groups unless it is > messing around with devices it is not bound to 😄 Fair enough. I was thinking that probably we could make it like below: /** * device_iommu_mapped - Returns true when the device DMA is translated * by an IOMMU * @dev: Device to perform the check on */ static inline bool device_iommu_mapped(struct device *dev) { return (dev->iommu && dev->iommu->iommu_dev); } The iommu probe device code guarantees that dev->iommu->iommu_dev is valid only after the IOMMU driver's .probe_device returned successfully. Any thoughts? Best regards, baolu
On Fri, Jan 27, 2023 at 09:50:29PM +0800, Baolu Lu wrote: > On 2023/1/26 22:41, Jason Gunthorpe wrote: > > On Thu, Jan 26, 2023 at 02:21:29PM +0000, Robin Murphy wrote: > > > > > The "check" is inherent in the fact that it's been called at all. VFIO > > > noiommu*is* an IOMMU driver in the sense that it provides a bare minimum of > > > IOMMU API functionality (i.e. creating groups), sufficient to support > > > (careful) usage by VFIO drivers. There would not seem to be a legitimate > > > reason for some*other* driver to be specifically querying a device while it > > > is already bound to a VFIO driver (and thus may have a noiommu group). > > Yes, the devices that VFIO assigns to its internal groups never leak > > outside VFIO's control during their assignment - ie they are > > continuously bound to VFIO never another driver. > > > > So no other driver can ever see the internal groups unless it is > > messing around with devices it is not bound to 😄 > > Fair enough. I was thinking that probably we could make it like below: > > /** > * device_iommu_mapped - Returns true when the device DMA is translated > * by an IOMMU > * @dev: Device to perform the check on > */ > static inline bool device_iommu_mapped(struct device *dev) > { > return (dev->iommu && dev->iommu->iommu_dev); > } > > The iommu probe device code guarantees that dev->iommu->iommu_dev is > valid only after the IOMMU driver's .probe_device returned successfully. > > Any thoughts? I find the above much clearer if it can work Jason
On 2023-01-27 13:50, Baolu Lu wrote: > On 2023/1/26 22:41, Jason Gunthorpe wrote: >> On Thu, Jan 26, 2023 at 02:21:29PM +0000, Robin Murphy wrote: >> >>> The "check" is inherent in the fact that it's been called at all. VFIO >>> noiommu*is* an IOMMU driver in the sense that it provides a bare >>> minimum of >>> IOMMU API functionality (i.e. creating groups), sufficient to support >>> (careful) usage by VFIO drivers. There would not seem to be a legitimate >>> reason for some*other* driver to be specifically querying a device >>> while it >>> is already bound to a VFIO driver (and thus may have a noiommu group). >> Yes, the devices that VFIO assigns to its internal groups never leak >> outside VFIO's control during their assignment - ie they are >> continuously bound to VFIO never another driver. >> >> So no other driver can ever see the internal groups unless it is >> messing around with devices it is not bound to 😄 > > Fair enough. I was thinking that probably we could make it like below: > > /** > * device_iommu_mapped - Returns true when the device DMA is translated > * by an IOMMU > * @dev: Device to perform the check on > */ > static inline bool device_iommu_mapped(struct device *dev) > { > return (dev->iommu && dev->iommu->iommu_dev); > } > > The iommu probe device code guarantees that dev->iommu->iommu_dev is > valid only after the IOMMU driver's .probe_device returned successfully. > > Any thoughts? Heh, I actually wrote that helper yesterday for v2, but as an internal check for valid ops :) The current implementation of device_iommu_mapped() just dates back to when dev->iommu_group was the only per-device thing we had, so in principle I don't have any conceptual objection to redefining it in terms of "device has ops" rather than "device has a group", but as things stand you'd still have to do something about PPC first (I know Jason had been pushing on that, but I've not kept track of where it got to). Thanks, Robin.
On Fri, Jan 27, 2023 at 03:19:55PM +0000, Robin Murphy wrote: > The current implementation of device_iommu_mapped() just dates back to when > dev->iommu_group was the only per-device thing we had, so in principle I > don't have any conceptual objection to redefining it in terms of "device has > ops" rather than "device has a group", but as things stand you'd still have > to do something about PPC first (I know Jason had been pushing on that, but > I've not kept track of where it got to). PPC hasn't moved at all, AFAICT. In a few more months I'm going to suggest we delete the special VFIO support due to it being broken, distros already having turned it off and nobody caring enough to fix it.. What does device_iommu_mapped() even really mean? Looking at usages.. These are fixing SOC HW bugs/issues - the desire seems to be "is the SOC's IOMMU enabled" drivers/char/agp/intel-gtt.c: device_iommu_mapped(&intel_private.pcidev->dev)); drivers/dma/sh/rcar-dmac.c: if (device_iommu_mapped(&pdev->dev)) drivers/gpu/drm/i915/i915_utils.c: if (device_iommu_mapped(i915->drm.dev)) ? drivers/usb/dwc3/dwc3-xilinx.c: if (of_dma_is_coherent(dev->of_node) || device_iommu_mapped(dev)) { drivers/usb/host/xhci.c: if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev)) drivers/crypto/qat/qat_common/adf_sriov.c: if (!device_iommu_mapped(&pdev->dev)) ? These seem to be trying to decide if iommu_domain's can be used (and they can't be on power): drivers/gpu/drm/msm/msm_drv.c: if (device_iommu_mapped(mdp_dev)) drivers/gpu/drm/msm/msm_drv.c: device_iommu_mapped(dev->dev) || drivers/gpu/drm/msm/msm_drv.c: device_iommu_mapped(dev->dev->parent); drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c: if (device_iommu_mapped(dev)) { drivers/gpu/drm/rockchip/rockchip_drm_drv.c: if (!device_iommu_mapped(dev)) drivers/gpu/drm/tegra/uapi.c: if (device_iommu_mapped(client->base.dev) && client->ops->can_use_memory_ctx) { drivers/gpu/host1x/context.c: if (!fwspec || !device_iommu_mapped(&ctx->dev)) { drivers/infiniband/hw/usnic/usnic_ib_main.c: if (!device_iommu_mapped(&pdev->dev)) { Yikes, trying to map DMA addresses programmed into devices back to CPU addresses: drivers/misc/habanalabs/common/debugfs.c: if (!user_address || device_iommu_mapped(&hdev->pdev->dev)) { drivers/misc/habanalabs/gaudi2/gaudi2.c: if (!device_iommu_mapped(&hdev->pdev->dev)) And then sequencing the call to iommu_probe_device() which doesn't apply to power: drivers/acpi/scan.c: if (!err && dev->bus && !device_iommu_mapped(dev)) drivers/iommu/of_iommu.c: if (!err && dev->bus && !device_iommu_mapped(dev)) Leaving these: arch/powerpc/kernel/eeh.c: if (device_iommu_mapped(dev)) { This is only used to support eeh_iommu_group_to_pe which is only caleld by vfio_iommu_spapr_tce.c. Since power vfio doesn't work right now this is uncallable, and when power is fixed this will work properly. arch/powerpc/kernel/iommu.c: if (device_iommu_mapped(dev)) { arch/powerpc/kernel/iommu.c: if (!device_iommu_mapped(dev)) { These should both be replaced with some kind of 'device has iommu group', since it is really driving ppc unique group logic. So, I'd say Baolu's approach is the right thing, just replace the above two in ppc with something else. Jason
On 2023/1/27 23:43, Jason Gunthorpe wrote: > On Fri, Jan 27, 2023 at 03:19:55PM +0000, Robin Murphy wrote: > >> The current implementation of device_iommu_mapped() just dates back to when >> dev->iommu_group was the only per-device thing we had, so in principle I >> don't have any conceptual objection to redefining it in terms of "device has >> ops" rather than "device has a group", but as things stand you'd still have >> to do something about PPC first (I know Jason had been pushing on that, but >> I've not kept track of where it got to). > PPC hasn't moved at all, AFAICT. In a few more months I'm going to > suggest we delete the special VFIO support due to it being broken, > distros already having turned it off and nobody caring enough to fix > it.. > > What does device_iommu_mapped() even really mean? > > Looking at usages.. > > These are fixing SOC HW bugs/issues - the desire seems to be "is the SOC's > IOMMU enabled" > > drivers/char/agp/intel-gtt.c: device_iommu_mapped(&intel_private.pcidev->dev)); > drivers/dma/sh/rcar-dmac.c: if (device_iommu_mapped(&pdev->dev)) > drivers/gpu/drm/i915/i915_utils.c: if (device_iommu_mapped(i915->drm.dev)) > ? > drivers/usb/dwc3/dwc3-xilinx.c: if (of_dma_is_coherent(dev->of_node) || device_iommu_mapped(dev)) { > drivers/usb/host/xhci.c: if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev)) > drivers/crypto/qat/qat_common/adf_sriov.c: if (!device_iommu_mapped(&pdev->dev)) > ? > > These seem to be trying to decide if iommu_domain's can be used (and > they can't be on power): > > drivers/gpu/drm/msm/msm_drv.c: if (device_iommu_mapped(mdp_dev)) > drivers/gpu/drm/msm/msm_drv.c: device_iommu_mapped(dev->dev) || > drivers/gpu/drm/msm/msm_drv.c: device_iommu_mapped(dev->dev->parent); > drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c: if (device_iommu_mapped(dev)) { > drivers/gpu/drm/rockchip/rockchip_drm_drv.c: if (!device_iommu_mapped(dev)) > drivers/gpu/drm/tegra/uapi.c: if (device_iommu_mapped(client->base.dev) && client->ops->can_use_memory_ctx) { > drivers/gpu/host1x/context.c: if (!fwspec || !device_iommu_mapped(&ctx->dev)) { > drivers/infiniband/hw/usnic/usnic_ib_main.c: if (!device_iommu_mapped(&pdev->dev)) { > > Yikes, trying to map DMA addresses programmed into devices back to CPU addresses: > > drivers/misc/habanalabs/common/debugfs.c: if (!user_address || device_iommu_mapped(&hdev->pdev->dev)) { > drivers/misc/habanalabs/gaudi2/gaudi2.c: if (!device_iommu_mapped(&hdev->pdev->dev)) > > And then sequencing the call to iommu_probe_device() which doesn't > apply to power: > > drivers/acpi/scan.c: if (!err && dev->bus && !device_iommu_mapped(dev)) > drivers/iommu/of_iommu.c: if (!err && dev->bus && !device_iommu_mapped(dev)) > > Leaving these: > > arch/powerpc/kernel/eeh.c: if (device_iommu_mapped(dev)) { > > This is only used to support eeh_iommu_group_to_pe which is only > caleld by vfio_iommu_spapr_tce.c. Since power vfio doesn't work right > now this is uncallable, and when power is fixed this will work > properly. > > arch/powerpc/kernel/iommu.c: if (device_iommu_mapped(dev)) { > arch/powerpc/kernel/iommu.c: if (!device_iommu_mapped(dev)) { > > These should both be replaced with some kind of 'device has iommu group', since > it is really driving ppc unique group logic. > > So, I'd say Baolu's approach is the right thing, just replace the > above two in ppc with something else. Thank you both. I will follow up a series later. Best regards, baolu
On 2023-01-28 08:49, Baolu Lu wrote: > On 2023/1/27 23:43, Jason Gunthorpe wrote: >> On Fri, Jan 27, 2023 at 03:19:55PM +0000, Robin Murphy wrote: >> >>> The current implementation of device_iommu_mapped() just dates back >>> to when >>> dev->iommu_group was the only per-device thing we had, so in principle I >>> don't have any conceptual objection to redefining it in terms of >>> "device has >>> ops" rather than "device has a group", but as things stand you'd >>> still have >>> to do something about PPC first (I know Jason had been pushing on >>> that, but >>> I've not kept track of where it got to). >> PPC hasn't moved at all, AFAICT. In a few more months I'm going to >> suggest we delete the special VFIO support due to it being broken, >> distros already having turned it off and nobody caring enough to fix >> it.. >> >> What does device_iommu_mapped() even really mean? >> >> Looking at usages.. >> >> These are fixing SOC HW bugs/issues - the desire seems to be "is the >> SOC's >> IOMMU enabled" >> >> drivers/char/agp/intel-gtt.c: >> device_iommu_mapped(&intel_private.pcidev->dev)); >> drivers/dma/sh/rcar-dmac.c: if (device_iommu_mapped(&pdev->dev)) >> drivers/gpu/drm/i915/i915_utils.c: if >> (device_iommu_mapped(i915->drm.dev)) >> ? >> drivers/usb/dwc3/dwc3-xilinx.c: if (of_dma_is_coherent(dev->of_node) >> || device_iommu_mapped(dev)) { >> drivers/usb/host/xhci.c: if (!(xhci->quirks & >> XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev)) >> drivers/crypto/qat/qat_common/adf_sriov.c: if >> (!device_iommu_mapped(&pdev->dev)) >> ? >> >> These seem to be trying to decide if iommu_domain's can be used (and >> they can't be on power): >> >> drivers/gpu/drm/msm/msm_drv.c: if (device_iommu_mapped(mdp_dev)) >> drivers/gpu/drm/msm/msm_drv.c: device_iommu_mapped(dev->dev) || >> drivers/gpu/drm/msm/msm_drv.c: >> device_iommu_mapped(dev->dev->parent); >> drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c: if >> (device_iommu_mapped(dev)) { >> drivers/gpu/drm/rockchip/rockchip_drm_drv.c: if >> (!device_iommu_mapped(dev)) >> drivers/gpu/drm/tegra/uapi.c: if >> (device_iommu_mapped(client->base.dev) && >> client->ops->can_use_memory_ctx) { >> drivers/gpu/host1x/context.c: if (!fwspec || >> !device_iommu_mapped(&ctx->dev)) { >> drivers/infiniband/hw/usnic/usnic_ib_main.c: if >> (!device_iommu_mapped(&pdev->dev)) { >> >> Yikes, trying to map DMA addresses programmed into devices back to CPU >> addresses: >> >> drivers/misc/habanalabs/common/debugfs.c: if (!user_address || >> device_iommu_mapped(&hdev->pdev->dev)) { >> drivers/misc/habanalabs/gaudi2/gaudi2.c: if >> (!device_iommu_mapped(&hdev->pdev->dev)) >> >> And then sequencing the call to iommu_probe_device() which doesn't >> apply to power: >> >> drivers/acpi/scan.c: if (!err && dev->bus && >> !device_iommu_mapped(dev)) >> drivers/iommu/of_iommu.c: if (!err && dev->bus && >> !device_iommu_mapped(dev)) >> >> Leaving these: >> >> arch/powerpc/kernel/eeh.c: if (device_iommu_mapped(dev)) { >> >> This is only used to support eeh_iommu_group_to_pe which is only >> caleld by vfio_iommu_spapr_tce.c. Since power vfio doesn't work right >> now this is uncallable, and when power is fixed this will work >> properly. Oh wow, I should have looked at more context... Even better, this one is already just an elaborate "if (true)" - it has been impossible for dev_has_iommu_table() to return 0 since at least 2015 :D >> arch/powerpc/kernel/iommu.c: if (device_iommu_mapped(dev)) { >> arch/powerpc/kernel/iommu.c: if (!device_iommu_mapped(dev)) { >> >> These should both be replaced with some kind of 'device has iommu >> group', since >> it is really driving ppc unique group logic. And in fact those appear to mostly serve for printing debug messages; if we made iommu_group_add_device() return -EBUSY for duplicate calls (not necessarily a bad idea anyway vs. relying on the noisy failure of sysfs_create_link()), they could arguably just go. All in all, it's only actually the habanalabs ones that I'm slightly wary of, since they're effectively (mis)using device_iommu_mapped() to infer the DMA ops implementation, which could potentially go wrong (or at least *more* wrong) on POWER with this change. I guess the saving grace is that although they are available on PCIe-interfaced modules, the userspace driver stack seems to be implicitly x86_64-only - as far as I could tell from a quick poke around their site and documentation, which doesn't appear to acknowledge the concept of CPU architectures at all - so the chances of anyone actually trying to use the kernel drivers in anger on POWER seem minimal. Thanks, Robin. >> So, I'd say Baolu's approach is the right thing, just replace the >> above two in ppc with something else. > > Thank you both. I will follow up a series later. > > Best regards, > baolu
On Mon, Jan 30, 2023 at 01:49:20PM +0000, Robin Murphy wrote: > All in all, it's only actually the habanalabs ones that I'm slightly wary > of, since they're effectively (mis)using device_iommu_mapped() to infer the > DMA ops implementation, which could potentially go wrong (or at least *more* > wrong) on POWER with this change. I guess the saving grace is that > although IMHO habana is not using the DMA API abstraction properly. If it doesn't work on some archs is their bugs to deal with - we don't need to complexify the core code to tiptoe around around such an abuse in an obscure driver. Jason
On Mon, Jan 30, 2023 at 3:53 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Mon, Jan 30, 2023 at 01:49:20PM +0000, Robin Murphy wrote: > > > All in all, it's only actually the habanalabs ones that I'm slightly wary > > of, since they're effectively (mis)using device_iommu_mapped() to infer the > > DMA ops implementation, which could potentially go wrong (or at least *more* > > wrong) on POWER with this change. I guess the saving grace is that > > although > > IMHO habana is not using the DMA API abstraction properly. If it > doesn't work on some archs is their bugs to deal with - we don't need > to complexify the core code to tiptoe around around such an abuse in > an obscure driver. > > Jason Agreed, feel free to change the kapi as you see fit. Do the right thing for the kernel. In any case, we limit ourselves to x86-64 arch in the 6.3 merge cycle. Oded
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index b189ed345057..a77d58e1b976 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1871,9 +1871,24 @@ int bus_iommu_probe(struct bus_type *bus) return ret; } +static int __iommu_present(struct device *dev, void *unused) +{ + return device_iommu_mapped(dev); +} + +/** + * iommu_present() - make platform-specific assumptions about an IOMMU + * @bus: bus to check + * + * Do not use this function. You want device_iommu_mapped() instead. + * + * Return: true if some IOMMU is present for some device on the given bus. In + * general it may not be the only IOMMU, and it may not be for the device you + * are ultimately interested in. + */ bool iommu_present(struct bus_type *bus) { - return bus->iommu_ops != NULL; + return bus_for_each_dev(bus, NULL, NULL, __iommu_present) > 0; } EXPORT_SYMBOL_GPL(iommu_present);