[1/4] iommu: Add a broken_unmanaged_domain flag in iommu_ops

Message ID 0875479d24a53670e17db8a11945664a6bb4a25b.1674849118.git.nicolinc@nvidia.com
State New
Headers
Series iommu: Reject drivers with broken_unmanaged_domain |

Commit Message

Nicolin Chen Jan. 27, 2023, 8:04 p.m. UTC
  Both IOMMU_DOMAIN_UNMANAGED and IOMMU_DOMAIN_DMA require the support
of __IOMMU_DOMAIN_PAGING capability, i.e. iommu_map/unmap. However,
some older iommu drivers do not fully support that, and these drivers
also do not advertise support for dma-iommu.c via IOMMU_DOMAIN_DMA,
or use arm_iommu_create_mapping(), so largely their implementations
of IOMMU_DOMAIN_UNMANAGED are untested. This means that a user like
vfio/iommufd does not likely work with them.

Several of them have obvious problems:
  * fsl_pamu_domain.c
    Without map/unmap ops in the default_domain_ops, it isn't an
    unmanaged domain at all.
  * mtk_iommu_v1.c
    With a fixed 4M "pagetable", it can only map exactly 4G of
    memory, but doesn't set the aperture.
  * tegra-gart.c
    Its notion of attach/detach and groups has to be a complete lie to
    get around all the other API expectations.

Some others might work but have never been tested with vfio/iommufd:
  * msm_iommu.c
  * omap-iommu.c
  * tegra-smmu.c

Thus, mark all these drivers as having "broken" UNAMANGED domains and
add a new device_iommu_unmanaged_supported() API for vfio/iommufd and
dma-iommu to refuse to work with these drivers.

Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/fsl_pamu_domain.c |  1 +
 drivers/iommu/iommu.c           | 24 ++++++++++++++++++++++++
 drivers/iommu/msm_iommu.c       |  1 +
 drivers/iommu/mtk_iommu_v1.c    |  1 +
 drivers/iommu/omap-iommu.c      |  1 +
 drivers/iommu/tegra-gart.c      |  1 +
 drivers/iommu/tegra-smmu.c      |  1 +
 include/linux/iommu.h           | 11 +++++++++++
 8 files changed, 41 insertions(+)
  

Comments

Robin Murphy Jan. 27, 2023, 9:58 p.m. UTC | #1
On 2023-01-27 20:04, Nicolin Chen wrote:
> Both IOMMU_DOMAIN_UNMANAGED and IOMMU_DOMAIN_DMA require the support
> of __IOMMU_DOMAIN_PAGING capability, i.e. iommu_map/unmap. However,
> some older iommu drivers do not fully support that, and these drivers
> also do not advertise support for dma-iommu.c via IOMMU_DOMAIN_DMA,
> or use arm_iommu_create_mapping(), so largely their implementations
> of IOMMU_DOMAIN_UNMANAGED are untested. This means that a user like
> vfio/iommufd does not likely work with them.
> 
> Several of them have obvious problems:
>    * fsl_pamu_domain.c
>      Without map/unmap ops in the default_domain_ops, it isn't an
>      unmanaged domain at all.
>    * mtk_iommu_v1.c
>      With a fixed 4M "pagetable", it can only map exactly 4G of
>      memory, but doesn't set the aperture.

The aperture is easily fixed (one could argue that what's broken there 
are the ARM DMA ops for assuming every IOMMU has a 32-bit IOVA space and 
not checking).

>    * tegra-gart.c
>      Its notion of attach/detach and groups has to be a complete lie to
>      get around all the other API expectations.

That's true, and the domain is tiny and not isolated from the rest of 
the address space outside the aperture, but the one thing it does do is 
support iommu_map/unmap just fine, which is what this flag is documented 
as saying it doesn't.

> Some others might work but have never been tested with vfio/iommufd:
>    * msm_iommu.c
>    * omap-iommu.c
>    * tegra-smmu.c

And yet they all have other in-tree users (GPUs on MSM and Tegra, 
remoteproc on OMAP) that allocate unmanaged domains and use 
iommu_map/unmap just fine, so they're clearly not broken either.

On the flipside, you're also missing cases like apple-dart, which can 
have broken unmanaged domains by any definition, but only under certain 
conditions (at least it "fails safe" and they will refuse attempts to 
attach anything). I'd also question sprd-iommu, which hardly has a 
generally-useful domain size, and has only just recently gained the 
ability to unmap anything successfully. TBH none of the SoC IOMMUs are 
likely to ever be of interest to VFIO or IOMMUFD, since the only things 
they could assign to userspace are the individual devices - usually 
graphics and media engines - that they're coupled to, whose useful 
functionality tends to depend on clocks, phys, and random other 
low-level stuff that would be somewhere between impractical and 
downright unsafe to attempt to somehow expose as well.

> Thus, mark all these drivers as having "broken" UNAMANGED domains and
> add a new device_iommu_unmanaged_supported() API for vfio/iommufd and
> dma-iommu to refuse to work with these drivers.
> 
> Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>

[...]

> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 46e1347bfa22..919a5dbad75b 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -245,6 +245,10 @@ struct iommu_iotlb_gather {
>    *                    pasid, so that any DMA transactions with this pasid
>    *                    will be blocked by the hardware.
>    * @pgsize_bitmap: bitmap of all possible supported page sizes
> + * @broken_unmanaged_domain: IOMMU_DOMAIN_UNMANAGED is not fully functional; the
> + *                           driver does not really support iommu_map/unmap, but
> + *                           uses UNMANAGED domains for the IOMMU API, called by
> + *                           other SOC drivers.

"uses UNMANAGED domains for the IOMMU API" is literally the definition 
of unmanaged domains :/

Some "other SOC drivers" use more of the IOMMU API than VFIO does :/

Please just add IOMMU_CAP_IOMMUFD to represent whatever the nebulous 
requirements of IOMMUFD actually are (frankly it's no less informative 
than calling domains "broken"), handle that in the drivers you care 
about and have tested, and use device_iommu_capable(). What you're 
describing in this series is a capability, and we have a perfectly good 
API for drivers to express those already. Plus, as demonstrated above, a 
positive capability based on empirical testing will be infinitely more 
robust than a negative one based on guessing.

Thanks,
Robin.
  
Nicolin Chen Jan. 27, 2023, 11:39 p.m. UTC | #2
Hi Robin.

On Fri, Jan 27, 2023 at 09:58:46PM +0000, Robin Murphy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-01-27 20:04, Nicolin Chen wrote:
> > Both IOMMU_DOMAIN_UNMANAGED and IOMMU_DOMAIN_DMA require the support
> > of __IOMMU_DOMAIN_PAGING capability, i.e. iommu_map/unmap. However,
> > some older iommu drivers do not fully support that, and these drivers
> > also do not advertise support for dma-iommu.c via IOMMU_DOMAIN_DMA,
> > or use arm_iommu_create_mapping(), so largely their implementations
> > of IOMMU_DOMAIN_UNMANAGED are untested. This means that a user like
> > vfio/iommufd does not likely work with them.
> > 
> > Several of them have obvious problems:
> >    * fsl_pamu_domain.c
> >      Without map/unmap ops in the default_domain_ops, it isn't an
> >      unmanaged domain at all.
> >    * mtk_iommu_v1.c
> >      With a fixed 4M "pagetable", it can only map exactly 4G of
> >      memory, but doesn't set the aperture.
> 
> The aperture is easily fixed (one could argue that what's broken there
> are the ARM DMA ops for assuming every IOMMU has a 32-bit IOVA space and
> not checking).
>
> >    * tegra-gart.c
> >      Its notion of attach/detach and groups has to be a complete lie to
> >      get around all the other API expectations.
> 
> That's true, and the domain is tiny and not isolated from the rest of
> the address space outside the aperture, but the one thing it does do is
> support iommu_map/unmap just fine, which is what this flag is documented
> as saying it doesn't.
> 
> > Some others might work but have never been tested with vfio/iommufd:
> >    * msm_iommu.c
> >    * omap-iommu.c
> >    * tegra-smmu.c
> 
> And yet they all have other in-tree users (GPUs on MSM and Tegra,
> remoteproc on OMAP) that allocate unmanaged domains and use
> iommu_map/unmap just fine, so they're clearly not broken either.
> 
> On the flipside, you're also missing cases like apple-dart, which can
> have broken unmanaged domains by any definition, but only under certain
> conditions (at least it "fails safe" and they will refuse attempts to
> attach anything). I'd also question sprd-iommu, which hardly has a
> generally-useful domain size, and has only just recently gained the
> ability to unmap anything successfully. TBH none of the SoC IOMMUs are
> likely to ever be of interest to VFIO or IOMMUFD, since the only things
> they could assign to userspace are the individual devices - usually
> graphics and media engines - that they're coupled to, whose useful
> functionality tends to depend on clocks, phys, and random other
> low-level stuff that would be somewhere between impractical and
> downright unsafe to attempt to somehow expose as well.

Thanks for all the inputs.

> > Thus, mark all these drivers as having "broken" UNAMANGED domains and
> > add a new device_iommu_unmanaged_supported() API for vfio/iommufd and
> > dma-iommu to refuse to work with these drivers.
> > 
> > Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> 
> [...]
> 
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 46e1347bfa22..919a5dbad75b 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -245,6 +245,10 @@ struct iommu_iotlb_gather {
> >    *                    pasid, so that any DMA transactions with this pasid
> >    *                    will be blocked by the hardware.
> >    * @pgsize_bitmap: bitmap of all possible supported page sizes
> > + * @broken_unmanaged_domain: IOMMU_DOMAIN_UNMANAGED is not fully functional; the
> > + *                           driver does not really support iommu_map/unmap, but
> > + *                           uses UNMANAGED domains for the IOMMU API, called by
> > + *                           other SOC drivers.
> 
> "uses UNMANAGED domains for the IOMMU API" is literally the definition
> of unmanaged domains :/
> 
> Some "other SOC drivers" use more of the IOMMU API than VFIO does :/
> 
> Please just add IOMMU_CAP_IOMMUFD to represent whatever the nebulous
> requirements of IOMMUFD actually are (frankly it's no less informative
> than calling domains "broken"), handle that in the drivers you care
> about and have tested, and use device_iommu_capable(). What you're
> describing in this series is a capability, and we have a perfectly good
> API for drivers to express those already. Plus, as demonstrated above, a
> positive capability based on empirical testing will be infinitely more
> robust than a negative one based on guessing.

OK. I can change to IOMMU_CAP_IOMMUFD, and add to the drivers that
are tested. And an IOMMU driver that wants to use IOMMUFD can add
such a CAP later whenever it's ready.

Yet, "IOMMU_CAP_IOMMUFD" would make the VFIO change suspicious, so
perhaps the next version is just one CAP patch + one IOMMUFD patch.
@Jason, any concern?

Thank you
Nicolin
  
Jason Gunthorpe Jan. 27, 2023, 11:54 p.m. UTC | #3
On Fri, Jan 27, 2023 at 09:58:46PM +0000, Robin Murphy wrote:

> Please just add IOMMU_CAP_IOMMUFD to represent whatever the nebulous
> requirements of IOMMUFD actually are (frankly it's no less informative than
> calling domains "broken"), handle that in the drivers you care about
> and

I don't want to tie this to iommufd, that isn't the point.

We clearly have drivers that don't implement the iommu kernel API
properly, because their only use is between the iommu driver and some
other same-SOC driver.

As a user of the iommu API iommufd and VFIO want to avoid these
drivers.

We have that acknowledgment already via the IOMMU_DOMAIN_DMA stuff
protecting the dma_iommu.c from those same drivers.

So, how about this below instead. Imagine it is followed by something along
the lines of my other sketch:

https://lore.kernel.org/linux-iommu/Y4%2FLsZKmR3iWFphU@nvidia.com/

And we completely delete IOMMU_DOMAIN_DMA/_FQ and get dma-iommu.c
mostly out of driver code.

iommufd/vfio would refuse to work with drivers that don't indicate
they support dma_iommu.c, that is good enough.

We'd eventually rename IOMMU_DOMAIN_UNMANAGED to IOMMU_DOMAIN_MAPPING.

Subject: [PATCH] iommu: Remove IOMMU_DOMAIN_DMA from drivers

The IOMMU_DOMAIN_DMA is exactly the same driver functionality as
IOMMU_DOMAIN_UNMANAGED, except it also implies that dma_iommu.c should
be able to use this driver.

As a first step to removing IOMMU_DOMAIN_DMA remove all references to
it from the drivers. Two simple ops flags are enough to indicate if
the driver is compatible with dma_iommu.c and if the driver will allow
the FQ mode to be selected. When dma-iommu.c needs an a domain it will
request an IOMMU_DOMAIN_UNMANAGED.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/apple-dart.c                  |  3 ++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  4 +--
 drivers/iommu/arm/arm-smmu/arm-smmu.c       | 11 ++++----
 drivers/iommu/arm/arm-smmu/qcom_iommu.c     |  3 ++-
 drivers/iommu/exynos-iommu.c                |  3 ++-
 drivers/iommu/intel/iommu.c                 |  3 +--
 drivers/iommu/iommu.c                       | 29 ++++++++++++++++-----
 drivers/iommu/ipmmu-vmsa.c                  |  3 ++-
 drivers/iommu/mtk_iommu.c                   |  3 ++-
 drivers/iommu/rockchip-iommu.c              |  3 ++-
 drivers/iommu/sprd-iommu.c                  |  3 ++-
 drivers/iommu/sun50i-iommu.c                |  4 +--
 drivers/iommu/virtio-iommu.c                |  2 +-
 include/linux/iommu.h                       |  2 ++
 14 files changed, 50 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 4f4a323be0d0ff..0eb3eb857d7e9e 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -580,7 +580,7 @@ static struct iommu_domain *apple_dart_domain_alloc(unsigned int type)
 {
 	struct apple_dart_domain *dart_domain;
 
-	if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED &&
+	if (type != IOMMU_DOMAIN_UNMANAGED &&
 	    type != IOMMU_DOMAIN_IDENTITY && type != IOMMU_DOMAIN_BLOCKED)
 		return NULL;
 
@@ -769,6 +769,7 @@ static void apple_dart_get_resv_regions(struct device *dev,
 }
 
 static const struct iommu_ops apple_dart_iommu_ops = {
+	.use_dma_iommu = true,
 	.domain_alloc = apple_dart_domain_alloc,
 	.probe_device = apple_dart_probe_device,
 	.release_device = apple_dart_release_device,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ab160198edd6b1..2253c5598092d0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2013,8 +2013,6 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 		return arm_smmu_sva_domain_alloc();
 
 	if (type != IOMMU_DOMAIN_UNMANAGED &&
-	    type != IOMMU_DOMAIN_DMA &&
-	    type != IOMMU_DOMAIN_DMA_FQ &&
 	    type != IOMMU_DOMAIN_IDENTITY)
 		return NULL;
 
@@ -2844,6 +2842,8 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 }
 
 static struct iommu_ops arm_smmu_ops = {
+	.use_dma_iommu		= true,
+	.allow_dma_fq		= true,
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.probe_device		= arm_smmu_probe_device,
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 719fbca1fe52a0..7bb160bdc4b594 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -855,11 +855,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_IDENTITY) {
-		if (using_legacy_binding ||
-		    (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_DMA_FQ))
-			return NULL;
-	}
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_IDENTITY)
+		return NULL;
+
 	/*
 	 * Allocate the domain and initialise some of its data structures.
 	 * We can't really do anything meaningful until we've added a
@@ -1555,6 +1553,8 @@ static int arm_smmu_def_domain_type(struct device *dev)
 }
 
 static struct iommu_ops arm_smmu_ops = {
+	.use_dma_iommu		= true,
+	.allow_dma_fq		= true,
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.probe_device		= arm_smmu_probe_device,
@@ -1982,6 +1982,7 @@ static int arm_smmu_device_dt_probe(struct arm_smmu_device *smmu,
 				  IS_ENABLED(CONFIG_ARM_SMMU_LEGACY_DT_BINDINGS) ? "DMA API" : "SMMU");
 		}
 		using_legacy_binding = true;
+		arm_smmu_ops.use_dma_iommu = false;
 	} else if (!legacy_binding && !using_legacy_binding) {
 		using_generic_binding = true;
 	} else {
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 270c3d9128bab8..babd20108f6a17 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -323,7 +323,7 @@ static struct iommu_domain *qcom_iommu_domain_alloc(unsigned type)
 {
 	struct qcom_iommu_domain *qcom_domain;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+	if (type != IOMMU_DOMAIN_UNMANAGED)
 		return NULL;
 	/*
 	 * Allocate the domain and initialise some of its data structures.
@@ -575,6 +575,7 @@ static int qcom_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
 }
 
 static const struct iommu_ops qcom_iommu_ops = {
+	.use_dma_iommu  = true,
 	.capable	= qcom_iommu_capable,
 	.domain_alloc	= qcom_iommu_domain_alloc,
 	.probe_device	= qcom_iommu_probe_device,
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index b0cde22119875e..824350551e5933 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -825,7 +825,7 @@ static struct iommu_domain *exynos_iommu_domain_alloc(unsigned type)
 	/* Check if correct PTE offsets are initialized */
 	BUG_ON(PG_ENT_SHIFT < 0 || !dma_dev);
 
-	if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED)
+	if (type != IOMMU_DOMAIN_UNMANAGED)
 		return NULL;
 
 	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
@@ -1396,6 +1396,7 @@ static int exynos_iommu_of_xlate(struct device *dev,
 }
 
 static const struct iommu_ops exynos_iommu_ops = {
+	.use_dma_iommu = true,
 	.domain_alloc = exynos_iommu_domain_alloc,
 	.device_group = generic_device_group,
 	.probe_device = exynos_iommu_probe_device,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 59df7e42fd533c..bb34d3f641f17b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4165,8 +4165,6 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
 	switch (type) {
 	case IOMMU_DOMAIN_BLOCKED:
 		return &blocking_domain;
-	case IOMMU_DOMAIN_DMA:
-	case IOMMU_DOMAIN_DMA_FQ:
 	case IOMMU_DOMAIN_UNMANAGED:
 		dmar_domain = alloc_domain(type);
 		if (!dmar_domain) {
@@ -4761,6 +4759,7 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 }
 
 const struct iommu_ops intel_iommu_ops = {
+	.use_dma_iommu		= true,
 	.capable		= intel_iommu_capable,
 	.domain_alloc		= intel_iommu_domain_alloc,
 	.probe_device		= intel_iommu_probe_device,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index de91dd88705bd3..877ef0a58b07f4 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1618,17 +1618,32 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus,
 {
 	struct iommu_domain *dom;
 
-	dom = __iommu_domain_alloc(bus, type);
-	if (!dom && type != IOMMU_DOMAIN_DMA) {
-		dom = __iommu_domain_alloc(bus, IOMMU_DOMAIN_DMA);
-		if (dom)
+	/*
+	 * Keep the DMA domain type out of the drivers, eventually it will go
+	 * away completely.
+	 */
+	if (type == IOMMU_DOMAIN_IDENTITY) {
+		dom = __iommu_domain_alloc(bus, IOMMU_DOMAIN_IDENTITY);
+		if (!dom)
+			return -ENOMEM;
+	} else if (type == IOMMU_DOMAIN_DMA_FQ || type == IOMMU_DOMAIN_DMA) {
+		if (!bus->iommu_ops->use_dma_iommu)
+			return -EINVAL;
+
+		dom = __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED);
+		if (!dom)
+			return -ENOMEM;
+		if (type == IOMMU_DOMAIN_DMA_FQ &&
+		    !bus->iommu_ops->allow_dma_fq) {
 			pr_warn("Failed to allocate default IOMMU domain of type %u for group %s - Falling back to IOMMU_DOMAIN_DMA",
 				type, group->name);
+			type = IOMMU_DOMAIN_DMA;
+		}
+		dom->type = type;
+	} else {
+		return -EINVAL;
 	}
 
-	if (!dom)
-		return -ENOMEM;
-
 	group->default_domain = dom;
 	if (!group->domain)
 		group->domain = dom;
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index a003bd5fc65c13..a22df69f7f4775 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -571,7 +571,7 @@ static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
 {
 	struct ipmmu_vmsa_domain *domain;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+	if (type != IOMMU_DOMAIN_UNMANAGED)
 		return NULL;
 
 	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
@@ -866,6 +866,7 @@ static struct iommu_group *ipmmu_find_group(struct device *dev)
 }
 
 static const struct iommu_ops ipmmu_ops = {
+	.use_dma_iommu = true,
 	.domain_alloc = ipmmu_domain_alloc,
 	.probe_device = ipmmu_probe_device,
 	.release_device = ipmmu_release_device,
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 2badd6acfb23d6..e27cb428bd9583 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -636,7 +636,7 @@ static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
 {
 	struct mtk_iommu_domain *dom;
 
-	if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED)
+	if (type != IOMMU_DOMAIN_UNMANAGED)
 		return NULL;
 
 	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
@@ -936,6 +936,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev,
 }
 
 static const struct iommu_ops mtk_iommu_ops = {
+	.use_dma_iommu  = true,
 	.domain_alloc	= mtk_iommu_domain_alloc,
 	.probe_device	= mtk_iommu_probe_device,
 	.release_device	= mtk_iommu_release_device,
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index a68eadd64f38db..fa3ec38e5244db 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1061,7 +1061,7 @@ static struct iommu_domain *rk_iommu_domain_alloc(unsigned type)
 {
 	struct rk_iommu_domain *rk_domain;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+	if (type != IOMMU_DOMAIN_UNMANAGED)
 		return NULL;
 
 	if (!dma_dev)
@@ -1184,6 +1184,7 @@ static int rk_iommu_of_xlate(struct device *dev,
 }
 
 static const struct iommu_ops rk_iommu_ops = {
+	.use_dma_iommu = true,
 	.domain_alloc = rk_iommu_domain_alloc,
 	.probe_device = rk_iommu_probe_device,
 	.release_device = rk_iommu_release_device,
diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
index 219bfa11f7f48f..fbff1831c16e78 100644
--- a/drivers/iommu/sprd-iommu.c
+++ b/drivers/iommu/sprd-iommu.c
@@ -136,7 +136,7 @@ static struct iommu_domain *sprd_iommu_domain_alloc(unsigned int domain_type)
 {
 	struct sprd_iommu_domain *dom;
 
-	if (domain_type != IOMMU_DOMAIN_DMA && domain_type != IOMMU_DOMAIN_UNMANAGED)
+	if (domain_type != IOMMU_DOMAIN_UNMANAGED)
 		return NULL;
 
 	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
@@ -406,6 +406,7 @@ static int sprd_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
 
 
 static const struct iommu_ops sprd_iommu_ops = {
+	.use_dma_iommu  = true,
 	.domain_alloc	= sprd_iommu_domain_alloc,
 	.probe_device	= sprd_iommu_probe_device,
 	.device_group	= sprd_iommu_device_group,
diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
index 5b585eace3d46f..de2a033f65197a 100644
--- a/drivers/iommu/sun50i-iommu.c
+++ b/drivers/iommu/sun50i-iommu.c
@@ -671,8 +671,7 @@ static struct iommu_domain *sun50i_iommu_domain_alloc(unsigned type)
 {
 	struct sun50i_iommu_domain *sun50i_domain;
 
-	if (type != IOMMU_DOMAIN_DMA &&
-	    type != IOMMU_DOMAIN_UNMANAGED)
+	if (type != IOMMU_DOMAIN_UNMANAGED)
 		return NULL;
 
 	sun50i_domain = kzalloc(sizeof(*sun50i_domain), GFP_KERNEL);
@@ -827,6 +826,7 @@ static int sun50i_iommu_of_xlate(struct device *dev,
 }
 
 static const struct iommu_ops sun50i_iommu_ops = {
+	.use_dma_iommu  = true,
 	.pgsize_bitmap	= SZ_4K,
 	.device_group	= sun50i_iommu_device_group,
 	.domain_alloc	= sun50i_iommu_domain_alloc,
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 5b8fe9bfa9a5b9..8afa21a9c0b9d6 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -642,7 +642,6 @@ static struct iommu_domain *viommu_domain_alloc(unsigned type)
 	struct viommu_domain *vdomain;
 
 	if (type != IOMMU_DOMAIN_UNMANAGED &&
-	    type != IOMMU_DOMAIN_DMA &&
 	    type != IOMMU_DOMAIN_IDENTITY)
 		return NULL;
 
@@ -1018,6 +1017,7 @@ static bool viommu_capable(struct device *dev, enum iommu_cap cap)
 }
 
 static struct iommu_ops viommu_ops = {
+	.use_dma_iommu		= true,
 	.capable		= viommu_capable,
 	.domain_alloc		= viommu_domain_alloc,
 	.probe_device		= viommu_probe_device,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 46e1347bfa2286..954db8e77491c7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -277,6 +277,8 @@ struct iommu_ops {
 
 	const struct iommu_domain_ops *default_domain_ops;
 	unsigned long pgsize_bitmap;
+	u8 use_dma_iommu : 1;
+	u8 allow_dma_fq : 1;
 	struct module *owner;
 };
  
Tian, Kevin Jan. 29, 2023, 7:54 a.m. UTC | #4
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, January 28, 2023 4:04 AM
> 
> Both IOMMU_DOMAIN_UNMANAGED and IOMMU_DOMAIN_DMA require
> the support
> of __IOMMU_DOMAIN_PAGING capability, i.e. iommu_map/unmap.
> However,
> some older iommu drivers do not fully support that, and these drivers
> also do not advertise support for dma-iommu.c via IOMMU_DOMAIN_DMA,
> or use arm_iommu_create_mapping(), so largely their implementations
> of IOMMU_DOMAIN_UNMANAGED are untested. This means that a user like
> vfio/iommufd does not likely work with them.
> 
> Several of them have obvious problems:
>   * fsl_pamu_domain.c
>     Without map/unmap ops in the default_domain_ops, it isn't an
>     unmanaged domain at all.
>   * mtk_iommu_v1.c
>     With a fixed 4M "pagetable", it can only map exactly 4G of
>     memory, but doesn't set the aperture.
>   * tegra-gart.c
>     Its notion of attach/detach and groups has to be a complete lie to
>     get around all the other API expectations.
> 
> Some others might work but have never been tested with vfio/iommufd:
>   * msm_iommu.c
>   * omap-iommu.c
>   * tegra-smmu.c
> 

Do we have a link where all drivers tested with vfio/iommufd have been
listed?

In a quick glance at least exynos-iommu.c and apple-dart.c both support
UNMANAGED with map/unmap ops. They are not mentioned in above
list but I doubt they are tested for vfio/iommufd.
  
Tian, Kevin Jan. 29, 2023, 8:11 a.m. UTC | #5
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, January 28, 2023 7:54 AM
> 
> On Fri, Jan 27, 2023 at 09:58:46PM +0000, Robin Murphy wrote:
> 
> > Please just add IOMMU_CAP_IOMMUFD to represent whatever the
> nebulous
> > requirements of IOMMUFD actually are (frankly it's no less informative
> than
> > calling domains "broken"), handle that in the drivers you care about
> > and
> 
> I don't want to tie this to iommufd, that isn't the point.
> 
> We clearly have drivers that don't implement the iommu kernel API
> properly, because their only use is between the iommu driver and some
> other same-SOC driver.
> 
> As a user of the iommu API iommufd and VFIO want to avoid these
> drivers.
> 
> We have that acknowledgment already via the IOMMU_DOMAIN_DMA stuff
> protecting the dma_iommu.c from those same drivers.
> 
> So, how about this below instead. Imagine it is followed by something along
> the lines of my other sketch:
> 
> https://lore.kernel.org/linux-iommu/Y4%2FLsZKmR3iWFphU@nvidia.com/
> 
> And we completely delete IOMMU_DOMAIN_DMA/_FQ and get dma-
> iommu.c
> mostly out of driver code.
> 
> iommufd/vfio would refuse to work with drivers that don't indicate
> they support dma_iommu.c, that is good enough.

this is a good idea. Just one doubt. Robin mentioned that sprd-iommu
might have a broken unmanaged domains:

" I'd also question sprd-iommu, which hardly has a generally-useful
domain size, and has only just recently gained the ability to unmap
anything successfully."

Want to understand why that restriction is not a problem for DMA API.

what Jason proposed here assumes that existing driver support for
DMA API can be safely applied to vfio/iommufd. But above looks
warning certain exception?

> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 59df7e42fd533c..bb34d3f641f17b 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4165,8 +4165,6 @@ static struct iommu_domain
> *intel_iommu_domain_alloc(unsigned type)
>  	switch (type) {
>  	case IOMMU_DOMAIN_BLOCKED:
>  		return &blocking_domain;
> -	case IOMMU_DOMAIN_DMA:
> -	case IOMMU_DOMAIN_DMA_FQ:
>  	case IOMMU_DOMAIN_UNMANAGED:
>  		dmar_domain = alloc_domain(type);
>  		if (!dmar_domain) {
> @@ -4761,6 +4759,7 @@ static void intel_iommu_remove_dev_pasid(struct
> device *dev, ioasid_t pasid)
>  }
> 
>  const struct iommu_ops intel_iommu_ops = {
> +	.use_dma_iommu		= true,

missed:
+	.allow_dma_fq			= true,
  
Jason Gunthorpe Jan. 30, 2023, 1:33 p.m. UTC | #6
On Sun, Jan 29, 2023 at 08:11:48AM +0000, Tian, Kevin wrote:

> " I'd also question sprd-iommu, which hardly has a generally-useful
> domain size, and has only just recently gained the ability to unmap
> anything successfully."

So long as it has a correct kernel API and exposes the right (small)
aperture then it is OK.

The device will not be useful for qemu, but it would run some dpdk
configurations just fine.

All the VFIO world is not VMs.

> > @@ -4761,6 +4759,7 @@ static void intel_iommu_remove_dev_pasid(struct
> > device *dev, ioasid_t pasid)
> >  }
> > 
> >  const struct iommu_ops intel_iommu_ops = {
> > +	.use_dma_iommu		= true,
> 
> missed:
> +	.allow_dma_fq			= true,

Yep

Jason
  
Tian, Kevin Feb. 1, 2023, 3:14 a.m. UTC | #7
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, January 30, 2023 9:33 PM
> 
> On Sun, Jan 29, 2023 at 08:11:48AM +0000, Tian, Kevin wrote:
> 
> > " I'd also question sprd-iommu, which hardly has a generally-useful
> > domain size, and has only just recently gained the ability to unmap
> > anything successfully."
> 
> So long as it has a correct kernel API and exposes the right (small)
> aperture then it is OK.
> 
> The device will not be useful for qemu, but it would run some dpdk
> configurations just fine.

I still didn't get the restriction here. Can you elaborate why it works
with dpdk but not qemu?

Can qemu verify this restriction via existing path or need new uAPI
flag to communicate?
  
Jason Gunthorpe Feb. 1, 2023, 2:55 p.m. UTC | #8
On Wed, Feb 01, 2023 at 03:14:03AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Monday, January 30, 2023 9:33 PM
> > 
> > On Sun, Jan 29, 2023 at 08:11:48AM +0000, Tian, Kevin wrote:
> > 
> > > " I'd also question sprd-iommu, which hardly has a generally-useful
> > > domain size, and has only just recently gained the ability to unmap
> > > anything successfully."
> > 
> > So long as it has a correct kernel API and exposes the right (small)
> > aperture then it is OK.
> > 
> > The device will not be useful for qemu, but it would run some dpdk
> > configurations just fine.
> 
> I still didn't get the restriction here. Can you elaborate why it works
> with dpdk but not qemu?

dpdk needs like, say, 64M of aperture and doesn't care what the IOVAs
are

qemu needs the entire guest memory of aperture and must have IOVAs
that are 1:1 with the GPA.

So aperture size and location can exclude qemu

> Can qemu verify this restriction via existing path or need new uAPI
> flag to communicate?

It already happens, the aperture/etc is convayed to qemu through
IOMMUFD_CMD_IOAS_IOVA_RANGES and if qemu cannot get the IOVA's it
needs to create the guest it should fail.

Jason
  
Tian, Kevin Feb. 2, 2023, 3:50 a.m. UTC | #9
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, February 1, 2023 10:56 PM
> 
> On Wed, Feb 01, 2023 at 03:14:03AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Monday, January 30, 2023 9:33 PM
> > >
> > > On Sun, Jan 29, 2023 at 08:11:48AM +0000, Tian, Kevin wrote:
> > >
> > > > " I'd also question sprd-iommu, which hardly has a generally-useful
> > > > domain size, and has only just recently gained the ability to unmap
> > > > anything successfully."
> > >
> > > So long as it has a correct kernel API and exposes the right (small)
> > > aperture then it is OK.
> > >
> > > The device will not be useful for qemu, but it would run some dpdk
> > > configurations just fine.
> >
> > I still didn't get the restriction here. Can you elaborate why it works
> > with dpdk but not qemu?
> 
> dpdk needs like, say, 64M of aperture and doesn't care what the IOVAs
> are
> 
> qemu needs the entire guest memory of aperture and must have IOVAs
> that are 1:1 with the GPA.
> 
> So aperture size and location can exclude qemu
> 
> > Can qemu verify this restriction via existing path or need new uAPI
> > flag to communicate?
> 
> It already happens, the aperture/etc is convayed to qemu through
> IOMMUFD_CMD_IOAS_IOVA_RANGES and if qemu cannot get the IOVA's it
> needs to create the guest it should fail.
> 

Make sense.
  

Patch

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 4408ac3c49b6..1f3a4c8c78ad 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -448,6 +448,7 @@  static struct iommu_device *fsl_pamu_probe_device(struct device *dev)
 }
 
 static const struct iommu_ops fsl_pamu_ops = {
+	.broken_unmanaged_domain = true,
 	.capable	= fsl_pamu_capable,
 	.domain_alloc	= fsl_pamu_domain_alloc,
 	.probe_device	= fsl_pamu_probe_device,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5f6a85aea501..648fc04143b8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1897,6 +1897,30 @@  bool device_iommu_capable(struct device *dev, enum iommu_cap cap)
 }
 EXPORT_SYMBOL_GPL(device_iommu_capable);
 
+/**
+ * device_iommu_unmanaged_supported() - full support of IOMMU_DOMAIN_UNMANAGED
+ * @dev: device that is behind the iommu
+ *
+ * In general, all IOMMU drivers can allocate IOMMU_DOMAIN_UNMANAGED domains.
+ * However, some of them set the broken_unmanaged_domain, indicating something
+ * is wrong about its support of IOMMU_DOMAIN_UNMANAGED/__IOMMU_DOMAIN_PAGING.
+ * This can happen, when a driver lies about the support to get around with the
+ * IOMMU API, merely for other drivers to use, or when a driver has never been
+ * tested with vfio/iommufd that need a full support of IOMMU_DOMAIN_UNMANAGED.
+ *
+ * Return: true if an IOMMU is present and broken_unmanaged_domain is not set,
+ *         otherwise, false.
+ */
+bool device_iommu_unmanaged_supported(struct device *dev)
+{
+	if (dev->iommu && dev->iommu->iommu_dev &&
+	    !dev_iommu_ops(dev)->broken_unmanaged_domain)
+		return true;
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(device_iommu_unmanaged_supported);
+
 /**
  * iommu_set_fault_handler() - set a fault handler for an iommu domain
  * @domain: iommu domain
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index c60624910872..9c0bd5aee10b 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -675,6 +675,7 @@  irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id)
 }
 
 static struct iommu_ops msm_iommu_ops = {
+	.broken_unmanaged_domain = true,
 	.domain_alloc = msm_iommu_domain_alloc,
 	.probe_device = msm_iommu_probe_device,
 	.device_group = generic_device_group,
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index ca581ff1c769..c2ecbff6592c 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -578,6 +578,7 @@  static int mtk_iommu_v1_hw_init(const struct mtk_iommu_v1_data *data)
 }
 
 static const struct iommu_ops mtk_iommu_v1_ops = {
+	.broken_unmanaged_domain = true,
 	.domain_alloc	= mtk_iommu_v1_domain_alloc,
 	.probe_device	= mtk_iommu_v1_probe_device,
 	.probe_finalize = mtk_iommu_v1_probe_finalize,
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 2fd7702c6709..17ed392b9f63 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1733,6 +1733,7 @@  static struct iommu_group *omap_iommu_device_group(struct device *dev)
 }
 
 static const struct iommu_ops omap_iommu_ops = {
+	.broken_unmanaged_domain = true,
 	.domain_alloc	= omap_iommu_domain_alloc,
 	.probe_device	= omap_iommu_probe_device,
 	.release_device	= omap_iommu_release_device,
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index ed53279d1106..9af56d2ec6c1 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -267,6 +267,7 @@  static void gart_iommu_sync(struct iommu_domain *domain,
 }
 
 static const struct iommu_ops gart_iommu_ops = {
+	.broken_unmanaged_domain = true,
 	.domain_alloc	= gart_iommu_domain_alloc,
 	.probe_device	= gart_iommu_probe_device,
 	.device_group	= generic_device_group,
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 5b1af40221ec..d1e4c4825d74 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -962,6 +962,7 @@  static int tegra_smmu_of_xlate(struct device *dev,
 }
 
 static const struct iommu_ops tegra_smmu_ops = {
+	.broken_unmanaged_domain = true,
 	.domain_alloc = tegra_smmu_domain_alloc,
 	.probe_device = tegra_smmu_probe_device,
 	.device_group = tegra_smmu_device_group,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 46e1347bfa22..919a5dbad75b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -245,6 +245,10 @@  struct iommu_iotlb_gather {
  *                    pasid, so that any DMA transactions with this pasid
  *                    will be blocked by the hardware.
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @broken_unmanaged_domain: IOMMU_DOMAIN_UNMANAGED is not fully functional; the
+ *                           driver does not really support iommu_map/unmap, but
+ *                           uses UNMANAGED domains for the IOMMU API, called by
+ *                           other SOC drivers.
  * @owner: Driver module providing these ops
  */
 struct iommu_ops {
@@ -277,6 +281,7 @@  struct iommu_ops {
 
 	const struct iommu_domain_ops *default_domain_ops;
 	unsigned long pgsize_bitmap;
+	bool broken_unmanaged_domain;
 	struct module *owner;
 };
 
@@ -455,6 +460,7 @@  static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
 extern int bus_iommu_probe(struct bus_type *bus);
 extern bool iommu_present(struct bus_type *bus);
 extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap);
+extern bool device_iommu_unmanaged_supported(struct device *dev);
 extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
 extern struct iommu_group *iommu_group_get_by_id(int id);
 extern void iommu_domain_free(struct iommu_domain *domain);
@@ -742,6 +748,11 @@  static inline bool device_iommu_capable(struct device *dev, enum iommu_cap cap)
 	return false;
 }
 
+static inline bool device_iommu_unmanaged_supported(struct device *dev)
+{
+	return false;
+}
+
 static inline struct iommu_domain *iommu_domain_alloc(struct bus_type *bus)
 {
 	return NULL;