[v1,0/8] Add IO page table replacement support

Message ID cover.1675320212.git.nicolinc@nvidia.com
Headers
Series Add IO page table replacement support |

Message

Nicolin Chen Feb. 2, 2023, 7:05 a.m. UTC
  Hi all,

The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
   table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
   to an S1 domain, or when switching between relevant S1 domains.

This series introduces a new iommu_group_replace_domain() for that. And
add corresponding support throughout the uAPI. So user space can do such
a REPLACE ioctl reusing the existing VFIO_DEVICE_ATTACH_IOMMUFD_PT. This
means that user space needs to be aware whether the device is attached or
not: an unattached device calling VFIO_DEVICE_ATTACH_IOMMUFD_PT means a
regular ATTACH; an attached device calling VFIO_DEVICE_ATTACH_IOMMUFD_PT
on the other hand means a REPLACE.

QEMU with this feature should have the vIOMMU maintain a cache of the
guest io page table addresses and assign a unique IOAS to each unique
guest page table.

As the guest writes the page table address to the HW registers qemu should
then use the 'replace domain' operation on VFIO to assign the VFIO device
to the correct de-duplicated page table.

The algorithm where QEMU uses one VFIO container per-device and removes
all the mappings to change the assignment should ideally not be used with
iommufd.

To apply this series, please rebase on top of the following patches:
1) [PATCH 00/13] Add vfio_device cdev for iommufd support
   https://lore.kernel.org/kvm/20230117134942.101112-1-yi.l.liu@intel.com/
2) (Merged) [PATCH v5 0/5] iommu: Retire detach_dev callback
   https://lore.kernel.org/linux-iommu/20230110025408.667767-1-baolu.lu@linux.intel.com/
3) (Merged) [PATCH] selftests: iommu: Fix test_cmd_destroy_access() call in user_copy
   https://lore.kernel.org/lkml/20230120074204.1368-1-nicolinc@nvidia.com/

Or you can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v1

Thank you
Nicolin Chen

Nicolin Chen (7):
  iommu: Introduce a new iommu_group_replace_domain() API
  iommufd: Create access in vfio_iommufd_emulated_bind()
  iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_SET_IOAS coverage
  iommufd: Add replace support in iommufd_access_set_ioas()
  iommufd/selftest: Add coverage for access->ioas replacement
  iommufd/device: Use iommu_group_replace_domain()
  vfio-iommufd: Support IO page table replacement

Yi Liu (1):
  iommu: Move dev_iommu_ops() to private header

 drivers/iommu/iommu-priv.h                    |  22 +++
 drivers/iommu/iommu.c                         |  32 ++++
 drivers/iommu/iommufd/device.c                | 150 +++++++++++++++---
 drivers/iommu/iommufd/iommufd_private.h       |   4 +
 drivers/iommu/iommufd/iommufd_test.h          |   4 +
 drivers/iommu/iommufd/selftest.c              |  25 ++-
 drivers/vfio/iommufd.c                        |  33 ++--
 include/linux/iommu.h                         |  11 --
 include/linux/iommufd.h                       |   4 +-
 tools/testing/selftests/iommu/iommufd.c       |  29 +++-
 tools/testing/selftests/iommu/iommufd_utils.h |  22 ++-
 11 files changed, 273 insertions(+), 63 deletions(-)
 create mode 100644 drivers/iommu/iommu-priv.h
  

Comments

Tian, Kevin Feb. 3, 2023, 8:09 a.m. UTC | #1
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 2, 2023 3:05 PM
> 
> QEMU with this feature should have the vIOMMU maintain a cache of the
> guest io page table addresses and assign a unique IOAS to each unique
> guest page table.

I didn't get why we impose such requirement to userspace.

> 
> To apply this series, please rebase on top of the following patches:
> 1) [PATCH 00/13] Add vfio_device cdev for iommufd support
>    https://lore.kernel.org/kvm/20230117134942.101112-1-yi.l.liu@intel.com/
> 2) (Merged) [PATCH v5 0/5] iommu: Retire detach_dev callback
>    https://lore.kernel.org/linux-iommu/20230110025408.667767-1-
> baolu.lu@linux.intel.com/
> 3) (Merged) [PATCH] selftests: iommu: Fix test_cmd_destroy_access() call in
> user_copy
>    https://lore.kernel.org/lkml/20230120074204.1368-1-nicolinc@nvidia.com/
> 

No need to list merged work as dependency. In concept every commit
merged in the version which this series is rebased to is required. 😊
  
Jason Gunthorpe Feb. 3, 2023, 3:04 p.m. UTC | #2
On Fri, Feb 03, 2023 at 08:09:30AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Thursday, February 2, 2023 3:05 PM
> > 
> > QEMU with this feature should have the vIOMMU maintain a cache of the
> > guest io page table addresses and assign a unique IOAS to each unique
> > guest page table.
> 
> I didn't get why we impose such requirement to userspace.

I read this as implementation guidance for qemu. qemu can do what it
wants of course

Jason
  
Tian, Kevin Feb. 6, 2023, 6:39 a.m. UTC | #3
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, February 3, 2023 11:04 PM
> 
> On Fri, Feb 03, 2023 at 08:09:30AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Thursday, February 2, 2023 3:05 PM
> > >
> > > QEMU with this feature should have the vIOMMU maintain a cache of the
> > > guest io page table addresses and assign a unique IOAS to each unique
> > > guest page table.
> >
> > I didn't get why we impose such requirement to userspace.
> 
> I read this as implementation guidance for qemu. qemu can do what it
> wants of course
> 

sure but I still didn't get why this is a guidance specific to the
new replace cmd...
  
Jason Gunthorpe Feb. 6, 2023, 1:26 p.m. UTC | #4
On Mon, Feb 06, 2023 at 06:39:29AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, February 3, 2023 11:04 PM
> > 
> > On Fri, Feb 03, 2023 at 08:09:30AM +0000, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Thursday, February 2, 2023 3:05 PM
> > > >
> > > > QEMU with this feature should have the vIOMMU maintain a cache of the
> > > > guest io page table addresses and assign a unique IOAS to each unique
> > > > guest page table.
> > >
> > > I didn't get why we impose such requirement to userspace.
> > 
> > I read this as implementation guidance for qemu. qemu can do what it
> > wants of course
> > 
> 
> sure but I still didn't get why this is a guidance specific to the
> new replace cmd...

I think the guidance is about the change to VFIO uAPI where it is now
possible to change the domain attached, previously that was not
possible

Jason
  
Tian, Kevin Feb. 7, 2023, 12:34 a.m. UTC | #5
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, February 6, 2023 9:26 PM
> 
> On Mon, Feb 06, 2023 at 06:39:29AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, February 3, 2023 11:04 PM
> > >
> > > On Fri, Feb 03, 2023 at 08:09:30AM +0000, Tian, Kevin wrote:
> > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > > Sent: Thursday, February 2, 2023 3:05 PM
> > > > >
> > > > > QEMU with this feature should have the vIOMMU maintain a cache of
> the
> > > > > guest io page table addresses and assign a unique IOAS to each
> unique
> > > > > guest page table.
> > > >
> > > > I didn't get why we impose such requirement to userspace.
> > >
> > > I read this as implementation guidance for qemu. qemu can do what it
> > > wants of course
> > >
> >
> > sure but I still didn't get why this is a guidance specific to the
> > new replace cmd...
> 
> I think the guidance is about the change to VFIO uAPI where it is now
> possible to change the domain attached, previously that was not
> possible
> 

that is fine. I just didn't get why the original description emphasized
the cache and unique IOAS aspects in Qemu.