[00/17] Add Intel VT-d nested translation

Message ID 20230209043153.14964-1-yi.l.liu@intel.com
Headers
Series Add Intel VT-d nested translation |

Message

Yi Liu Feb. 9, 2023, 4:31 a.m. UTC
  Nested translation has two stage address translations to get the final
physical addresses. Take Intel VT-d as an example, the first stage translation
structure is I/O page table. As the below diagram shows, guest I/O page
table pointer in GPA (guest physical address) is passed to host to do the
first stage translation. Along with it, guest modifications to present
mappings in the first stage page should be followed with an iotlb invalidation
to sync host iotlb.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest I/O page table      |
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush --+
    '-------------'                        |
    |             |                        V
    |             |           I/O page table pointer in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .------------------------.
    |   pIOMMU    |  |  FS for GIOVA->GPA      |
    |             |  '------------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.----------------------------------.
    |             |   | SS for GPA->HPA, unmanaged domain|
    |             |   '----------------------------------'
    '-------------'
Where:
 - FS = First stage page tables
 - SS = Second stage page tables
<Intel VT-d Nested translation>

Different platform vendors have different first stage translation formats,
so userspace should query the underlying iommu capability before setting
first stage translation structures to host.[1]

In iommufd subsystem, I/O page tables would be tracked by hw_pagetable objects.
First stage page table is owned by userspace (guest), while second stage page
table is owned by kernel for security. So First stage page tables are tracked
by user-managed hw_pagetable, second stage page tables are tracked by kernel-
managed hw_pagetable.

This series first introduces new iommu op for allocating domains for iommufd,
and op for syncing iotlb for first stage page table modifications, and then
add the implementation of the new ops in intel-iommu driver. After this
preparation, adds kernel-managed and user-managed hw_pagetable allocation for
userspace. Last, add self-test for the new ioctls.

This series is based on "[PATCH 0/6] iommufd: Add iommu capability reporting"[1]
and Nicolin's "[PATCH v2 00/10] Add IO page table replacement support"[2]. Complete
code can be found in[3]. Draft Qemu code can be found in[4].

Basic test done with DSA device on VT-d. Where the guest has a vIOMMU built
with nested translation.

[1] https://lore.kernel.org/linux-iommu/20230209041642.9346-1-yi.l.liu@intel.com/
[2] https://lore.kernel.org/linux-iommu/cover.1675802050.git.nicolinc@nvidia.com/
[3] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting_vtd_v1
[4] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv3%2Bnesting

Regards,
	Yi Liu

Lu Baolu (5):
  iommu: Add new iommu op to create domains owned by userspace
  iommu: Add nested domain support
  iommu/vt-d: Extend dmar_domain to support nested domain
  iommu/vt-d: Add helper to setup pasid nested translation
  iommu/vt-d: Add nested domain support

Nicolin Chen (6):
  iommufd: Add/del hwpt to IOAS at alloc/destroy()
  iommufd/device: Move IOAS attaching and detaching operations into
    helpers
  iommufd/selftest: Add IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE test op
  iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC ioctl
  iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
  iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl

Yi Liu (6):
  iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
  iommufd: Split iommufd_hw_pagetable_alloc()
  iommufd: Add kernel-managed hw_pagetable allocation for userspace
  iommufd: Add infrastructure for user-managed hw_pagetable allocation
  iommufd: Add user-managed hw_pagetable allocation
  iommufd/device: Report supported stage-1 page table types

 drivers/iommu/intel/Makefile                  |   2 +-
 drivers/iommu/intel/iommu.c                   |  38 ++-
 drivers/iommu/intel/iommu.h                   |  50 +++-
 drivers/iommu/intel/nested.c                  | 143 +++++++++
 drivers/iommu/intel/pasid.c                   | 142 +++++++++
 drivers/iommu/intel/pasid.h                   |   2 +
 drivers/iommu/iommufd/device.c                | 117 ++++----
 drivers/iommu/iommufd/hw_pagetable.c          | 280 +++++++++++++++++-
 drivers/iommu/iommufd/iommufd_private.h       |  23 +-
 drivers/iommu/iommufd/iommufd_test.h          |  35 +++
 drivers/iommu/iommufd/main.c                  |  11 +
 drivers/iommu/iommufd/selftest.c              | 149 +++++++++-
 include/linux/iommu.h                         |  11 +
 include/uapi/linux/iommufd.h                  | 196 ++++++++++++
 tools/testing/selftests/iommu/iommufd.c       | 124 +++++++-
 tools/testing/selftests/iommu/iommufd_utils.h | 106 +++++++
 16 files changed, 1329 insertions(+), 100 deletions(-)
 create mode 100644 drivers/iommu/intel/nested.c
  

Comments

Shameerali Kolothum Thodi Feb. 9, 2023, 10:11 a.m. UTC | #1
> -----Original Message-----
> From: Yi Liu [mailto:yi.l.liu@intel.com]
> Sent: 09 February 2023 04:32
> To: joro@8bytes.org; alex.williamson@redhat.com; jgg@nvidia.com;
> kevin.tian@intel.com; robin.murphy@arm.com
> Cc: cohuck@redhat.com; eric.auger@redhat.com; nicolinc@nvidia.com;
> kvm@vger.kernel.org; mjrosato@linux.ibm.com;
> chao.p.peng@linux.intel.com; yi.l.liu@intel.com; yi.y.sun@linux.intel.com;
> peterx@redhat.com; jasowang@redhat.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; lulu@redhat.com;
> suravee.suthikulpanit@amd.com; iommu@lists.linux.dev;
> linux-kernel@vger.kernel.org; linux-kselftest@vger.kernel.org;
> baolu.lu@linux.intel.com
> Subject: [PATCH 00/17] Add Intel VT-d nested translation
> 
> Nested translation has two stage address translations to get the final
> physical addresses. Take Intel VT-d as an example, the first stage translation
> structure is I/O page table. As the below diagram shows, guest I/O page
> table pointer in GPA (guest physical address) is passed to host to do the
> first stage translation. Along with it, guest modifications to present
> mappings in the first stage page should be followed with an iotlb invalidation
> to sync host iotlb.
> 
>     .-------------.  .---------------------------.
>     |   vIOMMU    |  | Guest I/O page table      |
>     |             |  '---------------------------'
>     .----------------/
>     | PASID Entry |--- PASID cache flush --+
>     '-------------'                        |
>     |             |                        V
>     |             |           I/O page table pointer in GPA
>     '-------------'
> Guest
> ------| Shadow |--------------------------|--------
>       v        v                          v
> Host
>     .-------------.  .------------------------.
>     |   pIOMMU    |  |  FS for GIOVA->GPA      |
>     |             |  '------------------------'
>     .----------------/  |
>     | PASID Entry |     V (Nested xlate)
>     '----------------\.----------------------------------.
>     |             |   | SS for GPA->HPA, unmanaged domain|
>     |             |   '----------------------------------'
>     '-------------'
> Where:
>  - FS = First stage page tables
>  - SS = Second stage page tables
> <Intel VT-d Nested translation>
> 
> Different platform vendors have different first stage translation formats,
> so userspace should query the underlying iommu capability before setting
> first stage translation structures to host.[1]
> 
> In iommufd subsystem, I/O page tables would be tracked by hw_pagetable
> objects.
> First stage page table is owned by userspace (guest), while second stage
> page
> table is owned by kernel for security. So First stage page tables are tracked
> by user-managed hw_pagetable, second stage page tables are tracked by
> kernel-
> managed hw_pagetable.
> 
> This series first introduces new iommu op for allocating domains for
> iommufd,
> and op for syncing iotlb for first stage page table modifications, and then
> add the implementation of the new ops in intel-iommu driver. After this
> preparation, adds kernel-managed and user-managed hw_pagetable
> allocation for
> userspace. Last, add self-test for the new ioctls.
> 
> This series is based on "[PATCH 0/6] iommufd: Add iommu capability
> reporting"[1]
> and Nicolin's "[PATCH v2 00/10] Add IO page table replacement support"[2].
> Complete
> code can be found in[3]. Draft Qemu code can be found in[4].
> 
> Basic test done with DSA device on VT-d. Where the guest has a vIOMMU
> built
> with nested translation.

Hi Yi Liu,

Thanks for sending this out. Will go through this one. As I informed before we keep
an internal branch based on your work and rebase few patches to get the ARM
SMMUv3 nesting support. The recent one is based on your "iommufd-v6.2-rc4-nesting"
branch and is here,

https://github.com/hisilicon/kernel-dev/commits/iommufd-v6.2-rc4-nesting-arm

Just wondering any chance the latest "Add SMMUv3 nesting support" series will
be send out soon? Please let me know if you need any help with that.

Thanks,
Shameer
> 
> [1]
> https://lore.kernel.org/linux-iommu/20230209041642.9346-1-yi.l.liu@intel.
> com/
> [2]
> https://lore.kernel.org/linux-iommu/cover.1675802050.git.nicolinc@nvidia.c
> om/
> [3] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting_vtd_v1
> [4] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv3%2Bnesting
> 
> Regards,
> 	Yi Liu
> 
> Lu Baolu (5):
>   iommu: Add new iommu op to create domains owned by userspace
>   iommu: Add nested domain support
>   iommu/vt-d: Extend dmar_domain to support nested domain
>   iommu/vt-d: Add helper to setup pasid nested translation
>   iommu/vt-d: Add nested domain support
> 
> Nicolin Chen (6):
>   iommufd: Add/del hwpt to IOAS at alloc/destroy()
>   iommufd/device: Move IOAS attaching and detaching operations into
>     helpers
>   iommufd/selftest: Add IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE test
> op
>   iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC ioctl
>   iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
>   iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
> 
> Yi Liu (6):
>   iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
>   iommufd: Split iommufd_hw_pagetable_alloc()
>   iommufd: Add kernel-managed hw_pagetable allocation for userspace
>   iommufd: Add infrastructure for user-managed hw_pagetable allocation
>   iommufd: Add user-managed hw_pagetable allocation
>   iommufd/device: Report supported stage-1 page table types
> 
>  drivers/iommu/intel/Makefile                  |   2 +-
>  drivers/iommu/intel/iommu.c                   |  38 ++-
>  drivers/iommu/intel/iommu.h                   |  50 +++-
>  drivers/iommu/intel/nested.c                  | 143 +++++++++
>  drivers/iommu/intel/pasid.c                   | 142 +++++++++
>  drivers/iommu/intel/pasid.h                   |   2 +
>  drivers/iommu/iommufd/device.c                | 117 ++++----
>  drivers/iommu/iommufd/hw_pagetable.c          | 280
> +++++++++++++++++-
>  drivers/iommu/iommufd/iommufd_private.h       |  23 +-
>  drivers/iommu/iommufd/iommufd_test.h          |  35 +++
>  drivers/iommu/iommufd/main.c                  |  11 +
>  drivers/iommu/iommufd/selftest.c              | 149 +++++++++-
>  include/linux/iommu.h                         |  11 +
>  include/uapi/linux/iommufd.h                  | 196 ++++++++++++
>  tools/testing/selftests/iommu/iommufd.c       | 124 +++++++-
>  tools/testing/selftests/iommu/iommufd_utils.h | 106 +++++++
>  16 files changed, 1329 insertions(+), 100 deletions(-)
>  create mode 100644 drivers/iommu/intel/nested.c
> 
> --
> 2.34.1
>
  
Nicolin Chen Feb. 9, 2023, 4:10 p.m. UTC | #2
Hi Shameer,

On Thu, Feb 09, 2023 at 10:11:42AM +0000, Shameerali Kolothum Thodi wrote:

> > This series first introduces new iommu op for allocating domains for
> > iommufd,
> > and op for syncing iotlb for first stage page table modifications, and then
> > add the implementation of the new ops in intel-iommu driver. After this
> > preparation, adds kernel-managed and user-managed hw_pagetable
> > allocation for
> > userspace. Last, add self-test for the new ioctls.
> >
> > This series is based on "[PATCH 0/6] iommufd: Add iommu capability
> > reporting"[1]
> > and Nicolin's "[PATCH v2 00/10] Add IO page table replacement support"[2].
> > Complete
> > code can be found in[3]. Draft Qemu code can be found in[4].
> >
> > Basic test done with DSA device on VT-d. Where the guest has a vIOMMU
> > built
> > with nested translation.

> Thanks for sending this out. Will go through this one. As I informed before we keep
> an internal branch based on your work and rebase few patches to get the ARM
> SMMUv3 nesting support. The recent one is based on your "iommufd-v6.2-rc4-nesting"
> branch and is here,
> 
> https://github.com/hisilicon/kernel-dev/commits/iommufd-v6.2-rc4-nesting-arm
> 
> Just wondering any chance the latest "Add SMMUv3 nesting support" series will
> be send out soon? Please let me know if you need any help with that.

I had an initial discussion with Robin/Jason regarding the SMMUv3
nesting series, and I received quite a few inputs so I'd need to
finish reworking before sending out -- hopefully we can see them
in the maillist in the following weeks.

Thanks
Nic
  
Shameerali Kolothum Thodi Feb. 9, 2023, 4:16 p.m. UTC | #3
> -----Original Message-----
> From: Nicolin Chen [mailto:nicolinc@nvidia.com]
> Sent: 09 February 2023 16:11
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Yi Liu <yi.l.liu@intel.com>; joro@8bytes.org;
> alex.williamson@redhat.com; jgg@nvidia.com; kevin.tian@intel.com;
> robin.murphy@arm.com; cohuck@redhat.com; eric.auger@redhat.com;
> kvm@vger.kernel.org; mjrosato@linux.ibm.com;
> chao.p.peng@linux.intel.com; yi.y.sun@linux.intel.com; peterx@redhat.com;
> jasowang@redhat.com; lulu@redhat.com; suravee.suthikulpanit@amd.com;
> iommu@lists.linux.dev; linux-kernel@vger.kernel.org;
> linux-kselftest@vger.kernel.org; baolu.lu@linux.intel.com;
> zhangfei.gao@linaro.org
> Subject: Re: [PATCH 00/17] Add Intel VT-d nested translation
> 
> Hi Shameer,
> 
> On Thu, Feb 09, 2023 at 10:11:42AM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > > This series first introduces new iommu op for allocating domains for
> > > iommufd,
> > > and op for syncing iotlb for first stage page table modifications, and then
> > > add the implementation of the new ops in intel-iommu driver. After this
> > > preparation, adds kernel-managed and user-managed hw_pagetable
> > > allocation for
> > > userspace. Last, add self-test for the new ioctls.
> > >
> > > This series is based on "[PATCH 0/6] iommufd: Add iommu capability
> > > reporting"[1]
> > > and Nicolin's "[PATCH v2 00/10] Add IO page table replacement
> support"[2].
> > > Complete
> > > code can be found in[3]. Draft Qemu code can be found in[4].
> > >
> > > Basic test done with DSA device on VT-d. Where the guest has a vIOMMU
> > > built
> > > with nested translation.
> 
> > Thanks for sending this out. Will go through this one. As I informed before
> we keep
> > an internal branch based on your work and rebase few patches to get the
> ARM
> > SMMUv3 nesting support. The recent one is based on your
> "iommufd-v6.2-rc4-nesting"
> > branch and is here,
> >
> >
> https://github.com/hisilicon/kernel-dev/commits/iommufd-v6.2-rc4-nesting
> -arm
> >
> > Just wondering any chance the latest "Add SMMUv3 nesting support"
> series will
> > be send out soon? Please let me know if you need any help with that.
> 
> I had an initial discussion with Robin/Jason regarding the SMMUv3
> nesting series, and I received quite a few inputs so I'd need to
> finish reworking before sending out -- hopefully we can see them
> in the maillist in the following weeks.

Thanks for that update. Sure, looking forward to it.

Shameer
  
Nicolin Chen Feb. 17, 2023, 6:20 p.m. UTC | #4
Hi Yi,

On Wed, Feb 08, 2023 at 08:31:36PM -0800, Yi Liu wrote:

> Nicolin Chen (6):
>   iommufd: Add/del hwpt to IOAS at alloc/destroy()
>   iommufd/device: Move IOAS attaching and detaching operations into
>     helpers
>   iommufd/selftest: Add IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE test op
>   iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC ioctl
>   iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
>   iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
> 
> Yi Liu (6):
>   iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
>   iommufd: Split iommufd_hw_pagetable_alloc()
>   iommufd: Add kernel-managed hw_pagetable allocation for userspace
>   iommufd: Add infrastructure for user-managed hw_pagetable allocation
>   iommufd: Add user-managed hw_pagetable allocation

Just want to let you know that Jason made some major changes
in device/hw_pagetable and selftest (mock_domain):
https://github.com/jgunthorpe/linux/commits/iommufd_hwpt

So most of changes above need to be redone. I am now rebasing
the replace and the nesting series, and probably will finish
by early next week.

If you are reworking this series, perhaps can hold for a few
days at these changes.

Thanks
Nic