[v2,1/5] iommufd: Add nesting related data structures for Intel VT-d

Message ID 20230309082207.612346-2-yi.l.liu@intel.com
State New
Headers
Series Add Intel VT-d nested translation |

Commit Message

Yi Liu March 9, 2023, 8:22 a.m. UTC
  Add the following data structures for corresponding ioctls:
               iommu_hwpt_intel_vtd => IOMMU_HWPT_ALLOC
              iommu_hw_info_vtd => IOMMU_DEVICE_GET_HW_INFO
    iommu_hwpt_invalidate_intel_vtd => IOMMU_HWPT_INVALIDATE

Also, add IOMMU_HW_INFO_TYPE_INTEL_VTD and IOMMU_HWPT_TYPE_VTD_S1 to the
header and corresponding type/size arrays.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c |   7 +-
 drivers/iommu/iommufd/main.c         |   5 +
 include/uapi/linux/iommufd.h         | 136 +++++++++++++++++++++++++++
 3 files changed, 147 insertions(+), 1 deletion(-)
  

Comments

Liu, Jingqi March 15, 2023, 1:50 p.m. UTC | #1
On 3/9/2023 4:22 PM, Yi Liu wrote:
> Add the following data structures for corresponding ioctls:
>                 iommu_hwpt_intel_vtd => IOMMU_HWPT_ALLOC
>                iommu_hw_info_vtd => IOMMU_DEVICE_GET_HW_INFO
>      iommu_hwpt_invalidate_intel_vtd => IOMMU_HWPT_INVALIDATE
>
> Also, add IOMMU_HW_INFO_TYPE_INTEL_VTD and IOMMU_HWPT_TYPE_VTD_S1 to the
> header and corresponding type/size arrays.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/iommufd/hw_pagetable.c |   7 +-
>   drivers/iommu/iommufd/main.c         |   5 +
>   include/uapi/linux/iommufd.h         | 136 +++++++++++++++++++++++++++
>   3 files changed, 147 insertions(+), 1 deletion(-)
>
[...]

> +
> +/**
> + * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
> + *                               stage-1 page table info
> + * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
> + * @pgtbl_addr: The base address of the user-managed stage-1 page table.
> + * @pat: Page attribute table data to compute effective memory type
> + * @emt: Extended memory type
> + * @addr_width: The address width of the untranslated addresses that are
> + *              subjected to the user-managed stage-1 page table.
> + * @__reserved: Must be 0
> + *
> + * The Intel VT-d specific data for creating hw_pagetable to represent
> + * the user-managed stage-1 page table that is used in nested translation.
> + *
> + * In nested translation, the stage-1 page table locates in the address
> + * space that defined by the corresponding stage-2 page table. Hence the
> + * stage-1 page table base address value should not be higher than the
> + * maximum untranslated address of stage-2 page table.
> + *
> + * The paging level of the stage-1 page table should be compataible with

s/compataible/compatible

> + * the hardware iommu. Otherwise, the allocation would be failed.
> + */
> +struct iommu_hwpt_intel_vtd {
> +	__u64 flags;
> +	__u64 pgtbl_addr;
> +	__u32 pat;
> +	__u32 emt;
> +	__u32 addr_width;
> +	__u32 __reserved;
>   };
>   

[...]

> +
> +/**
> + * struct iommu_hwpt_invalidate_intel_vtd - Intel VT-d cache invalidation info
> + * @granularity: One of enum iommu_vtd_qi_granularity.
> + * @flags: Combination of enum iommu_hwpt_intel_vtd_invalidate_flags
> + * @__reserved: Must be 0
> + * @addr: The start address of the addresses to be invalidated.
> + * @granule_size: Page/block size of the mapping in bytes. It is used to
> + *                compute the invalidation range togehter with @nb_granules.

s/togehter/together

Thanks,
Jingqi
  
Jason Gunthorpe March 20, 2023, 1:49 p.m. UTC | #2
On Thu, Mar 09, 2023 at 12:22:03AM -0800, Yi Liu wrote:

> +struct iommu_hwpt_invalidate_intel_vtd {
> +	__u8 granularity;
> +	__u8 padding[7];
> +	__u32 flags;
> +	__u32 __reserved;
> +	__u64 addr;
> +	__u64 granule_size;
> +	__u64 nb_granules;
> +};

Is there a reason this has such a weird layout? Put the granularity in
the __reserved slot?

Consider the discussion on ARM if you prefer to use the native HW
command structure instead?

Jason
  
Yi Liu March 21, 2023, 6:06 a.m. UTC | #3
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, March 20, 2023 9:50 PM
> 
> On Thu, Mar 09, 2023 at 12:22:03AM -0800, Yi Liu wrote:
> 
> > +struct iommu_hwpt_invalidate_intel_vtd {
> > +	__u8 granularity;
> > +	__u8 padding[7];
> > +	__u32 flags;
> > +	__u32 __reserved;
> > +	__u64 addr;
> > +	__u64 granule_size;
> > +	__u64 nb_granules;
> > +};
> 
> Is there a reason this has such a weird layout? Put the granularity in
> the __reserved slot?

No special reason. This layout was from the previous merged version.
Will modify it as you suggested.

> Consider the discussion on ARM if you prefer to use the native HW
> command structure instead?

Yes, will think about it. at least granule_size and nb_granules are not
necessary. They was added in the previous abstracted invalidation
uapi structure.

Regards,
Yi Liu
  
Yi Liu March 21, 2023, 6:31 a.m. UTC | #4
> From: Liu, Jingqi <jingqi.liu@intel.com>
> Sent: Wednesday, March 15, 2023 9:51 PM
> 
> 
> On 3/9/2023 4:22 PM, Yi Liu wrote:
> > Add the following data structures for corresponding ioctls:
> >                 iommu_hwpt_intel_vtd => IOMMU_HWPT_ALLOC
> >                iommu_hw_info_vtd => IOMMU_DEVICE_GET_HW_INFO
> >      iommu_hwpt_invalidate_intel_vtd => IOMMU_HWPT_INVALIDATE
> >
> > Also, add IOMMU_HW_INFO_TYPE_INTEL_VTD and
> IOMMU_HWPT_TYPE_VTD_S1 to the
> > header and corresponding type/size arrays.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >   drivers/iommu/iommufd/hw_pagetable.c |   7 +-
> >   drivers/iommu/iommufd/main.c         |   5 +
> >   include/uapi/linux/iommufd.h         | 136
> +++++++++++++++++++++++++++
> >   3 files changed, 147 insertions(+), 1 deletion(-)
> >
> [...]
> 
> > +
> > +/**
> > + * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
> > + *                               stage-1 page table info
> > + * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
> > + * @pgtbl_addr: The base address of the user-managed stage-1 page
> table.
> > + * @pat: Page attribute table data to compute effective memory type
> > + * @emt: Extended memory type
> > + * @addr_width: The address width of the untranslated addresses that
> are
> > + *              subjected to the user-managed stage-1 page table.
> > + * @__reserved: Must be 0
> > + *
> > + * The Intel VT-d specific data for creating hw_pagetable to represent
> > + * the user-managed stage-1 page table that is used in nested translation.
> > + *
> > + * In nested translation, the stage-1 page table locates in the address
> > + * space that defined by the corresponding stage-2 page table. Hence
> the
> > + * stage-1 page table base address value should not be higher than the
> > + * maximum untranslated address of stage-2 page table.
> > + *
> > + * The paging level of the stage-1 page table should be compataible with
> 
> s/compataible/compatible
> 
> > + * the hardware iommu. Otherwise, the allocation would be failed.
> > + */
> > +struct iommu_hwpt_intel_vtd {
> > +	__u64 flags;
> > +	__u64 pgtbl_addr;
> > +	__u32 pat;
> > +	__u32 emt;
> > +	__u32 addr_width;
> > +	__u32 __reserved;
> >   };
> >
> 
> [...]
> 
> > +
> > +/**
> > + * struct iommu_hwpt_invalidate_intel_vtd - Intel VT-d cache
> invalidation info
> > + * @granularity: One of enum iommu_vtd_qi_granularity.
> > + * @flags: Combination of enum
> iommu_hwpt_intel_vtd_invalidate_flags
> > + * @__reserved: Must be 0
> > + * @addr: The start address of the addresses to be invalidated.
> > + * @granule_size: Page/block size of the mapping in bytes. It is used to
> > + *                compute the invalidation range togehter with @nb_granules.
> 
> s/togehter/together
> 

All above received. Thanks.

Regards,
Yi Liu
  

Patch

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 76ff228dfc1f..36e79db8a17d 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -172,6 +172,7 @@  iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
  */
 static const size_t iommufd_hwpt_alloc_data_size[] = {
 	[IOMMU_HWPT_TYPE_DEFAULT] = 0,
+	[IOMMU_HWPT_TYPE_VTD_S1] = sizeof(struct iommu_hwpt_intel_vtd),
 };
 
 /*
@@ -180,6 +181,8 @@  static const size_t iommufd_hwpt_alloc_data_size[] = {
  */
 const u64 iommufd_hwpt_type_bitmaps[] =  {
 	[IOMMU_HW_INFO_TYPE_DEFAULT] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
+	[IOMMU_HW_INFO_TYPE_INTEL_VTD] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT) |
+					 BIT_ULL(IOMMU_HWPT_TYPE_VTD_S1),
 };
 
 /* Return true if type is supported, otherwise false */
@@ -324,7 +327,9 @@  int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
  * size of page table type specific invalidate_info, indexed by
  * enum iommu_hwpt_type.
  */
-static const size_t iommufd_hwpt_invalidate_info_size[] = {};
+static const size_t iommufd_hwpt_invalidate_info_size[] = {
+	[IOMMU_HWPT_TYPE_VTD_S1] = sizeof(struct iommu_hwpt_invalidate_intel_vtd),
+};
 
 int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
 {
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 7ec3ceac01b3..514db4c26927 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -275,6 +275,11 @@  union ucmd_buffer {
 #ifdef CONFIG_IOMMUFD_TEST
 	struct iommu_test_cmd test;
 #endif
+	/*
+	 * data_type specific structure used in the cache invalidation
+	 * path.
+	 */
+	struct iommu_hwpt_invalidate_intel_vtd vtd;
 };
 
 struct iommufd_ioctl_op {
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index e2eff9c56ab3..2a6c326391b2 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -351,9 +351,64 @@  struct iommu_vfio_ioas {
 /**
  * enum iommu_hwpt_type - IOMMU HWPT Type
  * @IOMMU_HWPT_TYPE_DEFAULT: default
+ * @IOMMU_HWPT_TYPE_VTD_S1: Intel VT-d stage-1 page table
  */
 enum iommu_hwpt_type {
 	IOMMU_HWPT_TYPE_DEFAULT,
+	IOMMU_HWPT_TYPE_VTD_S1,
+};
+
+/**
+ * enum iommu_hwpt_intel_vtd_flags - Intel VT-d stage-1 page
+ *				     table entry attributes
+ * @IOMMU_VTD_PGTBL_SRE: Supervisor request
+ * @IOMMU_VTD_PGTBL_EAFE: Extended access enable
+ * @IOMMU_VTD_PGTBL_PCD: Page-level cache disable
+ * @IOMMU_VTD_PGTBL_PWT: Page-level write through
+ * @IOMMU_VTD_PGTBL_EMTE: Extended mem type enable
+ * @IOMMU_VTD_PGTBL_CD: PASID-level cache disable
+ * @IOMMU_VTD_PGTBL_WPE: Write protect enable
+ */
+enum iommu_hwpt_intel_vtd_flags {
+	IOMMU_VTD_PGTBL_SRE = 1 << 0,
+	IOMMU_VTD_PGTBL_EAFE = 1 << 1,
+	IOMMU_VTD_PGTBL_PCD = 1 << 2,
+	IOMMU_VTD_PGTBL_PWT = 1 << 3,
+	IOMMU_VTD_PGTBL_EMTE = 1 << 4,
+	IOMMU_VTD_PGTBL_CD = 1 << 5,
+	IOMMU_VTD_PGTBL_WPE = 1 << 6,
+	IOMMU_VTD_PGTBL_LAST = 1 << 7,
+};
+
+/**
+ * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
+ *                               stage-1 page table info
+ * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
+ * @pgtbl_addr: The base address of the user-managed stage-1 page table.
+ * @pat: Page attribute table data to compute effective memory type
+ * @emt: Extended memory type
+ * @addr_width: The address width of the untranslated addresses that are
+ *              subjected to the user-managed stage-1 page table.
+ * @__reserved: Must be 0
+ *
+ * The Intel VT-d specific data for creating hw_pagetable to represent
+ * the user-managed stage-1 page table that is used in nested translation.
+ *
+ * In nested translation, the stage-1 page table locates in the address
+ * space that defined by the corresponding stage-2 page table. Hence the
+ * stage-1 page table base address value should not be higher than the
+ * maximum untranslated address of stage-2 page table.
+ *
+ * The paging level of the stage-1 page table should be compataible with
+ * the hardware iommu. Otherwise, the allocation would be failed.
+ */
+struct iommu_hwpt_intel_vtd {
+	__u64 flags;
+	__u64 pgtbl_addr;
+	__u32 pat;
+	__u32 emt;
+	__u32 addr_width;
+	__u32 __reserved;
 };
 
 /**
@@ -389,6 +444,8 @@  enum iommu_hwpt_type {
  * +------------------------------+-------------------------------------+-----------+
  * | IOMMU_HWPT_TYPE_DEFAULT      |               N/A                   |    IOAS   |
  * +------------------------------+-------------------------------------+-----------+
+ * | IOMMU_HWPT_TYPE_VTD_S1       |      struct iommu_hwpt_intel_vtd    |    HWPT   |
+ * +------------------------------+-------------------------------------+-----------+
  */
 struct iommu_hwpt_alloc {
 	__u32 size;
@@ -405,9 +462,32 @@  struct iommu_hwpt_alloc {
 
 /**
  * enum iommu_hw_info_type - IOMMU Hardware Info Types
+ * @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type
  */
 enum iommu_hw_info_type {
 	IOMMU_HW_INFO_TYPE_DEFAULT,
+	IOMMU_HW_INFO_TYPE_INTEL_VTD,
+};
+
+/**
+ * struct iommu_hw_info_vtd - Intel VT-d hardware information
+ *
+ * @flags: Must be set to 0
+ * @__reserved: Must be 0
+ *
+ * @cap_reg: Value of Intel VT-d capability register defined in VT-d spec
+ *           section 11.4.2 Capability Register.
+ * @ecap_reg: Value of Intel VT-d capability register defined in VT-d spec
+ *            section 11.4.3 Extended Capability Register.
+ *
+ * User needs to understand the Intel VT-d specification to decode the
+ * register value.
+ */
+struct iommu_hw_info_vtd {
+	__u32 flags;
+	__u32 __reserved;
+	__aligned_u64 cap_reg;
+	__aligned_u64 ecap_reg;
 };
 
 /**
@@ -457,6 +537,60 @@  struct iommu_hw_info {
 };
 #define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO)
 
+/**
+ * enum iommu_vtd_qi_granularity - Intel VT-d specific granularity of
+ *                                 queued invalidation
+ * @IOMMU_VTD_QI_GRAN_DOMAIN: domain-selective invalidation
+ * @IOMMU_VTD_QI_GRAN_ADDR: page-selective invalidation
+ */
+enum iommu_vtd_qi_granularity {
+	IOMMU_VTD_QI_GRAN_DOMAIN,
+	IOMMU_VTD_QI_GRAN_ADDR,
+};
+
+/**
+ * enum iommu_hwpt_intel_vtd_invalidate_flags - Flags for Intel VT-d
+ *                                              stage-1 page table cache
+ *                                              invalidation
+ * @IOMMU_VTD_QI_FLAGS_LEAF: The LEAF flag indicates whether only the
+ *                           leaf PTE caching needs to be invalidated
+ *                           and other paging structure caches can be
+ *                           preserved.
+ */
+enum iommu_hwpt_intel_vtd_invalidate_flags {
+	IOMMU_VTD_QI_FLAGS_LEAF = 1 << 0,
+};
+
+/**
+ * struct iommu_hwpt_invalidate_intel_vtd - Intel VT-d cache invalidation info
+ * @granularity: One of enum iommu_vtd_qi_granularity.
+ * @flags: Combination of enum iommu_hwpt_intel_vtd_invalidate_flags
+ * @__reserved: Must be 0
+ * @addr: The start address of the addresses to be invalidated.
+ * @granule_size: Page/block size of the mapping in bytes. It is used to
+ *                compute the invalidation range togehter with @nb_granules.
+ * @nb_granules: Number of contiguous granules to be invalidated.
+ *
+ * The Intel VT-d specific invalidation data for user-managed stage-1 cache
+ * invalidation under nested translation. Userspace uses this structure to
+ * tell host about the impacted caches after modifying the stage-1 page table.
+ *
+ * @addr, @granule_size and @nb_granules are meaningful when
+ * @granularity==IOMMU_VTD_QI_GRAN_ADDR. Intel VT-d currently only supports
+ * 4kB page size, so @granule_size should be 4KB. @addr should be aligned
+ * with @granule_size * @nb_granules, otherwise invalidation won't take
+ * effect.
+ */
+struct iommu_hwpt_invalidate_intel_vtd {
+	__u8 granularity;
+	__u8 padding[7];
+	__u32 flags;
+	__u32 __reserved;
+	__u64 addr;
+	__u64 granule_size;
+	__u64 nb_granules;
+};
+
 /**
  * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
  * @size: sizeof(struct iommu_hwpt_invalidate)
@@ -473,6 +607,8 @@  struct iommu_hw_info {
  * +==============================+========================================+
  * | @data_type                   |     Data structure in @data_uptr       |
  * +------------------------------+----------------------------------------+
+ * | IOMMU_HWPT_TYPE_VTD_S1       | struct iommu_hwpt_invalidate_intel_vtd |
+ * +------------------------------+----------------------------------------+
  */
 struct iommu_hwpt_invalidate {
 	__u32 size;