[3/6] iommufd: Add IOMMU_DEVICE_GET_INFO

Message ID 20230209041642.9346-4-yi.l.liu@intel.com
State New
Headers
Series iommufd: Add iommu capability reporting |

Commit Message

Yi Liu Feb. 9, 2023, 4:16 a.m. UTC
  Under nested IOMMU translation, userspace owns the stage-1 translation
structure (e.g. the stage-1 page table of Intel VT-d or the context table
of ARM SMMUv3, and etc.). Such stage-1 translation structures are vendor
specific, and needs to be compatiable with the underlying IOMMU hardware.
Hence, userspace should know the IOMMU hardware capability before creating
and configuring the stage-1 translation structure to kernel.

This adds ioctl: IOMMU_DEVICE_GET_INFO to query the IOMMU hardware
capability for a given device. The returned data is vendor specific,
userspace can tell it by the @out_device_type field.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          | 72 +++++++++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 drivers/iommu/iommufd/main.c            |  3 ++
 include/uapi/linux/iommufd.h            | 36 +++++++++++++
 4 files changed, 112 insertions(+)
  

Comments

Tian, Kevin Feb. 10, 2023, 7:55 a.m. UTC | #1
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, February 9, 2023 12:17 PM
> 
> +int iommufd_device_get_info(struct iommufd_ucmd *ucmd)
> +{
> +	struct iommu_device_info *cmd = ucmd->cmd;
> +	struct iommufd_object *dev_obj;
> +	struct device *dev;
> +	const struct iommu_ops *ops;
> +	void *data;
> +	unsigned int length, data_len;
> +	int rc;
> +
> +	if (cmd->flags || cmd->__reserved || !cmd->data_len)
> +		return -EOPNOTSUPP;

do we want !cmd->data_len being a way to check how many bytes are
required in a following call to get the vendor info?

> +
> +	/*
> +	 * Zero the trailing bytes for userspace if the buffer is bigger
> +	 * than the data size kernel actually has.
> +	 */

"Zero the trailing bytes if the user buffer ..."

> +
> +/**
> + * struct iommu_device_info - ioctl(IOMMU_DEVICE_GET_INFO)
> + * @size: sizeof(struct iommu_device_info)
> + * @flags: Must be 0
> + * @dev_id: The device being attached to the IOMMU

iommufd

> + * @data_len: Input the type specific data buffer length in bytes

also is an output field

> + *
> + * Query the hardware iommu capability for given device which has been
> + * bound to iommufd. @data_len is set to be the size of the buffer to
> + * type specific data and the data will be filled. Trailing bytes are

"@data_len is the size of the buffer which captures iommu type specific data"

> + * zeroed if the user buffer is larger than the data kernel has.
> + *
> + * The type specific data would be used to sync capability between the
> + * vIOMMU and the hardware IOMMU, also for the availabillity checking of
> + * iommu hardware features like dirty page tracking in I/O page table.

It's fine to report format information related to stage-1 page table
which userspace manages.

but IMHO this should not be an interface to report which capability is
supported by iommufd. Having hardware supporting dirty bit 
doesn't mean the underlying iommu driver provides necessary support
to iommufd dirty tracking.

We either don't state this way if following what this series does which
simply dumps all raw iommu hw info to userspace, or explicitly require
iommu driver to only report information as required when a feature
is supported by iommufd.

> + *
> + * The @out_device_type will be filled if the ioctl succeeds. It would

s/will be/is/

> + * be used to decode the data filled in the buffer pointed by @data_ptr.

s/would be/is/

> + */
> +struct iommu_device_info {
> +	__u32 size;
> +	__u32 flags;
> +	__u32 dev_id;
> +	__u32 data_len;
> +	__aligned_u64 data_ptr;

moving forward iommu hw cap is not the only information reported
via this interface, e.g. it can be also used to report pasid mode.

from this angle it's better to rename above two fields to be iommu
specific, e.g.:

	__u32 iommu_data_len;
	__aligned_u64 iommu_data_ptr;

> +	__u32 out_device_type;
> +	__u32 __reserved;
> +};
> +#define IOMMU_DEVICE_GET_INFO _IO(IOMMUFD_TYPE,
> IOMMUFD_CMD_DEVICE_GET_INFO)
>  #endif
> --
> 2.34.1
  
Joao Martins Feb. 10, 2023, 11:10 a.m. UTC | #2
On 10/02/2023 07:55, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Thursday, February 9, 2023 12:17 PM
>> + * zeroed if the user buffer is larger than the data kernel has.
>> + *
>> + * The type specific data would be used to sync capability between the
>> + * vIOMMU and the hardware IOMMU, also for the availabillity checking of
>> + * iommu hardware features like dirty page tracking in I/O page table.
> 
> It's fine to report format information related to stage-1 page table
> which userspace manages.
> 
> but IMHO this should not be an interface to report which capability is
> supported by iommufd. Having hardware supporting dirty bit 
> doesn't mean the underlying iommu driver provides necessary support
> to iommufd dirty tracking.
> 

+1

In fact this shouldn't really be a way to check any capability as we are dumping
straight away the IOMMU hardware registers. By dumping raw IOMMU hardware data,
forces the application to understand IOMMU hardware specific formats.  Maybe
that's OK for hw nesting as there's a lot of info you need to know for the
vIOMMU pgtables, pasid and etc so both are tightly coupled. But it is a bit
disconnected from what really the software (IOMMUFD) and driver can use, without
getting onto assumptions.

[Calling it IOMMU_DEVICE_GET_HW_INFO would be bit more obvious if you're not
asking IOMMUFD support]

For capability checking, I think this really should be returning capabilities
that both IOMMU driver supports ... and that IOMMUFD understands and marshalled
on a format of its own defined by IOMMUFD. Maybe using IOMMU_CAP or some other
thing within the kernel (now that's per-device), or even simpler. That's at
least had written initially for the dirty tracking series.
  
Jason Gunthorpe Feb. 10, 2023, 8:58 p.m. UTC | #3
On Fri, Feb 10, 2023 at 11:10:34AM +0000, Joao Martins wrote:
> On 10/02/2023 07:55, Tian, Kevin wrote:
> >> From: Liu, Yi L <yi.l.liu@intel.com>
> >> Sent: Thursday, February 9, 2023 12:17 PM
> >> + * zeroed if the user buffer is larger than the data kernel has.
> >> + *
> >> + * The type specific data would be used to sync capability between the
> >> + * vIOMMU and the hardware IOMMU, also for the availabillity checking of
> >> + * iommu hardware features like dirty page tracking in I/O page table.
> > 
> > It's fine to report format information related to stage-1 page table
> > which userspace manages.
> > 
> > but IMHO this should not be an interface to report which capability is
> > supported by iommufd. Having hardware supporting dirty bit 
> > doesn't mean the underlying iommu driver provides necessary support
> > to iommufd dirty tracking.
> > 
> 
> +1
> 
> In fact this shouldn't really be a way to check any capability as we are dumping
> straight away the IOMMU hardware registers. By dumping raw IOMMU hardware data,
> forces the application to understand IOMMU hardware specific formats.  Maybe
> that's OK for hw nesting as there's a lot of info you need to know for the
> vIOMMU pgtables, pasid and etc so both are tightly coupled. But it is a bit
> disconnected from what really the software (IOMMUFD) and driver can use, without
> getting onto assumptions.
> 
> [Calling it IOMMU_DEVICE_GET_HW_INFO would be bit more obvious if you're not
> asking IOMMUFD support]
> 
> For capability checking, I think this really should be returning capabilities
> that both IOMMU driver supports ... and that IOMMUFD understands and marshalled
> on a format of its own defined by IOMMUFD. Maybe using IOMMU_CAP or some other
> thing within the kernel (now that's per-device), or even simpler. That's at
> least had written initially for the dirty tracking series.

Yes, the design of IOMMU_DEVICE_GET_INFO is already split. The HW
specific structure should only contain things related to operating the
HW specific paths (ie other HW specific structures in other APIs)

The enclosing struct should have general information about operating
the device through the abstract API - like dirty tracking.

But it may be that HW will individually report dirty tracking
capabilties related to special page table types - ie can the S2 page
table do dirty tracking

Jason
  
Jason Gunthorpe Feb. 10, 2023, 8:59 p.m. UTC | #4
On Fri, Feb 10, 2023 at 07:55:42AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, February 9, 2023 12:17 PM
> > 
> > +int iommufd_device_get_info(struct iommufd_ucmd *ucmd)
> > +{
> > +	struct iommu_device_info *cmd = ucmd->cmd;
> > +	struct iommufd_object *dev_obj;
> > +	struct device *dev;
> > +	const struct iommu_ops *ops;
> > +	void *data;
> > +	unsigned int length, data_len;
> > +	int rc;
> > +
> > +	if (cmd->flags || cmd->__reserved || !cmd->data_len)
> > +		return -EOPNOTSUPP;
> 
> do we want !cmd->data_len being a way to check how many bytes are
> required in a following call to get the vendor info?

There is no reason, if the userspace doesn't have a struct large
enough then it also doesn't know what the extra bytes should even be
used for. No reason to read the.

> > +struct iommu_device_info {
> > +	__u32 size;
> > +	__u32 flags;
> > +	__u32 dev_id;
> > +	__u32 data_len;
> > +	__aligned_u64 data_ptr;
> 
> moving forward iommu hw cap is not the only information reported
> via this interface, e.g. it can be also used to report pasid mode.
> 
> from this angle it's better to rename above two fields to be iommu
> specific, e.g.:
> 
> 	__u32 iommu_data_len;
> 	__aligned_u64 iommu_data_ptr;

maybe call it hw info

Jason
  
Tian, Kevin Feb. 13, 2023, 2:04 a.m. UTC | #5
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, February 11, 2023 5:00 AM
> 
> 
> > > +struct iommu_device_info {
> > > +	__u32 size;
> > > +	__u32 flags;
> > > +	__u32 dev_id;
> > > +	__u32 data_len;
> > > +	__aligned_u64 data_ptr;
> >
> > moving forward iommu hw cap is not the only information reported
> > via this interface, e.g. it can be also used to report pasid mode.
> >
> > from this angle it's better to rename above two fields to be iommu
> > specific, e.g.:
> >
> > 	__u32 iommu_data_len;
> > 	__aligned_u64 iommu_data_ptr;
> 
> maybe call it hw info
> 

that is fine given we already have 'struct iommu_device_info'.

probably this cmd should be named as IOMMU_DEVIEC_GET_IOMMU_INFO
to be more accurate.
  

Patch

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 8a9834fc129a..3b64aef24807 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -134,6 +134,78 @@  void iommufd_device_unbind(struct iommufd_device *idev)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
 
+static int iommufd_zero_fill_user(u64 ptr, int bytes)
+{
+	int index = 0;
+
+	for (; index < bytes; index++) {
+		if (put_user(0, (uint8_t __user *)(ptr + index)))
+			return -EFAULT;
+	}
+	return 0;
+}
+
+int iommufd_device_get_info(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_device_info *cmd = ucmd->cmd;
+	struct iommufd_object *dev_obj;
+	struct device *dev;
+	const struct iommu_ops *ops;
+	void *data;
+	unsigned int length, data_len;
+	int rc;
+
+	if (cmd->flags || cmd->__reserved || !cmd->data_len)
+		return -EOPNOTSUPP;
+
+	dev_obj = iommufd_get_object(ucmd->ictx, cmd->dev_id,
+				     IOMMUFD_OBJ_DEVICE);
+	if (IS_ERR(dev_obj))
+		return PTR_ERR(dev_obj);
+
+	dev = container_of(dev_obj, struct iommufd_device, obj)->dev;
+
+	ops = dev_iommu_ops(dev);
+	if (!ops || !ops->hw_info) {
+		rc = -EOPNOTSUPP;
+		goto out_put;
+	}
+
+	data = ops->hw_info(dev, &data_len);
+	if (IS_ERR(data)) {
+		rc = PTR_ERR(data);
+		goto out_put;
+	}
+
+	length = min(cmd->data_len, data_len);
+	if (copy_to_user(u64_to_user_ptr(cmd->data_ptr), data, length)) {
+		rc = -EFAULT;
+		goto out_free_data;
+	}
+
+	/*
+	 * Zero the trailing bytes for userspace if the buffer is bigger
+	 * than the data size kernel actually has.
+	 */
+	if (length < cmd->data_len) {
+		rc = iommufd_zero_fill_user(cmd->data_ptr + length,
+					    cmd->data_len - length);
+		if (rc)
+			goto out_free_data;
+	}
+
+	cmd->out_device_type = ops->driver_type;
+	cmd->data_len = data_len;
+
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+
+out_free_data:
+	kfree(data);
+out_put:
+	iommufd_put_object(dev_obj);
+	return rc;
+}
+
 static int iommufd_device_setup_msi(struct iommufd_device *idev,
 				    struct iommufd_hw_pagetable *hwpt,
 				    phys_addr_t sw_msi_start)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 200c783800ad..4a0a1a7fdae1 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -258,6 +258,7 @@  iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
 
 void iommufd_device_destroy(struct iommufd_object *obj);
+int iommufd_device_get_info(struct iommufd_ucmd *ucmd);
 
 struct iommufd_access {
 	struct iommufd_object obj;
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 3fbe636c3d8a..59aa30ad1090 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -250,6 +250,7 @@  static int iommufd_option(struct iommufd_ucmd *ucmd)
 
 union ucmd_buffer {
 	struct iommu_destroy destroy;
+	struct iommu_device_info info;
 	struct iommu_ioas_alloc alloc;
 	struct iommu_ioas_allow_iovas allow_iovas;
 	struct iommu_ioas_copy ioas_copy;
@@ -281,6 +282,8 @@  struct iommufd_ioctl_op {
 	}
 static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 	IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id),
+	IOCTL_OP(IOMMU_DEVICE_GET_INFO, iommufd_device_get_info, struct iommu_device_info,
+		 __reserved),
 	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
 		 struct iommu_ioas_alloc, out_ioas_id),
 	IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index fda75c8450ee..6cfe102f26f3 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -45,6 +45,7 @@  enum {
 	IOMMUFD_CMD_IOAS_UNMAP,
 	IOMMUFD_CMD_OPTION,
 	IOMMUFD_CMD_VFIO_IOAS,
+	IOMMUFD_CMD_DEVICE_GET_INFO,
 };
 
 /**
@@ -371,4 +372,39 @@  struct iommu_device_info_vtd {
 	__aligned_u64 cap_reg;
 	__aligned_u64 ecap_reg;
 };
+
+/**
+ * struct iommu_device_info - ioctl(IOMMU_DEVICE_GET_INFO)
+ * @size: sizeof(struct iommu_device_info)
+ * @flags: Must be 0
+ * @dev_id: The device being attached to the IOMMU
+ * @data_len: Input the type specific data buffer length in bytes
+ * @data_ptr: Pointer to the type specific structure (e.g.
+ *	      struct iommu_device_info_vtd)
+ * @out_device_type: Output the underlying iommu hardware type, it is
+ *		   one of enum iommu_device_data_type.
+ * @__reserved: Must be 0
+ *
+ * Query the hardware iommu capability for given device which has been
+ * bound to iommufd. @data_len is set to be the size of the buffer to
+ * type specific data and the data will be filled. Trailing bytes are
+ * zeroed if the user buffer is larger than the data kernel has.
+ *
+ * The type specific data would be used to sync capability between the
+ * vIOMMU and the hardware IOMMU, also for the availabillity checking of
+ * iommu hardware features like dirty page tracking in I/O page table.
+ *
+ * The @out_device_type will be filled if the ioctl succeeds. It would
+ * be used to decode the data filled in the buffer pointed by @data_ptr.
+ */
+struct iommu_device_info {
+	__u32 size;
+	__u32 flags;
+	__u32 dev_id;
+	__u32 data_len;
+	__aligned_u64 data_ptr;
+	__u32 out_device_type;
+	__u32 __reserved;
+};
+#define IOMMU_DEVICE_GET_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_INFO)
 #endif