[RFC,v2,0/3] Add set_dev_data and unset_dev_data support

Message ID cover.1681976394.git.nicolinc@nvidia.com
Headers
Series Add set_dev_data and unset_dev_data support |

Message

Nicolin Chen April 20, 2023, 7:47 a.m. UTC
  This is a pair of new uAPI/ops for user space to set an iommu specific
device data for a passthrough device. This is primarily used by SMMUv3
driver for now, to link the vSID and the pSID of a device that's behind
the SMMU. The link (lookup table) will be used to verify any ATC_INV
command from the user space for that device, and then replace the SID
field (virtual SID) with the corresponding physical SID.

This series is available on Github:
https://github.com/nicolinc/iommufd/commits/set_dev_data-rfc-v2

Thanks!
Nicolin

Nicolin Chen (3):
  iommu: Add set/unset_dev_data_user ops
  iommufd: Add iommufd_device_set_data and iommufd_device_unset_data
    APIs
  vfio: Add dev_data_len/uptr in struct vfio_device_bind_iommufd

 drivers/iommu/iommufd/device.c | 65 ++++++++++++++++++++++++++++++++++
 drivers/vfio/device_cdev.c     | 19 ++++++++--
 drivers/vfio/iommufd.c         | 13 +++++++
 include/linux/iommu.h          |  6 ++++
 include/linux/iommufd.h        |  4 +++
 include/linux/vfio.h           |  2 ++
 include/uapi/linux/vfio.h      | 13 +++++++
 7 files changed, 120 insertions(+), 2 deletions(-)
  

Comments

Tian, Kevin April 21, 2023, 7:35 a.m. UTC | #1
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, April 20, 2023 3:48 PM
> 
> This is a pair of new uAPI/ops for user space to set an iommu specific
> device data for a passthrough device. This is primarily used by SMMUv3
> driver for now, to link the vSID and the pSID of a device that's behind
> the SMMU. The link (lookup table) will be used to verify any ATC_INV
> command from the user space for that device, and then replace the SID
> field (virtual SID) with the corresponding physical SID.
> 
> This series is available on Github:
> https://github.com/nicolinc/iommufd/commits/set_dev_data-rfc-v2
> 
> Thanks!
> Nicolin
> 

there is no changelog compared to v1.

Could you add some words why changing from passing the information
in an iommufd ioctl to bind_iommufd? My gut-feeling leans toward
the latter option...
  
Nicolin Chen April 21, 2023, 7:41 a.m. UTC | #2
On Fri, Apr 21, 2023 at 07:35:52AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Thursday, April 20, 2023 3:48 PM
> >
> > This is a pair of new uAPI/ops for user space to set an iommu specific
> > device data for a passthrough device. This is primarily used by SMMUv3
> > driver for now, to link the vSID and the pSID of a device that's behind
> > the SMMU. The link (lookup table) will be used to verify any ATC_INV
> > command from the user space for that device, and then replace the SID
> > field (virtual SID) with the corresponding physical SID.
> >
> > This series is available on Github:
> > https://github.com/nicolinc/iommufd/commits/set_dev_data-rfc-v2
> >
> > Thanks!
> > Nicolin
> >
> 
> there is no changelog compared to v1.

Weird! How could it be missed during copy-n-paste..
I recalled that I had it but seemingly lost it after an update.

It is in the commit message of the cover-letter though:
https://github.com/nicolinc/iommufd/commit/5e17d270bfca2a5e3e7401d4bf58ae53eb7a8a55
--------------------------------------------------------
Changelog
v2:
 * Integrated the uAPI into VFIO_DEVICE_BIND_IOMMUFD call
 * Renamed the previous set_rid_user to set_dev_data, to decouple from
   the PCI regime.
v1:
 https://lore.kernel.org/all/cover.1680762112.git.nicolinc@nvidia.com/
--------------------------------------------------------

> Could you add some words why changing from passing the information
> in an iommufd ioctl to bind_iommufd? My gut-feeling leans toward
> the latter option...

Yea. Jason told me to decouple it from PCI. And merge it into
a general uAPI. So I picked the BIND ioctl.

Thanks
Nic
  
Tian, Kevin April 21, 2023, 7:47 a.m. UTC | #3
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, April 21, 2023 3:42 PM
> 
> On Fri, Apr 21, 2023 at 07:35:52AM +0000, Tian, Kevin wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Thursday, April 20, 2023 3:48 PM
> > >
> > > This is a pair of new uAPI/ops for user space to set an iommu specific
> > > device data for a passthrough device. This is primarily used by SMMUv3
> > > driver for now, to link the vSID and the pSID of a device that's behind
> > > the SMMU. The link (lookup table) will be used to verify any ATC_INV
> > > command from the user space for that device, and then replace the SID
> > > field (virtual SID) with the corresponding physical SID.
> > >
> > > This series is available on Github:
> > > https://github.com/nicolinc/iommufd/commits/set_dev_data-rfc-v2
> > >
> > > Thanks!
> > > Nicolin
> > >
> >
> > there is no changelog compared to v1.
> 
> Weird! How could it be missed during copy-n-paste..
> I recalled that I had it but seemingly lost it after an update.
> 
> It is in the commit message of the cover-letter though:
> https://github.com/nicolinc/iommufd/commit/5e17d270bfca2a5e3e7401d4b
> f58ae53eb7a8a55
> --------------------------------------------------------
> Changelog
> v2:
>  * Integrated the uAPI into VFIO_DEVICE_BIND_IOMMUFD call
>  * Renamed the previous set_rid_user to set_dev_data, to decouple from
>    the PCI regime.
> v1:
>  https://lore.kernel.org/all/cover.1680762112.git.nicolinc@nvidia.com/
> --------------------------------------------------------
> 
> > Could you add some words why changing from passing the information
> > in an iommufd ioctl to bind_iommufd? My gut-feeling leans toward
> > the latter option...
> 
> Yea. Jason told me to decouple it from PCI. And merge it into
> a general uAPI. So I picked the BIND ioctl.
> 

'decouple it from PCI' is kind of covered by renaming set_rid
to set_data. but I didn't get why this has to be merged with another
uAPI. Once iommufd_device is created we could have separate
ioctls to poke its attributes individually. What'd be broken if this
is not done at BIND time?
  
Nicolin Chen April 21, 2023, 7:56 a.m. UTC | #4
On Fri, Apr 21, 2023 at 07:47:13AM +0000, Tian, Kevin wrote:

> > It is in the commit message of the cover-letter though:
> > https://github.com/nicolinc/iommufd/commit/5e17d270bfca2a5e3e7401d4b
> > f58ae53eb7a8a55
> > --------------------------------------------------------
> > Changelog
> > v2:
> >  * Integrated the uAPI into VFIO_DEVICE_BIND_IOMMUFD call
> >  * Renamed the previous set_rid_user to set_dev_data, to decouple from
> >    the PCI regime.
> > v1:
> >  https://lore.kernel.org/all/cover.1680762112.git.nicolinc@nvidia.com/
> > --------------------------------------------------------
> >
> > > Could you add some words why changing from passing the information
> > > in an iommufd ioctl to bind_iommufd? My gut-feeling leans toward
> > > the latter option...
> >
> > Yea. Jason told me to decouple it from PCI. And merge it into
> > a general uAPI. So I picked the BIND ioctl.
> >
> 
> 'decouple it from PCI' is kind of covered by renaming set_rid
> to set_data. but I didn't get why this has to be merged with another
> uAPI. Once iommufd_device is created we could have separate
> ioctls to poke its attributes individually. What'd be broken if this
> is not done at BIND time?

Oh, sorry. He didn't literally told me to merge, but commented
"make sense" at my proposal of reusing BIND. So, I don't think
adding to the BIND is a must here.

The BIND is done in vfio_realize() where the RID (dev_data) is
available also. And the new uAPI in my v1 actually gets called
near the BIND. So, I feel we may just do it once? I am open to
a better idea.

Thanks
Nic
  
Tian, Kevin April 21, 2023, 8:07 a.m. UTC | #5
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, April 21, 2023 3:56 PM
> 
> On Fri, Apr 21, 2023 at 07:47:13AM +0000, Tian, Kevin wrote:
> 
> > > It is in the commit message of the cover-letter though:
> > >
> https://github.com/nicolinc/iommufd/commit/5e17d270bfca2a5e3e7401d4b
> > > f58ae53eb7a8a55
> > > --------------------------------------------------------
> > > Changelog
> > > v2:
> > >  * Integrated the uAPI into VFIO_DEVICE_BIND_IOMMUFD call
> > >  * Renamed the previous set_rid_user to set_dev_data, to decouple from
> > >    the PCI regime.
> > > v1:
> > >  https://lore.kernel.org/all/cover.1680762112.git.nicolinc@nvidia.com/
> > > --------------------------------------------------------
> > >
> > > > Could you add some words why changing from passing the information
> > > > in an iommufd ioctl to bind_iommufd? My gut-feeling leans toward
> > > > the latter option...
> > >
> > > Yea. Jason told me to decouple it from PCI. And merge it into
> > > a general uAPI. So I picked the BIND ioctl.
> > >
> >
> > 'decouple it from PCI' is kind of covered by renaming set_rid
> > to set_data. but I didn't get why this has to be merged with another
> > uAPI. Once iommufd_device is created we could have separate
> > ioctls to poke its attributes individually. What'd be broken if this
> > is not done at BIND time?
> 
> Oh, sorry. He didn't literally told me to merge, but commented
> "make sense" at my proposal of reusing BIND. So, I don't think
> adding to the BIND is a must here.
> 
> The BIND is done in vfio_realize() where the RID (dev_data) is
> available also. And the new uAPI in my v1 actually gets called
> near the BIND. So, I feel we may just do it once? I am open to
> a better idea.
> 

IMHO if this can be done within iommufd then that should be
the choice. vfio doesn't need to know this data at all and doing
so means vdpa or a 3rd driver also needs to implement similar 
logic in their uAPI...
  
Nicolin Chen April 21, 2023, 8:20 a.m. UTC | #6
On Fri, Apr 21, 2023 at 08:07:19AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Friday, April 21, 2023 3:56 PM
> >
> > On Fri, Apr 21, 2023 at 07:47:13AM +0000, Tian, Kevin wrote:
> >
> > > > It is in the commit message of the cover-letter though:
> > > >
> > https://github.com/nicolinc/iommufd/commit/5e17d270bfca2a5e3e7401d4b
> > > > f58ae53eb7a8a55
> > > > --------------------------------------------------------
> > > > Changelog
> > > > v2:
> > > >  * Integrated the uAPI into VFIO_DEVICE_BIND_IOMMUFD call
> > > >  * Renamed the previous set_rid_user to set_dev_data, to decouple from
> > > >    the PCI regime.
> > > > v1:
> > > >  https://lore.kernel.org/all/cover.1680762112.git.nicolinc@nvidia.com/
> > > > --------------------------------------------------------
> > > >
> > > > > Could you add some words why changing from passing the information
> > > > > in an iommufd ioctl to bind_iommufd? My gut-feeling leans toward
> > > > > the latter option...
> > > >
> > > > Yea. Jason told me to decouple it from PCI. And merge it into
> > > > a general uAPI. So I picked the BIND ioctl.
> > > >
> > >
> > > 'decouple it from PCI' is kind of covered by renaming set_rid
> > > to set_data. but I didn't get why this has to be merged with another
> > > uAPI. Once iommufd_device is created we could have separate
> > > ioctls to poke its attributes individually. What'd be broken if this
> > > is not done at BIND time?
> >
> > Oh, sorry. He didn't literally told me to merge, but commented
> > "make sense" at my proposal of reusing BIND. So, I don't think
> > adding to the BIND is a must here.
> >
> > The BIND is done in vfio_realize() where the RID (dev_data) is
> > available also. And the new uAPI in my v1 actually gets called
> > near the BIND. So, I feel we may just do it once? I am open to
> > a better idea.
> >
> 
> IMHO if this can be done within iommufd then that should be
> the choice. vfio doesn't need to know this data at all and doing
> so means vdpa or a 3rd driver also needs to implement similar
> logic in their uAPI...

Reusing the VFIO ioctl is because the device is a VFIO device.
But doing it within iommufd could save us a lot of efforts, as
you said.

So...

+/**
+ * struct iommufd_device_set_data - ioctl(IOMMU_DEVICE_SET_DATA)
+ * @size: sizeof(struct iommufd_device_set_data)
+ * @dev_id: The device to set a device data
+ * @data_uptr: User pointer of the device user data.
+ * @data_len: Length of the device user data.
+ */
+struct iommufd_device_set_data {
+	__u32 size;
+	__u32 dev_id;
+	__aligned_u64   data_uptr;
+	__u32 data_len;
+};
+#define IOMMU_DEVICE_SET_DATA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_SET_DATA)
+
+/**
+ * struct iommufd_device_unset_data - ioctl(IOMMU_DEVICE_UNSET_DATA)
+ * @size: sizeof(struct iommufd_device_unset_data)
+ * @dev_id: The device to unset its device data
+ */
+struct iommufd_device_unset_data {
+	__u32 size;
+	__u32 dev_id;
+};
+#define IOMMU_DEVICE_UNSET_DATA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_UNSET_DATA)

Maybe just like this?

Thanks
Nic
  
Jason Gunthorpe April 21, 2023, 1:09 p.m. UTC | #7
On Fri, Apr 21, 2023 at 01:20:13AM -0700, Nicolin Chen wrote:

> +/**
> + * struct iommufd_device_set_data - ioctl(IOMMU_DEVICE_SET_DATA)
> + * @size: sizeof(struct iommufd_device_set_data)
> + * @dev_id: The device to set a device data
> + * @data_uptr: User pointer of the device user data.
> + * @data_len: Length of the device user data.
> + */
> +struct iommufd_device_set_data {
> +	__u32 size;
> +	__u32 dev_id;
> +	__aligned_u64   data_uptr;
> +	__u32 data_len;
> +};
> +#define IOMMU_DEVICE_SET_DATA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_SET_DATA)
> +
> +/**
> + * struct iommufd_device_unset_data - ioctl(IOMMU_DEVICE_UNSET_DATA)
> + * @size: sizeof(struct iommufd_device_unset_data)
> + * @dev_id: The device to unset its device data
> + */
> +struct iommufd_device_unset_data {
> +	__u32 size;
> +	__u32 dev_id;
> +};
> +#define IOMMU_DEVICE_UNSET_DATA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_UNSET_DATA)
> 
> Maybe just like this?

How would the iommu_ops backing this work?

Jason
  
Nicolin Chen April 21, 2023, 5:37 p.m. UTC | #8
On Fri, Apr 21, 2023 at 10:09:35AM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 21, 2023 at 01:20:13AM -0700, Nicolin Chen wrote:
> 
> > +/**
> > + * struct iommufd_device_set_data - ioctl(IOMMU_DEVICE_SET_DATA)
> > + * @size: sizeof(struct iommufd_device_set_data)
> > + * @dev_id: The device to set a device data
> > + * @data_uptr: User pointer of the device user data.
> > + * @data_len: Length of the device user data.
> > + */
> > +struct iommufd_device_set_data {
> > +	__u32 size;
> > +	__u32 dev_id;
> > +	__aligned_u64   data_uptr;
> > +	__u32 data_len;
> > +};
> > +#define IOMMU_DEVICE_SET_DATA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_SET_DATA)
> > +
> > +/**
> > + * struct iommufd_device_unset_data - ioctl(IOMMU_DEVICE_UNSET_DATA)
> > + * @size: sizeof(struct iommufd_device_unset_data)
> > + * @dev_id: The device to unset its device data
> > + */
> > +struct iommufd_device_unset_data {
> > +	__u32 size;
> > +	__u32 dev_id;
> > +};
> > +#define IOMMU_DEVICE_UNSET_DATA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_UNSET_DATA)
> > 
> > Maybe just like this?
> 
> How would the iommu_ops backing this work?

How about the following piece? Needs a test with QEMU though..

static const size_t iommufd_device_data_size[] = {
	[IOMMU_HW_INFO_TYPE_NONE] = 0,
	[IOMMU_HW_INFO_TYPE_INTEL_VTD] = 0,
	[IOMMU_HW_INFO_TYPE_ARM_SMMUV3] =
		sizeof(struct iommu_device_data_arm_smmuv3),
};

int iommufd_device_set_data(struct iommufd_ucmd *ucmd)
{
	struct iommufd_device_set_data *cmd = ucmd->cmd;
	struct iommufd_device *idev;
	const struct iommu_ops *ops;
	void *data = NULL;
	u32 klen = 0;
	int rc;

	if (!cmd->data_uptr || !cmd->data_len)
		return -EINVAL;

	idev = iommufd_get_device(ucmd, cmd->dev_id);
	if (IS_ERR(idev))
		return PTR_ERR(idev);

	ops = dev_iommu_ops(idev->dev);
	if (!ops || !ops->set_dev_data_user || !ops->unset_dev_data_user ||
	    ops->hw_info_type >= ARRAY_SIZE(iommufd_device_data_size)) {
		rc = -EOPNOTSUPP;
		goto out_put_idev;
	}

	klen = iommufd_device_data_size[ops->hw_info_type];
	if (!klen) {
		rc = -EOPNOTSUPP;
		goto out_put_idev;
	}

	data = kzalloc(klen, GFP_KERNEL);
	if (!data) {
		rc = -ENOMEM;
		goto out_put_idev;
	}

	if (copy_struct_from_user(data, klen, u64_to_user_ptr(cmd->data_uptr),
				  cmd->data_len)) {
		rc = -EFAULT;
		goto out_free_data;
	}

	rc = ops->set_dev_data_user(idev->dev, data);
out_free_data:
	kfree(data);
out_put_idev:
	iommufd_put_object(&idev->obj);
	return rc;
}
  
Jason Gunthorpe April 21, 2023, 5:59 p.m. UTC | #9
On Fri, Apr 21, 2023 at 10:37:22AM -0700, Nicolin Chen wrote:

> How about the following piece? Needs a test with QEMU though..
> 
> static const size_t iommufd_device_data_size[] = {
> 	[IOMMU_HW_INFO_TYPE_NONE] = 0,
> 	[IOMMU_HW_INFO_TYPE_INTEL_VTD] = 0,
> 	[IOMMU_HW_INFO_TYPE_ARM_SMMUV3] =
> 		sizeof(struct iommu_device_data_arm_smmuv3),
> };

If we need more than one of these things we'll need a better
solution..

> 	rc = ops->set_dev_data_user(idev->dev, data);

Where will the iommu driver store the vsid to sid xarray from these
arguments?

Jason
  
Nicolin Chen April 21, 2023, 6:19 p.m. UTC | #10
On Fri, Apr 21, 2023 at 02:59:37PM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 21, 2023 at 10:37:22AM -0700, Nicolin Chen wrote:
> 
> > How about the following piece? Needs a test with QEMU though..
> > 
> > static const size_t iommufd_device_data_size[] = {
> > 	[IOMMU_HW_INFO_TYPE_NONE] = 0,
> > 	[IOMMU_HW_INFO_TYPE_INTEL_VTD] = 0,
> > 	[IOMMU_HW_INFO_TYPE_ARM_SMMUV3] =
> > 		sizeof(struct iommu_device_data_arm_smmuv3),
> > };
> 
> If we need more than one of these things we'll need a better
> solution..

How about adding ops->device_data_size to store the value?

And, since we have a few size arrays in hw_pagetable.c too,
perhaps a new structure in ops packing all these sizes can
clean up a bit things too? For example,

static struct iommu_user_data_size arm_smmu_user_data_size = {
	.device_data_size = sizeof(iommu_device_data_arm_smmuv3),
	.hwpt_alloc_data_size = sizeof(iommu_hwpt_alloc_arm_smmuv3),
	.hwpt_invalidate_data_size = sizeof(iommu_hwpt_invalidate_arm_smmuv3),
}

The hwpt_xxx_data_size might be in form of arrays for multi-
HWPT_TYPE support.

> > 	rc = ops->set_dev_data_user(idev->dev, data);
> 
> Where will the iommu driver store the vsid to sid xarray from these
> arguments?

The ARM structure packs a vsid. For example:

static int arm_smmu_set_data(struct device *dev, const void *user_data)
{
	const struct iommufd_device_data_arm_smmuv3 *data = user_data;
	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
	struct arm_smmu_stream *stream = &master->streams[0];
	struct arm_smmu_device *smmu = master->smmu;
	u32 sid_user = data->sid;
	int ret = 0;

	if (!sid_user)
		return -EINVAL;

	ret = xa_alloc(&smmu->streams_user, &sid_user, stream,
			XA_LIMIT(sid_user, sid_user), GFP_KERNEL_ACCOUNT);
	if (ret)
		return ret;
	stream->id_user = sid_user;
	return 0;
}

Thanks
Nic
  
Nicolin Chen April 23, 2023, 7:44 a.m. UTC | #11
On Fri, Apr 21, 2023 at 11:19:23AM -0700, Nicolin Chen wrote:
> On Fri, Apr 21, 2023 at 02:59:37PM -0300, Jason Gunthorpe wrote:
> > On Fri, Apr 21, 2023 at 10:37:22AM -0700, Nicolin Chen wrote:
> > 
> > > How about the following piece? Needs a test with QEMU though..
> > > 
> > > static const size_t iommufd_device_data_size[] = {
> > > 	[IOMMU_HW_INFO_TYPE_NONE] = 0,
> > > 	[IOMMU_HW_INFO_TYPE_INTEL_VTD] = 0,
> > > 	[IOMMU_HW_INFO_TYPE_ARM_SMMUV3] =
> > > 		sizeof(struct iommu_device_data_arm_smmuv3),
> > > };
> > 
> > If we need more than one of these things we'll need a better
> > solution..
> 
> How about adding ops->device_data_size to store the value?

https://lore.kernel.org/linux-iommu/cover.1682234302.git.nicolinc@nvidia.com/

I sent a v3 that includes this replacing the data_size array.
If it looks good, we can drop the other two data_size arrays
for hwpt in the nesting series too.

Thanks
Nic