Message ID | 20230309080910.607396-2-yi.l.liu@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp165093wrd; Thu, 9 Mar 2023 00:17:38 -0800 (PST) X-Google-Smtp-Source: AK7set+JR5wf+6GCbpNdkaBkHK3dbZ6wjNqO3HS/Xr5cWc6ATmGnep6bQ9qJDap7HOwyFNYDpWNJ X-Received: by 2002:a17:902:d381:b0:19a:b033:2bb1 with SMTP id e1-20020a170902d38100b0019ab0332bb1mr18663986pld.44.1678349857919; Thu, 09 Mar 2023 00:17:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678349857; cv=none; d=google.com; s=arc-20160816; b=A+AzrBHCRv97Rpeuf/3Cf80BZc3CbWWo5gd49VjhkctT6c1cfmZ2HIzVaidmhWn+2C Mm221AtWBfck48oUViLjwclBdmIKKNsbOsU70Fv0lDhup5gRruIPpBoN2qKat2g8ErxP yg5ADPmSUD6+Ima6fS5DEZMQpeKyl0qAjAaPpmcxFIbEHIcu2kGIqG7X4NDJdnaF7CEp CL5AatwVrLIEuYdCNtZWBuUnFvvcJ3eaXK3FkNDX5AgIDInsshmFpsUJjs2sNmXCKWqv GDzMGER2gRLfStULXVPPSZFFH60eR1/JiVJOxy0bv/fXPXE+Jy79ATpw68PSwwDOoAP+ N4Eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=LNXdthArayQPSQrt8i0NkPukWlcGM440DpIxKAnMwGA=; b=ov0tiYPuu+c86C2X7a6q6Ypsak1j0Jk60YkD8MjFLZ+OsirFoIXIZuIYzF0TaadEe2 O9lQfv8mVSPzoD090e35RrbkeNYMaskUW6V/JhmKMI5QSCsI0Kt08VOWCXfag3RZTyEu u8JFq6vazEZ6ynDnYgf4ZGhuI7ZgnZMWmNagl1H0/k0HjlIiz/1/VR/xRkoz6IqnwWZT q0gkcr6r5ZliD7496IH+hL01FJJRSJSNsIzWUT1KhAgPMRdWy26iCnHNb1bhhHh1Jp2P Z1ixpKDHZ6oggXiZ5RhyK/51u9SwAH3CPV3iH7ZLxiGsl8JGAK8KRmdYRy7o0jswnncL ap0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SuHuIuvY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kr12-20020a170903080c00b00194a1f665b5si15860534plb.570.2023.03.09.00.17.25; Thu, 09 Mar 2023 00:17:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SuHuIuvY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230102AbjCIIJl (ORCPT <rfc822;carlos.wei.hk@gmail.com> + 99 others); Thu, 9 Mar 2023 03:09:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229549AbjCIIJd (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 9 Mar 2023 03:09:33 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B2D273017; Thu, 9 Mar 2023 00:09:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678349372; x=1709885372; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=U1VZYmCOLesVIRYg1SmVLfX+1m4uRnRI1+5TxmUm/rk=; b=SuHuIuvYtX5XwEeZxLplMCfAohMhhJxWJSTVLKggSmD6vSXLxaHrmh1z 9PZp7/6vGuT3SjwwOyWX/fvsZYOGjA2Lf8NaumVsZIzHnwzbT7i5hqAv+ LgZBU7hw1CfS0lh6Ba/VbEYHBsaNBgWFVVe6AGVz/+FuOtknx5ybjLCME 4NtfY3TPr3te2lHqn4VtzEUkoOQs51ZOILWd76CPaHOCloiA8qfE21PWZ Sp+CqfW4u1RotluzC495G1Kur5w8ZYYLka95uKgdlFcS5jdD5rjIJHRwz CBIIOHLMz/iYzKPiLLyLH7kgQ9MVYS8ayBqe+A3EKtiI6dzsxb1tbD8Qn A==; X-IronPort-AV: E=McAfee;i="6500,9779,10643"; a="364023029" X-IronPort-AV: E=Sophos;i="5.98,245,1673942400"; d="scan'208";a="364023029" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Mar 2023 00:09:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10643"; a="787471348" X-IronPort-AV: E=Sophos;i="5.98,245,1673942400"; d="scan'208";a="787471348" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by fmsmga002.fm.intel.com with ESMTP; 09 Mar 2023 00:09:20 -0800 From: Yi Liu <yi.l.liu@intel.com> To: joro@8bytes.org, alex.williamson@redhat.com, jgg@nvidia.com, kevin.tian@intel.com, robin.murphy@arm.com, baolu.lu@linux.intel.com Cc: cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, shameerali.kolothum.thodi@huawei.com, lulu@redhat.com, suravee.suthikulpanit@amd.com, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH 01/12] iommu: Add new iommu op to create domains owned by userspace Date: Thu, 9 Mar 2023 00:08:59 -0800 Message-Id: <20230309080910.607396-2-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230309080910.607396-1-yi.l.liu@intel.com> References: <20230309080910.607396-1-yi.l.liu@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759877380940118847?= X-GMAIL-MSGID: =?utf-8?q?1759877380940118847?= |
Series |
iommufd: Add nesting infrastructure
|
|
Commit Message
Yi Liu
March 9, 2023, 8:08 a.m. UTC
From: Lu Baolu <baolu.lu@linux.intel.com> Introduce a new iommu_domain op to create domains owned by userspace, e.g. through iommufd. These domains have a few different properties compares to kernel owned domains: - They may be UNMANAGED domains, but created with special parameters. For instance aperture size changes/number of levels, different IOPTE formats, or other things necessary to make a vIOMMU work - We have to track all the memory allocations with GFP_KERNEL_ACCOUNT to make the cgroup sandbox stronger - Device-specialty domains, such as NESTED domains can be created by iommufd. The new op clearly says the domain is being created by IOMMUFD, that the domain is intended for userspace use, and it provides a way to pass a driver specific uAPI structure to customize the created domain to exactly what the vIOMMU userspace driver requires. iommu drivers that cannot support VFIO/IOMMUFD should not support this op. This includes any driver that cannot provide a fully functional UNMANAGED domain. This op chooses to make the special parameters opaque to the core. This suits the current usage model where accessing any of the IOMMU device special parameters does require a userspace driver that matches the kernel driver. If a need for common parameters, implemented similarly by several drivers, arises then there is room in the design to grow a generic parameter set as well. This new op for now is only supposed to be used by iommufd, hence no wrapper for it. iommufd would call the callback directly. As for domain free, iommufd would use iommu_domain_free(). Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> --- include/linux/iommu.h | 4 ++++ 1 file changed, 4 insertions(+)
Comments
On Thu, Mar 09, 2023 at 12:08:59AM -0800, Yi Liu wrote: > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 3ef84ee359d2..a269bc62a31c 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -229,6 +229,7 @@ struct iommu_iotlb_gather { > * after use. Return the data buffer if success, or ERR_PTR on > * failure. > * @domain_alloc: allocate iommu domain > + * @domain_alloc_user: allocate user iommu domain > * @probe_device: Add device to iommu driver handling > * @release_device: Remove device from iommu driver handling > * @probe_finalize: Do final setup work after the device is added to an IOMMU > @@ -266,6 +267,9 @@ struct iommu_ops { > > /* Domain allocation and freeing by the iommu driver */ > struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); > + struct iommu_domain *(*domain_alloc_user)(struct device *dev, > + struct iommu_domain *parent, > + const void *user_data); Since the kernel does the copy from user and manages the zero fill compat maybe this user_data have a union like Robin suggested. But yes, this is the idea. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Friday, March 10, 2023 8:56 AM > > On Thu, Mar 09, 2023 at 12:08:59AM -0800, Yi Liu wrote: > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > > index 3ef84ee359d2..a269bc62a31c 100644 > > --- a/include/linux/iommu.h > > +++ b/include/linux/iommu.h > > @@ -229,6 +229,7 @@ struct iommu_iotlb_gather { > > * after use. Return the data buffer if success, or ERR_PTR on > > * failure. > > * @domain_alloc: allocate iommu domain > > + * @domain_alloc_user: allocate user iommu domain > > * @probe_device: Add device to iommu driver handling > > * @release_device: Remove device from iommu driver handling > > * @probe_finalize: Do final setup work after the device is added to an > IOMMU > > @@ -266,6 +267,9 @@ struct iommu_ops { > > > > /* Domain allocation and freeing by the iommu driver */ > > struct iommu_domain *(*domain_alloc)(unsigned > iommu_domain_type); > > + struct iommu_domain *(*domain_alloc_user)(struct device *dev, > > + struct iommu_domain > *parent, > > + const void *user_data); > > Since the kernel does the copy from user and manages the zero fill > compat maybe this user_data have a union like Robin suggested. > > But yes, this is the idea. Ok. so it's a union like the below, and in this patch may be only an empty union can be added as the struct iommu_hwpt_intel_vtd and struct iommu_hwpt_arm_smmuv3 would be added by the vendor nesting patch series. union iommu_hwpt_alloc_user_data { struct iommu_hwpt_intel_vtd vtd; struct iommu_hwpt_arm_smmuv3 smmuv3; }; Regards, Yi Liu
Hi Jason, On Thu, Mar 09, 2023 at 08:56:06PM -0400, Jason Gunthorpe wrote: > On Thu, Mar 09, 2023 at 12:08:59AM -0800, Yi Liu wrote: > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > > index 3ef84ee359d2..a269bc62a31c 100644 > > --- a/include/linux/iommu.h > > +++ b/include/linux/iommu.h > > @@ -229,6 +229,7 @@ struct iommu_iotlb_gather { > > * after use. Return the data buffer if success, or ERR_PTR on > > * failure. > > * @domain_alloc: allocate iommu domain > > + * @domain_alloc_user: allocate user iommu domain > > * @probe_device: Add device to iommu driver handling > > * @release_device: Remove device from iommu driver handling > > * @probe_finalize: Do final setup work after the device is added to an IOMMU > > @@ -266,6 +267,9 @@ struct iommu_ops { > > > > /* Domain allocation and freeing by the iommu driver */ > > struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); > > + struct iommu_domain *(*domain_alloc_user)(struct device *dev, > > + struct iommu_domain *parent, > > + const void *user_data); > > Since the kernel does the copy from user and manages the zero fill > compat maybe this user_data have a union like Robin suggested. > > But yes, this is the idea. > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> We pass in a read-only data to this ->domain_alloc_user() while it also returns NULL on failure, matching ->domain_alloc(). So, there seems to be no error feedback pathway from the driver to user space. Robin remarked in the SMMU series that an STE configuration can fail. So, a proper error feedback is required for this callback too. To return a driver/HW specific error, I think we could define a "u8 out_error" in the user_data structure. So, we probably need a non-const pass-in here. What do you think? Thanks Nic
On Wed, Apr 12, 2023 at 05:44:04PM -0700, Nicolin Chen wrote: > Hi Jason, > > On Thu, Mar 09, 2023 at 08:56:06PM -0400, Jason Gunthorpe wrote: > > On Thu, Mar 09, 2023 at 12:08:59AM -0800, Yi Liu wrote: > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > > > index 3ef84ee359d2..a269bc62a31c 100644 > > > --- a/include/linux/iommu.h > > > +++ b/include/linux/iommu.h > > > @@ -229,6 +229,7 @@ struct iommu_iotlb_gather { > > > * after use. Return the data buffer if success, or ERR_PTR on > > > * failure. > > > * @domain_alloc: allocate iommu domain > > > + * @domain_alloc_user: allocate user iommu domain > > > * @probe_device: Add device to iommu driver handling > > > * @release_device: Remove device from iommu driver handling > > > * @probe_finalize: Do final setup work after the device is added to an IOMMU > > > @@ -266,6 +267,9 @@ struct iommu_ops { > > > > > > /* Domain allocation and freeing by the iommu driver */ > > > struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); > > > + struct iommu_domain *(*domain_alloc_user)(struct device *dev, > > > + struct iommu_domain *parent, > > > + const void *user_data); > > > > Since the kernel does the copy from user and manages the zero fill > > compat maybe this user_data have a union like Robin suggested. > > > > But yes, this is the idea. > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> > > We pass in a read-only data to this ->domain_alloc_user() while > it also returns NULL on failure, matching ->domain_alloc(). So, > there seems to be no error feedback pathway from the driver to > user space. > > Robin remarked in the SMMU series that an STE configuration can > fail. So, a proper error feedback is required for this callback > too. > > To return a driver/HW specific error, I think we could define a > "u8 out_error" in the user_data structure. So, we probably need > a non-const pass-in here. What do you think? What is wrong with err_ptr? Jason
On Thu, Apr 13, 2023 at 08:37:14AM -0300, Jason Gunthorpe wrote: > > > > @@ -266,6 +267,9 @@ struct iommu_ops { > > > > > > > > /* Domain allocation and freeing by the iommu driver */ > > > > struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); > > > > + struct iommu_domain *(*domain_alloc_user)(struct device *dev, > > > > + struct iommu_domain *parent, > > > > + const void *user_data); > > > > > > Since the kernel does the copy from user and manages the zero fill > > > compat maybe this user_data have a union like Robin suggested. > > > > > > But yes, this is the idea. > > > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> > > > > We pass in a read-only data to this ->domain_alloc_user() while > > it also returns NULL on failure, matching ->domain_alloc(). So, > > there seems to be no error feedback pathway from the driver to > > user space. > > > > Robin remarked in the SMMU series that an STE configuration can > > fail. So, a proper error feedback is required for this callback > > too. > > > > To return a driver/HW specific error, I think we could define a > > "u8 out_error" in the user_data structure. So, we probably need > > a non-const pass-in here. What do you think? > > What is wrong with err_ptr? I see. That could keep the "const" then. Will try that. Thanks! Nic
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 3ef84ee359d2..a269bc62a31c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -229,6 +229,7 @@ struct iommu_iotlb_gather { * after use. Return the data buffer if success, or ERR_PTR on * failure. * @domain_alloc: allocate iommu domain + * @domain_alloc_user: allocate user iommu domain * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling * @probe_finalize: Do final setup work after the device is added to an IOMMU @@ -266,6 +267,9 @@ struct iommu_ops { /* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); + struct iommu_domain *(*domain_alloc_user)(struct device *dev, + struct iommu_domain *parent, + const void *user_data); struct iommu_device *(*probe_device)(struct device *dev); void (*release_device)(struct device *dev);