Message ID | 20230306163138.587484-9-fenghua.yu@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1953439wrd; Mon, 6 Mar 2023 09:00:38 -0800 (PST) X-Google-Smtp-Source: AK7set8zImig0xIZV1i/Y4snFsrfdOfR9xP6PG25YXpT/UujHmmU/aEJTPXt8qFDB1Bn0KVjK2dU X-Received: by 2002:a17:90b:1c87:b0:234:d78:9b4c with SMTP id oo7-20020a17090b1c8700b002340d789b4cmr11905270pjb.18.1678122038531; Mon, 06 Mar 2023 09:00:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678122038; cv=none; d=google.com; s=arc-20160816; b=ez8ct2fNdxBsHTdt+SmoLlT8T8Ayy+PXm4hQ1xEXr497PnwgixsETW8TdcnaiNErap FrwHMgAnHsmk2BqRziPy61SSkY5DKyaEVWRi+VxRW4bB6gHWv85XqrQR1TZVvXnG42X7 3GKxOMTVv1uEJ0x3LUsto0iiC/41M1LtsybQXngBP50IHVLbFVKMqxm7GWQmwP4CLMNu LdyfoUylCODWssGCxmMu0ifzZ1PAcMy+/7eqLu04ZzZnf/904YBPMivZg17Def2n0Mrb Nnhd8+GTdNe/xXBfg0efaZEFS/9LEVJkToTDsn4GO2C3Tlbb6cE5NQJaerFpBGD84axN Sf+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=oRrwtq2dONR9JY19CWVJEZAijqDWxMjpAyssm53Ll34=; b=lmKHbRz42mMSD4qoXk9pYWWYTlnRhjyQUHPvNswnjewkh9uyQN8Z5KBO+LKTxaMg8q +gPTnTO3w/anKbCWIzN+v2jQWPSDxTWCZS/kF1hdku4BsxOg2nogkWL3cSxmbaoxdiQQ muwSEnl9QiMLD9yDx7mc/EOu9d2QjOlsyFvXPHWHai7MJvgwEBsX9/TZgfxEgJzRGqJF BS/deLfsA8V/e4CpioKTeAtybkVqD+NdCepDo2NHpqst94mL6Sm8XrmwTDOnm495jLQN 16Vklt+V0SevB6EmJIujaA9I28ODxrA8xXfsNXtEszroPAXPO3E7OQU7rHw1EfrEuDrY sHsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="i5/XJRA/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c18-20020a17090abf1200b0022dae6bb6c8si12146309pjs.30.2023.03.06.09.00.26; Mon, 06 Mar 2023 09:00:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="i5/XJRA/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230444AbjCFQqE (ORCPT <rfc822;toshivichauhan@gmail.com> + 99 others); Mon, 6 Mar 2023 11:46:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230332AbjCFQoW (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 6 Mar 2023 11:44:22 -0500 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7BB834F43; Mon, 6 Mar 2023 08:43:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678121038; x=1709657038; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ktIv+Rh9KBAfkRLGGpvik7AxgJzeQXcUN3sTRinZTIs=; b=i5/XJRA/tL30phmJ4RnJgJoZanPN9SmehbxBYJju/jfwMFe4NZJQ2M4T V/zype3hX/hl2RDaoq8Rq01icD5kOdrNiSbh3daqYBYfMfLslTmtfqSHm 5I97d5fH6c16Dk24gjKxj5BqG7b/Xs1FB2BrMzC5iOpPQq34H7M1CYDjM jEz9sFHlR1d23ys0l77QJdG+CuqTWqp9NqP/9l4jC0cmKjXtIVg/dJKLh VHhq6FM7o84gJhoxMo8ORitvZVXZu7FPCehFWe00X6hwvfS2zVPgTGp3T HVHaRis7MdQ/FdDVH0dmQNNTFnGn4mYt0GDqNVzm45Cu79LshGGqcBCCL w==; X-IronPort-AV: E=McAfee;i="6500,9779,10641"; a="398181162" X-IronPort-AV: E=Sophos;i="5.98,238,1673942400"; d="scan'208";a="398181162" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2023 08:31:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10641"; a="669504527" X-IronPort-AV: E=Sophos;i="5.98,238,1673942400"; d="scan'208";a="669504527" Received: from fyu1.sc.intel.com ([172.25.103.126]) by orsmga007.jf.intel.com with ESMTP; 06 Mar 2023 08:31:57 -0800 From: Fenghua Yu <fenghua.yu@intel.com> To: "Vinod Koul" <vkoul@kernel.org>, "Dave Jiang" <dave.jiang@intel.com> Cc: dmaengine@vger.kernel.org, "linux-kernel" <linux-kernel@vger.kernel.org>, Fenghua Yu <fenghua.yu@intel.com>, Alistair Popple <apopple@nvidia.com>, Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Robin Murphy <robin.murphy@arm.com>, Lorenzo Stoakes <lstoakes@gmail.com>, Christoph Hellwig <hch@infradead.org>, iommu@lists.linux.dev Subject: [PATCH v2 08/16] iommu: define and export iommu_access_remote_vm() Date: Mon, 6 Mar 2023 08:31:30 -0800 Message-Id: <20230306163138.587484-9-fenghua.yu@intel.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230306163138.587484-1-fenghua.yu@intel.com> References: <20230306163138.587484-1-fenghua.yu@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759638494874040588?= X-GMAIL-MSGID: =?utf-8?q?1759638494874040588?= |
Series |
Enable DSA 2.0 Event Log and completion record faulting features
|
|
Commit Message
Fenghua Yu
March 6, 2023, 4:31 p.m. UTC
Define and export iommu_access_remote_vm() to allow IOMMU related
drivers to access user address space by PASID.
The IDXD driver would like to use it to write the user's completion
record that the hardware device is not able to write to due to user
page fault.
Without the API, it's complex for IDXD driver to copy completion record
to a process' fault address for two reasons:
1. access_remote_vm() is not exported and shouldn't be exported for
drivers because drivers may easily cause mm reference issue.
2. user frees fault address pages to trigger fault by IDXD device.
The driver has to call iommu_sva_find(), kthread_use_mm(), re-implement
majority of access_remote_vm() etc to access remote vm.
This IOMMU specific API hides these details and provides a clean interface
for idxd driver and potentially other IOMMU related drivers.
Suggested-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: iommu@lists.linux.dev
---
v2:
- Define and export iommu_access_remote_vm() for IDXD driver to write
completion record to user address space. This change removes
patch 8 and 9 in v1 (Alistair Popple)
drivers/iommu/iommu-sva.c | 35 +++++++++++++++++++++++++++++++++++
include/linux/iommu.h | 9 +++++++++
2 files changed, 44 insertions(+)
Comments
On 3/7/23 12:31 AM, Fenghua Yu wrote: > Define and export iommu_access_remote_vm() to allow IOMMU related > drivers to access user address space by PASID. > > The IDXD driver would like to use it to write the user's completion > record that the hardware device is not able to write to due to user > page fault. I don't quite follow here. Isn't I/O page fault already supported? Best regards, baolu
Hi Fenghua, On Mon, Mar 06, 2023 at 08:31:30AM -0800, Fenghua Yu wrote: > Define and export iommu_access_remote_vm() to allow IOMMU related > drivers to access user address space by PASID. > > The IDXD driver would like to use it to write the user's completion > record that the hardware device is not able to write to due to user > page fault. > > Without the API, it's complex for IDXD driver to copy completion record > to a process' fault address for two reasons: > 1. access_remote_vm() is not exported and shouldn't be exported for > drivers because drivers may easily cause mm reference issue. > 2. user frees fault address pages to trigger fault by IDXD device. > > The driver has to call iommu_sva_find(), kthread_use_mm(), re-implement > majority of access_remote_vm() etc to access remote vm. > > This IOMMU specific API hides these details and provides a clean interface > for idxd driver and potentially other IOMMU related drivers. > > Suggested-by: Alistair Popple <apopple@nvidia.com> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> > Cc: Joerg Roedel <joro@8bytes.org> > Cc: Will Deacon <will@kernel.org> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Alistair Popple <apopple@nvidia.com> > Cc: Lorenzo Stoakes <lstoakes@gmail.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: iommu@lists.linux.dev > --- > v2: > - Define and export iommu_access_remote_vm() for IDXD driver to write > completion record to user address space. This change removes > patch 8 and 9 in v1 (Alistair Popple) > > drivers/iommu/iommu-sva.c | 35 +++++++++++++++++++++++++++++++++++ > include/linux/iommu.h | 9 +++++++++ > 2 files changed, 44 insertions(+) > > diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c > index 24bf9b2b58aa..1d7a0aee58f7 100644 > --- a/drivers/iommu/iommu-sva.c > +++ b/drivers/iommu/iommu-sva.c > @@ -71,6 +71,41 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid) > } > EXPORT_SYMBOL_GPL(iommu_sva_find); > > +/** > + * iommu_access_remote_vm - access another process' address space by PASID > + * @pasid: Process Address Space ID assigned to the mm > + * @addr: start address to access > + * @buf: source or destination buffer > + * @len: number of bytes to transfer > + * @gup_flags: flags modifying lookup behaviour > + * > + * Another process' address space is found by PASID. A reference on @mm > + * is taken and released inside the function. > + * > + * Return: number of bytes copied from source to destination. > + */ > +int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, void *buf, > + int len, unsigned int gup_flags) > +{ > + struct mm_struct *mm; > + int copied; > + > + mm = iommu_sva_find(pasid); The ability to find a mm by PASID is being removed, see https://lore.kernel.org/linux-iommu/20230301235646.2692846-4-jacob.jun.pan@linux.intel.com/ Thanks, Jean > + if (IS_ERR_OR_NULL(mm)) > + return 0; > + > + /* > + * A reference on @mm has been held by mmget_not_zero() > + * during iommu_sva_find(). > + */ > + copied = access_remote_vm(mm, addr, buf, len, gup_flags); > + /* The reference is released. */ > + mmput(mm); > + > + return copied; > +} > +EXPORT_SYMBOL_GPL(iommu_access_remote_vm); > + > /** > * iommu_sva_bind_device() - Bind a process address space to a device > * @dev: the device > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 6595454d4f48..414a46a53799 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -1177,6 +1177,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, > struct mm_struct *mm); > void iommu_sva_unbind_device(struct iommu_sva *handle); > u32 iommu_sva_get_pasid(struct iommu_sva *handle); > +int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, void *buf, > + int len, unsigned int gup_flags); > #else > static inline struct iommu_sva * > iommu_sva_bind_device(struct device *dev, struct mm_struct *mm) > @@ -1192,6 +1194,13 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle) > { > return IOMMU_PASID_INVALID; > } > + > +static inline int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, > + void *buf, int len, > + unsigned int gup_flags) > +{ > + return 0; > +} > #endif /* CONFIG_IOMMU_SVA */ > > #endif /* __LINUX_IOMMU_H */ > -- > 2.37.1 > >
Hi, Jean, On 3/7/23 00:40, Jean-Philippe Brucker wrote: > Hi Fenghua, > > On Mon, Mar 06, 2023 at 08:31:30AM -0800, Fenghua Yu wrote: >> Define and export iommu_access_remote_vm() to allow IOMMU related >> drivers to access user address space by PASID. >> >> The IDXD driver would like to use it to write the user's completion >> record that the hardware device is not able to write to due to user >> page fault. >> >> Without the API, it's complex for IDXD driver to copy completion record >> to a process' fault address for two reasons: >> 1. access_remote_vm() is not exported and shouldn't be exported for >> drivers because drivers may easily cause mm reference issue. >> 2. user frees fault address pages to trigger fault by IDXD device. >> >> The driver has to call iommu_sva_find(), kthread_use_mm(), re-implement >> majority of access_remote_vm() etc to access remote vm. >> >> This IOMMU specific API hides these details and provides a clean interface >> for idxd driver and potentially other IOMMU related drivers. >> >> Suggested-by: Alistair Popple <apopple@nvidia.com> >> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> >> Cc: Joerg Roedel <joro@8bytes.org> >> Cc: Will Deacon <will@kernel.org> >> Cc: Robin Murphy <robin.murphy@arm.com> >> Cc: Alistair Popple <apopple@nvidia.com> >> Cc: Lorenzo Stoakes <lstoakes@gmail.com> >> Cc: Christoph Hellwig <hch@infradead.org> >> Cc: iommu@lists.linux.dev >> --- >> v2: >> - Define and export iommu_access_remote_vm() for IDXD driver to write >> completion record to user address space. This change removes >> patch 8 and 9 in v1 (Alistair Popple) >> >> drivers/iommu/iommu-sva.c | 35 +++++++++++++++++++++++++++++++++++ >> include/linux/iommu.h | 9 +++++++++ >> 2 files changed, 44 insertions(+) >> >> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c >> index 24bf9b2b58aa..1d7a0aee58f7 100644 >> --- a/drivers/iommu/iommu-sva.c >> +++ b/drivers/iommu/iommu-sva.c >> @@ -71,6 +71,41 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid) >> } >> EXPORT_SYMBOL_GPL(iommu_sva_find); >> >> +/** >> + * iommu_access_remote_vm - access another process' address space by PASID >> + * @pasid: Process Address Space ID assigned to the mm >> + * @addr: start address to access >> + * @buf: source or destination buffer >> + * @len: number of bytes to transfer >> + * @gup_flags: flags modifying lookup behaviour >> + * >> + * Another process' address space is found by PASID. A reference on @mm >> + * is taken and released inside the function. >> + * >> + * Return: number of bytes copied from source to destination. >> + */ >> +int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, void *buf, >> + int len, unsigned int gup_flags) >> +{ >> + struct mm_struct *mm; >> + int copied; >> + >> + mm = iommu_sva_find(pasid); > > The ability to find a mm by PASID is being removed, see > https://lore.kernel.org/linux-iommu/20230301235646.2692846-4-jacob.jun.pan@linux.intel.com/ > Thank you very much for pointing out this. I talked to Jacob just now. He will keep iommu_sva_find() function in his next version because this patch is still using the function. He agrees that I can still call iommu_sva_find() in this patch. -Fenghua
Hi, Baolu, On 3/6/23 17:41, Baolu Lu wrote: > On 3/7/23 12:31 AM, Fenghua Yu wrote: >> Define and export iommu_access_remote_vm() to allow IOMMU related >> drivers to access user address space by PASID. >> >> The IDXD driver would like to use it to write the user's completion >> record that the hardware device is not able to write to due to user >> page fault. > > I don't quite follow here. Isn't I/O page fault already supported? The following patch 9 in this series explains in details why IDXD device cannot use page fault to write to user memory: https://lore.kernel.org/dmaengine/20230306163138.587484-10-fenghua.yu@intel.com/ "DSA supports page fault handling through PRS. However, the DMA engine that's processing the descriptor is blocked until the PRS response is received. Other workqueues sharing the engine are also blocked. Page fault handing by the driver with PRS disabled can be used to mitigate the stalling. With PRS disabled while ATS remain enabled, DSA handles page faults on a completion record by reporting an event in the event log. In this instance, the descriptor is completed and the event log contains the completion record address and the contents of the completion record." That's why IDXD driver needs this IOMMU's helper iommu_access_remote_vm() to copy the completion record from event log buffer to user space. Thanks. -Fenghua
On 3/8/23 1:55 AM, Fenghua Yu wrote: > Hi, Baolu, > > On 3/6/23 17:41, Baolu Lu wrote: >> On 3/7/23 12:31 AM, Fenghua Yu wrote: >>> Define and export iommu_access_remote_vm() to allow IOMMU related >>> drivers to access user address space by PASID. >>> >>> The IDXD driver would like to use it to write the user's completion >>> record that the hardware device is not able to write to due to user >>> page fault. >> >> I don't quite follow here. Isn't I/O page fault already supported? > > The following patch 9 in this series explains in details why IDXD device > cannot use page fault to write to user memory: > https://lore.kernel.org/dmaengine/20230306163138.587484-10-fenghua.yu@intel.com/ > > "DSA supports page fault handling through PRS. However, the DMA engine > that's processing the descriptor is blocked until the PRS response is > received. Other workqueues sharing the engine are also blocked. > Page fault handing by the driver with PRS disabled can be used to > mitigate the stalling. Ah! Get your point now. Thanks for the explanation. > > With PRS disabled while ATS remain enabled, DSA handles page faults on > a completion record by reporting an event in the event log. In this > instance, the descriptor is completed and the event log contains the > completion record address and the contents of the completion record." > > That's why IDXD driver needs this IOMMU's helper > iommu_access_remote_vm() to copy the completion record from event log > buffer to user space. > > Thanks. > > -Fenghua Best regards, baolu
Hi, Jean and Jacob, On 3/7/23 08:33, Fenghua Yu wrote: > Hi, Jean, > > On 3/7/23 00:40, Jean-Philippe Brucker wrote: >> Hi Fenghua, >> >> On Mon, Mar 06, 2023 at 08:31:30AM -0800, Fenghua Yu wrote: >>> Define and export iommu_access_remote_vm() to allow IOMMU related >>> drivers to access user address space by PASID. >>> >>> The IDXD driver would like to use it to write the user's completion >>> record that the hardware device is not able to write to due to user >>> page fault. >>> >>> Without the API, it's complex for IDXD driver to copy completion record >>> to a process' fault address for two reasons: >>> 1. access_remote_vm() is not exported and shouldn't be exported for >>> drivers because drivers may easily cause mm reference issue. >>> 2. user frees fault address pages to trigger fault by IDXD device. >>> >>> The driver has to call iommu_sva_find(), kthread_use_mm(), re-implement >>> majority of access_remote_vm() etc to access remote vm. >>> >>> This IOMMU specific API hides these details and provides a clean >>> interface >>> for idxd driver and potentially other IOMMU related drivers. >>> >>> Suggested-by: Alistair Popple <apopple@nvidia.com> >>> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> >>> Cc: Joerg Roedel <joro@8bytes.org> >>> Cc: Will Deacon <will@kernel.org> >>> Cc: Robin Murphy <robin.murphy@arm.com> >>> Cc: Alistair Popple <apopple@nvidia.com> >>> Cc: Lorenzo Stoakes <lstoakes@gmail.com> >>> Cc: Christoph Hellwig <hch@infradead.org> >>> Cc: iommu@lists.linux.dev >>> --- >>> v2: >>> - Define and export iommu_access_remote_vm() for IDXD driver to write >>> completion record to user address space. This change removes >>> patch 8 and 9 in v1 (Alistair Popple) >>> >>> drivers/iommu/iommu-sva.c | 35 +++++++++++++++++++++++++++++++++++ >>> include/linux/iommu.h | 9 +++++++++ >>> 2 files changed, 44 insertions(+) >>> >>> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c >>> index 24bf9b2b58aa..1d7a0aee58f7 100644 >>> --- a/drivers/iommu/iommu-sva.c >>> +++ b/drivers/iommu/iommu-sva.c >>> @@ -71,6 +71,41 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid) >>> } >>> EXPORT_SYMBOL_GPL(iommu_sva_find); >>> +/** >>> + * iommu_access_remote_vm - access another process' address space by >>> PASID >>> + * @pasid: Process Address Space ID assigned to the mm >>> + * @addr: start address to access >>> + * @buf: source or destination buffer >>> + * @len: number of bytes to transfer >>> + * @gup_flags: flags modifying lookup behaviour >>> + * >>> + * Another process' address space is found by PASID. A reference on @mm >>> + * is taken and released inside the function. >>> + * >>> + * Return: number of bytes copied from source to destination. >>> + */ >>> +int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, void >>> *buf, >>> + int len, unsigned int gup_flags) >>> +{ >>> + struct mm_struct *mm; >>> + int copied; >>> + >>> + mm = iommu_sva_find(pasid); >> >> The ability to find a mm by PASID is being removed, see >> https://lore.kernel.org/linux-iommu/20230301235646.2692846-4-jacob.jun.pan@linux.intel.com/ >> >> > > Thank you very much for pointing out this. > > I talked to Jacob just now. He will keep iommu_sva_find() function > in his next version because this patch is still using the function. He > agrees that I can still call iommu_sva_find() in this patch. Further comment from Jason confirms that iommu_sva_find() will be removed (https://lore.kernel.org/lkml/ZAjSsm4%2FPDRqViwa@nvidia.com/). So cannot call iommu_sva_find() any more. Will maintain mm and find mm from PASID inside IDXD driver. And will implement accessing the remote mm inside IDXD driver although the implementation will have duplicate code as access_remote_vm(). Next version will only change IDXD driver code. There won't be IOMMU code change. Thanks. -Fenghua
On Tue, Mar 07, 2023 at 09:55:28AM -0800, Fenghua Yu wrote: > > > > I don't quite follow here. Isn't I/O page fault already supported? > > The following patch 9 in this series explains in details why IDXD device > cannot use page fault to write to user memory: https://lore.kernel.org/dmaengine/20230306163138.587484-10-fenghua.yu@intel.com/ > > "DSA supports page fault handling through PRS. However, the DMA engine > that's processing the descriptor is blocked until the PRS response is > received. Other workqueues sharing the engine are also blocked. > Page fault handing by the driver with PRS disabled can be used to > mitigate the stalling. > > With PRS disabled while ATS remain enabled, DSA handles page faults on > a completion record by reporting an event in the event log. In this > instance, the descriptor is completed and the event log contains the > completion record address and the contents of the completion record." This seems like a completely broken I/O model, and I don't think Linux should support this model when it requires operations like access_remote_vm.
Hi, Christoph, On 3/20/23 06:35, Christoph Hellwig wrote: > On Tue, Mar 07, 2023 at 09:55:28AM -0800, Fenghua Yu wrote: >>> >>> I don't quite follow here. Isn't I/O page fault already supported? >> >> The following patch 9 in this series explains in details why IDXD device >> cannot use page fault to write to user memory: https://lore.kernel.org/dmaengine/20230306163138.587484-10-fenghua.yu@intel.com/ >> >> "DSA supports page fault handling through PRS. However, the DMA engine >> that's processing the descriptor is blocked until the PRS response is >> received. Other workqueues sharing the engine are also blocked. >> Page fault handing by the driver with PRS disabled can be used to >> mitigate the stalling. >> >> With PRS disabled while ATS remain enabled, DSA handles page faults on >> a completion record by reporting an event in the event log. In this >> instance, the descriptor is completed and the event log contains the >> completion record address and the contents of the completion record." > > This seems like a completely broken I/O model, and I don't think Linux > should support this model when it requires operations like > access_remote_vm. This patch set have two parts: 1. Basic event log support and PRS disabling knob. 2. Completion record page fault fixup part. The current patch is the major patch in this part. It tries to fix up completion record page fault from user space. Since the fix up in part 2 is debatable and part 1 can be sent out independently, how about we send out the parts separately? In the new part 1, simply warn on completion record page fault and don't try to fix it up. Usually a process executes ENQCMD instruction and then keeps polling completion record periodically. So the completion record will be likely backed by page and won't generate page fault. If page fault is really an issue, sysadmin can enable PRS (which is enabled by default anyway) and let PRS to handle page fault. Then in the next step, we will send out a new part 2 to eliminate completion record page fault. One proposal is to mmap() kernel allocated completion record area. So there won't be completion record page fault and fix up(no access_remote_vm() of course). Is this OK for you? Thank you very much for your comment! -Fenghua
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 24bf9b2b58aa..1d7a0aee58f7 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -71,6 +71,41 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid) } EXPORT_SYMBOL_GPL(iommu_sva_find); +/** + * iommu_access_remote_vm - access another process' address space by PASID + * @pasid: Process Address Space ID assigned to the mm + * @addr: start address to access + * @buf: source or destination buffer + * @len: number of bytes to transfer + * @gup_flags: flags modifying lookup behaviour + * + * Another process' address space is found by PASID. A reference on @mm + * is taken and released inside the function. + * + * Return: number of bytes copied from source to destination. + */ +int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, void *buf, + int len, unsigned int gup_flags) +{ + struct mm_struct *mm; + int copied; + + mm = iommu_sva_find(pasid); + if (IS_ERR_OR_NULL(mm)) + return 0; + + /* + * A reference on @mm has been held by mmget_not_zero() + * during iommu_sva_find(). + */ + copied = access_remote_vm(mm, addr, buf, len, gup_flags); + /* The reference is released. */ + mmput(mm); + + return copied; +} +EXPORT_SYMBOL_GPL(iommu_access_remote_vm); + /** * iommu_sva_bind_device() - Bind a process address space to a device * @dev: the device diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 6595454d4f48..414a46a53799 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -1177,6 +1177,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm); void iommu_sva_unbind_device(struct iommu_sva *handle); u32 iommu_sva_get_pasid(struct iommu_sva *handle); +int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, void *buf, + int len, unsigned int gup_flags); #else static inline struct iommu_sva * iommu_sva_bind_device(struct device *dev, struct mm_struct *mm) @@ -1192,6 +1194,13 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle) { return IOMMU_PASID_INVALID; } + +static inline int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, + void *buf, int len, + unsigned int gup_flags) +{ + return 0; +} #endif /* CONFIG_IOMMU_SVA */ #endif /* __LINUX_IOMMU_H */