Message ID | 20230918-viommu-sync-map-v2-1-f33767f6cf7a@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp2926781vqi; Mon, 18 Sep 2023 13:15:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEOYf6MhUK2JtMEDGnOpdE8IZj5vG+fbbLtSgFNioKAwSkmnO0hCtqqRrr1hZrcFVw4eJ8+ X-Received: by 2002:a05:6a20:9148:b0:154:b4cb:2e61 with SMTP id x8-20020a056a20914800b00154b4cb2e61mr9971049pzc.60.1695068118668; Mon, 18 Sep 2023 13:15:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695068118; cv=none; d=google.com; s=arc-20160816; b=GM1965ZyDCymg2I+Xm9q4FbyxRMRoOwldPMIe98McW5ZHe1Y+lMhPjPQKrfA/Nqpfv 2/zF/37xJpjaj0m9xrpTpGXoVK1XnUIuSSZnyCGELNE3KQtz/waPg/pBigKGtDpOs8nJ w43qYqlon0Zex5YyW/poHtYKaFVMjAH3VxwktTCWpcgq+MwsTPe6wNFuggUOGkpth4qU aG2+QnxSvf5lyUiNJ7Qom63TU3XyufNL6Q0n2REsmbXa/WoelNa2reepxB3YpaLqAjs9 zKKX53/2AAezOmVY/rRBAFhVTN+YRzeWpt8UxrDQNQyEPaRLqK6c8PgvdWo4EkuFTd6+ 8BOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding:cc:to :in-reply-to:references:message-id:subject:date:from:dkim-signature; bh=9jzteoHEIzdqh8tOaro7yj/VFvME4x3Qvpq6QEzYKTU=; fh=Z2RVczOfYjKxGADmNuHiv0N3ipoKjYSBu2F5P0bHIRY=; b=DwNM6su0OPvefjv27x9A3bJE2v3Qbgi4+Ydx/UJc3PaDWSPSFuE3jSqNpzZFk7hC1m WxoVwpuLPcgBG4uRtQRamAMir2LpTpjrefTOr91F8LIIzJwdLYpgPowkMO+hgdI9Evz3 RfUJgx/QgIaj4xfVY1iOBHwP2MHFbmK3hdu0ZnmurbwKlI88F0eua3NCrmyI8IQMBzQC JMyFAyivl9yprhW10IXVyHtSF4Q2ckwPQjFxp+MsXKibqQc1p/Dv+SoCmUGw/0XosH6X lNuknb65pl7gJXAvhPEhhsXNE8J+550rpcLXhAR9GVjGYr4zj4kyZj1npUsQR3XAQMoP /Lmw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=mHakv94T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id 14-20020a63184e000000b005655e87c8aasi8135324pgy.192.2023.09.18.13.15.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Sep 2023 13:15:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=mHakv94T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id A58A981B17A3; Mon, 18 Sep 2023 04:53:14 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241360AbjIRLwg (ORCPT <rfc822;kernel.ruili@gmail.com> + 27 others); Mon, 18 Sep 2023 07:52:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241741AbjIRLwa (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 18 Sep 2023 07:52:30 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECA26E6 for <linux-kernel@vger.kernel.org>; Mon, 18 Sep 2023 04:52:24 -0700 (PDT) Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38IBgMnY028468; Mon, 18 Sep 2023 11:52:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : date : subject : content-type : message-id : references : in-reply-to : to : cc : content-transfer-encoding : mime-version; s=pp1; bh=9jzteoHEIzdqh8tOaro7yj/VFvME4x3Qvpq6QEzYKTU=; b=mHakv94Tlcmr5Enx1U87Z8ow6TLq9bJ9YUlCV7AyrsK3W7hMy+JrWdnp4w1xRwagLvWD K9xJTvxgTrwxBFLnzjQdag8JAuyaf6RHxzetWEJduGZgmiKZSszC1Lcv84P6h5UB6VXY oOUUx/pw6cOYBB6XiUj1T8B50r1iUaJiF+OaTo452AH1SdOmhtIJJy8GSUNv8Eyz0OQg 17HV+/TWpO5abhrw7y8TxqsXs3H+bcJP6IuQ2/FoXed9e56g1Ml0xp8BPNrYfHJ7hPPv qAmCTqi3IP/+p/TEqRlSPWxZ1tRrGIMS99Y1VC6kn0uRuY8vpkYwKz5/L6tQazLpmq23 qQ== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t6nwhr6eb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Sep 2023 11:52:07 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38IA0Q90011683; Mon, 18 Sep 2023 11:52:06 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3t5qpn2bdy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Sep 2023 11:52:06 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38IBq43S7144038 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 18 Sep 2023 11:52:04 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6F02120040; Mon, 18 Sep 2023 11:52:04 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 40F3C2004F; Mon, 18 Sep 2023 11:52:04 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 18 Sep 2023 11:52:04 +0000 (GMT) From: Niklas Schnelle <schnelle@linux.ibm.com> Date: Mon, 18 Sep 2023 13:51:43 +0200 Subject: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map Content-Type: text/plain; charset="utf-8" Message-Id: <20230918-viommu-sync-map-v2-1-f33767f6cf7a@linux.ibm.com> References: <20230918-viommu-sync-map-v2-0-f33767f6cf7a@linux.ibm.com> In-Reply-To: <20230918-viommu-sync-map-v2-0-f33767f6cf7a@linux.ibm.com> To: Jean-Philippe Brucker <jean-philippe@linaro.org>, Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Robin Murphy <robin.murphy@arm.com> Cc: virtualization@lists.linux-foundation.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Niklas Schnelle <schnelle@linux.ibm.com> X-Mailer: b4 0.12.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=2246; i=schnelle@linux.ibm.com; h=from:subject:message-id; bh=9DWWTNpNbWK0dWH80r17ghRqxZ4r9lAHmGntuH/H0Ls=; b=owGbwMvMwCH2Wz534YHOJ2GMp9WSGFI5LK8fmXclesfsUzZMLytLdrtsfi8x2XvxrpT1yYcqf zTIszMqdJSyMIhxMMiKKbIs6nL2W1cwxXRPUH8HzBxWJpAhDFycAjARYyeGf+rHxM2n8in9vRa4 92Pevj8SwRf0f1uzKiW6iaqXLK5kSGf479Txhy06+Zf4YbfNFX+rxB4l7ks1W1DGfv3jvOIXyzd e5QEA X-Developer-Key: i=schnelle@linux.ibm.com; a=openpgp; fpr=9DB000B2D2752030A5F72DDCAFE43F15E8C26090 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: w6vAB7KI49RqmI13ttuXNrChbd2X2jXW X-Proofpoint-GUID: w6vAB7KI49RqmI13ttuXNrChbd2X2jXW Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-18_04,2023-09-18_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 phishscore=0 suspectscore=0 adultscore=0 malwarescore=0 bulkscore=0 spamscore=0 mlxlogscore=340 priorityscore=1501 mlxscore=0 impostorscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309180101 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 18 Sep 2023 04:53:14 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777394866528974896 X-GMAIL-MSGID: 1777407747572323519 |
Series |
iommu/virtio: Enable IOMMU_CAP_DERRED_FLUSH
|
|
Commit Message
Niklas Schnelle
Sept. 18, 2023, 11:51 a.m. UTC
Pull out the sync operation from viommu_map_pages() by implementing
ops->iotlb_sync_map. This allows the common IOMMU code to map multiple
elements of an sg with a single sync (see iommu_map_sg()). Furthermore,
it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH.
Link: https://lore.kernel.org/lkml/20230726111433.1105665-1-schnelle@linux.ibm.com/
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
drivers/iommu/virtio-iommu.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
Comments
On Mon, 2023-09-18 at 17:37 +0100, Robin Murphy wrote: > On 2023-09-18 12:51, Niklas Schnelle wrote: > > Pull out the sync operation from viommu_map_pages() by implementing > > ops->iotlb_sync_map. This allows the common IOMMU code to map multiple > > elements of an sg with a single sync (see iommu_map_sg()). Furthermore, > > it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH. > > Is it really a requirement? Deferred flush only deals with unmapping. Or > are you just trying to say that it's not too worthwhile to try doing > more for unmapping performance while obvious mapping performance is > still left on the table? > You're right there is no hard requirement. I somehow thought that iommu_create_device_direct_map() relied on it because it does flush_iotlb_all() and iommu_map() but that doesn't seem to be true. If you want I can resend with the last sentence removed. > > Link: https://lore.kernel.org/lkml/20230726111433.1105665-1-schnelle@linux.ibm.com/ > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > > --- > > drivers/iommu/virtio-iommu.c | 17 ++++++++++++++++- > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > > index 17dcd826f5c2..3649586f0e5c 100644 > > --- a/drivers/iommu/virtio-iommu.c > > +++ b/drivers/iommu/virtio-iommu.c > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) > > int ret; > > unsigned long flags; > > > > + /* > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu > > + * is initialized e.g. via iommu_create_device_direct_mappings() > > + */ > > + if (!viommu) > > + return 0; > > Minor nit: I'd be inclined to make that check explicitly in the places > where it definitely is expected, rather than allowing *any* sync to > silently do nothing if called incorrectly. Plus then they could use > vdomain->nr_endpoints for consistency with the equivalent checks > elsewhere (it did take me a moment to figure out how we could get to > .iotlb_sync_map with a NULL viommu without viommu_map_pages() blowing up > first...) > > Thanks, > Robin. That's what I had in v1. I think this is a matter of taste and Jean- Philippe pointed me to moving the check into viommu_sync_req() I added a comment because it really is not entirely obvious. > > > spin_lock_irqsave(&viommu->request_lock, flags); > > ret = __viommu_sync_req(viommu); > > if (ret) > > @@ -843,7 +849,7 @@ static int viommu_map_pages(struct iommu_domain *domain, unsigned long iova, > > .flags = cpu_to_le32(flags), > > }; > > > > - ret = viommu_send_req_sync(vdomain->viommu, &map, sizeof(map)); > > + ret = viommu_add_req(vdomain->viommu, &map, sizeof(map)); > > if (ret) { > > viommu_del_mappings(vdomain, iova, end); > > return ret; > > @@ -912,6 +918,14 @@ static void viommu_iotlb_sync(struct iommu_domain *domain, > > viommu_sync_req(vdomain->viommu); > > } > > > > +static int viommu_iotlb_sync_map(struct iommu_domain *domain, > > + unsigned long iova, size_t size) > > +{ > > + struct viommu_domain *vdomain = to_viommu_domain(domain); > > + > > + return viommu_sync_req(vdomain->viommu); > > +} > > + > > static void viommu_get_resv_regions(struct device *dev, struct list_head *head) > > { > > struct iommu_resv_region *entry, *new_entry, *msi = NULL; > > @@ -1058,6 +1072,7 @@ static struct iommu_ops viommu_ops = { > > .unmap_pages = viommu_unmap_pages, > > .iova_to_phys = viommu_iova_to_phys, > > .iotlb_sync = viommu_iotlb_sync, > > + .iotlb_sync_map = viommu_iotlb_sync_map, > > .free = viommu_domain_free, > > } > > }; > >
On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote: > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > > index 17dcd826f5c2..3649586f0e5c 100644 > > --- a/drivers/iommu/virtio-iommu.c > > +++ b/drivers/iommu/virtio-iommu.c > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) > > int ret; > > unsigned long flags; > > + /* > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu > > + * is initialized e.g. via iommu_create_device_direct_mappings() > > + */ > > + if (!viommu) > > + return 0; > > Minor nit: I'd be inclined to make that check explicitly in the places where > it definitely is expected, rather than allowing *any* sync to silently do > nothing if called incorrectly. Plus then they could use > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere > (it did take me a moment to figure out how we could get to .iotlb_sync_map > with a NULL viommu without viommu_map_pages() blowing up first...) They're not strictly equivalent: this check works around a temporary issue with the IOMMU core, which calls map/unmap before the domain is finalized. Once we merge domain_alloc() and finalize(), then this check disappears, but we still need to test nr_endpoints in map/unmap to handle detached domains (and we still need to fix the synchronization of nr_endpoints against attach/detach). That's why I preferred doing this on viommu and keeping it in one place. Thanks, Jean
On 2023-09-19 09:15, Jean-Philippe Brucker wrote: > On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote: >>> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c >>> index 17dcd826f5c2..3649586f0e5c 100644 >>> --- a/drivers/iommu/virtio-iommu.c >>> +++ b/drivers/iommu/virtio-iommu.c >>> @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) >>> int ret; >>> unsigned long flags; >>> + /* >>> + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu >>> + * is initialized e.g. via iommu_create_device_direct_mappings() >>> + */ >>> + if (!viommu) >>> + return 0; >> >> Minor nit: I'd be inclined to make that check explicitly in the places where >> it definitely is expected, rather than allowing *any* sync to silently do >> nothing if called incorrectly. Plus then they could use >> vdomain->nr_endpoints for consistency with the equivalent checks elsewhere >> (it did take me a moment to figure out how we could get to .iotlb_sync_map >> with a NULL viommu without viommu_map_pages() blowing up first...) > > They're not strictly equivalent: this check works around a temporary issue > with the IOMMU core, which calls map/unmap before the domain is finalized. > Once we merge domain_alloc() and finalize(), then this check disappears, > but we still need to test nr_endpoints in map/unmap to handle detached > domains (and we still need to fix the synchronization of nr_endpoints > against attach/detach). That's why I preferred doing this on viommu and > keeping it in one place. Fair enough - it just seems to me that in both cases it's a detached domain, so its previous history of whether it's ever been otherwise or not shouldn't matter. Even once viommu is initialised, does it really make sense to send sync commands for a mapping on a detached domain where we haven't actually sent any map/unmap commands? Thanks, Robin.
On Tue, Sep 19, 2023 at 09:15:19AM +0100, Jean-Philippe Brucker wrote: > On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote: > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > > > index 17dcd826f5c2..3649586f0e5c 100644 > > > --- a/drivers/iommu/virtio-iommu.c > > > +++ b/drivers/iommu/virtio-iommu.c > > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) > > > int ret; > > > unsigned long flags; > > > + /* > > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu > > > + * is initialized e.g. via iommu_create_device_direct_mappings() > > > + */ > > > + if (!viommu) > > > + return 0; > > > > Minor nit: I'd be inclined to make that check explicitly in the places where > > it definitely is expected, rather than allowing *any* sync to silently do > > nothing if called incorrectly. Plus then they could use > > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere > > (it did take me a moment to figure out how we could get to .iotlb_sync_map > > with a NULL viommu without viommu_map_pages() blowing up first...) This makes more sense to me Ultimately this driver should reach a point where every iommu_domain always has a non-null domain->viommu because it will be set during alloc. But it can still have nr_endpoints == 0, doesn't it make sense to avoid sync in this case? (btw this driver is missing locking around vdomain->nr_endpoints) > They're not strictly equivalent: this check works around a temporary issue > with the IOMMU core, which calls map/unmap before the domain is > finalized. Where? The above points to iommu_create_device_direct_mappings() but it doesn't because the pgsize_bitmap == 0: static int iommu_create_device_direct_mappings(struct iommu_domain *domain, struct device *dev) { struct iommu_resv_region *entry; struct list_head mappings; unsigned long pg_size; int ret = 0; pg_size = domain->pgsize_bitmap ? 1UL << __ffs(domain->pgsize_bitmap) : 0; INIT_LIST_HEAD(&mappings); if (WARN_ON_ONCE(iommu_is_dma_domain(domain) && !pg_size)) Indeed, the driver should be failing all map's until the domain is finalized because it has no way to check the IOVA matches the eventual aperture. Jason
On Tue, Sep 19, 2023 at 09:28:08AM +0100, Robin Murphy wrote: > On 2023-09-19 09:15, Jean-Philippe Brucker wrote: > > On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote: > > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > > > > index 17dcd826f5c2..3649586f0e5c 100644 > > > > --- a/drivers/iommu/virtio-iommu.c > > > > +++ b/drivers/iommu/virtio-iommu.c > > > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) > > > > int ret; > > > > unsigned long flags; > > > > + /* > > > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu > > > > + * is initialized e.g. via iommu_create_device_direct_mappings() > > > > + */ > > > > + if (!viommu) > > > > + return 0; > > > > > > Minor nit: I'd be inclined to make that check explicitly in the places where > > > it definitely is expected, rather than allowing *any* sync to silently do > > > nothing if called incorrectly. Plus then they could use > > > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere > > > (it did take me a moment to figure out how we could get to .iotlb_sync_map > > > with a NULL viommu without viommu_map_pages() blowing up first...) > > > > They're not strictly equivalent: this check works around a temporary issue > > with the IOMMU core, which calls map/unmap before the domain is finalized. > > Once we merge domain_alloc() and finalize(), then this check disappears, > > but we still need to test nr_endpoints in map/unmap to handle detached > > domains (and we still need to fix the synchronization of nr_endpoints > > against attach/detach). That's why I preferred doing this on viommu and > > keeping it in one place. > > Fair enough - it just seems to me that in both cases it's a detached domain, > so its previous history of whether it's ever been otherwise or not shouldn't > matter. Even once viommu is initialised, does it really make sense to send > sync commands for a mapping on a detached domain where we haven't actually > sent any map/unmap commands? If no requests were added by map/unmap, then viommu_sync_req() is essentially a nop because virtio doesn't use sync commands (and virtqueue_kick() only kicks the host when the queue's not empty, if I remember correctly). It still does a bit of work so is less efficient than a preliminary check on nr_endpoints, but it feels nicer to streamline the case where the domain is attached, since map/unmap on detached domains happens rarely, if ever. Either is fine by me. An extra test won't make much difference performance wise, and I guess will look less confusing. Niklas, do you mind resending the version with nr_endpoints check (and updated commit messages)? Thanks, Jean
On Tue, Sep 19, 2023 at 11:46:49AM -0300, Jason Gunthorpe wrote: > On Tue, Sep 19, 2023 at 09:15:19AM +0100, Jean-Philippe Brucker wrote: > > On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote: > > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > > > > index 17dcd826f5c2..3649586f0e5c 100644 > > > > --- a/drivers/iommu/virtio-iommu.c > > > > +++ b/drivers/iommu/virtio-iommu.c > > > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) > > > > int ret; > > > > unsigned long flags; > > > > + /* > > > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu > > > > + * is initialized e.g. via iommu_create_device_direct_mappings() > > > > + */ > > > > + if (!viommu) > > > > + return 0; > > > > > > Minor nit: I'd be inclined to make that check explicitly in the places where > > > it definitely is expected, rather than allowing *any* sync to silently do > > > nothing if called incorrectly. Plus then they could use > > > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere > > > (it did take me a moment to figure out how we could get to .iotlb_sync_map > > > with a NULL viommu without viommu_map_pages() blowing up first...) > > This makes more sense to me > > Ultimately this driver should reach a point where every iommu_domain > always has a non-null domain->viommu because it will be set during > alloc. > > But it can still have nr_endpoints == 0, doesn't it make sense to > avoid sync in this case? > > (btw this driver is missing locking around vdomain->nr_endpoints) Yes, that's on my list > > > They're not strictly equivalent: this check works around a temporary issue > > with the IOMMU core, which calls map/unmap before the domain is > > finalized. > > Where? The above points to iommu_create_device_direct_mappings() but > it doesn't because the pgsize_bitmap == 0: __iommu_domain_alloc() sets pgsize_bitmap in this case: /* * If not already set, assume all sizes by default; the driver * may override this later */ if (!domain->pgsize_bitmap) domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; Thanks, Jean
On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote: > > > They're not strictly equivalent: this check works around a temporary issue > > > with the IOMMU core, which calls map/unmap before the domain is > > > finalized. > > > > Where? The above points to iommu_create_device_direct_mappings() but > > it doesn't because the pgsize_bitmap == 0: > > __iommu_domain_alloc() sets pgsize_bitmap in this case: > > /* > * If not already set, assume all sizes by default; the driver > * may override this later > */ > if (!domain->pgsize_bitmap) > domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; Dirver's shouldn't do that. The core code was fixed to try again with mapping reserved regions to support these kinds of drivers. Jason
On 22/09/2023 1:41 pm, Jason Gunthorpe wrote: > On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote: >>>> They're not strictly equivalent: this check works around a temporary issue >>>> with the IOMMU core, which calls map/unmap before the domain is >>>> finalized. >>> >>> Where? The above points to iommu_create_device_direct_mappings() but >>> it doesn't because the pgsize_bitmap == 0: >> >> __iommu_domain_alloc() sets pgsize_bitmap in this case: >> >> /* >> * If not already set, assume all sizes by default; the driver >> * may override this later >> */ >> if (!domain->pgsize_bitmap) >> domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; > > Dirver's shouldn't do that. > > The core code was fixed to try again with mapping reserved regions to > support these kinds of drivers. This is still the "normal" code path, really; I think it's only AMD that started initialising the domain bitmap "early" and warranted making it conditional. However we *do* ultimately want all the drivers to do the same, so we can get rid of ops->pgsize_bitmap, because it's already pretty redundant and meaningless in the face of per-domain pagetable formats. Thanks, Robin.
On Fri, Sep 22, 2023 at 02:13:18PM +0100, Robin Murphy wrote: > On 22/09/2023 1:41 pm, Jason Gunthorpe wrote: > > On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote: > > > > > They're not strictly equivalent: this check works around a temporary issue > > > > > with the IOMMU core, which calls map/unmap before the domain is > > > > > finalized. > > > > > > > > Where? The above points to iommu_create_device_direct_mappings() but > > > > it doesn't because the pgsize_bitmap == 0: > > > > > > __iommu_domain_alloc() sets pgsize_bitmap in this case: > > > > > > /* > > > * If not already set, assume all sizes by default; the driver > > > * may override this later > > > */ > > > if (!domain->pgsize_bitmap) > > > domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; > > > > Dirver's shouldn't do that. > > > > The core code was fixed to try again with mapping reserved regions to > > support these kinds of drivers. > > This is still the "normal" code path, really; I think it's only AMD that > started initialising the domain bitmap "early" and warranted making it > conditional. My main point was that iommu_create_device_direct_mappings() should fail for unfinalized domains, setting pgsize_bitmap to allow it to succeed is not a nice hack, and not necessary now. What do you think about something like this to replace iommu_create_device_direct_mappings(), that does enforce things properly? static int resv_cmp(void *priv, const struct list_head *llhs, const struct list_head *lrhs) { struct iommu_resv_region *lhs = list_entry(llhs, struct iommu_resv_region, list); struct iommu_resv_region *rhs = list_entry(lrhs, struct iommu_resv_region, list); if (lhs->start == rhs->start) return 0; if (lhs->start < rhs->start) return -1; return 1; } static int iommu_create_device_direct_mappings(struct iommu_domain *domain, struct device *dev) { struct iommu_resv_region *entry; struct iommu_resv_region *tmp; struct list_head mappings; struct list_head direct; phys_addr_t cur = 0; int ret = 0; INIT_LIST_HEAD(&mappings); INIT_LIST_HEAD(&direct); iommu_get_resv_regions(dev, &mappings); list_for_each_entry_safe(entry, tmp, &mappings, list) { if (entry->type == IOMMU_RESV_DIRECT) dev->iommu->require_direct = 1; if ((domain->type & __IOMMU_DOMAIN_PAGING) && (entry->type == IOMMU_RESV_DIRECT || entry->type == IOMMU_RESV_DIRECT_RELAXABLE)) { if (domain->geometry.aperture_start > entry->start || domain->geometry.aperture_end == 0 || (domain->geometry.aperture_end - 1) < (entry->start + entry->length - 1)) { ret = -EINVAL; goto out; } list_move(&entry->list, &direct); } } if (list_empty(&direct)) goto out; /* * FW can have overlapping ranges, sort the list by start address * and map any duplicated IOVA only once. */ list_sort(NULL, &direct, resv_cmp); list_for_each_entry(entry, &direct, list) { phys_addr_t start_pfn = entry->start / PAGE_SIZE; phys_addr_t last_pfn = (entry->length - 1 + entry->start) / PAGE_SIZE; if (start_pfn < cur) start_pfn = cur; if (start_pfn <= last_pfn) { ret = iommu_map(domain, start_pfn * PAGE_SIZE, start_pfn * PAGE_SIZE, (last_pfn - start_pfn + 1) * PAGE_SIZE, entry->prot, GFP_KERNEL); if (ret) goto out; cur = last_pfn + 1; } } out: list_splice(&direct, &mappings); iommu_put_resv_regions(dev, &mappings); return ret; }
On 22/09/2023 5:27 pm, Jason Gunthorpe wrote: > On Fri, Sep 22, 2023 at 02:13:18PM +0100, Robin Murphy wrote: >> On 22/09/2023 1:41 pm, Jason Gunthorpe wrote: >>> On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote: >>>>>> They're not strictly equivalent: this check works around a temporary issue >>>>>> with the IOMMU core, which calls map/unmap before the domain is >>>>>> finalized. >>>>> >>>>> Where? The above points to iommu_create_device_direct_mappings() but >>>>> it doesn't because the pgsize_bitmap == 0: >>>> >>>> __iommu_domain_alloc() sets pgsize_bitmap in this case: >>>> >>>> /* >>>> * If not already set, assume all sizes by default; the driver >>>> * may override this later >>>> */ >>>> if (!domain->pgsize_bitmap) >>>> domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; >>> >>> Dirver's shouldn't do that. >>> >>> The core code was fixed to try again with mapping reserved regions to >>> support these kinds of drivers. >> >> This is still the "normal" code path, really; I think it's only AMD that >> started initialising the domain bitmap "early" and warranted making it >> conditional. > > My main point was that iommu_create_device_direct_mappings() should > fail for unfinalized domains, setting pgsize_bitmap to allow it to > succeed is not a nice hack, and not necessary now. Sure, but it's the whole "unfinalised domains" and rewriting domain->pgsize_bitmap after attach thing that is itself the massive hack. AMD doesn't do that, and doesn't need to; it knows the appropriate format at allocation time and can quite happily return a fully working domain which allows map before attach, but the old ops->pgsize_bitmap mechanism fundamentally doesn't work for multiple formats with different page sizes. The only thing I'd accuse it of doing wrong is the weird half-and-half thing of having one format as a default via one mechanism, and the other as an override through the other, rather than setting both explicitly. virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings either; it sets it once it's discovered any instance, since apparently it's assuming that all instances must support identical page sizes, and thus once it's seen one it can work "normally" per the core code's assumptions. It's also I think the only driver which has a "finalise" bodge but *can* still properly support map-before-attach, by virtue of having to replay mappings to every new endpoint anyway. > What do you think about something like this to replace > iommu_create_device_direct_mappings(), that does enforce things > properly? I fail to see how that would make any practical difference. Either the mappings can be correctly set up in a pagetable *before* the relevant device is attached to that pagetable, or they can't (if the driver doesn't have enough information to be able to do so) and we just have to really hope nothing blows up in the race window between attaching the device to an empty pagetable and having a second try at iommu_create_device_direct_mappings(). That's a driver-level issue and has nothing to do with pgsize_bitmap either way. Thanks, Robin.
On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: > virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings > either; it sets it once it's discovered any instance, since apparently it's > assuming that all instances must support identical page sizes, and thus once > it's seen one it can work "normally" per the core code's assumptions. It's > also I think the only driver which has a "finalise" bodge but *can* still > properly support map-before-attach, by virtue of having to replay mappings > to every new endpoint anyway. Well it can't quite do that since it doesn't know the geometry - it all is sort of guessing and hoping it doesn't explode on replay. If it knows the geometry it wouldn't need finalize... > > What do you think about something like this to replace > > iommu_create_device_direct_mappings(), that does enforce things > > properly? > > I fail to see how that would make any practical difference. Either the > mappings can be correctly set up in a pagetable *before* the relevant device > is attached to that pagetable, or they can't (if the driver doesn't have > enough information to be able to do so) and we just have to really hope > nothing blows up in the race window between attaching the device to an empty > pagetable and having a second try at iommu_create_device_direct_mappings(). > That's a driver-level issue and has nothing to do with pgsize_bitmap either > way. Except we don't detect this in the core code correctly, that is my point. We should detect the aperture conflict, not pgsize_bitmap to check if it is the first or second try. Jason
On 9/23/23 7:33 AM, Jason Gunthorpe wrote: > On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: > >> virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings >> either; it sets it once it's discovered any instance, since apparently it's >> assuming that all instances must support identical page sizes, and thus once >> it's seen one it can work "normally" per the core code's assumptions. It's >> also I think the only driver which has a "finalise" bodge but*can* still >> properly support map-before-attach, by virtue of having to replay mappings >> to every new endpoint anyway. > Well it can't quite do that since it doesn't know the geometry - it > all is sort of guessing and hoping it doesn't explode on replay. If it > knows the geometry it wouldn't need finalize... The ultimate solution to this problem seems to be to add device pointer to the parameter of ops->domain_alloc. So once the domain is allocated, it is fully initialized. Attaching this domain to a device that is not compatible will return -EINVAL, then the caller has to allocate a new domain for this device. I feel that this is not an AMD specific problem, other iommu drivers will also encounter the similar problem sooner or later. Best regards, baolu
On Mon, Sep 25, 2023 at 10:48:21AM +0800, Baolu Lu wrote: > On 9/23/23 7:33 AM, Jason Gunthorpe wrote: > > On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: > > > > > virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings > > > either; it sets it once it's discovered any instance, since apparently it's > > > assuming that all instances must support identical page sizes, and thus once > > > it's seen one it can work "normally" per the core code's assumptions. It's > > > also I think the only driver which has a "finalise" bodge but*can* still > > > properly support map-before-attach, by virtue of having to replay mappings > > > to every new endpoint anyway. > > Well it can't quite do that since it doesn't know the geometry - it > > all is sort of guessing and hoping it doesn't explode on replay. If it > > knows the geometry it wouldn't need finalize... > > The ultimate solution to this problem seems to be to add device pointer > to the parameter of ops->domain_alloc. So once the domain is allocated, > it is fully initialized. Attaching this domain to a device that is not > compatible will return -EINVAL, then the caller has to allocate a new > domain for this device. Sure, it looks like this, and it works already for default domain setup. diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 8599394c9157d7..1697f5a2370785 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -637,15 +637,10 @@ static void viommu_event_handler(struct virtqueue *vq) /* IOMMU API */ -static struct iommu_domain *viommu_domain_alloc(unsigned type) +static struct viommu_domain *__viommu_domain_alloc(void) { struct viommu_domain *vdomain; - if (type != IOMMU_DOMAIN_UNMANAGED && - type != IOMMU_DOMAIN_DMA && - type != IOMMU_DOMAIN_IDENTITY) - return NULL; - vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL); if (!vdomain) return NULL; @@ -654,16 +649,32 @@ static struct iommu_domain *viommu_domain_alloc(unsigned type) spin_lock_init(&vdomain->mappings_lock); vdomain->mappings = RB_ROOT_CACHED; + return vdomain; +} + +static struct iommu_domain *viommu_domain_alloc(unsigned type) +{ + struct viommu_domain *vdomain; + + /* + * viommu sometimes creates an identity domain out of a + * paging domain, that can't use the global static. + * During attach it will fill in an identity page table. + */ + if (type != IOMMU_DOMAIN_IDENTITY) + return NULL; + vdomain = __viommu_domain_alloc(); + if (!vdomain) + return NULL; return &vdomain->domain; } static int viommu_domain_finalise(struct viommu_endpoint *vdev, - struct iommu_domain *domain) + struct viommu_domain *vdomain) { int ret; unsigned long viommu_page_size; struct viommu_dev *viommu = vdev->viommu; - struct viommu_domain *vdomain = to_viommu_domain(domain); viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap); if (viommu_page_size > PAGE_SIZE) { @@ -680,13 +691,13 @@ static int viommu_domain_finalise(struct viommu_endpoint *vdev, vdomain->id = (unsigned int)ret; - domain->pgsize_bitmap = viommu->pgsize_bitmap; - domain->geometry = viommu->geometry; + vdomain->domain.pgsize_bitmap = viommu->pgsize_bitmap; + vdomain->domain.geometry = viommu->geometry; vdomain->map_flags = viommu->map_flags; vdomain->viommu = viommu; - if (domain->type == IOMMU_DOMAIN_IDENTITY) { + if (vdomain->domain.type == IOMMU_DOMAIN_IDENTITY) { if (virtio_has_feature(viommu->vdev, VIRTIO_IOMMU_F_BYPASS_CONFIG)) { vdomain->bypass = true; @@ -717,6 +728,24 @@ static void viommu_domain_free(struct iommu_domain *domain) kfree(vdomain); } +static struct iommu_domain *viommu_domain_alloc_paging(struct device *dev) +{ + struct viommu_domain *vdomain; + vdomain = __viommu_domain_alloc(); + if (!vdomain) + return NULL; + + if (dev) { + struct viommu_endpoint *vdev = dev_iommu_priv_get(dev); + + if (viommu_domain_finalise(vdev, vdomain)) { + viommu_domain_free(&vdomain->domain); + return NULL; + } + } + return &vdomain->domain; +} + static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev) { int i; @@ -732,7 +761,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev) * Properly initialize the domain now that we know which viommu * owns it. */ - ret = viommu_domain_finalise(vdev, domain); + ret = viommu_domain_finalise(vdev, vdomain); } else if (vdomain->viommu != vdev->viommu) { ret = -EINVAL; } @@ -1045,6 +1074,7 @@ static bool viommu_capable(struct device *dev, enum iommu_cap cap) static struct iommu_ops viommu_ops = { .capable = viommu_capable, .domain_alloc = viommu_domain_alloc, + .domain_alloc_paging = viommu_domain_alloc_paging, .probe_device = viommu_probe_device, .probe_finalize = viommu_probe_finalize, .release_device = viommu_release_device,
On 2023-09-23 00:33, Jason Gunthorpe wrote: > On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: > >> virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings >> either; it sets it once it's discovered any instance, since apparently it's >> assuming that all instances must support identical page sizes, and thus once >> it's seen one it can work "normally" per the core code's assumptions. It's >> also I think the only driver which has a "finalise" bodge but *can* still >> properly support map-before-attach, by virtue of having to replay mappings >> to every new endpoint anyway. > > Well it can't quite do that since it doesn't know the geometry - it > all is sort of guessing and hoping it doesn't explode on replay. If it > knows the geometry it wouldn't need finalize... I think it's entirely reasonable to assume that any direct mappings specified for a device are valid for that device and its IOMMU. However, in the particular case of virtio, it really shouldn't ever have direct mappings anyway, since even if the underlying hardware did have any, the host can enforce the actual direct-mapping aspect itself, and just present them as unusable regions to the guest. >>> What do you think about something like this to replace >>> iommu_create_device_direct_mappings(), that does enforce things >>> properly? >> >> I fail to see how that would make any practical difference. Either the >> mappings can be correctly set up in a pagetable *before* the relevant device >> is attached to that pagetable, or they can't (if the driver doesn't have >> enough information to be able to do so) and we just have to really hope >> nothing blows up in the race window between attaching the device to an empty >> pagetable and having a second try at iommu_create_device_direct_mappings(). >> That's a driver-level issue and has nothing to do with pgsize_bitmap either >> way. > > Except we don't detect this in the core code correctly, that is my > point. We should detect the aperture conflict, not pgsize_bitmap to > check if it is the first or second try. Again, that's irrelevant. It can only be about whether the actual ->map_pages call succeeds or not. A driver could well know up-front that all instances support the same pgsize_bitmap and aperture, and set both at ->domain_alloc time, yet still be unable to handle an actual mapping without knowing which instance(s) that needs to interact with (e.g. omap-iommu). Thanks, Robin.
On Mon, Sep 25, 2023 at 02:07:50PM +0100, Robin Murphy wrote: > On 2023-09-23 00:33, Jason Gunthorpe wrote: > > On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: > > > > > virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings > > > either; it sets it once it's discovered any instance, since apparently it's > > > assuming that all instances must support identical page sizes, and thus once > > > it's seen one it can work "normally" per the core code's assumptions. It's > > > also I think the only driver which has a "finalise" bodge but *can* still > > > properly support map-before-attach, by virtue of having to replay mappings > > > to every new endpoint anyway. > > > > Well it can't quite do that since it doesn't know the geometry - it > > all is sort of guessing and hoping it doesn't explode on replay. If it > > knows the geometry it wouldn't need finalize... > > I think it's entirely reasonable to assume that any direct mappings > specified for a device are valid for that device and its IOMMU. However, in > the particular case of virtio, it really shouldn't ever have direct mappings > anyway, since even if the underlying hardware did have any, the host can > enforce the actual direct-mapping aspect itself, and just present them as > unusable regions to the guest. I assume this machinery is for the ARM GIC ITS page.... > Again, that's irrelevant. It can only be about whether the actual > ->map_pages call succeeds or not. A driver could well know up-front that all > instances support the same pgsize_bitmap and aperture, and set both at > ->domain_alloc time, yet still be unable to handle an actual mapping without > knowing which instance(s) that needs to interact with (e.g. omap-iommu). I think this is a different issue. The domain is supposed to represent the actual io pte storage, and the storage is supposed to exist even when the domain is not attached to anything. As we said with tegra-gart, it is a bug in the driver if all the mappings disappear when the last device is detached from the domain. Driver bugs like this turn into significant issues with vfio/iommufd as this will result in warn_on's and memory leaking. So, I disagree that this is something we should be allowing in the API design. map_pages should succeed (memory allocation failures aside) if a IOVA within the aperture and valid flags are presented. Regardless of the attachment status. Calling map_pages with an IOVA outside the aperture should be a caller bug. It looks omap is just mis-designed to store the pgd in the omap_iommu, not the omap_iommu_domain :( pgd is clearly a per-domain object in our API. And why does every instance need its own copy of the identical pgd? Jason
On 2023-09-25 14:29, Jason Gunthorpe wrote: > On Mon, Sep 25, 2023 at 02:07:50PM +0100, Robin Murphy wrote: >> On 2023-09-23 00:33, Jason Gunthorpe wrote: >>> On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: >>> >>>> virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings >>>> either; it sets it once it's discovered any instance, since apparently it's >>>> assuming that all instances must support identical page sizes, and thus once >>>> it's seen one it can work "normally" per the core code's assumptions. It's >>>> also I think the only driver which has a "finalise" bodge but *can* still >>>> properly support map-before-attach, by virtue of having to replay mappings >>>> to every new endpoint anyway. >>> >>> Well it can't quite do that since it doesn't know the geometry - it >>> all is sort of guessing and hoping it doesn't explode on replay. If it >>> knows the geometry it wouldn't need finalize... >> >> I think it's entirely reasonable to assume that any direct mappings >> specified for a device are valid for that device and its IOMMU. However, in >> the particular case of virtio, it really shouldn't ever have direct mappings >> anyway, since even if the underlying hardware did have any, the host can >> enforce the actual direct-mapping aspect itself, and just present them as >> unusable regions to the guest. > > I assume this machinery is for the ARM GIC ITS page.... > >> Again, that's irrelevant. It can only be about whether the actual >> ->map_pages call succeeds or not. A driver could well know up-front that all >> instances support the same pgsize_bitmap and aperture, and set both at >> ->domain_alloc time, yet still be unable to handle an actual mapping without >> knowing which instance(s) that needs to interact with (e.g. omap-iommu). > > I think this is a different issue. The domain is supposed to represent > the actual io pte storage, and the storage is supposed to exist even > when the domain is not attached to anything. > > As we said with tegra-gart, it is a bug in the driver if all the > mappings disappear when the last device is detached from the domain. > Driver bugs like this turn into significant issues with vfio/iommufd > as this will result in warn_on's and memory leaking. > > So, I disagree that this is something we should be allowing in the API > design. map_pages should succeed (memory allocation failures aside) if > a IOVA within the aperture and valid flags are presented. Regardless > of the attachment status. Calling map_pages with an IOVA outside the > aperture should be a caller bug. > > It looks omap is just mis-designed to store the pgd in the omap_iommu, > not the omap_iommu_domain :( pgd is clearly a per-domain object in our > API. And why does every instance need its own copy of the identical > pgd? The point wasn't that it was necessarily a good and justifiable example, just that it is one that exists, to demonstrate that in general we have no reasonable heuristic for guessing whether ->map_pages is going to succeed or not other than by calling it and seeing if it succeeds or not. And IMO it's a complete waste of time thinking about ways to make such a heuristic possible instead of just getting on with fixing iommu_domain_alloc() to make the problem disappear altogether. Once Joerg pushes out the current queue I'll rebase and resend v4 of the bus ops removal, then hopefully get back to despairing at the hideous pile of WIP iommu_domain_alloc() patches I currently have on top of it... Thanks, Robin.
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 17dcd826f5c2..3649586f0e5c 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) int ret; unsigned long flags; + /* + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu + * is initialized e.g. via iommu_create_device_direct_mappings() + */ + if (!viommu) + return 0; spin_lock_irqsave(&viommu->request_lock, flags); ret = __viommu_sync_req(viommu); if (ret) @@ -843,7 +849,7 @@ static int viommu_map_pages(struct iommu_domain *domain, unsigned long iova, .flags = cpu_to_le32(flags), }; - ret = viommu_send_req_sync(vdomain->viommu, &map, sizeof(map)); + ret = viommu_add_req(vdomain->viommu, &map, sizeof(map)); if (ret) { viommu_del_mappings(vdomain, iova, end); return ret; @@ -912,6 +918,14 @@ static void viommu_iotlb_sync(struct iommu_domain *domain, viommu_sync_req(vdomain->viommu); } +static int viommu_iotlb_sync_map(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ + struct viommu_domain *vdomain = to_viommu_domain(domain); + + return viommu_sync_req(vdomain->viommu); +} + static void viommu_get_resv_regions(struct device *dev, struct list_head *head) { struct iommu_resv_region *entry, *new_entry, *msi = NULL; @@ -1058,6 +1072,7 @@ static struct iommu_ops viommu_ops = { .unmap_pages = viommu_unmap_pages, .iova_to_phys = viommu_iova_to_phys, .iotlb_sync = viommu_iotlb_sync, + .iotlb_sync_map = viommu_iotlb_sync_map, .free = viommu_domain_free, } };