Message ID | 20221116171656.4128212-5-schnelle@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp263618wru; Wed, 16 Nov 2022 09:27:36 -0800 (PST) X-Google-Smtp-Source: AA0mqf4jT0jW0psU5+cRtMqSO3kf3m6g4sm0HX51okr+MZ0cOC1TqTSpH9732Pkl5CzMltnyd/DA X-Received: by 2002:a63:1802:0:b0:476:d22c:bdf7 with SMTP id y2-20020a631802000000b00476d22cbdf7mr6100609pgl.279.1668619656586; Wed, 16 Nov 2022 09:27:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668619656; cv=none; d=google.com; s=arc-20160816; b=gnN1FKbYzZPK1aaX5CvxiyGY1OnVCFWNxm7zPlGNOnGxpaN7vJ9LnSQetf3nK0P1uR X4BN8Nnu++3aJG9WCBgPMxbLz00GchWeHkxosUCu0akcZrbYPQKDGnvVSNx68WEugGmt MY0GV6Nma8XvkEnyiccvdB5clnsHt+xZiJUoCHJL1GuyvdHmgTC6d3wOT0G8Wwvdm+3M pcRp4hgtI5c4ZcbT6XFJEAxQtl548H1s71Q+p79GhKq35zFX4+drVxjpAHVb/2ljmTN1 TXeGpooiifaUTgjOD8N2gpgPbURvcX2eMqvwbFYT5F2wWx7jgKwQ5WX3wQ7COrZ8SnnN cMHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=DesZm6hVauGJ5jEKwYVVgM1CsFnIzR6FUV6dd8h3oPU=; b=BFmi6kFgKnW8kGeIckzRdWTteFxFgHK1fy7G3qzhhFZ7bfInRLdJE7iQ+Q78Tw62wB lb/wAehN2Ao1tPzwagCzt5Ccncdemnhvy+zMKqF70Kf1bIqbUnQbr6YwEQqy95Hp5iXd Feeq95WWcos8Ry7RqTgAGsLoG3B2ag8hkTDerd5Z+sLFXE0lJW7SUQwHZFmvI22JU1jT 1WapwQrkDjactMkrW99hCVaAjisHZeDoeWWW3/zOgygCW+IYKisc5LPfOkCbLJnkkOEZ 60OpJj7UtKnBskhKILGs0nuQ8aJv2SpOOcVod0YzlBko52Fx5pefNXubfTkj7BTsO2gS n05A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b="Opmw8G/V"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q19-20020a056a00089300b00562331a3562si16917936pfj.130.2022.11.16.09.27.23; Wed, 16 Nov 2022 09:27:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b="Opmw8G/V"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233605AbiKPRRU (ORCPT <rfc822;just.gull.subs@gmail.com> + 99 others); Wed, 16 Nov 2022 12:17:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230377AbiKPRRT (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 16 Nov 2022 12:17:19 -0500 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE50E4731C; Wed, 16 Nov 2022 09:17:18 -0800 (PST) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AGHDJHH005207; Wed, 16 Nov 2022 17:17:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=DesZm6hVauGJ5jEKwYVVgM1CsFnIzR6FUV6dd8h3oPU=; b=Opmw8G/V3dRoyOOSGVdh+czxS74TKiVkIlbiAvymk1v5SGU2j7vxqY9dw7z8eA5cEo/l LFJXRolPGu46ib3tkLFCCAhkjCghXY7YryLjWADW4te3jHsOme0M9Y1hxEV7BGXtIQ2l N91Qs6PqvlYi2bLUud6/3HufTLz9crz+rJ9FMnePH5QIwI/N2VF0qeu5I79h8Fyd9Rnt JJTnGAjotrOcuCPcQ8IJ2n/1zc17QWHimsFEtqjigSVjzcu+Y/MXKOXQVrRfJ2rluM53 5to8DBwGv4zJXk7FoKRcFhG3eW5/DqbC4sk/YfyT+QUJUv9xPd6lTRmTmiYOP/kbw1XJ XQ== Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kw3qe8say-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 16 Nov 2022 17:17:04 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AGH5llf024480; Wed, 16 Nov 2022 17:17:02 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma04ams.nl.ibm.com with ESMTP id 3kt348x9jy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 16 Nov 2022 17:17:02 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AGHGxwi38928914 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 16 Nov 2022 17:16:59 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 41D3F5204F; Wed, 16 Nov 2022 17:16:59 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id AC39452054; Wed, 16 Nov 2022 17:16:58 +0000 (GMT) From: Niklas Schnelle <schnelle@linux.ibm.com> To: Matthew Rosato <mjrosato@linux.ibm.com>, Gerd Bayer <gbayer@linux.ibm.com>, iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Robin Murphy <robin.murphy@arm.com>, Jason Gunthorpe <jgg@nvidia.com>, Wenjia Zhang <wenjia@linux.ibm.com> Cc: Pierre Morel <pmorel@linux.ibm.com>, linux-s390@vger.kernel.org, borntraeger@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com, gerald.schaefer@linux.ibm.com, agordeev@linux.ibm.com, svens@linux.ibm.com, linux-kernel@vger.kernel.org, Julian Ruess <julianr@linux.ibm.com> Subject: [PATCH v2 4/7] iommu: Let iommu.strict override ops->def_domain_type Date: Wed, 16 Nov 2022 18:16:53 +0100 Message-Id: <20221116171656.4128212-5-schnelle@linux.ibm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221116171656.4128212-1-schnelle@linux.ibm.com> References: <20221116171656.4128212-1-schnelle@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Gop84IsW_vCOuB5QmJucy4dagLgXcyIS X-Proofpoint-GUID: Gop84IsW_vCOuB5QmJucy4dagLgXcyIS X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-16_03,2022-11-16_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 spamscore=0 malwarescore=0 bulkscore=0 lowpriorityscore=0 adultscore=0 priorityscore=1501 impostorscore=0 phishscore=0 clxscore=1015 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211160119 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749674524708715786?= X-GMAIL-MSGID: =?utf-8?q?1749674524708715786?= |
Series |
iommu/dma: s390 DMA API conversion and optimized IOTLB flushing
|
|
Commit Message
Niklas Schnelle
Nov. 16, 2022, 5:16 p.m. UTC
When iommu.strict=1 is set or iommu_set_dma_strict() was called we
should use IOMMU_DOMAIN_DMA irrespective of ops->def_domain_type.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
drivers/iommu/iommu.c | 3 +++
1 file changed, 3 insertions(+)
Comments
On 2022/11/17 1:16, Niklas Schnelle wrote: > When iommu.strict=1 is set or iommu_set_dma_strict() was called we > should use IOMMU_DOMAIN_DMA irrespective of ops->def_domain_type. > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > --- > drivers/iommu/iommu.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 65a3b3d886dc..d9bf94d198df 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -1562,6 +1562,9 @@ static int iommu_get_def_domain_type(struct device *dev) > { > const struct iommu_ops *ops = dev_iommu_ops(dev); > > + if (iommu_dma_strict) > + return IOMMU_DOMAIN_DMA; If any quirky device must work in IOMMU identity mapping mode, this might introduce functional regression. At least for VT-d platforms, some devices do require IOMMU identity mapping mode for functionality. > + > if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted) > return IOMMU_DOMAIN_DMA; > Best regards, baolu
On Thu, 2022-11-17 at 09:55 +0800, Baolu Lu wrote: > On 2022/11/17 1:16, Niklas Schnelle wrote: > > When iommu.strict=1 is set or iommu_set_dma_strict() was called we > > should use IOMMU_DOMAIN_DMA irrespective of ops->def_domain_type. > > > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > > --- > > drivers/iommu/iommu.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index 65a3b3d886dc..d9bf94d198df 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -1562,6 +1562,9 @@ static int iommu_get_def_domain_type(struct device *dev) > > { > > const struct iommu_ops *ops = dev_iommu_ops(dev); > > > > + if (iommu_dma_strict) > > + return IOMMU_DOMAIN_DMA; > > If any quirky device must work in IOMMU identity mapping mode, this > might introduce functional regression. At least for VT-d platforms, some > devices do require IOMMU identity mapping mode for functionality. That's a good point. How about instead of unconditionally returning IOMMU_DOMAIN_DMA we just do so if the domain type returned by ops- >def_domain_type uses a flush queue (i.e. the __IOMMU_DOMAIN_DMA_FQ bit is set). That way a device that only supports identity mapping gets to set that but iommu_dma_strict at least always prevents use of an IOVA flush queue. > > > + > > if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted) > > return IOMMU_DOMAIN_DMA; > > > > Best regards, > baolu
On 2022/11/28 19:10, Niklas Schnelle wrote: > On Thu, 2022-11-17 at 09:55 +0800, Baolu Lu wrote: >> On 2022/11/17 1:16, Niklas Schnelle wrote: >>> When iommu.strict=1 is set or iommu_set_dma_strict() was called we >>> should use IOMMU_DOMAIN_DMA irrespective of ops->def_domain_type. >>> >>> Signed-off-by: Niklas Schnelle<schnelle@linux.ibm.com> >>> --- >>> drivers/iommu/iommu.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>> index 65a3b3d886dc..d9bf94d198df 100644 >>> --- a/drivers/iommu/iommu.c >>> +++ b/drivers/iommu/iommu.c >>> @@ -1562,6 +1562,9 @@ static int iommu_get_def_domain_type(struct device *dev) >>> { >>> const struct iommu_ops *ops = dev_iommu_ops(dev); >>> >>> + if (iommu_dma_strict) >>> + return IOMMU_DOMAIN_DMA; >> If any quirky device must work in IOMMU identity mapping mode, this >> might introduce functional regression. At least for VT-d platforms, some >> devices do require IOMMU identity mapping mode for functionality. > That's a good point. How about instead of unconditionally returning > IOMMU_DOMAIN_DMA we just do so if the domain type returned by ops- >> def_domain_type uses a flush queue (i.e. the __IOMMU_DOMAIN_DMA_FQ bit > is set). That way a device that only supports identity mapping gets to > set that but iommu_dma_strict at least always prevents use of an IOVA > flush queue. def_domain_type returns IOMMU_DOMAIN_DMA, IOMMU_DOMAIN_IDENTIRY or 0 (don't care). From a code perspective, you can force IOMMU_DOMAIN_DMA if def_domain_type() returns 0. *But* you need to document the relationship between strict mode and default domain type somewhere and get that agreed with Robin. Best regards, baolu
On Mon, Nov 28, 2022 at 12:10:39PM +0100, Niklas Schnelle wrote: > On Thu, 2022-11-17 at 09:55 +0800, Baolu Lu wrote: > > On 2022/11/17 1:16, Niklas Schnelle wrote: > > > When iommu.strict=1 is set or iommu_set_dma_strict() was called we > > > should use IOMMU_DOMAIN_DMA irrespective of ops->def_domain_type. > > > > > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > > > --- > > > drivers/iommu/iommu.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > > index 65a3b3d886dc..d9bf94d198df 100644 > > > --- a/drivers/iommu/iommu.c > > > +++ b/drivers/iommu/iommu.c > > > @@ -1562,6 +1562,9 @@ static int iommu_get_def_domain_type(struct device *dev) > > > { > > > const struct iommu_ops *ops = dev_iommu_ops(dev); > > > > > > + if (iommu_dma_strict) > > > + return IOMMU_DOMAIN_DMA; > > > > If any quirky device must work in IOMMU identity mapping mode, this > > might introduce functional regression. At least for VT-d platforms, some > > devices do require IOMMU identity mapping mode for functionality. > > That's a good point. How about instead of unconditionally returning > IOMMU_DOMAIN_DMA we just do so if the domain type returned by ops- > >def_domain_type uses a flush queue (i.e. the __IOMMU_DOMAIN_DMA_FQ bit > is set). That way a device that only supports identity mapping gets to > set that but iommu_dma_strict at least always prevents use of an IOVA > flush queue. I would prefer we create some formal caps in iommu_ops to describe whatever it is you are trying to do. Jason
On Mon, 2022-11-28 at 09:29 -0400, Jason Gunthorpe wrote: > On Mon, Nov 28, 2022 at 12:10:39PM +0100, Niklas Schnelle wrote: > > On Thu, 2022-11-17 at 09:55 +0800, Baolu Lu wrote: > > > On 2022/11/17 1:16, Niklas Schnelle wrote: > > > > When iommu.strict=1 is set or iommu_set_dma_strict() was called we > > > > should use IOMMU_DOMAIN_DMA irrespective of ops->def_domain_type. > > > > > > > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > > > > --- > > > > drivers/iommu/iommu.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > > > index 65a3b3d886dc..d9bf94d198df 100644 > > > > --- a/drivers/iommu/iommu.c > > > > +++ b/drivers/iommu/iommu.c > > > > @@ -1562,6 +1562,9 @@ static int iommu_get_def_domain_type(struct device *dev) > > > > { > > > > const struct iommu_ops *ops = dev_iommu_ops(dev); > > > > > > > > + if (iommu_dma_strict) > > > > + return IOMMU_DOMAIN_DMA; > > > > > > If any quirky device must work in IOMMU identity mapping mode, this > > > might introduce functional regression. At least for VT-d platforms, some > > > devices do require IOMMU identity mapping mode for functionality. > > > > That's a good point. How about instead of unconditionally returning > > IOMMU_DOMAIN_DMA we just do so if the domain type returned by ops- > > > def_domain_type uses a flush queue (i.e. the __IOMMU_DOMAIN_DMA_FQ bit > > is set). That way a device that only supports identity mapping gets to > > set that but iommu_dma_strict at least always prevents use of an IOVA > > flush queue. > > I would prefer we create some formal caps in iommu_ops to describe > whatever it is you are trying to do. > > Jason I agree that there is currently a lack of distinction between what domain types can be used (capability) and which should be used as default (iommu.strict=<x>, iommu_set_...(), CONFIG_IOMMU_DEFAULT_DMA, ops->def_domain_type.). My case though is about the latter which I think has some undocumented and surprising precedences built in at the moment. With this series we can use all of IOMMU_DOMAIN_DMA(_FQ, _SQ) on any PCI device but we want to default to either IOMMU_DOMAIN_DMA_FQ or IOMMU_DOMAIN_SQ based on whether we're running in a paging hypervisor (z/VM or KVM) to get the best performance. From a semantic point of view I felt that this is a good match for ops->def_domain_type in that we pick a default but it's still possible to change the domain type e.g. via sysfs. Now this had the problem that ops->def_domain_type would cause IOMMU_DOMAIN_DMA_FQ to be used even if iommu_set_dma_strict() was called (via iommu.strict=1) because there is a undocumented override of ops- >def_domain_type over iommu_def_domain_type which I believe comes from the mixing of capability and default you also point at. I think ideally we need two separate mechanism to determine which domain types can be used for a particular device (capability) and for which one to default to with the latter part having a clear precedence between the options. Put together I think iommu.strict=1 should override a device's preference (ops->def_domain_type) and CONFIG_IOMMU_DEFAULT_DMA to use IOMMU_DOMAIN_DMA but of course only if the device is capable of that. Does that sound reasonable?
On Mon, Nov 28, 2022 at 04:54:03PM +0100, Niklas Schnelle wrote: > I agree that there is currently a lack of distinction between what > domain types can be used (capability) and which should be used as > default (iommu.strict=<x>, iommu_set_...(), CONFIG_IOMMU_DEFAULT_DMA, > ops->def_domain_type.). What I would like to get to is the drivers only expose UNMANAGED, IDENTITY and BLOCKING domains. Everything that the DMA/FQ/etc domains were doing should be handled as some kind of cap. Eg, after Lu's series: https://lore.kernel.org/linux-iommu/20221128064648.1934720-1-baolu.lu@linux.intel.com/ We should be able to remove IOMMU_DOMAIN_DMA and its related from the drivers entirely. Instead drivers will provide UNMANAGED domains and dma-iommu.c will operate the UNMANAGED domain to provide the dma api. We can detect if the driver supports this by set_platform_dma() being NULL. A statement that a driver performs better using SQ/FQ/none should be something that is queried from the UNMANAGED domain as a guidance to dma-iommu.c what configuration to pick if not overriden. So, I would say what you want is some option flag, perhaps on the domain ops: 'domain performs better with SQ or FQ' > My case though is about the latter which I think has some undocumented > and surprising precedences built in at the moment. With this series we > can use all of IOMMU_DOMAIN_DMA(_FQ, _SQ) on any PCI device but we want > to default to either IOMMU_DOMAIN_DMA_FQ or IOMMU_DOMAIN_SQ based on > whether we're running in a paging hypervisor (z/VM or KVM) to get the > best performance. From a semantic point of view I felt that this is a > good match for ops->def_domain_type in that we pick a default but it's > still possible to change the domain type e.g. via sysfs. Now this had > the problem that ops->def_domain_type would cause IOMMU_DOMAIN_DMA_FQ > to be used even if iommu_set_dma_strict() was called (via > iommu.strict=1) because there is a undocumented override of ops- > >def_domain_type over iommu_def_domain_type which I believe comes from > the mixing of capability and default you also point at. Yeah, this does sounds troubled. > I think ideally we need two separate mechanism to determine which > domain types can be used for a particular device (capability) and for > which one to default to with the latter part having a clear precedence > between the options. Put together I think iommu.strict=1 should > override a device's preference (ops->def_domain_type) and > CONFIG_IOMMU_DEFAULT_DMA to use IOMMU_DOMAIN_DMA but of course only if > the device is capable of that. Does that sound reasonable? Using the language above, if someone asks for strict then the infrastructure should try to allocate an UNMANAGED domain, not an identity domain, and not use the lazy flush algorithms in dma-iommu.c Similarly if sysfs asks for lazy flush or identity maps then it should get it, always. The driver should have no say in how dma-iommu.c works beyond if it provides the required ops functionalities, and hint(s) as to what gives best performance. So anything that moves closer to this direction seems like a good choice to me. Jason
On 2022-11-28 15:54, Niklas Schnelle wrote: > On Mon, 2022-11-28 at 09:29 -0400, Jason Gunthorpe wrote: >> On Mon, Nov 28, 2022 at 12:10:39PM +0100, Niklas Schnelle wrote: >>> On Thu, 2022-11-17 at 09:55 +0800, Baolu Lu wrote: >>>> On 2022/11/17 1:16, Niklas Schnelle wrote: >>>>> When iommu.strict=1 is set or iommu_set_dma_strict() was called we >>>>> should use IOMMU_DOMAIN_DMA irrespective of ops->def_domain_type. >>>>> >>>>> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> >>>>> --- >>>>> drivers/iommu/iommu.c | 3 +++ >>>>> 1 file changed, 3 insertions(+) >>>>> >>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>>>> index 65a3b3d886dc..d9bf94d198df 100644 >>>>> --- a/drivers/iommu/iommu.c >>>>> +++ b/drivers/iommu/iommu.c >>>>> @@ -1562,6 +1562,9 @@ static int iommu_get_def_domain_type(struct device *dev) >>>>> { >>>>> const struct iommu_ops *ops = dev_iommu_ops(dev); >>>>> >>>>> + if (iommu_dma_strict) >>>>> + return IOMMU_DOMAIN_DMA; >>>> >>>> If any quirky device must work in IOMMU identity mapping mode, this >>>> might introduce functional regression. At least for VT-d platforms, some >>>> devices do require IOMMU identity mapping mode for functionality. >>> >>> That's a good point. How about instead of unconditionally returning >>> IOMMU_DOMAIN_DMA we just do so if the domain type returned by ops- >>>> def_domain_type uses a flush queue (i.e. the __IOMMU_DOMAIN_DMA_FQ bit >>> is set). That way a device that only supports identity mapping gets to >>> set that but iommu_dma_strict at least always prevents use of an IOVA >>> flush queue. >> >> I would prefer we create some formal caps in iommu_ops to describe >> whatever it is you are trying to do. >> >> Jason > > I agree that there is currently a lack of distinction between what > domain types can be used (capability) and which should be used as > default (iommu.strict=<x>, iommu_set_...(), CONFIG_IOMMU_DEFAULT_DMA, > ops->def_domain_type.). As far as I'm concerned, the purpose of .def_domain_type is really just for quirks where the device needs an identity mapping, based on knowledge that tends to be sufficiently platform-specific that we prefer to delegate it to drivers. What apple-dart is doing is really just a workaround for not being to indicate per-instance domain type support at the point of the .domain_alloc call, and IIRC what mtk_iommu_v1 is doing is a horrible giant hack around the arch/arm DMA ops that don't understand IOMMU groups. Both of those situations are on the cards to be cleaned up, so don't take too much from them. > My case though is about the latter which I think has some undocumented > and surprising precedences built in at the moment. With this series we > can use all of IOMMU_DOMAIN_DMA(_FQ, _SQ) on any PCI device but we want > to default to either IOMMU_DOMAIN_DMA_FQ or IOMMU_DOMAIN_SQ based on > whether we're running in a paging hypervisor (z/VM or KVM) to get the > best performance. From a semantic point of view I felt that this is a > good match for ops->def_domain_type in that we pick a default but it's > still possible to change the domain type e.g. via sysfs. Now this had > the problem that ops->def_domain_type would cause IOMMU_DOMAIN_DMA_FQ > to be used even if iommu_set_dma_strict() was called (via > iommu.strict=1) because there is a undocumented override of ops- >> def_domain_type over iommu_def_domain_type which I believe comes from > the mixing of capability and default you also point at. > > I think ideally we need two separate mechanism to determine which > domain types can be used for a particular device (capability) and for > which one to default to with the latter part having a clear precedence > between the options. Put together I think iommu.strict=1 should > override a device's preference (ops->def_domain_type) and > CONFIG_IOMMU_DEFAULT_DMA to use IOMMU_DOMAIN_DMA but of course only if > the device is capable of that. Does that sound reasonable? That sounds like essentially what we already have, though. The current logic should be thus: 1: If the device is untrusted, it gets strict translation, nothing else. If that won't actually work, tough. 2: If .def_domain_type returns a specific type, it is because any other type will not work correctly at all, so we must use that. 3: Otherwise, we compute the user's preferred default type based on kernel config and command line options. 4: Then we determine whether the IOMMU driver actually supports that type, by trying to allocate it. If allocation fails and the preferred type was more relaxed than IOMMU_DOMAIN_DMA, fall back to the stricter type and try one last time. AFAICS the distinction and priority of those steps is pretty clear: 1: Core requirements 2: Driver-specific requirements 3: Core preference 4: Driver-specific support Now, for step 4 we *could* potentially use static capability flags in place of the "try allocating different things until one succeeds", but that doesn't change anything other than saving the repetitive boilerplate in everyone's .domain_alloc implementations. The real moral of the story here is not to express a soft preference where it will be interpreted as a hard requirement. Thanks, Robin.
On 2022-11-28 16:35, Jason Gunthorpe wrote: > On Mon, Nov 28, 2022 at 04:54:03PM +0100, Niklas Schnelle wrote: > >> I agree that there is currently a lack of distinction between what >> domain types can be used (capability) and which should be used as >> default (iommu.strict=<x>, iommu_set_...(), CONFIG_IOMMU_DEFAULT_DMA, >> ops->def_domain_type.). > > What I would like to get to is the drivers only expose UNMANAGED, > IDENTITY and BLOCKING domains. Everything that the DMA/FQ/etc domains > were doing should be handled as some kind of cap. > > Eg, after Lu's series: > > https://lore.kernel.org/linux-iommu/20221128064648.1934720-1-baolu.lu@linux.intel.com/ > > We should be able to remove IOMMU_DOMAIN_DMA and its related from the > drivers entirely. Instead drivers will provide UNMANAGED domains and > dma-iommu.c will operate the UNMANAGED domain to provide the dma > api. We can detect if the driver supports this by set_platform_dma() > being NULL. > > A statement that a driver performs better using SQ/FQ/none should be > something that is queried from the UNMANAGED domain as a guidance to > dma-iommu.c what configuration to pick if not overriden. Ack, I'm sure it could be cleaner overall if the driver capabilities didn't come in right at the end of the process with the .domain_alloc dance. As I've said before, I would still like to keep the domain types in the core code (since they already work as a set of capability flags), but drivers not having to deal with them directly would be good. Maybe we dedicate .domain_alloc to paging domains, and have separate device ops for .get_{blocking,identity}_domain, given that optimised implementations of those are likely to be static or at least per-instance. > So, I would say what you want is some option flag, perhaps on the > domain ops: 'domain performs better with SQ or FQ' Although for something that's likely to be global based on whether running virtualised or not, I'd be inclined to try pulling that as far as reasonably possible towards core code. >> My case though is about the latter which I think has some undocumented >> and surprising precedences built in at the moment. With this series we >> can use all of IOMMU_DOMAIN_DMA(_FQ, _SQ) on any PCI device but we want >> to default to either IOMMU_DOMAIN_DMA_FQ or IOMMU_DOMAIN_SQ based on >> whether we're running in a paging hypervisor (z/VM or KVM) to get the >> best performance. From a semantic point of view I felt that this is a >> good match for ops->def_domain_type in that we pick a default but it's >> still possible to change the domain type e.g. via sysfs. Now this had >> the problem that ops->def_domain_type would cause IOMMU_DOMAIN_DMA_FQ >> to be used even if iommu_set_dma_strict() was called (via >> iommu.strict=1) because there is a undocumented override of ops- >>> def_domain_type over iommu_def_domain_type which I believe comes from >> the mixing of capability and default you also point at. > > Yeah, this does sounds troubled. The initial assumption about .def_domain_type is incorrect, though. From there it's a straightforward path to the conclusion that introducing inconsistency (by using the wrong mechanism) leads to the presence of inconsistency. >> I think ideally we need two separate mechanism to determine which >> domain types can be used for a particular device (capability) and for >> which one to default to with the latter part having a clear precedence >> between the options. Put together I think iommu.strict=1 should >> override a device's preference (ops->def_domain_type) and >> CONFIG_IOMMU_DEFAULT_DMA to use IOMMU_DOMAIN_DMA but of course only if >> the device is capable of that. Does that sound reasonable? > > Using the language above, if someone asks for strict then the > infrastructure should try to allocate an UNMANAGED domain, not an > identity domain, Careful, "iommu.strict" refers specifically to the invalidation policy for DMA API domains, and I've tried to be careful to document it as such. It has never been defined to have any impact on anything other than DMA API domains, so I don't think any should be assumed. Control of the basic domain type (identity vs. translation) on the command line has always been via separate parameters, which I think have always had higher priority anyway. With sysfs you can ask for anything, but you'll still only get it if it's safe and guaranteed to work. > and not use the lazy flush algorithms in dma-iommu.c > > Similarly if sysfs asks for lazy flush or identity maps then it should > get it, always. I'm hardly an advocate for trying to save users from themselves, but I honestly can't see any justifiable reason for not having sysfs respect iommu_get_def_domain_type(). If a privileged user wants to screw up the system they're hardly short of options already. Far worse, though, is that someone nefarious would only need to popularise a "make external dGPUs and/or other popular accelerators faster on laptops" udev rule that forces identity domains via sysfs, and bye bye Thunderclap mitigations. > The driver should have no say in how dma-iommu.c works beyond if it > provides the required ops functionalities, and hint(s) as to what > gives best performance. That should already be the case today, as outlined in my other mail. It's just somewhat more evolved than designed, so may not be so clear to everyone. Thanks, Robin.
On Mon, Nov 28, 2022 at 09:01:43PM +0000, Robin Murphy wrote: > I'm hardly an advocate for trying to save users from themselves, but I > honestly can't see any justifiable reason for not having sysfs respect > iommu_get_def_domain_type(). We really need to rename this value if it is not actually just an advisory "default" but a functional requirement .. > > The driver should have no say in how dma-iommu.c works beyond if it > > provides the required ops functionalities, and hint(s) as to what > > gives best performance. > > That should already be the case today, as outlined in my other mail. It's > just somewhat more evolved than designed, so may not be so clear to > everyone. It is close to being clear, once we get the last touches of dma-iommu stuff out of the drivers it should be quite clear Thanks, Jason
On 2022-11-29 17:33, Jason Gunthorpe wrote: > On Mon, Nov 28, 2022 at 09:01:43PM +0000, Robin Murphy wrote: > >> I'm hardly an advocate for trying to save users from themselves, but I >> honestly can't see any justifiable reason for not having sysfs respect >> iommu_get_def_domain_type(). > > We really need to rename this value if it is not actually just an > advisory "default" but a functional requirement .. It represents a required default domain type. As in, the type for the device's default domain. Not the default type for a domain. It's the iommu_def_domain_type variable that holds the *default* default domain type ;) Which reminds me I should finish that patch undoing my terrible ops->default_domain_ops idea, not least because they are misleadingly unrelated to default domains... >>> The driver should have no say in how dma-iommu.c works beyond if it >>> provides the required ops functionalities, and hint(s) as to what >>> gives best performance. >> >> That should already be the case today, as outlined in my other mail. It's >> just somewhat more evolved than designed, so may not be so clear to >> everyone. > > It is close to being clear, once we get the last touches of dma-iommu > stuff out of the drivers it should be quite clear Cool, some upheaval of .domain_alloc is next on my hitlist anyway, so that might be a good excuse to upheave it a bit more and streamline the type stuff along the way. Cheers, Robin.
On Tue, Nov 29, 2022 at 06:41:22PM +0000, Robin Murphy wrote: > On 2022-11-29 17:33, Jason Gunthorpe wrote: > > On Mon, Nov 28, 2022 at 09:01:43PM +0000, Robin Murphy wrote: > > > > > I'm hardly an advocate for trying to save users from themselves, but I > > > honestly can't see any justifiable reason for not having sysfs respect > > > iommu_get_def_domain_type(). > > > > We really need to rename this value if it is not actually just an > > advisory "default" but a functional requirement .. > > It represents a required default domain type. As in, the type for the > device's default domain. Not the default type for a domain. It's the > iommu_def_domain_type variable that holds the *default* default domain type > ;) I find the name "default domain" incredibly confusing at this point in time. I would like to call that the "dma-api domain" - its primary purpose is to be the domain that the DMA API uses to operate the IOMMU, there is little "default" about it. This meshes better with our apis talking about ownership and so forth. So, if the op was called get_dma_api_domain_type() It is pretty clear that it is the exact type of domain that should be created to support the DMA API, which is what I think you have been describing it is supposed to do? And with Lu's series we have the set_platform_dma() (Lu perhaps you should call this set_platform_dma_api() to re-enforce it is about the DMA API, not some nebulous DMA thing) Which is basically the other way to configure the DMA API for operation. And encapsulating more of the logic to setup and manage the DMA API's domain into dma-iommu.c would also be helpful to understanding. > Which reminds me I should finish that patch undoing my terrible > ops->default_domain_ops idea, not least because they are misleadingly > unrelated to default domains... :) > > It is close to being clear, once we get the last touches of dma-iommu > > stuff out of the drivers it should be quite clear > > Cool, some upheaval of .domain_alloc is next on my hitlist anyway, so that > might be a good excuse to upheave it a bit more and streamline the type > stuff along the way. Yes, I think so. I want to tidy things a bit so adding this "user space" domain concept is a little nicer Jason
On 2022/11/30 4:09, Jason Gunthorpe wrote: > On Tue, Nov 29, 2022 at 06:41:22PM +0000, Robin Murphy wrote: >> On 2022-11-29 17:33, Jason Gunthorpe wrote: >>> On Mon, Nov 28, 2022 at 09:01:43PM +0000, Robin Murphy wrote: >>> >>>> I'm hardly an advocate for trying to save users from themselves, but I >>>> honestly can't see any justifiable reason for not having sysfs respect >>>> iommu_get_def_domain_type(). >>> >>> We really need to rename this value if it is not actually just an >>> advisory "default" but a functional requirement .. >> >> It represents a required default domain type. As in, the type for the >> device's default domain. Not the default type for a domain. It's the >> iommu_def_domain_type variable that holds the *default* default domain type >> ;) > > I find the name "default domain" incredibly confusing at this point in > time. > > I would like to call that the "dma-api domain" - its primary purpose > is to be the domain that the DMA API uses to operate the IOMMU, there > is little "default" about it. This meshes better with our apis talking > about ownership and so forth. > > So, if the op was called > get_dma_api_domain_type() > > It is pretty clear that it is the exact type of domain that should be > created to support the DMA API, which is what I think you have been > describing it is supposed to do? > > And with Lu's series we have the set_platform_dma() (Lu perhaps you > should call this set_platform_dma_api() to re-enforce it is about the > DMA API, not some nebulous DMA thing) Sure thing. It's more specific. > > Which is basically the other way to configure the DMA API for > operation. > > And encapsulating more of the logic to setup and manage the DMA API's > domain into dma-iommu.c would also be helpful to understanding. > >> Which reminds me I should finish that patch undoing my terrible >> ops->default_domain_ops idea, not least because they are misleadingly >> unrelated to default domains... > > :) > >>> It is close to being clear, once we get the last touches of dma-iommu >>> stuff out of the drivers it should be quite clear >> >> Cool, some upheaval of .domain_alloc is next on my hitlist anyway, so that >> might be a good excuse to upheave it a bit more and streamline the type >> stuff along the way. > > Yes, I think so. I want to tidy things a bit so adding this "user > space" domain concept is a little nicer > > Jason > -- Best regards, baolu
On Tue, 2022-11-29 at 16:09 -0400, Jason Gunthorpe wrote: > On Tue, Nov 29, 2022 at 06:41:22PM +0000, Robin Murphy wrote: > > On 2022-11-29 17:33, Jason Gunthorpe wrote: > > > On Mon, Nov 28, 2022 at 09:01:43PM +0000, Robin Murphy wrote: > > > > > > > I'm hardly an advocate for trying to save users from themselves, but I > > > > honestly can't see any justifiable reason for not having sysfs respect > > > > iommu_get_def_domain_type(). > > > > > > We really need to rename this value if it is not actually just an > > > advisory "default" but a functional requirement .. > > > > It represents a required default domain type. As in, the type for the > > device's default domain. Not the default type for a domain. It's the > > iommu_def_domain_type variable that holds the *default* default domain type > > ;) > > I find the name "default domain" incredibly confusing at this point in > time. Yes it definitely confused me as evidenced by this patch. > > I would like to call that the "dma-api domain" - its primary purpose > is to be the domain that the DMA API uses to operate the IOMMU, there > is little "default" about it. This meshes better with our apis talking > about ownership and so forth. > > So, if the op was called > get_dma_api_domain_type() > > It is pretty clear that it is the exact type of domain that should be > created to support the DMA API, which is what I think you have been > describing it is supposed to do? > > And with Lu's series we have the set_platform_dma() (Lu perhaps you > should call this set_platform_dma_api() to re-enforce it is about the > DMA API, not some nebulous DMA thing) > > Which is basically the other way to configure the DMA API for > operation. > > And encapsulating more of the logic to setup and manage the DMA API's > domain into dma-iommu.c would also be helpful to understanding. > > > Which reminds me I should finish that patch undoing my terrible > > ops->default_domain_ops idea, not least because they are misleadingly > > unrelated to default domains... > > :) > > > > It is close to being clear, once we get the last touches of dma-iommu > > > stuff out of the drivers it should be quite clear > > > > Cool, some upheaval of .domain_alloc is next on my hitlist anyway, so that > > might be a good excuse to upheave it a bit more and streamline the type > > stuff along the way. > > Yes, I think so. I want to tidy things a bit so adding this "user > space" domain concept is a little nicer > > Jason Ok, so it sounds to me like there is going to be quite a bit of change in this area. Thus I'm a little unsure however how to proceed here. I think especially with the proposed multiple queue types in this series it makes sense for an IOMMU driver to express its preference of a particular domain type if it supports multiple which clearly isn't what iommu_get_def_domain_type() is currently supposed to do. Looking at the current code I don't see a trivial way to integrate this especially with a lot of reworks going on. At the moment the preference for IOMMU_DOMAIN_DMA_FQ over IOMMU_DOMAIN_DMA during initial boot is hard coded in iommu_subsys_init() before we even know what IOMMU driver will be used. so can't really access its ops there. On the other hand this decision may still be rolled back if iommu_dma_init_fq() fails in iommu_dma_init_domain() so maybe it would make sense to assume a DMA domain type of IOMMU_DOMAIN_DMA until that point and only then prefer IOMMU_DOMAIN_DMA_FQ or something else if the driver has a preference. Maybe it would also make sense not to mix the type of flush queue used with the domain and maybe the queue type could be passed in via iommu_setup_dma_ops() -> iommu_dma_init_domain() though I do remember that there is also a planned rework for that. Any suggestions? Thanks, Niklas
On Mon, Dec 05, 2022 at 04:34:08PM +0100, Niklas Schnelle wrote: > On Tue, 2022-11-29 at 16:09 -0400, Jason Gunthorpe wrote: > > On Tue, Nov 29, 2022 at 06:41:22PM +0000, Robin Murphy wrote: > > > On 2022-11-29 17:33, Jason Gunthorpe wrote: > > > > On Mon, Nov 28, 2022 at 09:01:43PM +0000, Robin Murphy wrote: > > > > > > > > > I'm hardly an advocate for trying to save users from themselves, but I > > > > > honestly can't see any justifiable reason for not having sysfs respect > > > > > iommu_get_def_domain_type(). > > > > > > > > We really need to rename this value if it is not actually just an > > > > advisory "default" but a functional requirement .. > > > > > > It represents a required default domain type. As in, the type for the > > > device's default domain. Not the default type for a domain. It's the > > > iommu_def_domain_type variable that holds the *default* default domain type > > > ;) > > > > I find the name "default domain" incredibly confusing at this point in > > time. > > Yes it definitely confused me as evidenced by this patch. I did some fiddling what this could rename could look like and came up with this very sketchy sketch. I think it looks a lot clearer in the end but it seems a bit tricky to break things up into nice patches. Jason diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index beb9f535d3d815..052caf21fee3dd 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -34,7 +34,12 @@ static struct kset *iommu_group_kset; static DEFINE_IDA(iommu_group_ida); -static unsigned int iommu_def_domain_type __read_mostly; +static enum dma_api_policy { + DMA_API_POLICY_IDENTITY, + DMA_API_POLICY_STRICT, + DMA_API_POLICY_LAZY, + DMA_API_POLICY_ANY, +} dma_api_default_policy __read_mostly; static bool iommu_dma_strict __read_mostly = IS_ENABLED(CONFIG_IOMMU_DEFAULT_DMA_STRICT); static u32 iommu_cmd_line __read_mostly; @@ -82,7 +87,7 @@ static const char * const iommu_group_resv_type_string[] = { static int iommu_bus_notifier(struct notifier_block *nb, unsigned long action, void *data); static int iommu_alloc_default_domain(struct iommu_group *group, - struct device *dev); + enum dma_api_policy policy); static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus, unsigned type); static int __iommu_attach_device(struct iommu_domain *domain, @@ -97,6 +102,9 @@ static struct iommu_group *iommu_group_get_for_dev(struct device *dev); static ssize_t iommu_group_store_type(struct iommu_group *group, const char *buf, size_t count); +static enum dma_api_policy +iommu_get_dma_api_mandatory_policy(struct device *dev); + #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ struct iommu_group_attribute iommu_group_attr_##_name = \ __ATTR(_name, _mode, _show, _store) @@ -125,26 +133,6 @@ static struct bus_type * const iommu_buses[] = { #endif }; -/* - * Use a function instead of an array here because the domain-type is a - * bit-field, so an array would waste memory. - */ -static const char *iommu_domain_type_str(unsigned int t) -{ - switch (t) { - case IOMMU_DOMAIN_BLOCKED: - return "Blocked"; - case IOMMU_DOMAIN_IDENTITY: - return "Passthrough"; - case IOMMU_DOMAIN_UNMANAGED: - return "Unmanaged"; - case IOMMU_DOMAIN_DMA: - return "Translated"; - default: - return "Unknown"; - } -} - static int __init iommu_subsys_init(void) { struct notifier_block *nb; @@ -161,19 +149,27 @@ static int __init iommu_subsys_init(void) } } - if (!iommu_default_passthrough() && !iommu_dma_strict) - iommu_def_domain_type = IOMMU_DOMAIN_DMA_FQ; + if (dma_api_default_policy == DMA_API_POLICY_LAZY && iommu_dma_strict) + dma_api_default_policy = DMA_API_POLICY_STRICT; - pr_info("Default domain type: %s %s\n", - iommu_domain_type_str(iommu_def_domain_type), - (iommu_cmd_line & IOMMU_CMD_LINE_DMA_API) ? - "(set via kernel command line)" : ""); - - if (!iommu_default_passthrough()) - pr_info("DMA domain TLB invalidation policy: %s mode %s\n", + switch (dma_api_default_policy) { + case DMA_API_POLICY_LAZY: + case DMA_API_POLICY_STRICT: + pr_info("DMA translated domain TLB invalidation policy: %s mode %s\n", iommu_dma_strict ? "strict" : "lazy", (iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ? - "(set via kernel command line)" : ""); + "(set via kernel command line)" : + ""); + break; + case DMA_API_POLICY_IDENTITY: + pr_info("Default domain type: Passthrough %s\n", + (iommu_cmd_line & IOMMU_CMD_LINE_DMA_API) ? + "(set via kernel command line)" : + ""); + break; + default: + break; + } nb = kcalloc(ARRAY_SIZE(iommu_buses), sizeof(*nb), GFP_KERNEL); if (!nb) @@ -353,7 +349,8 @@ int iommu_probe_device(struct device *dev) * checked. */ mutex_lock(&group->mutex); - iommu_alloc_default_domain(group, dev); + iommu_alloc_default_domain(group, + iommu_get_dma_api_mandatory_policy(dev)); /* * If device joined an existing group which has been claimed, don't @@ -436,8 +433,8 @@ early_param("iommu.strict", iommu_dma_setup); void iommu_set_dma_strict(void) { iommu_dma_strict = true; - if (iommu_def_domain_type == IOMMU_DOMAIN_DMA_FQ) - iommu_def_domain_type = IOMMU_DOMAIN_DMA; + if (dma_api_default_policy == DMA_API_POLICY_LAZY) + dma_api_default_policy = DMA_API_POLICY_STRICT; } static ssize_t iommu_group_attr_show(struct kobject *kobj, @@ -1557,53 +1554,100 @@ struct iommu_group *fsl_mc_device_group(struct device *dev) } EXPORT_SYMBOL_GPL(fsl_mc_device_group); -static int iommu_get_def_domain_type(struct device *dev) +static enum dma_api_policy +iommu_get_dma_api_mandatory_policy(struct device *dev) { const struct iommu_ops *ops = dev_iommu_ops(dev); if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted) - return IOMMU_DOMAIN_DMA; + return DMA_API_POLICY_STRICT; if (ops->def_domain_type) return ops->def_domain_type(dev); - - return 0; + return DMA_API_POLICY_ANY; } -static int iommu_group_alloc_default_domain(struct bus_type *bus, - struct iommu_group *group, - unsigned int type) +static int __iommu_get_dma_api_group_mandatory_policy(struct device *dev, + void *data) { - struct iommu_domain *dom; + enum dma_api_policy *policy = data; + enum dma_api_policy dev_policy = + iommu_get_dma_api_mandatory_policy(dev); - dom = __iommu_domain_alloc(bus, type); - if (!dom && type != IOMMU_DOMAIN_DMA) { - dom = __iommu_domain_alloc(bus, IOMMU_DOMAIN_DMA); - if (dom) - pr_warn("Failed to allocate default IOMMU domain of type %u for group %s - Falling back to IOMMU_DOMAIN_DMA", - type, group->name); + if (dev_policy == DMA_API_POLICY_ANY || *policy == dev_policy) + return 0; + if (*policy == DMA_API_POLICY_ANY) { + *policy = dev_policy; + return 0; } - if (!dom) - return -ENOMEM; + dev_warn( + dev, + "IOMMU driver is requesting different DMA API policies for devices in the same group %s.", + dev->iommu_group->name); - group->default_domain = dom; - if (!group->domain) - group->domain = dom; + /* + * Resolve the conflict by priority, the lower numbers in the enum are + * higher preference in case of driver problems. + */ + if (dev_policy < *policy) + *policy = dev_policy; return 0; } +static enum dma_api_policy +iommu_get_dma_api_group_mandatory_policy(struct iommu_group *group) +{ + enum dma_api_policy policy = DMA_API_POLICY_ANY; + + __iommu_group_for_each_dev(group, &policy, + __iommu_get_dma_api_group_mandatory_policy); + return policy; +} + static int iommu_alloc_default_domain(struct iommu_group *group, - struct device *dev) + enum dma_api_policy policy) { - unsigned int type; + struct iommu_domain *domain; + struct group_device *grp_dev = + list_first_entry(&group->devices, struct group_device, list); + struct bus_type *bus = grp_dev->dev->bus; + + lockdep_assert_held(group->mutex); if (group->default_domain) return 0; + if (policy == DMA_API_POLICY_ANY) + policy = dma_api_default_policy; + switch (policy) { + case DMA_API_POLICY_IDENTITY: + domain = __iommu_domain_alloc(bus, IOMMU_DOMAIN_IDENTITY); + break; + + case DMA_API_POLICY_LAZY: + domain = __iommu_domain_alloc(bus, IOMMU_DOMAIN_DMA); + if (domain && !domain->ops->prefer_dma_api_fq){ + pr_warn("Failed to allocate default lazy IOMMU domain for group %s - Falling back to strict", + group->name); + } + break; - type = iommu_get_def_domain_type(dev) ? : iommu_def_domain_type; + case DMA_API_POLICY_STRICT: + domain = __iommu_domain_alloc(bus, IOMMU_DOMAIN_DMA); + break; - return iommu_group_alloc_default_domain(dev->bus, group, type); + default: + WARN_ON(true); + domain = NULL; + } + + if (!domain) + return -ENOMEM; + + group->default_domain = domain; + if (!group->domain) + group->domain = domain; + return 0; } /** @@ -1688,52 +1732,6 @@ static int iommu_bus_notifier(struct notifier_block *nb, return 0; } -struct __group_domain_type { - struct device *dev; - unsigned int type; -}; - -static int probe_get_default_domain_type(struct device *dev, void *data) -{ - struct __group_domain_type *gtype = data; - unsigned int type = iommu_get_def_domain_type(dev); - - if (type) { - if (gtype->type && gtype->type != type) { - dev_warn(dev, "Device needs domain type %s, but device %s in the same iommu group requires type %s - using default\n", - iommu_domain_type_str(type), - dev_name(gtype->dev), - iommu_domain_type_str(gtype->type)); - gtype->type = 0; - } - - if (!gtype->dev) { - gtype->dev = dev; - gtype->type = type; - } - } - - return 0; -} - -static void probe_alloc_default_domain(struct bus_type *bus, - struct iommu_group *group) -{ - struct __group_domain_type gtype; - - memset(>ype, 0, sizeof(gtype)); - - /* Ask for default domain requirements of all devices in the group */ - __iommu_group_for_each_dev(group, >ype, - probe_get_default_domain_type); - - if (!gtype.type) - gtype.type = iommu_def_domain_type; - - iommu_group_alloc_default_domain(bus, group, gtype.type); - -} - static int iommu_group_do_dma_attach(struct device *dev, void *data) { struct iommu_domain *domain = data; @@ -1804,7 +1802,8 @@ int bus_iommu_probe(struct bus_type *bus) mutex_lock(&group->mutex); /* Try to allocate default domain */ - probe_alloc_default_domain(bus, group); + iommu_alloc_default_domain( + group, iommu_get_dma_api_group_mandatory_policy(group)); if (!group->default_domain) { mutex_unlock(&group->mutex); @@ -2600,19 +2599,19 @@ void iommu_set_default_passthrough(bool cmd_line) { if (cmd_line) iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; + dma_api_default_policy = DMA_API_POLICY_IDENTITY; } void iommu_set_default_translated(bool cmd_line) { if (cmd_line) iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; - iommu_def_domain_type = IOMMU_DOMAIN_DMA; + dma_api_default_policy = DMA_API_POLICY_STRICT; } bool iommu_default_passthrough(void) { - return iommu_def_domain_type == IOMMU_DOMAIN_IDENTITY; + return dma_api_default_policy == DMA_API_POLICY_IDENTITY; } EXPORT_SYMBOL_GPL(iommu_default_passthrough); @@ -2832,103 +2831,40 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid); * group's default domain type through /sys/kernel/iommu_groups/<grp_id>/type * Please take a closer look if intended to use for other purposes. */ -static int iommu_change_dev_def_domain(struct iommu_group *group, - struct device *prev_dev, int type) +static int iommu_change_group_dma_api_policy(struct iommu_group *group, + enum dma_api_policy policy) { - struct iommu_domain *prev_dom; - struct group_device *grp_dev; - int ret, dev_def_dom; - struct device *dev; - - mutex_lock(&group->mutex); - - if (group->default_domain != group->domain) { - dev_err_ratelimited(prev_dev, "Group not assigned to default domain\n"); - ret = -EBUSY; - goto out; - } - - /* - * iommu group wasn't locked while acquiring device lock in - * iommu_group_store_type(). So, make sure that the device count hasn't - * changed while acquiring device lock. - * - * Changing default domain of an iommu group with two or more devices - * isn't supported because there could be a potential deadlock. Consider - * the following scenario. T1 is trying to acquire device locks of all - * the devices in the group and before it could acquire all of them, - * there could be another thread T2 (from different sub-system and use - * case) that has already acquired some of the device locks and might be - * waiting for T1 to release other device locks. - */ - if (iommu_group_device_count(group) != 1) { - dev_err_ratelimited(prev_dev, "Cannot change default domain: Group has more than one device\n"); - ret = -EINVAL; - goto out; - } - - /* Since group has only one device */ - grp_dev = list_first_entry(&group->devices, struct group_device, list); - dev = grp_dev->dev; - - if (prev_dev != dev) { - dev_err_ratelimited(prev_dev, "Cannot change default domain: Device has been changed\n"); - ret = -EBUSY; - goto out; - } + enum dma_api_policy mandatory = + iommu_get_dma_api_group_mandatory_policy(group); + struct iommu_domain *old_domain; + int ret; - prev_dom = group->default_domain; - if (!prev_dom) { - ret = -EINVAL; - goto out; - } + if (mandatory != DMA_API_POLICY_ANY && policy != mandatory) + return -EINVAL; - dev_def_dom = iommu_get_def_domain_type(dev); - if (!type) { - /* - * If the user hasn't requested any specific type of domain and - * if the device supports both the domains, then default to the - * domain the device was booted with - */ - type = dev_def_dom ? : iommu_def_domain_type; - } else if (dev_def_dom && type != dev_def_dom) { - dev_err_ratelimited(prev_dev, "Device cannot be in %s domain\n", - iommu_domain_type_str(type)); - ret = -EINVAL; - goto out; - } + mutex_lock(&group->mutex); - /* - * Switch to a new domain only if the requested domain type is different - * from the existing default domain type - */ - if (prev_dom->type == type) { - ret = 0; - goto out; + /* Shortcut if FQ is being enabled */ + if (group->default_domain->type == IOMMU_DOMAIN_DMA && + policy == DMA_API_POLICY_LAZY) { + ret = iommu_dma_init_fq(group->default_domain); + if (!ret) { + group->default_domain->type = IOMMU_DOMAIN_DMA_FQ; + mutex_unlock(&group->mutex); + return 0; + } } - /* We can bring up a flush queue without tearing down the domain */ - if (type == IOMMU_DOMAIN_DMA_FQ && prev_dom->type == IOMMU_DOMAIN_DMA) { - ret = iommu_dma_init_fq(prev_dom); - if (!ret) - prev_dom->type = IOMMU_DOMAIN_DMA_FQ; - goto out; + /* Otherwise just replace the whole domain */ + old_domain = group->default_domain; + group->default_domain = NULL; + ret = iommu_alloc_default_domain(group, policy); + if (ret) { + group->default_domain = old_domain; + mutex_unlock(&group->mutex); + return ret; } - - /* Sets group->default_domain to the newly allocated domain */ - ret = iommu_group_alloc_default_domain(dev->bus, group, type); - if (ret) - goto out; - - ret = iommu_create_device_direct_mappings(group, dev); - if (ret) - goto free_new_domain; - - ret = __iommu_attach_device(group->default_domain, dev); - if (ret) - goto free_new_domain; - - group->domain = group->default_domain; + iommu_group_create_direct_mappings(group); /* * Release the mutex here because ops->probe_finalize() call-back of @@ -2938,20 +2874,15 @@ static int iommu_change_dev_def_domain(struct iommu_group *group, */ mutex_unlock(&group->mutex); + /* + * FIXME: How does iommu_setup_dma_ops() get called with the new domain + * on ARM? + */ + /* Make sure dma_ops is appropriatley set */ - iommu_group_do_probe_finalize(dev, group->default_domain); - iommu_domain_free(prev_dom); + __iommu_group_dma_finalize(group); + iommu_domain_free(old_domain); return 0; - -free_new_domain: - iommu_domain_free(group->default_domain); - group->default_domain = prev_dom; - group->domain = prev_dom; - -out: - mutex_unlock(&group->mutex); - - return ret; } /* @@ -2966,9 +2897,8 @@ static int iommu_change_dev_def_domain(struct iommu_group *group, static ssize_t iommu_group_store_type(struct iommu_group *group, const char *buf, size_t count) { - struct group_device *grp_dev; - struct device *dev; - int ret, req_type; + enum dma_api_policy policy; + int ret; if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO)) return -EACCES; @@ -2977,77 +2907,30 @@ static ssize_t iommu_group_store_type(struct iommu_group *group, return -EINVAL; if (sysfs_streq(buf, "identity")) - req_type = IOMMU_DOMAIN_IDENTITY; + policy = DMA_API_POLICY_IDENTITY; else if (sysfs_streq(buf, "DMA")) - req_type = IOMMU_DOMAIN_DMA; + policy = DMA_API_POLICY_STRICT; else if (sysfs_streq(buf, "DMA-FQ")) - req_type = IOMMU_DOMAIN_DMA_FQ; + policy = DMA_API_POLICY_LAZY; else if (sysfs_streq(buf, "auto")) - req_type = 0; + policy = DMA_API_POLICY_ANY; else return -EINVAL; /* - * Lock/Unlock the group mutex here before device lock to - * 1. Make sure that the iommu group has only one device (this is a - * prerequisite for step 2) - * 2. Get struct *dev which is needed to lock device - */ - mutex_lock(&group->mutex); - if (iommu_group_device_count(group) != 1) { - mutex_unlock(&group->mutex); - pr_err_ratelimited("Cannot change default domain: Group has more than one device\n"); - return -EINVAL; - } - - /* Since group has only one device */ - grp_dev = list_first_entry(&group->devices, struct group_device, list); - dev = grp_dev->dev; - get_device(dev); - - /* - * Don't hold the group mutex because taking group mutex first and then - * the device lock could potentially cause a deadlock as below. Assume - * two threads T1 and T2. T1 is trying to change default domain of an - * iommu group and T2 is trying to hot unplug a device or release [1] VF - * of a PCIe device which is in the same iommu group. T1 takes group - * mutex and before it could take device lock assume T2 has taken device - * lock and is yet to take group mutex. Now, both the threads will be - * waiting for the other thread to release lock. Below, lock order was - * suggested. - * device_lock(dev); - * mutex_lock(&group->mutex); - * iommu_change_dev_def_domain(); - * mutex_unlock(&group->mutex); - * device_unlock(dev); - * - * [1] Typical device release path - * device_lock() from device/driver core code - * -> bus_notifier() - * -> iommu_bus_notifier() - * -> iommu_release_device() - * -> ops->release_device() vendor driver calls back iommu core code - * -> mutex_lock() from iommu core code + * Taking ownership disables the DMA API domain, prevents drivers from + * being attached, and switches to a blocking domain. Releasing + * ownership will put back the new or original DMA API domain. */ - mutex_unlock(&group->mutex); - - /* Check if the device in the group still has a driver bound to it */ - device_lock(dev); - if (device_is_bound(dev) && !(req_type == IOMMU_DOMAIN_DMA_FQ && - group->default_domain->type == IOMMU_DOMAIN_DMA)) { - pr_err_ratelimited("Device is still bound to driver\n"); - ret = -EBUSY; - goto out; - } - - ret = iommu_change_dev_def_domain(group, dev, req_type); - ret = ret ?: count; - -out: - device_unlock(dev); - put_device(dev); + ret = iommu_group_claim_dma_owner(group, &ret); + if (ret) + return ret; - return ret; + ret = iommu_change_group_dma_api_policy(group, policy); + iommu_group_release_dma_owner(group); + if (ret) + return ret; + return count; } static bool iommu_is_default_domain(struct iommu_group *group)
On 2022/12/7 7:09, Jason Gunthorpe wrote: > static ssize_t iommu_group_store_type(struct iommu_group *group, > const char *buf, size_t count) > { > - struct group_device *grp_dev; > - struct device *dev; > - int ret, req_type; > + enum dma_api_policy policy; > + int ret; > > if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO)) > return -EACCES; > @@ -2977,77 +2907,30 @@ static ssize_t iommu_group_store_type(struct iommu_group *group, > return -EINVAL; > > if (sysfs_streq(buf, "identity")) > - req_type = IOMMU_DOMAIN_IDENTITY; > + policy = DMA_API_POLICY_IDENTITY; > else if (sysfs_streq(buf, "DMA")) > - req_type = IOMMU_DOMAIN_DMA; > + policy = DMA_API_POLICY_STRICT; > else if (sysfs_streq(buf, "DMA-FQ")) > - req_type = IOMMU_DOMAIN_DMA_FQ; > + policy = DMA_API_POLICY_LAZY; > else if (sysfs_streq(buf, "auto")) > - req_type = 0; > + policy = DMA_API_POLICY_ANY; > else > return -EINVAL; > > /* > - * Lock/Unlock the group mutex here before device lock to > - * 1. Make sure that the iommu group has only one device (this is a > - * prerequisite for step 2) > - * 2. Get struct *dev which is needed to lock device > - */ > - mutex_lock(&group->mutex); > - if (iommu_group_device_count(group) != 1) { > - mutex_unlock(&group->mutex); > - pr_err_ratelimited("Cannot change default domain: Group has more than one device\n"); > - return -EINVAL; > - } > - > - /* Since group has only one device */ > - grp_dev = list_first_entry(&group->devices, struct group_device, list); > - dev = grp_dev->dev; > - get_device(dev); > - > - /* > - * Don't hold the group mutex because taking group mutex first and then > - * the device lock could potentially cause a deadlock as below. Assume > - * two threads T1 and T2. T1 is trying to change default domain of an > - * iommu group and T2 is trying to hot unplug a device or release [1] VF > - * of a PCIe device which is in the same iommu group. T1 takes group > - * mutex and before it could take device lock assume T2 has taken device > - * lock and is yet to take group mutex. Now, both the threads will be > - * waiting for the other thread to release lock. Below, lock order was > - * suggested. > - * device_lock(dev); > - * mutex_lock(&group->mutex); > - * iommu_change_dev_def_domain(); > - * mutex_unlock(&group->mutex); > - * device_unlock(dev); > - * > - * [1] Typical device release path > - * device_lock() from device/driver core code > - * -> bus_notifier() > - * -> iommu_bus_notifier() > - * -> iommu_release_device() > - * -> ops->release_device() vendor driver calls back iommu core code > - * -> mutex_lock() from iommu core code > + * Taking ownership disables the DMA API domain, prevents drivers from > + * being attached, and switches to a blocking domain. Releasing > + * ownership will put back the new or original DMA API domain. > */ > - mutex_unlock(&group->mutex); > - > - /* Check if the device in the group still has a driver bound to it */ > - device_lock(dev); With device_lock() removed, this probably races with the iommu_release_device() path? group->mutex seems insufficient to avoid the race. Perhaps I missed anything. > - if (device_is_bound(dev) && !(req_type == IOMMU_DOMAIN_DMA_FQ && > - group->default_domain->type == IOMMU_DOMAIN_DMA)) { > - pr_err_ratelimited("Device is still bound to driver\n"); > - ret = -EBUSY; > - goto out; > - } > - > - ret = iommu_change_dev_def_domain(group, dev, req_type); > - ret = ret ?: count; > - > -out: > - device_unlock(dev); > - put_device(dev); > + ret = iommu_group_claim_dma_owner(group, &ret); > + if (ret) > + return ret; > > - return ret; > + ret = iommu_change_group_dma_api_policy(group, policy); > + iommu_group_release_dma_owner(group); > + if (ret) > + return ret; > + return count; > } > > static bool iommu_is_default_domain(struct iommu_group *group) Best regards, baolu
On Wed, Dec 07, 2022 at 09:18:19PM +0800, Baolu Lu wrote: > > - /* Check if the device in the group still has a driver bound to it */ > > - device_lock(dev); > > With device_lock() removed, this probably races with the > iommu_release_device() path? group->mutex seems insufficient to avoid > the race. Perhaps I missed anything. This path only deals with group, so there is no 'dev' and no race with removal. Later on we obtain the group mutex and then extract the first device from the group list as a representative device of the group - eg to perform iommu_domain allocation. Under the group mutex devices on the device list cannot become invalid. It is the same reasoning we use in other places that iterate over the group device list under lock. Jason
On 2022-12-07 13:23, Jason Gunthorpe wrote: > On Wed, Dec 07, 2022 at 09:18:19PM +0800, Baolu Lu wrote: > >>> - /* Check if the device in the group still has a driver bound to it */ >>> - device_lock(dev); >> >> With device_lock() removed, this probably races with the >> iommu_release_device() path? group->mutex seems insufficient to avoid >> the race. Perhaps I missed anything. > > This path only deals with group, so there is no 'dev' and no race with > removal. If we can now use the ownership mechanism to enforce the required constraints for change_dev_def_domain, that would be worthwhile (and a lot clearer) as a separate patch in its own right. Thanks, Robin. > Later on we obtain the group mutex and then extract the first device > from the group list as a representative device of the group - eg to > perform iommu_domain allocation. > > Under the group mutex devices on the device list cannot become > invalid. > > It is the same reasoning we use in other places that iterate over the > group device list under lock. > > Jason
On Wed, Dec 07, 2022 at 02:18:36PM +0000, Robin Murphy wrote: > On 2022-12-07 13:23, Jason Gunthorpe wrote: > > On Wed, Dec 07, 2022 at 09:18:19PM +0800, Baolu Lu wrote: > > > > > > - /* Check if the device in the group still has a driver bound to it */ > > > > - device_lock(dev); > > > > > > With device_lock() removed, this probably races with the > > > iommu_release_device() path? group->mutex seems insufficient to avoid > > > the race. Perhaps I missed anything. > > > > This path only deals with group, so there is no 'dev' and no race with > > removal. > > If we can now use the ownership mechanism to enforce the required > constraints for change_dev_def_domain, that would be worthwhile (and a lot > clearer) as a separate patch in its own right. Oh for sure, this is just a brain dump to share I have a few other patches streamlining things in this file, just need time... Jason
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 65a3b3d886dc..d9bf94d198df 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1562,6 +1562,9 @@ static int iommu_get_def_domain_type(struct device *dev) { const struct iommu_ops *ops = dev_iommu_ops(dev); + if (iommu_dma_strict) + return IOMMU_DOMAIN_DMA; + if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted) return IOMMU_DOMAIN_DMA;