Message ID | 0875479d24a53670e17db8a11945664a6bb4a25b.1674849118.git.nicolinc@nvidia.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1032906wrn; Fri, 27 Jan 2023 12:38:10 -0800 (PST) X-Google-Smtp-Source: AMrXdXulsb0jMOjmHBDb8sPCo63SZp7iUcHNM4u6Ry9MopYQa1j4Ug24DChA/3Y6h2wcjmr3q3Tu X-Received: by 2002:a05:6402:449a:b0:499:376e:6b2b with SMTP id er26-20020a056402449a00b00499376e6b2bmr43519849edb.0.1674851890283; Fri, 27 Jan 2023 12:38:10 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1674851890; cv=pass; d=google.com; s=arc-20160816; b=bGgG4DoRwydTNPX50LK1o9DoetE/CTiCz/rSfMIIeKCNMEcCXpmztTOnXFXmi2qSEl f0LN14y191jNUrJc5KwWHwJUHCtTIg2CtE+HbCf3TRISQPwOvCYJyRUXunoS53dgUEOe /WdYrD24ovcadSA7HmWxiCA0u8TaNiwp0KRVBgYY1SOnCfVCP68xCRCbOou+0EWqeRHs dRmFjMCNn02UlvYPWhNH56NGYYqKwXFtnteslWBIfoZ1w9uRB/PjJLe4ym4BQE4E9yVY +l/9B1z4+a6wwCRiZXumQCWDiF3fMXlCgJFI70hrc3FRMsCmxwRlf34eAXV9kB0+5eJC CKLQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=mXVjs3EYyLasqPSgAoXLLOKEm+zYFSEF3Hi6bBiavhQ=; b=VAwOf18cCsu+LDYhxfSdzQ9uwQKs/tz3JJf3EQwe8jFuJHqdNM5caSlQ4U7SQAo2ud PKbvcgxX8LK0klNSuU482mG8nlf3R78BL2KgBsirgvATfy0pQ/6F5ekQNUdPtpkeCB+q Ys3nCJdQcUIiCK8cYjrQ29vupILSnh0mplv6mE4E3X4Nd3sNBzzM2YmE3mFFtNAajejK sBwTTQQix5iuaNKyJ6kixGv1EzKc3jpxrkdIDPupHcapfsDOgZ8iAGJMbTLHld2Kveeo jUvGkEvK52Pap5j6z311SIbPhyRO3enzEVqbfZbp8b/URseS8xgoYyrwhDAuJXph7dKG vgMA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@Nvidia.com header.s=selector2 header.b=nEpT16Vz; arc=pass (i=1 spf=pass spfdomain=nvidia.com dmarc=pass fromdomain=nvidia.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=nvidia.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v22-20020aa7cd56000000b004a2160515a4si1538300edw.199.2023.01.27.12.37.46; Fri, 27 Jan 2023 12:38:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@Nvidia.com header.s=selector2 header.b=nEpT16Vz; arc=pass (i=1 spf=pass spfdomain=nvidia.com dmarc=pass fromdomain=nvidia.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232869AbjA0UGj (ORCPT <rfc822;lekhanya01809@gmail.com> + 99 others); Fri, 27 Jan 2023 15:06:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232421AbjA0UGJ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 27 Jan 2023 15:06:09 -0500 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2060d.outbound.protection.outlook.com [IPv6:2a01:111:f400:7e8a::60d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A47824CBE; Fri, 27 Jan 2023 12:05:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IwCRObDLUR4lDzILfX+uSkRt5RdkJCh4hrL0ilj1Q12QDCdzYdKPT67gVI51yQnNt14fur0NeCe11a2GSJXIN2pnyv5hp/gt9TenpKTXsJVww+QnF2YaiGUUkkaQUdlJYahJ7Qaxv2BR2ORAFVoylDwUpVYZqKjSKUQjSn5d+ZcOZshEaiDs8jXZiX9HNyaTa4myGUuBS+dBO1EhIRQsdA0uXcU4YIcJ9ul2ghFRaA/x2MZERVaYXV99x5pbpU+gbACHFtyJEGj8TGExixZjmFPSK78r1T1HhTdd9VYOiT+eAlV/DA7MruuGY+kiScBSxqo/BbxkWQkSLlCUmvHLlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mXVjs3EYyLasqPSgAoXLLOKEm+zYFSEF3Hi6bBiavhQ=; b=GZtf9zXZej+pCJO2a9lu6kULzUCst+Abcliy7IVgNm4a2QPtT8vw4j7IxGDFmQ1krNARk4XTHGKilkBQzTa5hXxnogUCM4EDuLb7URJE2QTqCnJ7qV+wAv1SULJk9gGTSv6JadLeQdqfNqq41xz12W8EOC2edTlBSmK4MjurkQbnex9gUqLPjQ2+G6CtCfG2xye8HUMDto7s/2ntraXA+XbbZnoLpPXCShXkrBEvkXZpa/IAjwuuHl/Eef9sNH9G5PUkkX6P90Hklj/m7JNI/uDpl58c06jJVVEghmvy6Bf8xGMZ8UG94kHWTKzKQk87PZlVkX3ACir+cY1IyDT+0g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=lists.infradead.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mXVjs3EYyLasqPSgAoXLLOKEm+zYFSEF3Hi6bBiavhQ=; b=nEpT16Vzgl0PA8phSsHIfJ//yz8Gdqco1iN4FkyeenvexT5YRhex9QVxCGdZyYF8HZ9YkBx4A4xntzEUCXsbSJRSRzrMV2jATq6KvTyLX7megpykrJiD6zaLpr9qxIaSIGT8F0IfTLR+9ihuvnQZ56WTT73BB2w01/NiBQGo0CivWAC4zsSW5wmz6WhqFxbBipzcpTtXhZB8S0x4g+k4SdaJ7WlY9uiP6FEqtDb1NAWIk+W30oKQsh2N9Wit8wom+zj/iCKyUxWc90QLfGGD9J5MOpfR6DlLS/q5xseR1gxQ9CnY50H2M1QFEXUI2INJj+0f/f2DqN7N2gPnz4WX4w== Received: from BN9PR03CA0070.namprd03.prod.outlook.com (2603:10b6:408:fc::15) by IA0PR12MB7531.namprd12.prod.outlook.com (2603:10b6:208:43f::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6043.22; Fri, 27 Jan 2023 20:05:07 +0000 Received: from BN8NAM11FT059.eop-nam11.prod.protection.outlook.com (2603:10b6:408:fc:cafe::1f) by BN9PR03CA0070.outlook.office365.com (2603:10b6:408:fc::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6043.25 via Frontend Transport; Fri, 27 Jan 2023 20:05:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BN8NAM11FT059.mail.protection.outlook.com (10.13.177.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6043.23 via Frontend Transport; Fri, 27 Jan 2023 20:05:07 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Fri, 27 Jan 2023 12:04:57 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail204.nvidia.com (10.129.68.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Fri, 27 Jan 2023 12:04:56 -0800 Received: from Asurada-Nvidia.nvidia.com (10.127.8.12) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Fri, 27 Jan 2023 12:04:55 -0800 From: Nicolin Chen <nicolinc@nvidia.com> To: <jgg@nvidia.com>, <kevin.tian@intel.com>, <robin.murphy@arm.com>, <joro@8bytes.org>, <will@kernel.org>, <agross@kernel.org>, <andersson@kernel.org>, <konrad.dybcio@linaro.org>, <yong.wu@mediatek.com>, <matthias.bgg@gmail.com>, <thierry.reding@gmail.com>, <alex.williamson@redhat.com>, <cohuck@redhat.com> CC: <vdumpa@nvidia.com>, <jonathanh@nvidia.com>, <iommu@lists.linux.dev>, <linux-kernel@vger.kernel.org>, <linux-arm-msm@vger.kernel.org>, <linux-mediatek@lists.infradead.org>, <linux-arm-kernel@lists.infradead.org>, <linux-tegra@vger.kernel.org>, <kvm@vger.kernel.org> Subject: [PATCH 1/4] iommu: Add a broken_unmanaged_domain flag in iommu_ops Date: Fri, 27 Jan 2023 12:04:17 -0800 Message-ID: <0875479d24a53670e17db8a11945664a6bb4a25b.1674849118.git.nicolinc@nvidia.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <cover.1674849118.git.nicolinc@nvidia.com> References: <cover.1674849118.git.nicolinc@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT059:EE_|IA0PR12MB7531:EE_ X-MS-Office365-Filtering-Correlation-Id: f42cbbc0-a718-4abb-ac8f-08db00a1c9df X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: zuSgDPdKH+iQ/mM5iESN3ezTXb4PN5DA1Pmljx8A5erhH+BQk9CMrzf7ua+i5qoO9AWsrBbuXbvXkCZnWyiabqE9jm4MZrWOOIAhS9bgaQjoKXucGAbYnmQH3I9FGqxbYh/m7ZXafmTy50V8X6mHCWfGbAaQIDDIDaz77TIVkWY/AEozp09B2WzaUNCCTolmgEHX2ys6bdBFqZF2HXObkuHk599P1P1o2OSIDaHT35K7DIQ9vYu5ufC8R6vfYfMsTQGFVEnAYxhMUNuwI93e+cg2wScERTFTydM1WG1IURk66gk7FzcRi7XVWngHtNJ7Loi35r2WO6yEQmLH8hOI3sT5R9Snd/63ItdUrjJT5QdDg7JnkLjtTW6c7tVf6JUbYXyMTHYucxv+88DejD5xlamXG9hgm1Jjm3ij0cBCvLOHla6MQYbwpV4gGuXDPSgz6uJ39sfJ4i/r8uytM//Ol0WW1HS9JYIkcp7O9/Zmu9ODgBdH8w34gC1wRqRJdRg1QIAkIv4yCk8+PJJkGjMw9cJETA66VHJvx74mV/pNCMY190SqfoTEkl2kjI2FfgqmwKX+6KyCMls0lPq8v2e9ApV5R4pukcpH29nU7Vg7Ae/gVuTTKCShXFWEt/LtaT4X2w8b5TJjTiBkrnJ8wyUWnkbYxmguCsSW90tGg+7FtiuslNfxXLHp31HbDjlBRCm7 X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230025)(4636009)(396003)(136003)(346002)(376002)(39860400002)(451199018)(46966006)(36840700001)(7696005)(186003)(26005)(478600001)(47076005)(2616005)(336012)(82310400005)(426003)(83380400001)(6666004)(316002)(54906003)(41300700001)(7636003)(8676002)(356005)(70206006)(82740400003)(110136005)(921005)(8936002)(4326008)(86362001)(5660300002)(70586007)(7416002)(36756003)(40480700001)(36860700001)(2906002);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Jan 2023 20:05:07.7639 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f42cbbc0-a718-4abb-ac8f-08db00a1c9df X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT059.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA0PR12MB7531 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, SPF_HELO_PASS,SPF_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756209495981123351?= X-GMAIL-MSGID: =?utf-8?q?1756209495981123351?= |
Series |
iommu: Reject drivers with broken_unmanaged_domain
|
|
Commit Message
Nicolin Chen
Jan. 27, 2023, 8:04 p.m. UTC
Both IOMMU_DOMAIN_UNMANAGED and IOMMU_DOMAIN_DMA require the support of __IOMMU_DOMAIN_PAGING capability, i.e. iommu_map/unmap. However, some older iommu drivers do not fully support that, and these drivers also do not advertise support for dma-iommu.c via IOMMU_DOMAIN_DMA, or use arm_iommu_create_mapping(), so largely their implementations of IOMMU_DOMAIN_UNMANAGED are untested. This means that a user like vfio/iommufd does not likely work with them. Several of them have obvious problems: * fsl_pamu_domain.c Without map/unmap ops in the default_domain_ops, it isn't an unmanaged domain at all. * mtk_iommu_v1.c With a fixed 4M "pagetable", it can only map exactly 4G of memory, but doesn't set the aperture. * tegra-gart.c Its notion of attach/detach and groups has to be a complete lie to get around all the other API expectations. Some others might work but have never been tested with vfio/iommufd: * msm_iommu.c * omap-iommu.c * tegra-smmu.c Thus, mark all these drivers as having "broken" UNAMANGED domains and add a new device_iommu_unmanaged_supported() API for vfio/iommufd and dma-iommu to refuse to work with these drivers. Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- drivers/iommu/fsl_pamu_domain.c | 1 + drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++ drivers/iommu/msm_iommu.c | 1 + drivers/iommu/mtk_iommu_v1.c | 1 + drivers/iommu/omap-iommu.c | 1 + drivers/iommu/tegra-gart.c | 1 + drivers/iommu/tegra-smmu.c | 1 + include/linux/iommu.h | 11 +++++++++++ 8 files changed, 41 insertions(+)
Comments
On 2023-01-27 20:04, Nicolin Chen wrote: > Both IOMMU_DOMAIN_UNMANAGED and IOMMU_DOMAIN_DMA require the support > of __IOMMU_DOMAIN_PAGING capability, i.e. iommu_map/unmap. However, > some older iommu drivers do not fully support that, and these drivers > also do not advertise support for dma-iommu.c via IOMMU_DOMAIN_DMA, > or use arm_iommu_create_mapping(), so largely their implementations > of IOMMU_DOMAIN_UNMANAGED are untested. This means that a user like > vfio/iommufd does not likely work with them. > > Several of them have obvious problems: > * fsl_pamu_domain.c > Without map/unmap ops in the default_domain_ops, it isn't an > unmanaged domain at all. > * mtk_iommu_v1.c > With a fixed 4M "pagetable", it can only map exactly 4G of > memory, but doesn't set the aperture. The aperture is easily fixed (one could argue that what's broken there are the ARM DMA ops for assuming every IOMMU has a 32-bit IOVA space and not checking). > * tegra-gart.c > Its notion of attach/detach and groups has to be a complete lie to > get around all the other API expectations. That's true, and the domain is tiny and not isolated from the rest of the address space outside the aperture, but the one thing it does do is support iommu_map/unmap just fine, which is what this flag is documented as saying it doesn't. > Some others might work but have never been tested with vfio/iommufd: > * msm_iommu.c > * omap-iommu.c > * tegra-smmu.c And yet they all have other in-tree users (GPUs on MSM and Tegra, remoteproc on OMAP) that allocate unmanaged domains and use iommu_map/unmap just fine, so they're clearly not broken either. On the flipside, you're also missing cases like apple-dart, which can have broken unmanaged domains by any definition, but only under certain conditions (at least it "fails safe" and they will refuse attempts to attach anything). I'd also question sprd-iommu, which hardly has a generally-useful domain size, and has only just recently gained the ability to unmap anything successfully. TBH none of the SoC IOMMUs are likely to ever be of interest to VFIO or IOMMUFD, since the only things they could assign to userspace are the individual devices - usually graphics and media engines - that they're coupled to, whose useful functionality tends to depend on clocks, phys, and random other low-level stuff that would be somewhere between impractical and downright unsafe to attempt to somehow expose as well. > Thus, mark all these drivers as having "broken" UNAMANGED domains and > add a new device_iommu_unmanaged_supported() API for vfio/iommufd and > dma-iommu to refuse to work with these drivers. > > Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> [...] > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 46e1347bfa22..919a5dbad75b 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -245,6 +245,10 @@ struct iommu_iotlb_gather { > * pasid, so that any DMA transactions with this pasid > * will be blocked by the hardware. > * @pgsize_bitmap: bitmap of all possible supported page sizes > + * @broken_unmanaged_domain: IOMMU_DOMAIN_UNMANAGED is not fully functional; the > + * driver does not really support iommu_map/unmap, but > + * uses UNMANAGED domains for the IOMMU API, called by > + * other SOC drivers. "uses UNMANAGED domains for the IOMMU API" is literally the definition of unmanaged domains :/ Some "other SOC drivers" use more of the IOMMU API than VFIO does :/ Please just add IOMMU_CAP_IOMMUFD to represent whatever the nebulous requirements of IOMMUFD actually are (frankly it's no less informative than calling domains "broken"), handle that in the drivers you care about and have tested, and use device_iommu_capable(). What you're describing in this series is a capability, and we have a perfectly good API for drivers to express those already. Plus, as demonstrated above, a positive capability based on empirical testing will be infinitely more robust than a negative one based on guessing. Thanks, Robin.
Hi Robin. On Fri, Jan 27, 2023 at 09:58:46PM +0000, Robin Murphy wrote: > External email: Use caution opening links or attachments > > > On 2023-01-27 20:04, Nicolin Chen wrote: > > Both IOMMU_DOMAIN_UNMANAGED and IOMMU_DOMAIN_DMA require the support > > of __IOMMU_DOMAIN_PAGING capability, i.e. iommu_map/unmap. However, > > some older iommu drivers do not fully support that, and these drivers > > also do not advertise support for dma-iommu.c via IOMMU_DOMAIN_DMA, > > or use arm_iommu_create_mapping(), so largely their implementations > > of IOMMU_DOMAIN_UNMANAGED are untested. This means that a user like > > vfio/iommufd does not likely work with them. > > > > Several of them have obvious problems: > > * fsl_pamu_domain.c > > Without map/unmap ops in the default_domain_ops, it isn't an > > unmanaged domain at all. > > * mtk_iommu_v1.c > > With a fixed 4M "pagetable", it can only map exactly 4G of > > memory, but doesn't set the aperture. > > The aperture is easily fixed (one could argue that what's broken there > are the ARM DMA ops for assuming every IOMMU has a 32-bit IOVA space and > not checking). > > > * tegra-gart.c > > Its notion of attach/detach and groups has to be a complete lie to > > get around all the other API expectations. > > That's true, and the domain is tiny and not isolated from the rest of > the address space outside the aperture, but the one thing it does do is > support iommu_map/unmap just fine, which is what this flag is documented > as saying it doesn't. > > > Some others might work but have never been tested with vfio/iommufd: > > * msm_iommu.c > > * omap-iommu.c > > * tegra-smmu.c > > And yet they all have other in-tree users (GPUs on MSM and Tegra, > remoteproc on OMAP) that allocate unmanaged domains and use > iommu_map/unmap just fine, so they're clearly not broken either. > > On the flipside, you're also missing cases like apple-dart, which can > have broken unmanaged domains by any definition, but only under certain > conditions (at least it "fails safe" and they will refuse attempts to > attach anything). I'd also question sprd-iommu, which hardly has a > generally-useful domain size, and has only just recently gained the > ability to unmap anything successfully. TBH none of the SoC IOMMUs are > likely to ever be of interest to VFIO or IOMMUFD, since the only things > they could assign to userspace are the individual devices - usually > graphics and media engines - that they're coupled to, whose useful > functionality tends to depend on clocks, phys, and random other > low-level stuff that would be somewhere between impractical and > downright unsafe to attempt to somehow expose as well. Thanks for all the inputs. > > Thus, mark all these drivers as having "broken" UNAMANGED domains and > > add a new device_iommu_unmanaged_supported() API for vfio/iommufd and > > dma-iommu to refuse to work with these drivers. > > > > Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> > > [...] > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > > index 46e1347bfa22..919a5dbad75b 100644 > > --- a/include/linux/iommu.h > > +++ b/include/linux/iommu.h > > @@ -245,6 +245,10 @@ struct iommu_iotlb_gather { > > * pasid, so that any DMA transactions with this pasid > > * will be blocked by the hardware. > > * @pgsize_bitmap: bitmap of all possible supported page sizes > > + * @broken_unmanaged_domain: IOMMU_DOMAIN_UNMANAGED is not fully functional; the > > + * driver does not really support iommu_map/unmap, but > > + * uses UNMANAGED domains for the IOMMU API, called by > > + * other SOC drivers. > > "uses UNMANAGED domains for the IOMMU API" is literally the definition > of unmanaged domains :/ > > Some "other SOC drivers" use more of the IOMMU API than VFIO does :/ > > Please just add IOMMU_CAP_IOMMUFD to represent whatever the nebulous > requirements of IOMMUFD actually are (frankly it's no less informative > than calling domains "broken"), handle that in the drivers you care > about and have tested, and use device_iommu_capable(). What you're > describing in this series is a capability, and we have a perfectly good > API for drivers to express those already. Plus, as demonstrated above, a > positive capability based on empirical testing will be infinitely more > robust than a negative one based on guessing. OK. I can change to IOMMU_CAP_IOMMUFD, and add to the drivers that are tested. And an IOMMU driver that wants to use IOMMUFD can add such a CAP later whenever it's ready. Yet, "IOMMU_CAP_IOMMUFD" would make the VFIO change suspicious, so perhaps the next version is just one CAP patch + one IOMMUFD patch. @Jason, any concern? Thank you Nicolin
On Fri, Jan 27, 2023 at 09:58:46PM +0000, Robin Murphy wrote: > Please just add IOMMU_CAP_IOMMUFD to represent whatever the nebulous > requirements of IOMMUFD actually are (frankly it's no less informative than > calling domains "broken"), handle that in the drivers you care about > and I don't want to tie this to iommufd, that isn't the point. We clearly have drivers that don't implement the iommu kernel API properly, because their only use is between the iommu driver and some other same-SOC driver. As a user of the iommu API iommufd and VFIO want to avoid these drivers. We have that acknowledgment already via the IOMMU_DOMAIN_DMA stuff protecting the dma_iommu.c from those same drivers. So, how about this below instead. Imagine it is followed by something along the lines of my other sketch: https://lore.kernel.org/linux-iommu/Y4%2FLsZKmR3iWFphU@nvidia.com/ And we completely delete IOMMU_DOMAIN_DMA/_FQ and get dma-iommu.c mostly out of driver code. iommufd/vfio would refuse to work with drivers that don't indicate they support dma_iommu.c, that is good enough. We'd eventually rename IOMMU_DOMAIN_UNMANAGED to IOMMU_DOMAIN_MAPPING. Subject: [PATCH] iommu: Remove IOMMU_DOMAIN_DMA from drivers The IOMMU_DOMAIN_DMA is exactly the same driver functionality as IOMMU_DOMAIN_UNMANAGED, except it also implies that dma_iommu.c should be able to use this driver. As a first step to removing IOMMU_DOMAIN_DMA remove all references to it from the drivers. Two simple ops flags are enough to indicate if the driver is compatible with dma_iommu.c and if the driver will allow the FQ mode to be selected. When dma-iommu.c needs an a domain it will request an IOMMU_DOMAIN_UNMANAGED. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> --- drivers/iommu/apple-dart.c | 3 ++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 +-- drivers/iommu/arm/arm-smmu/arm-smmu.c | 11 ++++---- drivers/iommu/arm/arm-smmu/qcom_iommu.c | 3 ++- drivers/iommu/exynos-iommu.c | 3 ++- drivers/iommu/intel/iommu.c | 3 +-- drivers/iommu/iommu.c | 29 ++++++++++++++++----- drivers/iommu/ipmmu-vmsa.c | 3 ++- drivers/iommu/mtk_iommu.c | 3 ++- drivers/iommu/rockchip-iommu.c | 3 ++- drivers/iommu/sprd-iommu.c | 3 ++- drivers/iommu/sun50i-iommu.c | 4 +-- drivers/iommu/virtio-iommu.c | 2 +- include/linux/iommu.h | 2 ++ 14 files changed, 50 insertions(+), 26 deletions(-) diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c index 4f4a323be0d0ff..0eb3eb857d7e9e 100644 --- a/drivers/iommu/apple-dart.c +++ b/drivers/iommu/apple-dart.c @@ -580,7 +580,7 @@ static struct iommu_domain *apple_dart_domain_alloc(unsigned int type) { struct apple_dart_domain *dart_domain; - if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED && + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_IDENTITY && type != IOMMU_DOMAIN_BLOCKED) return NULL; @@ -769,6 +769,7 @@ static void apple_dart_get_resv_regions(struct device *dev, } static const struct iommu_ops apple_dart_iommu_ops = { + .use_dma_iommu = true, .domain_alloc = apple_dart_domain_alloc, .probe_device = apple_dart_probe_device, .release_device = apple_dart_release_device, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ab160198edd6b1..2253c5598092d0 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2013,8 +2013,6 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) return arm_smmu_sva_domain_alloc(); if (type != IOMMU_DOMAIN_UNMANAGED && - type != IOMMU_DOMAIN_DMA && - type != IOMMU_DOMAIN_DMA_FQ && type != IOMMU_DOMAIN_IDENTITY) return NULL; @@ -2844,6 +2842,8 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid) } static struct iommu_ops arm_smmu_ops = { + .use_dma_iommu = true, + .allow_dma_fq = true, .capable = arm_smmu_capable, .domain_alloc = arm_smmu_domain_alloc, .probe_device = arm_smmu_probe_device, diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 719fbca1fe52a0..7bb160bdc4b594 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -855,11 +855,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) { struct arm_smmu_domain *smmu_domain; - if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_IDENTITY) { - if (using_legacy_binding || - (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_DMA_FQ)) - return NULL; - } + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_IDENTITY) + return NULL; + /* * Allocate the domain and initialise some of its data structures. * We can't really do anything meaningful until we've added a @@ -1555,6 +1553,8 @@ static int arm_smmu_def_domain_type(struct device *dev) } static struct iommu_ops arm_smmu_ops = { + .use_dma_iommu = true, + .allow_dma_fq = true, .capable = arm_smmu_capable, .domain_alloc = arm_smmu_domain_alloc, .probe_device = arm_smmu_probe_device, @@ -1982,6 +1982,7 @@ static int arm_smmu_device_dt_probe(struct arm_smmu_device *smmu, IS_ENABLED(CONFIG_ARM_SMMU_LEGACY_DT_BINDINGS) ? "DMA API" : "SMMU"); } using_legacy_binding = true; + arm_smmu_ops.use_dma_iommu = false; } else if (!legacy_binding && !using_legacy_binding) { using_generic_binding = true; } else { diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c index 270c3d9128bab8..babd20108f6a17 100644 --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c @@ -323,7 +323,7 @@ static struct iommu_domain *qcom_iommu_domain_alloc(unsigned type) { struct qcom_iommu_domain *qcom_domain; - if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) + if (type != IOMMU_DOMAIN_UNMANAGED) return NULL; /* * Allocate the domain and initialise some of its data structures. @@ -575,6 +575,7 @@ static int qcom_iommu_of_xlate(struct device *dev, struct of_phandle_args *args) } static const struct iommu_ops qcom_iommu_ops = { + .use_dma_iommu = true, .capable = qcom_iommu_capable, .domain_alloc = qcom_iommu_domain_alloc, .probe_device = qcom_iommu_probe_device, diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index b0cde22119875e..824350551e5933 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -825,7 +825,7 @@ static struct iommu_domain *exynos_iommu_domain_alloc(unsigned type) /* Check if correct PTE offsets are initialized */ BUG_ON(PG_ENT_SHIFT < 0 || !dma_dev); - if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED) + if (type != IOMMU_DOMAIN_UNMANAGED) return NULL; domain = kzalloc(sizeof(*domain), GFP_KERNEL); @@ -1396,6 +1396,7 @@ static int exynos_iommu_of_xlate(struct device *dev, } static const struct iommu_ops exynos_iommu_ops = { + .use_dma_iommu = true, .domain_alloc = exynos_iommu_domain_alloc, .device_group = generic_device_group, .probe_device = exynos_iommu_probe_device, diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 59df7e42fd533c..bb34d3f641f17b 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4165,8 +4165,6 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) switch (type) { case IOMMU_DOMAIN_BLOCKED: return &blocking_domain; - case IOMMU_DOMAIN_DMA: - case IOMMU_DOMAIN_DMA_FQ: case IOMMU_DOMAIN_UNMANAGED: dmar_domain = alloc_domain(type); if (!dmar_domain) { @@ -4761,6 +4759,7 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) } const struct iommu_ops intel_iommu_ops = { + .use_dma_iommu = true, .capable = intel_iommu_capable, .domain_alloc = intel_iommu_domain_alloc, .probe_device = intel_iommu_probe_device, diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index de91dd88705bd3..877ef0a58b07f4 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1618,17 +1618,32 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus, { struct iommu_domain *dom; - dom = __iommu_domain_alloc(bus, type); - if (!dom && type != IOMMU_DOMAIN_DMA) { - dom = __iommu_domain_alloc(bus, IOMMU_DOMAIN_DMA); - if (dom) + /* + * Keep the DMA domain type out of the drivers, eventually it will go + * away completely. + */ + if (type == IOMMU_DOMAIN_IDENTITY) { + dom = __iommu_domain_alloc(bus, IOMMU_DOMAIN_IDENTITY); + if (!dom) + return -ENOMEM; + } else if (type == IOMMU_DOMAIN_DMA_FQ || type == IOMMU_DOMAIN_DMA) { + if (!bus->iommu_ops->use_dma_iommu) + return -EINVAL; + + dom = __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED); + if (!dom) + return -ENOMEM; + if (type == IOMMU_DOMAIN_DMA_FQ && + !bus->iommu_ops->allow_dma_fq) { pr_warn("Failed to allocate default IOMMU domain of type %u for group %s - Falling back to IOMMU_DOMAIN_DMA", type, group->name); + type = IOMMU_DOMAIN_DMA; + } + dom->type = type; + } else { + return -EINVAL; } - if (!dom) - return -ENOMEM; - group->default_domain = dom; if (!group->domain) group->domain = dom; diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index a003bd5fc65c13..a22df69f7f4775 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -571,7 +571,7 @@ static struct iommu_domain *ipmmu_domain_alloc(unsigned type) { struct ipmmu_vmsa_domain *domain; - if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) + if (type != IOMMU_DOMAIN_UNMANAGED) return NULL; domain = kzalloc(sizeof(*domain), GFP_KERNEL); @@ -866,6 +866,7 @@ static struct iommu_group *ipmmu_find_group(struct device *dev) } static const struct iommu_ops ipmmu_ops = { + .use_dma_iommu = true, .domain_alloc = ipmmu_domain_alloc, .probe_device = ipmmu_probe_device, .release_device = ipmmu_release_device, diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 2badd6acfb23d6..e27cb428bd9583 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -636,7 +636,7 @@ static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type) { struct mtk_iommu_domain *dom; - if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED) + if (type != IOMMU_DOMAIN_UNMANAGED) return NULL; dom = kzalloc(sizeof(*dom), GFP_KERNEL); @@ -936,6 +936,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev, } static const struct iommu_ops mtk_iommu_ops = { + .use_dma_iommu = true, .domain_alloc = mtk_iommu_domain_alloc, .probe_device = mtk_iommu_probe_device, .release_device = mtk_iommu_release_device, diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index a68eadd64f38db..fa3ec38e5244db 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -1061,7 +1061,7 @@ static struct iommu_domain *rk_iommu_domain_alloc(unsigned type) { struct rk_iommu_domain *rk_domain; - if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) + if (type != IOMMU_DOMAIN_UNMANAGED) return NULL; if (!dma_dev) @@ -1184,6 +1184,7 @@ static int rk_iommu_of_xlate(struct device *dev, } static const struct iommu_ops rk_iommu_ops = { + .use_dma_iommu = true, .domain_alloc = rk_iommu_domain_alloc, .probe_device = rk_iommu_probe_device, .release_device = rk_iommu_release_device, diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c index 219bfa11f7f48f..fbff1831c16e78 100644 --- a/drivers/iommu/sprd-iommu.c +++ b/drivers/iommu/sprd-iommu.c @@ -136,7 +136,7 @@ static struct iommu_domain *sprd_iommu_domain_alloc(unsigned int domain_type) { struct sprd_iommu_domain *dom; - if (domain_type != IOMMU_DOMAIN_DMA && domain_type != IOMMU_DOMAIN_UNMANAGED) + if (domain_type != IOMMU_DOMAIN_UNMANAGED) return NULL; dom = kzalloc(sizeof(*dom), GFP_KERNEL); @@ -406,6 +406,7 @@ static int sprd_iommu_of_xlate(struct device *dev, struct of_phandle_args *args) static const struct iommu_ops sprd_iommu_ops = { + .use_dma_iommu = true, .domain_alloc = sprd_iommu_domain_alloc, .probe_device = sprd_iommu_probe_device, .device_group = sprd_iommu_device_group, diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c index 5b585eace3d46f..de2a033f65197a 100644 --- a/drivers/iommu/sun50i-iommu.c +++ b/drivers/iommu/sun50i-iommu.c @@ -671,8 +671,7 @@ static struct iommu_domain *sun50i_iommu_domain_alloc(unsigned type) { struct sun50i_iommu_domain *sun50i_domain; - if (type != IOMMU_DOMAIN_DMA && - type != IOMMU_DOMAIN_UNMANAGED) + if (type != IOMMU_DOMAIN_UNMANAGED) return NULL; sun50i_domain = kzalloc(sizeof(*sun50i_domain), GFP_KERNEL); @@ -827,6 +826,7 @@ static int sun50i_iommu_of_xlate(struct device *dev, } static const struct iommu_ops sun50i_iommu_ops = { + .use_dma_iommu = true, .pgsize_bitmap = SZ_4K, .device_group = sun50i_iommu_device_group, .domain_alloc = sun50i_iommu_domain_alloc, diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 5b8fe9bfa9a5b9..8afa21a9c0b9d6 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -642,7 +642,6 @@ static struct iommu_domain *viommu_domain_alloc(unsigned type) struct viommu_domain *vdomain; if (type != IOMMU_DOMAIN_UNMANAGED && - type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_IDENTITY) return NULL; @@ -1018,6 +1017,7 @@ static bool viommu_capable(struct device *dev, enum iommu_cap cap) } static struct iommu_ops viommu_ops = { + .use_dma_iommu = true, .capable = viommu_capable, .domain_alloc = viommu_domain_alloc, .probe_device = viommu_probe_device, diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 46e1347bfa2286..954db8e77491c7 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -277,6 +277,8 @@ struct iommu_ops { const struct iommu_domain_ops *default_domain_ops; unsigned long pgsize_bitmap; + u8 use_dma_iommu : 1; + u8 allow_dma_fq : 1; struct module *owner; };
> From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Saturday, January 28, 2023 4:04 AM > > Both IOMMU_DOMAIN_UNMANAGED and IOMMU_DOMAIN_DMA require > the support > of __IOMMU_DOMAIN_PAGING capability, i.e. iommu_map/unmap. > However, > some older iommu drivers do not fully support that, and these drivers > also do not advertise support for dma-iommu.c via IOMMU_DOMAIN_DMA, > or use arm_iommu_create_mapping(), so largely their implementations > of IOMMU_DOMAIN_UNMANAGED are untested. This means that a user like > vfio/iommufd does not likely work with them. > > Several of them have obvious problems: > * fsl_pamu_domain.c > Without map/unmap ops in the default_domain_ops, it isn't an > unmanaged domain at all. > * mtk_iommu_v1.c > With a fixed 4M "pagetable", it can only map exactly 4G of > memory, but doesn't set the aperture. > * tegra-gart.c > Its notion of attach/detach and groups has to be a complete lie to > get around all the other API expectations. > > Some others might work but have never been tested with vfio/iommufd: > * msm_iommu.c > * omap-iommu.c > * tegra-smmu.c > Do we have a link where all drivers tested with vfio/iommufd have been listed? In a quick glance at least exynos-iommu.c and apple-dart.c both support UNMANAGED with map/unmap ops. They are not mentioned in above list but I doubt they are tested for vfio/iommufd.
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Saturday, January 28, 2023 7:54 AM > > On Fri, Jan 27, 2023 at 09:58:46PM +0000, Robin Murphy wrote: > > > Please just add IOMMU_CAP_IOMMUFD to represent whatever the > nebulous > > requirements of IOMMUFD actually are (frankly it's no less informative > than > > calling domains "broken"), handle that in the drivers you care about > > and > > I don't want to tie this to iommufd, that isn't the point. > > We clearly have drivers that don't implement the iommu kernel API > properly, because their only use is between the iommu driver and some > other same-SOC driver. > > As a user of the iommu API iommufd and VFIO want to avoid these > drivers. > > We have that acknowledgment already via the IOMMU_DOMAIN_DMA stuff > protecting the dma_iommu.c from those same drivers. > > So, how about this below instead. Imagine it is followed by something along > the lines of my other sketch: > > https://lore.kernel.org/linux-iommu/Y4%2FLsZKmR3iWFphU@nvidia.com/ > > And we completely delete IOMMU_DOMAIN_DMA/_FQ and get dma- > iommu.c > mostly out of driver code. > > iommufd/vfio would refuse to work with drivers that don't indicate > they support dma_iommu.c, that is good enough. this is a good idea. Just one doubt. Robin mentioned that sprd-iommu might have a broken unmanaged domains: " I'd also question sprd-iommu, which hardly has a generally-useful domain size, and has only just recently gained the ability to unmap anything successfully." Want to understand why that restriction is not a problem for DMA API. what Jason proposed here assumes that existing driver support for DMA API can be safely applied to vfio/iommufd. But above looks warning certain exception? > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 59df7e42fd533c..bb34d3f641f17b 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -4165,8 +4165,6 @@ static struct iommu_domain > *intel_iommu_domain_alloc(unsigned type) > switch (type) { > case IOMMU_DOMAIN_BLOCKED: > return &blocking_domain; > - case IOMMU_DOMAIN_DMA: > - case IOMMU_DOMAIN_DMA_FQ: > case IOMMU_DOMAIN_UNMANAGED: > dmar_domain = alloc_domain(type); > if (!dmar_domain) { > @@ -4761,6 +4759,7 @@ static void intel_iommu_remove_dev_pasid(struct > device *dev, ioasid_t pasid) > } > > const struct iommu_ops intel_iommu_ops = { > + .use_dma_iommu = true, missed: + .allow_dma_fq = true,
On Sun, Jan 29, 2023 at 08:11:48AM +0000, Tian, Kevin wrote: > " I'd also question sprd-iommu, which hardly has a generally-useful > domain size, and has only just recently gained the ability to unmap > anything successfully." So long as it has a correct kernel API and exposes the right (small) aperture then it is OK. The device will not be useful for qemu, but it would run some dpdk configurations just fine. All the VFIO world is not VMs. > > @@ -4761,6 +4759,7 @@ static void intel_iommu_remove_dev_pasid(struct > > device *dev, ioasid_t pasid) > > } > > > > const struct iommu_ops intel_iommu_ops = { > > + .use_dma_iommu = true, > > missed: > + .allow_dma_fq = true, Yep Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Monday, January 30, 2023 9:33 PM > > On Sun, Jan 29, 2023 at 08:11:48AM +0000, Tian, Kevin wrote: > > > " I'd also question sprd-iommu, which hardly has a generally-useful > > domain size, and has only just recently gained the ability to unmap > > anything successfully." > > So long as it has a correct kernel API and exposes the right (small) > aperture then it is OK. > > The device will not be useful for qemu, but it would run some dpdk > configurations just fine. I still didn't get the restriction here. Can you elaborate why it works with dpdk but not qemu? Can qemu verify this restriction via existing path or need new uAPI flag to communicate?
On Wed, Feb 01, 2023 at 03:14:03AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Monday, January 30, 2023 9:33 PM > > > > On Sun, Jan 29, 2023 at 08:11:48AM +0000, Tian, Kevin wrote: > > > > > " I'd also question sprd-iommu, which hardly has a generally-useful > > > domain size, and has only just recently gained the ability to unmap > > > anything successfully." > > > > So long as it has a correct kernel API and exposes the right (small) > > aperture then it is OK. > > > > The device will not be useful for qemu, but it would run some dpdk > > configurations just fine. > > I still didn't get the restriction here. Can you elaborate why it works > with dpdk but not qemu? dpdk needs like, say, 64M of aperture and doesn't care what the IOVAs are qemu needs the entire guest memory of aperture and must have IOVAs that are 1:1 with the GPA. So aperture size and location can exclude qemu > Can qemu verify this restriction via existing path or need new uAPI > flag to communicate? It already happens, the aperture/etc is convayed to qemu through IOMMUFD_CMD_IOAS_IOVA_RANGES and if qemu cannot get the IOVA's it needs to create the guest it should fail. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Wednesday, February 1, 2023 10:56 PM > > On Wed, Feb 01, 2023 at 03:14:03AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Monday, January 30, 2023 9:33 PM > > > > > > On Sun, Jan 29, 2023 at 08:11:48AM +0000, Tian, Kevin wrote: > > > > > > > " I'd also question sprd-iommu, which hardly has a generally-useful > > > > domain size, and has only just recently gained the ability to unmap > > > > anything successfully." > > > > > > So long as it has a correct kernel API and exposes the right (small) > > > aperture then it is OK. > > > > > > The device will not be useful for qemu, but it would run some dpdk > > > configurations just fine. > > > > I still didn't get the restriction here. Can you elaborate why it works > > with dpdk but not qemu? > > dpdk needs like, say, 64M of aperture and doesn't care what the IOVAs > are > > qemu needs the entire guest memory of aperture and must have IOVAs > that are 1:1 with the GPA. > > So aperture size and location can exclude qemu > > > Can qemu verify this restriction via existing path or need new uAPI > > flag to communicate? > > It already happens, the aperture/etc is convayed to qemu through > IOMMUFD_CMD_IOAS_IOVA_RANGES and if qemu cannot get the IOVA's it > needs to create the guest it should fail. > Make sense.
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c index 4408ac3c49b6..1f3a4c8c78ad 100644 --- a/drivers/iommu/fsl_pamu_domain.c +++ b/drivers/iommu/fsl_pamu_domain.c @@ -448,6 +448,7 @@ static struct iommu_device *fsl_pamu_probe_device(struct device *dev) } static const struct iommu_ops fsl_pamu_ops = { + .broken_unmanaged_domain = true, .capable = fsl_pamu_capable, .domain_alloc = fsl_pamu_domain_alloc, .probe_device = fsl_pamu_probe_device, diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 5f6a85aea501..648fc04143b8 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1897,6 +1897,30 @@ bool device_iommu_capable(struct device *dev, enum iommu_cap cap) } EXPORT_SYMBOL_GPL(device_iommu_capable); +/** + * device_iommu_unmanaged_supported() - full support of IOMMU_DOMAIN_UNMANAGED + * @dev: device that is behind the iommu + * + * In general, all IOMMU drivers can allocate IOMMU_DOMAIN_UNMANAGED domains. + * However, some of them set the broken_unmanaged_domain, indicating something + * is wrong about its support of IOMMU_DOMAIN_UNMANAGED/__IOMMU_DOMAIN_PAGING. + * This can happen, when a driver lies about the support to get around with the + * IOMMU API, merely for other drivers to use, or when a driver has never been + * tested with vfio/iommufd that need a full support of IOMMU_DOMAIN_UNMANAGED. + * + * Return: true if an IOMMU is present and broken_unmanaged_domain is not set, + * otherwise, false. + */ +bool device_iommu_unmanaged_supported(struct device *dev) +{ + if (dev->iommu && dev->iommu->iommu_dev && + !dev_iommu_ops(dev)->broken_unmanaged_domain) + return true; + + return false; +} +EXPORT_SYMBOL_GPL(device_iommu_unmanaged_supported); + /** * iommu_set_fault_handler() - set a fault handler for an iommu domain * @domain: iommu domain diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c index c60624910872..9c0bd5aee10b 100644 --- a/drivers/iommu/msm_iommu.c +++ b/drivers/iommu/msm_iommu.c @@ -675,6 +675,7 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id) } static struct iommu_ops msm_iommu_ops = { + .broken_unmanaged_domain = true, .domain_alloc = msm_iommu_domain_alloc, .probe_device = msm_iommu_probe_device, .device_group = generic_device_group, diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index ca581ff1c769..c2ecbff6592c 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -578,6 +578,7 @@ static int mtk_iommu_v1_hw_init(const struct mtk_iommu_v1_data *data) } static const struct iommu_ops mtk_iommu_v1_ops = { + .broken_unmanaged_domain = true, .domain_alloc = mtk_iommu_v1_domain_alloc, .probe_device = mtk_iommu_v1_probe_device, .probe_finalize = mtk_iommu_v1_probe_finalize, diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c index 2fd7702c6709..17ed392b9f63 100644 --- a/drivers/iommu/omap-iommu.c +++ b/drivers/iommu/omap-iommu.c @@ -1733,6 +1733,7 @@ static struct iommu_group *omap_iommu_device_group(struct device *dev) } static const struct iommu_ops omap_iommu_ops = { + .broken_unmanaged_domain = true, .domain_alloc = omap_iommu_domain_alloc, .probe_device = omap_iommu_probe_device, .release_device = omap_iommu_release_device, diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c index ed53279d1106..9af56d2ec6c1 100644 --- a/drivers/iommu/tegra-gart.c +++ b/drivers/iommu/tegra-gart.c @@ -267,6 +267,7 @@ static void gart_iommu_sync(struct iommu_domain *domain, } static const struct iommu_ops gart_iommu_ops = { + .broken_unmanaged_domain = true, .domain_alloc = gart_iommu_domain_alloc, .probe_device = gart_iommu_probe_device, .device_group = generic_device_group, diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 5b1af40221ec..d1e4c4825d74 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -962,6 +962,7 @@ static int tegra_smmu_of_xlate(struct device *dev, } static const struct iommu_ops tegra_smmu_ops = { + .broken_unmanaged_domain = true, .domain_alloc = tegra_smmu_domain_alloc, .probe_device = tegra_smmu_probe_device, .device_group = tegra_smmu_device_group, diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 46e1347bfa22..919a5dbad75b 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -245,6 +245,10 @@ struct iommu_iotlb_gather { * pasid, so that any DMA transactions with this pasid * will be blocked by the hardware. * @pgsize_bitmap: bitmap of all possible supported page sizes + * @broken_unmanaged_domain: IOMMU_DOMAIN_UNMANAGED is not fully functional; the + * driver does not really support iommu_map/unmap, but + * uses UNMANAGED domains for the IOMMU API, called by + * other SOC drivers. * @owner: Driver module providing these ops */ struct iommu_ops { @@ -277,6 +281,7 @@ struct iommu_ops { const struct iommu_domain_ops *default_domain_ops; unsigned long pgsize_bitmap; + bool broken_unmanaged_domain; struct module *owner; }; @@ -455,6 +460,7 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev) extern int bus_iommu_probe(struct bus_type *bus); extern bool iommu_present(struct bus_type *bus); extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap); +extern bool device_iommu_unmanaged_supported(struct device *dev); extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus); extern struct iommu_group *iommu_group_get_by_id(int id); extern void iommu_domain_free(struct iommu_domain *domain); @@ -742,6 +748,11 @@ static inline bool device_iommu_capable(struct device *dev, enum iommu_cap cap) return false; } +static inline bool device_iommu_unmanaged_supported(struct device *dev) +{ + return false; +} + static inline struct iommu_domain *iommu_domain_alloc(struct bus_type *bus) { return NULL;