Message ID | 20230312120157.452859-2-ray.huang@amd.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp693161wrd; Sun, 12 Mar 2023 05:21:40 -0700 (PDT) X-Google-Smtp-Source: AK7set8Fvy79Y+YECLKr886gpOcVUQPb1RI12ibmeyfFzh3ljW95RBep5BPA0XIRq+lfhB7YJd+V X-Received: by 2002:a62:1a8a:0:b0:5a9:c1f9:dc70 with SMTP id a132-20020a621a8a000000b005a9c1f9dc70mr25685640pfa.30.1678623700400; Sun, 12 Mar 2023 05:21:40 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1678623700; cv=pass; d=google.com; s=arc-20160816; b=BbGd2+ksZekYnqDWY9hGp/K5QX7gu6D4NUjtGXYONvsClzMKFtY4yvprNGLbp+PGtF cS/YgxanUpCZoiOs8z+Aug0JDcxVXQneu/PCXZBr5ugxSRWkZ0k2Kq2e74LVVDHBJnSF v372y47dy8V49E0iDqw3GzLl611EXLOIBwVkWvd0AXqvwELDpT7PQW4Hd+0Ahy++9vVb 1ixAv74UZu2SCw3TqT8w5aRAKEQwOfxOHaeY1DA0cC731elHly0hq5cVZ044+7zwq4rf upK05PgKaFMkHUlFYgf+Zh8O73k0OJHoEz4RhURU9B/HJhZbU24a7LQEBORsDPec6DpB XDmA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=qEv7WzFyAvZ7dlaf6PHictVoky/OV/XaCF4pEssIZF0=; b=m8+FjKvu9aMnfa76Ir3pMV3Yp9zB2T3jO5Lfmh5VLJ+KP/Flbv6AXnEDWM/MX80Dlu Ffu/FJAyB+A0R61vxpO/sLZj05m+kYKbdVxVpF1btq/0SDbMd0k656znhGmcIJ41WMDQ K0yyME20bUry3b3vf/CPapq69SRC/z+4hPAs4qVkHg0BdefBSjtoI7gEZpEWxbzRhEVn Km4Q+sI2VAHHTzZdO+O3egXUWOcO8PRW1eZ+qnlKU1Dqo0nOPS5ZEPTLNF3JOR+p0I6t QnRP6SpbF+cuaCdvm5bI98Umd8ae2s/mVvOYV2zepOgf/teLbsN5C7XoEwhYxtGgrwjM g8nQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=Nxd1pC7P; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i67-20020a626d46000000b006224c885db7si3214209pfc.19.2023.03.12.05.21.27; Sun, 12 Mar 2023 05:21:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=Nxd1pC7P; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230180AbjCLMCh (ORCPT <rfc822;realc9580@gmail.com> + 99 others); Sun, 12 Mar 2023 08:02:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230062AbjCLMCb (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 12 Mar 2023 08:02:31 -0400 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2055.outbound.protection.outlook.com [40.107.102.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9760474C6 for <linux-kernel@vger.kernel.org>; Sun, 12 Mar 2023 05:02:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BfZPUJ/B3HJSaWZSAN6avLqq+vfV2HR7hyAjwVdrXWeu/UOleniCAO7sGRz9rkWiMkd5WLKN0XqmGmNDw+UAnmbC7YoC6V1Y1uw7jXvOu55pw331yyEwjTNdATlxPETpqnlJ+/LyQK/lt2QRoKKyqATZ8LtmSp6Q5pKlTtAYAJpf7Fb0bO7ty/9f/EhS8dXoPlgLZvNxVMjGMSRBa062EfO1rwN+57E6fKRivvbcr0bNkZOgwnBnXc8xZZ7tzhorMw8WAgDf0WH/8laLjLjUcOk47gZM2ZxmXnYogbvy10N74O5r8Igiu+0Y+8KXWM8AR30oTkS6dDNEMvV9VgW6Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qEv7WzFyAvZ7dlaf6PHictVoky/OV/XaCF4pEssIZF0=; b=MBGi1mSmmJckLHpboYEI4H07AsPEBRGMSUZpzcKC9q1W3t2d2d2qdlWVwo3ZMbTjDRwob1zQyed8oUHsFHc/ZpaGSiybPmIbprst/5Gp6b3T/bhA0OTldMelffZPaaLDhlqa8dwSKi+eJM8Zm7a9CtNkyrMviB7xO2JsF6poz3ZtZSVSEJDulgIvcvvh7uzZELdBjZ7LMtVnfau5uCgaYapsjPcCKbYXMaP4Ren3aZrFWktEgXMJanOS88aLU+eg627EBHfqFQoSbtIxT2Y1OVh2CuZzVWjwoVqEqATArcwa2eFuTuo4CWmmWT+5v4b+s7Ycbi7v7Xhqks75ARnJqg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=suse.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qEv7WzFyAvZ7dlaf6PHictVoky/OV/XaCF4pEssIZF0=; b=Nxd1pC7PKX+qa7NOVrF0N/tPjEVr8w8yv/z+1kdWhUG0KHTR5XnLAD0SjI2CsxfbGukGU59DPzGALvaf7UNKNdUtJnFafVrMk38807OZG8TGikbyFGxPJ9BodxBNqfTcXsdL5tungrc4YAA45g7FbM4sZFcFnQPt98RqIHe4AwE= Received: from MW4PR04CA0081.namprd04.prod.outlook.com (2603:10b6:303:6b::26) by CH0PR12MB8578.namprd12.prod.outlook.com (2603:10b6:610:18e::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.24; Sun, 12 Mar 2023 12:02:23 +0000 Received: from CO1NAM11FT009.eop-nam11.prod.protection.outlook.com (2603:10b6:303:6b:cafe::cf) by MW4PR04CA0081.outlook.office365.com (2603:10b6:303:6b::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.24 via Frontend Transport; Sun, 12 Mar 2023 12:02:23 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT009.mail.protection.outlook.com (10.13.175.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6178.23 via Frontend Transport; Sun, 12 Mar 2023 12:02:23 +0000 Received: from hr-amd.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Sun, 12 Mar 2023 07:02:19 -0500 From: Huang Rui <ray.huang@amd.com> To: Juergen Gross <jgross@suse.com>, Stefano Stabellini <sstabellini@kernel.org>, Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>, Boris Ostrovsky <boris.ostrovsky@oracle.com>, =?utf-8?q?Roger_Pau_Monn?= =?utf-8?q?=C3=A9?= <roger.pau@citrix.com>, <xen-devel@lists.xenproject.org>, <linux-kernel@vger.kernel.org>, <dri-devel@lists.freedesktop.org>, <amd-gfx@lists.freedesktop.org> CC: Alex Deucher <alexander.deucher@amd.com>, =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>, "Stewart Hildebrand" <Stewart.Hildebrand@amd.com>, Xenia Ragiadakou <burzalodowa@gmail.com>, Honglei Huang <honglei1.huang@amd.com>, Julia Zhang <julia.zhang@amd.com>, Chen Jiqian <Jiqian.Chen@amd.com>, Huang Rui <ray.huang@amd.com> Subject: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh Date: Sun, 12 Mar 2023 20:01:53 +0800 Message-ID: <20230312120157.452859-2-ray.huang@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230312120157.452859-1-ray.huang@amd.com> References: <20230312120157.452859-1-ray.huang@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT009:EE_|CH0PR12MB8578:EE_ X-MS-Office365-Filtering-Correlation-Id: 4b5d0c5d-0f4e-415e-6dff-08db22f1a3ec X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: uUZ1ncYNFXQ4JEqFhiNSwlF1CFQRyJiK/nmel3dt9i+GMx2sTO9j3VTu2uDiZCaYw6YzaAaIdbFb57BY4ezipAyIoFqcwH5qR87mRTww7hjeExah1W7yDapckzOTp5c/JOgCvecQoxAeDs2wKb8eMzF7LuvuMSaxyYQ46uGZlfX1odBUO0NPkE7IFznNhj2pQcVnc/UgDkpBLeT3g734e1qLyw41gbOHUNzgzSGzMfZzQ1wqr5HJAcoDr/KCdWtCB1hnwJk/dQXDUGD27OxImjkcJRus3LFops6ZBSDBtOPP8S6Y7iNGWAOm21uyKLn6V73RHEF4fZ/q8c8FNOdzPbaS43f7doYXmE0fxKTwqvdOoheJyiFYRZ923Oafi6+SnkRU1v30YanF4ynbmeeRnyFk34EinaQ6cHVsJBAI97Cp2LR/QJ64xb4A1T6VO6Xud6TDjmAznYIAm6In1KjBtwpv3oPfFcn57tKUtXmuEtkAsVsEa9Cq24LheVHsNCsP8X5hIoHJqjPh61veBg62YZ5+4bQAuptN6TPrz18mavJQuDmsoMmBpk76tvzizOEpOxauREjZbbEoaBU73gbjp+D3PDM4wcQ7sxfACBsllCX7wHNFlS/1aScnwWJ17N7sJ6ST1DKEr7KxNgpzDi8j9POCn2mP27XjfnXkLMa6Z4b9jFaY8On0a7hoBsdgyUhDWBN0+Qqqo9QCbrF3vXsyDrioQg451lt7BZCFi1h62xo= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230025)(4636009)(136003)(39860400002)(396003)(346002)(376002)(451199018)(40470700004)(46966006)(36840700001)(6666004)(7696005)(83380400001)(110136005)(316002)(81166007)(478600001)(36860700001)(54906003)(82310400005)(82740400003)(16526019)(8936002)(40480700001)(186003)(26005)(40460700003)(1076003)(36756003)(5660300002)(4326008)(7416002)(356005)(8676002)(336012)(47076005)(426003)(70206006)(70586007)(41300700001)(86362001)(2616005)(2906002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Mar 2023 12:02:23.3982 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4b5d0c5d-0f4e-415e-6dff-08db22f1a3ec X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT009.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH0PR12MB8578 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760164524860245383?= X-GMAIL-MSGID: =?utf-8?q?1760164524860245383?= |
Series |
Add Xen PVH dom0 support for GPU
|
|
Commit Message
Huang Rui
March 12, 2023, 12:01 p.m. UTC
Xen PVH is the paravirtualized mode and takes advantage of hardware
virtualization support when possible. It will using the hardware IOMMU
support instead of xen-swiotlb, so disable swiotlb if current domain is
Xen PVH.
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
arch/x86/kernel/pci-dma.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
Comments
On 12.03.2023 13:01, Huang Rui wrote: > Xen PVH is the paravirtualized mode and takes advantage of hardware > virtualization support when possible. It will using the hardware IOMMU > support instead of xen-swiotlb, so disable swiotlb if current domain is > Xen PVH. But the kernel has no way (yet) to drive the IOMMU, so how can it get away without resorting to swiotlb in certain cases (like I/O to an address-restricted device)? Jan
On Mon, 13 Mar 2023, Jan Beulich wrote: > On 12.03.2023 13:01, Huang Rui wrote: > > Xen PVH is the paravirtualized mode and takes advantage of hardware > > virtualization support when possible. It will using the hardware IOMMU > > support instead of xen-swiotlb, so disable swiotlb if current domain is > > Xen PVH. > > But the kernel has no way (yet) to drive the IOMMU, so how can it get > away without resorting to swiotlb in certain cases (like I/O to an > address-restricted device)? I think Ray meant that, thanks to the IOMMU setup by Xen, there is no need for swiotlb-xen in Dom0. Address translations are done by the IOMMU so we can use guest physical addresses instead of machine addresses for DMA. This is a similar case to Dom0 on ARM when the IOMMU is available (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding case is XENFEAT_not_direct_mapped). Jurgen, what do you think? Would you rather make xen_swiotlb_detect common between ARM and x86?
On Wed, Mar 15, 2023 at 08:52:30AM +0800, Stefano Stabellini wrote: > On Mon, 13 Mar 2023, Jan Beulich wrote: > > On 12.03.2023 13:01, Huang Rui wrote: > > > Xen PVH is the paravirtualized mode and takes advantage of hardware > > > virtualization support when possible. It will using the hardware IOMMU > > > support instead of xen-swiotlb, so disable swiotlb if current domain is > > > Xen PVH. > > > > But the kernel has no way (yet) to drive the IOMMU, so how can it get > > away without resorting to swiotlb in certain cases (like I/O to an > > address-restricted device)? > > I think Ray meant that, thanks to the IOMMU setup by Xen, there is no > need for swiotlb-xen in Dom0. Address translations are done by the IOMMU > so we can use guest physical addresses instead of machine addresses for > DMA. This is a similar case to Dom0 on ARM when the IOMMU is available > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding > case is XENFEAT_not_direct_mapped). Hi Jan, sorry to late reply. We are using the native kernel amdgpu and ttm driver on Dom0, amdgpu/ttm would like to use IOMMU to allocate coherent buffers for userptr that map the user space memory to gpu access, however, swiotlb doesn't support this. In other words, with swiotlb, we only can handle the buffer page by page. Thanks, Ray > > Jurgen, what do you think? Would you rather make xen_swiotlb_detect > common between ARM and x86?
On 15.03.2023 01:52, Stefano Stabellini wrote: > On Mon, 13 Mar 2023, Jan Beulich wrote: >> On 12.03.2023 13:01, Huang Rui wrote: >>> Xen PVH is the paravirtualized mode and takes advantage of hardware >>> virtualization support when possible. It will using the hardware IOMMU >>> support instead of xen-swiotlb, so disable swiotlb if current domain is >>> Xen PVH. >> >> But the kernel has no way (yet) to drive the IOMMU, so how can it get >> away without resorting to swiotlb in certain cases (like I/O to an >> address-restricted device)? > > I think Ray meant that, thanks to the IOMMU setup by Xen, there is no > need for swiotlb-xen in Dom0. Address translations are done by the IOMMU > so we can use guest physical addresses instead of machine addresses for > DMA. This is a similar case to Dom0 on ARM when the IOMMU is available > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding > case is XENFEAT_not_direct_mapped). But how does Xen using an IOMMU help with, as said, address-restricted devices? They may still need e.g. a 32-bit address to be programmed in, and if the kernel has memory beyond the 4G boundary not all I/O buffers may fulfill this requirement. Jan
On 15.03.2023 05:14, Huang Rui wrote: > On Wed, Mar 15, 2023 at 08:52:30AM +0800, Stefano Stabellini wrote: >> On Mon, 13 Mar 2023, Jan Beulich wrote: >>> On 12.03.2023 13:01, Huang Rui wrote: >>>> Xen PVH is the paravirtualized mode and takes advantage of hardware >>>> virtualization support when possible. It will using the hardware IOMMU >>>> support instead of xen-swiotlb, so disable swiotlb if current domain is >>>> Xen PVH. >>> >>> But the kernel has no way (yet) to drive the IOMMU, so how can it get >>> away without resorting to swiotlb in certain cases (like I/O to an >>> address-restricted device)? >> >> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no >> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU >> so we can use guest physical addresses instead of machine addresses for >> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available >> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding >> case is XENFEAT_not_direct_mapped). > > Hi Jan, sorry to late reply. We are using the native kernel amdgpu and ttm > driver on Dom0, amdgpu/ttm would like to use IOMMU to allocate coherent > buffers for userptr that map the user space memory to gpu access, however, > swiotlb doesn't support this. In other words, with swiotlb, we only can > handle the buffer page by page. But how does outright disabling swiotlb help with this? There still wouldn't be an IOMMU that your kernel has control over. Looks like you want something like pvIOMMU, but that work was never completed. And even then the swiotlb may continue to be needed for other purposes. Jan
On Wed, 15 Mar 2023, Jan Beulich wrote: > On 15.03.2023 01:52, Stefano Stabellini wrote: > > On Mon, 13 Mar 2023, Jan Beulich wrote: > >> On 12.03.2023 13:01, Huang Rui wrote: > >>> Xen PVH is the paravirtualized mode and takes advantage of hardware > >>> virtualization support when possible. It will using the hardware IOMMU > >>> support instead of xen-swiotlb, so disable swiotlb if current domain is > >>> Xen PVH. > >> > >> But the kernel has no way (yet) to drive the IOMMU, so how can it get > >> away without resorting to swiotlb in certain cases (like I/O to an > >> address-restricted device)? > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, there is no > > need for swiotlb-xen in Dom0. Address translations are done by the IOMMU > > so we can use guest physical addresses instead of machine addresses for > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is available > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding > > case is XENFEAT_not_direct_mapped). > > But how does Xen using an IOMMU help with, as said, address-restricted > devices? They may still need e.g. a 32-bit address to be programmed in, > and if the kernel has memory beyond the 4G boundary not all I/O buffers > may fulfill this requirement. In short, it is going to work as long as Linux has guest physical addresses (not machine addresses, those could be anything) lower than 4GB. If the address-restricted device does DMA via an IOMMU, then the device gets programmed by Linux using its guest physical addresses (not machine addresses). The 32-bit restriction would be applied by Linux to its choice of guest physical address to use to program the device, the same way it does on native. The device would be fine as it always uses Linux-provided <4GB addresses. After the IOMMU translation (pagetable setup by Xen), we could get any address, including >4GB addresses, and that is expected to work.
On 16.03.2023 00:25, Stefano Stabellini wrote: > On Wed, 15 Mar 2023, Jan Beulich wrote: >> On 15.03.2023 01:52, Stefano Stabellini wrote: >>> On Mon, 13 Mar 2023, Jan Beulich wrote: >>>> On 12.03.2023 13:01, Huang Rui wrote: >>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware >>>>> virtualization support when possible. It will using the hardware IOMMU >>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is >>>>> Xen PVH. >>>> >>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get >>>> away without resorting to swiotlb in certain cases (like I/O to an >>>> address-restricted device)? >>> >>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no >>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU >>> so we can use guest physical addresses instead of machine addresses for >>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available >>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding >>> case is XENFEAT_not_direct_mapped). >> >> But how does Xen using an IOMMU help with, as said, address-restricted >> devices? They may still need e.g. a 32-bit address to be programmed in, >> and if the kernel has memory beyond the 4G boundary not all I/O buffers >> may fulfill this requirement. > > In short, it is going to work as long as Linux has guest physical > addresses (not machine addresses, those could be anything) lower than > 4GB. > > If the address-restricted device does DMA via an IOMMU, then the device > gets programmed by Linux using its guest physical addresses (not machine > addresses). > > The 32-bit restriction would be applied by Linux to its choice of guest > physical address to use to program the device, the same way it does on > native. The device would be fine as it always uses Linux-provided <4GB > addresses. After the IOMMU translation (pagetable setup by Xen), we > could get any address, including >4GB addresses, and that is expected to > work. I understand that's the "normal" way of working. But whatever the swiotlb is used for in baremetal Linux, that would similarly require its use in PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to me like an incomplete attempt to disable its use altogether on x86. What difference of PVH vs baremetal am I missing here? Jan
On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote: > > On 16.03.2023 00:25, Stefano Stabellini wrote: > > On Wed, 15 Mar 2023, Jan Beulich wrote: > >> On 15.03.2023 01:52, Stefano Stabellini wrote: > >>> On Mon, 13 Mar 2023, Jan Beulich wrote: > >>>> On 12.03.2023 13:01, Huang Rui wrote: > >>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware > >>>>> virtualization support when possible. It will using the hardware IOMMU > >>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is > >>>>> Xen PVH. > >>>> > >>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get > >>>> away without resorting to swiotlb in certain cases (like I/O to an > >>>> address-restricted device)? > >>> > >>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no > >>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU > >>> so we can use guest physical addresses instead of machine addresses for > >>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available > >>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding > >>> case is XENFEAT_not_direct_mapped). > >> > >> But how does Xen using an IOMMU help with, as said, address-restricted > >> devices? They may still need e.g. a 32-bit address to be programmed in, > >> and if the kernel has memory beyond the 4G boundary not all I/O buffers > >> may fulfill this requirement. > > > > In short, it is going to work as long as Linux has guest physical > > addresses (not machine addresses, those could be anything) lower than > > 4GB. > > > > If the address-restricted device does DMA via an IOMMU, then the device > > gets programmed by Linux using its guest physical addresses (not machine > > addresses). > > > > The 32-bit restriction would be applied by Linux to its choice of guest > > physical address to use to program the device, the same way it does on > > native. The device would be fine as it always uses Linux-provided <4GB > > addresses. After the IOMMU translation (pagetable setup by Xen), we > > could get any address, including >4GB addresses, and that is expected to > > work. > > I understand that's the "normal" way of working. But whatever the swiotlb > is used for in baremetal Linux, that would similarly require its use in > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to > me like an incomplete attempt to disable its use altogether on x86. What > difference of PVH vs baremetal am I missing here? swiotlb is not usable for GPUs even on bare metal. They often have hundreds or megs or even gigs of memory mapped on the device at any given time. Also, AMD GPUs support 44-48 bit DMA masks (depending on the chip family). Alex
On 16.03.23 14:45, Alex Deucher wrote: > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote: >> >> On 16.03.2023 00:25, Stefano Stabellini wrote: >>> On Wed, 15 Mar 2023, Jan Beulich wrote: >>>> On 15.03.2023 01:52, Stefano Stabellini wrote: >>>>> On Mon, 13 Mar 2023, Jan Beulich wrote: >>>>>> On 12.03.2023 13:01, Huang Rui wrote: >>>>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware >>>>>>> virtualization support when possible. It will using the hardware IOMMU >>>>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is >>>>>>> Xen PVH. >>>>>> >>>>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get >>>>>> away without resorting to swiotlb in certain cases (like I/O to an >>>>>> address-restricted device)? >>>>> >>>>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no >>>>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU >>>>> so we can use guest physical addresses instead of machine addresses for >>>>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available >>>>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding >>>>> case is XENFEAT_not_direct_mapped). >>>> >>>> But how does Xen using an IOMMU help with, as said, address-restricted >>>> devices? They may still need e.g. a 32-bit address to be programmed in, >>>> and if the kernel has memory beyond the 4G boundary not all I/O buffers >>>> may fulfill this requirement. >>> >>> In short, it is going to work as long as Linux has guest physical >>> addresses (not machine addresses, those could be anything) lower than >>> 4GB. >>> >>> If the address-restricted device does DMA via an IOMMU, then the device >>> gets programmed by Linux using its guest physical addresses (not machine >>> addresses). >>> >>> The 32-bit restriction would be applied by Linux to its choice of guest >>> physical address to use to program the device, the same way it does on >>> native. The device would be fine as it always uses Linux-provided <4GB >>> addresses. After the IOMMU translation (pagetable setup by Xen), we >>> could get any address, including >4GB addresses, and that is expected to >>> work. >> >> I understand that's the "normal" way of working. But whatever the swiotlb >> is used for in baremetal Linux, that would similarly require its use in >> PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to >> me like an incomplete attempt to disable its use altogether on x86. What >> difference of PVH vs baremetal am I missing here? > > swiotlb is not usable for GPUs even on bare metal. They often have > hundreds or megs or even gigs of memory mapped on the device at any > given time. Also, AMD GPUs support 44-48 bit DMA masks (depending on > the chip family). But the swiotlb isn't per device, but system global. Juergen
On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote: > > On 16.03.23 14:45, Alex Deucher wrote: > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote: > >> > >> On 16.03.2023 00:25, Stefano Stabellini wrote: > >>> On Wed, 15 Mar 2023, Jan Beulich wrote: > >>>> On 15.03.2023 01:52, Stefano Stabellini wrote: > >>>>> On Mon, 13 Mar 2023, Jan Beulich wrote: > >>>>>> On 12.03.2023 13:01, Huang Rui wrote: > >>>>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware > >>>>>>> virtualization support when possible. It will using the hardware IOMMU > >>>>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is > >>>>>>> Xen PVH. > >>>>>> > >>>>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get > >>>>>> away without resorting to swiotlb in certain cases (like I/O to an > >>>>>> address-restricted device)? > >>>>> > >>>>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no > >>>>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU > >>>>> so we can use guest physical addresses instead of machine addresses for > >>>>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available > >>>>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding > >>>>> case is XENFEAT_not_direct_mapped). > >>>> > >>>> But how does Xen using an IOMMU help with, as said, address-restricted > >>>> devices? They may still need e.g. a 32-bit address to be programmed in, > >>>> and if the kernel has memory beyond the 4G boundary not all I/O buffers > >>>> may fulfill this requirement. > >>> > >>> In short, it is going to work as long as Linux has guest physical > >>> addresses (not machine addresses, those could be anything) lower than > >>> 4GB. > >>> > >>> If the address-restricted device does DMA via an IOMMU, then the device > >>> gets programmed by Linux using its guest physical addresses (not machine > >>> addresses). > >>> > >>> The 32-bit restriction would be applied by Linux to its choice of guest > >>> physical address to use to program the device, the same way it does on > >>> native. The device would be fine as it always uses Linux-provided <4GB > >>> addresses. After the IOMMU translation (pagetable setup by Xen), we > >>> could get any address, including >4GB addresses, and that is expected to > >>> work. > >> > >> I understand that's the "normal" way of working. But whatever the swiotlb > >> is used for in baremetal Linux, that would similarly require its use in > >> PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to > >> me like an incomplete attempt to disable its use altogether on x86. What > >> difference of PVH vs baremetal am I missing here? > > > > swiotlb is not usable for GPUs even on bare metal. They often have > > hundreds or megs or even gigs of memory mapped on the device at any > > given time. Also, AMD GPUs support 44-48 bit DMA masks (depending on > > the chip family). > > But the swiotlb isn't per device, but system global. Sure, but if the swiotlb is in use, then you can't really use the GPU. So you get to pick one. Alex
On 16.03.2023 14:53, Alex Deucher wrote: > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote: >> >> On 16.03.23 14:45, Alex Deucher wrote: >>> On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote: >>>> >>>> On 16.03.2023 00:25, Stefano Stabellini wrote: >>>>> On Wed, 15 Mar 2023, Jan Beulich wrote: >>>>>> On 15.03.2023 01:52, Stefano Stabellini wrote: >>>>>>> On Mon, 13 Mar 2023, Jan Beulich wrote: >>>>>>>> On 12.03.2023 13:01, Huang Rui wrote: >>>>>>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware >>>>>>>>> virtualization support when possible. It will using the hardware IOMMU >>>>>>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is >>>>>>>>> Xen PVH. >>>>>>>> >>>>>>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get >>>>>>>> away without resorting to swiotlb in certain cases (like I/O to an >>>>>>>> address-restricted device)? >>>>>>> >>>>>>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no >>>>>>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU >>>>>>> so we can use guest physical addresses instead of machine addresses for >>>>>>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available >>>>>>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding >>>>>>> case is XENFEAT_not_direct_mapped). >>>>>> >>>>>> But how does Xen using an IOMMU help with, as said, address-restricted >>>>>> devices? They may still need e.g. a 32-bit address to be programmed in, >>>>>> and if the kernel has memory beyond the 4G boundary not all I/O buffers >>>>>> may fulfill this requirement. >>>>> >>>>> In short, it is going to work as long as Linux has guest physical >>>>> addresses (not machine addresses, those could be anything) lower than >>>>> 4GB. >>>>> >>>>> If the address-restricted device does DMA via an IOMMU, then the device >>>>> gets programmed by Linux using its guest physical addresses (not machine >>>>> addresses). >>>>> >>>>> The 32-bit restriction would be applied by Linux to its choice of guest >>>>> physical address to use to program the device, the same way it does on >>>>> native. The device would be fine as it always uses Linux-provided <4GB >>>>> addresses. After the IOMMU translation (pagetable setup by Xen), we >>>>> could get any address, including >4GB addresses, and that is expected to >>>>> work. >>>> >>>> I understand that's the "normal" way of working. But whatever the swiotlb >>>> is used for in baremetal Linux, that would similarly require its use in >>>> PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to >>>> me like an incomplete attempt to disable its use altogether on x86. What >>>> difference of PVH vs baremetal am I missing here? >>> >>> swiotlb is not usable for GPUs even on bare metal. They often have >>> hundreds or megs or even gigs of memory mapped on the device at any >>> given time. Also, AMD GPUs support 44-48 bit DMA masks (depending on >>> the chip family). >> >> But the swiotlb isn't per device, but system global. > > Sure, but if the swiotlb is in use, then you can't really use the GPU. > So you get to pick one. Yet that "pick one" then can't be an unconditional disable in the source code. If there's no way to avoid swiotlb on a per-device basis, then users will need to be told to arrange for this via command line option when they want to use the GPU is certain ways. Jan
On 16.03.23 14:53, Alex Deucher wrote: > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote: >> >> On 16.03.23 14:45, Alex Deucher wrote: >>> On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote: >>>> >>>> On 16.03.2023 00:25, Stefano Stabellini wrote: >>>>> On Wed, 15 Mar 2023, Jan Beulich wrote: >>>>>> On 15.03.2023 01:52, Stefano Stabellini wrote: >>>>>>> On Mon, 13 Mar 2023, Jan Beulich wrote: >>>>>>>> On 12.03.2023 13:01, Huang Rui wrote: >>>>>>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware >>>>>>>>> virtualization support when possible. It will using the hardware IOMMU >>>>>>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is >>>>>>>>> Xen PVH. >>>>>>>> >>>>>>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get >>>>>>>> away without resorting to swiotlb in certain cases (like I/O to an >>>>>>>> address-restricted device)? >>>>>>> >>>>>>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no >>>>>>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU >>>>>>> so we can use guest physical addresses instead of machine addresses for >>>>>>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available >>>>>>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding >>>>>>> case is XENFEAT_not_direct_mapped). >>>>>> >>>>>> But how does Xen using an IOMMU help with, as said, address-restricted >>>>>> devices? They may still need e.g. a 32-bit address to be programmed in, >>>>>> and if the kernel has memory beyond the 4G boundary not all I/O buffers >>>>>> may fulfill this requirement. >>>>> >>>>> In short, it is going to work as long as Linux has guest physical >>>>> addresses (not machine addresses, those could be anything) lower than >>>>> 4GB. >>>>> >>>>> If the address-restricted device does DMA via an IOMMU, then the device >>>>> gets programmed by Linux using its guest physical addresses (not machine >>>>> addresses). >>>>> >>>>> The 32-bit restriction would be applied by Linux to its choice of guest >>>>> physical address to use to program the device, the same way it does on >>>>> native. The device would be fine as it always uses Linux-provided <4GB >>>>> addresses. After the IOMMU translation (pagetable setup by Xen), we >>>>> could get any address, including >4GB addresses, and that is expected to >>>>> work. >>>> >>>> I understand that's the "normal" way of working. But whatever the swiotlb >>>> is used for in baremetal Linux, that would similarly require its use in >>>> PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to >>>> me like an incomplete attempt to disable its use altogether on x86. What >>>> difference of PVH vs baremetal am I missing here? >>> >>> swiotlb is not usable for GPUs even on bare metal. They often have >>> hundreds or megs or even gigs of memory mapped on the device at any >>> given time. Also, AMD GPUs support 44-48 bit DMA masks (depending on >>> the chip family). >> >> But the swiotlb isn't per device, but system global. > > Sure, but if the swiotlb is in use, then you can't really use the GPU. > So you get to pick one. The swiotlb is used only for buffers which are not within the DMA mask of a device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask won't use the swiotlb unless you have a buffer above guest physical address of 16TB (so basically never). Disabling swiotlb in such a guest would OTOH mean, that a device with only 32 bit DMA mask passed through to this guest couldn't work with buffers above 4GB. I don't think this is acceptable. Juergen
On Thu, 16 Mar 2023, Juergen Gross wrote: > On 16.03.23 14:53, Alex Deucher wrote: > > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote: > > > > > > On 16.03.23 14:45, Alex Deucher wrote: > > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote: > > > > > > > > > > On 16.03.2023 00:25, Stefano Stabellini wrote: > > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote: > > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote: > > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote: > > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote: > > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of > > > > > > > > > > hardware > > > > > > > > > > virtualization support when possible. It will using the > > > > > > > > > > hardware IOMMU > > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if > > > > > > > > > > current domain is > > > > > > > > > > Xen PVH. > > > > > > > > > > > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how can > > > > > > > > > it get > > > > > > > > > away without resorting to swiotlb in certain cases (like I/O > > > > > > > > > to an > > > > > > > > > address-restricted device)? > > > > > > > > > > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, there > > > > > > > > is no > > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by > > > > > > > > the IOMMU > > > > > > > > so we can use guest physical addresses instead of machine > > > > > > > > addresses for > > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is > > > > > > > > available > > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the > > > > > > > > corresponding > > > > > > > > case is XENFEAT_not_direct_mapped). > > > > > > > > > > > > > > But how does Xen using an IOMMU help with, as said, > > > > > > > address-restricted > > > > > > > devices? They may still need e.g. a 32-bit address to be > > > > > > > programmed in, > > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O > > > > > > > buffers > > > > > > > may fulfill this requirement. > > > > > > > > > > > > In short, it is going to work as long as Linux has guest physical > > > > > > addresses (not machine addresses, those could be anything) lower > > > > > > than > > > > > > 4GB. > > > > > > > > > > > > If the address-restricted device does DMA via an IOMMU, then the > > > > > > device > > > > > > gets programmed by Linux using its guest physical addresses (not > > > > > > machine > > > > > > addresses). > > > > > > > > > > > > The 32-bit restriction would be applied by Linux to its choice of > > > > > > guest > > > > > > physical address to use to program the device, the same way it does > > > > > > on > > > > > > native. The device would be fine as it always uses Linux-provided > > > > > > <4GB > > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), we > > > > > > could get any address, including >4GB addresses, and that is > > > > > > expected to > > > > > > work. > > > > > > > > > > I understand that's the "normal" way of working. But whatever the > > > > > swiotlb > > > > > is used for in baremetal Linux, that would similarly require its use > > > > > in > > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look > > > > > to > > > > > me like an incomplete attempt to disable its use altogether on x86. > > > > > What > > > > > difference of PVH vs baremetal am I missing here? > > > > > > > > swiotlb is not usable for GPUs even on bare metal. They often have > > > > hundreds or megs or even gigs of memory mapped on the device at any > > > > given time. Also, AMD GPUs support 44-48 bit DMA masks (depending on > > > > the chip family). > > > > > > But the swiotlb isn't per device, but system global. > > > > Sure, but if the swiotlb is in use, then you can't really use the GPU. > > So you get to pick one. > > The swiotlb is used only for buffers which are not within the DMA mask of a > device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask > won't use the swiotlb unless you have a buffer above guest physical address of > 16TB (so basically never). > > Disabling swiotlb in such a guest would OTOH mean, that a device with only > 32 bit DMA mask passed through to this guest couldn't work with buffers > above 4GB. > > I don't think this is acceptable. From the Xen subsystem in Linux point of view, the only thing we need to do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not the global swiotlb) on PVH because it is not needed anyway. I think we should leave the global "swiotlb" setting alone. The global swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to have a way to deal with swiotlb/GPU incompatibilities. We just have to avoid making things worse on Xen, and for that we just need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables swiotlb, then we have a good Linux configuration capable of handling the GPU properly. Alex, please correct me if I am wrong. How is x86_swiotlb_enable set to false on native (non-Xen) x86?
On Thu, Mar 16, 2023 at 7:09 PM Stefano Stabellini <sstabellini@kernel.org> wrote: > > On Thu, 16 Mar 2023, Juergen Gross wrote: > > On 16.03.23 14:53, Alex Deucher wrote: > > > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote: > > > > > > > > On 16.03.23 14:45, Alex Deucher wrote: > > > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote: > > > > > > > > > > > > On 16.03.2023 00:25, Stefano Stabellini wrote: > > > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote: > > > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote: > > > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote: > > > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote: > > > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of > > > > > > > > > > > hardware > > > > > > > > > > > virtualization support when possible. It will using the > > > > > > > > > > > hardware IOMMU > > > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if > > > > > > > > > > > current domain is > > > > > > > > > > > Xen PVH. > > > > > > > > > > > > > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how can > > > > > > > > > > it get > > > > > > > > > > away without resorting to swiotlb in certain cases (like I/O > > > > > > > > > > to an > > > > > > > > > > address-restricted device)? > > > > > > > > > > > > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, there > > > > > > > > > is no > > > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by > > > > > > > > > the IOMMU > > > > > > > > > so we can use guest physical addresses instead of machine > > > > > > > > > addresses for > > > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is > > > > > > > > > available > > > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the > > > > > > > > > corresponding > > > > > > > > > case is XENFEAT_not_direct_mapped). > > > > > > > > > > > > > > > > But how does Xen using an IOMMU help with, as said, > > > > > > > > address-restricted > > > > > > > > devices? They may still need e.g. a 32-bit address to be > > > > > > > > programmed in, > > > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O > > > > > > > > buffers > > > > > > > > may fulfill this requirement. > > > > > > > > > > > > > > In short, it is going to work as long as Linux has guest physical > > > > > > > addresses (not machine addresses, those could be anything) lower > > > > > > > than > > > > > > > 4GB. > > > > > > > > > > > > > > If the address-restricted device does DMA via an IOMMU, then the > > > > > > > device > > > > > > > gets programmed by Linux using its guest physical addresses (not > > > > > > > machine > > > > > > > addresses). > > > > > > > > > > > > > > The 32-bit restriction would be applied by Linux to its choice of > > > > > > > guest > > > > > > > physical address to use to program the device, the same way it does > > > > > > > on > > > > > > > native. The device would be fine as it always uses Linux-provided > > > > > > > <4GB > > > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), we > > > > > > > could get any address, including >4GB addresses, and that is > > > > > > > expected to > > > > > > > work. > > > > > > > > > > > > I understand that's the "normal" way of working. But whatever the > > > > > > swiotlb > > > > > > is used for in baremetal Linux, that would similarly require its use > > > > > > in > > > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look > > > > > > to > > > > > > me like an incomplete attempt to disable its use altogether on x86. > > > > > > What > > > > > > difference of PVH vs baremetal am I missing here? > > > > > > > > > > swiotlb is not usable for GPUs even on bare metal. They often have > > > > > hundreds or megs or even gigs of memory mapped on the device at any > > > > > given time. Also, AMD GPUs support 44-48 bit DMA masks (depending on > > > > > the chip family). > > > > > > > > But the swiotlb isn't per device, but system global. > > > > > > Sure, but if the swiotlb is in use, then you can't really use the GPU. > > > So you get to pick one. > > > > The swiotlb is used only for buffers which are not within the DMA mask of a > > device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask > > won't use the swiotlb unless you have a buffer above guest physical address of > > 16TB (so basically never). > > > > Disabling swiotlb in such a guest would OTOH mean, that a device with only > > 32 bit DMA mask passed through to this guest couldn't work with buffers > > above 4GB. > > > > I don't think this is acceptable. > > From the Xen subsystem in Linux point of view, the only thing we need to > do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not > the global swiotlb) on PVH because it is not needed anyway. > > I think we should leave the global "swiotlb" setting alone. The global > swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to > have a way to deal with swiotlb/GPU incompatibilities. > > We just have to avoid making things worse on Xen, and for that we just > need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem > doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables > swiotlb, then we have a good Linux configuration capable of handling the > GPU properly. > > Alex, please correct me if I am wrong. How is x86_swiotlb_enable set to > false on native (non-Xen) x86? In most cases we have an IOMMU enabled and IIRC, TTM has slightly different behavior for memory allocation depending on whether swiotlb would be needed or not. Alex
Am 17.03.23 um 15:45 schrieb Alex Deucher: > On Thu, Mar 16, 2023 at 7:09 PM Stefano Stabellini > <sstabellini@kernel.org> wrote: >> On Thu, 16 Mar 2023, Juergen Gross wrote: >>> On 16.03.23 14:53, Alex Deucher wrote: >>>> On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote: >>>>> On 16.03.23 14:45, Alex Deucher wrote: >>>>>> On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote: >>>>>>> On 16.03.2023 00:25, Stefano Stabellini wrote: >>>>>>>> On Wed, 15 Mar 2023, Jan Beulich wrote: >>>>>>>>> On 15.03.2023 01:52, Stefano Stabellini wrote: >>>>>>>>>> On Mon, 13 Mar 2023, Jan Beulich wrote: >>>>>>>>>>> On 12.03.2023 13:01, Huang Rui wrote: >>>>>>>>>>>> Xen PVH is the paravirtualized mode and takes advantage of >>>>>>>>>>>> hardware >>>>>>>>>>>> virtualization support when possible. It will using the >>>>>>>>>>>> hardware IOMMU >>>>>>>>>>>> support instead of xen-swiotlb, so disable swiotlb if >>>>>>>>>>>> current domain is >>>>>>>>>>>> Xen PVH. >>>>>>>>>>> But the kernel has no way (yet) to drive the IOMMU, so how can >>>>>>>>>>> it get >>>>>>>>>>> away without resorting to swiotlb in certain cases (like I/O >>>>>>>>>>> to an >>>>>>>>>>> address-restricted device)? >>>>>>>>>> I think Ray meant that, thanks to the IOMMU setup by Xen, there >>>>>>>>>> is no >>>>>>>>>> need for swiotlb-xen in Dom0. Address translations are done by >>>>>>>>>> the IOMMU >>>>>>>>>> so we can use guest physical addresses instead of machine >>>>>>>>>> addresses for >>>>>>>>>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is >>>>>>>>>> available >>>>>>>>>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the >>>>>>>>>> corresponding >>>>>>>>>> case is XENFEAT_not_direct_mapped). >>>>>>>>> But how does Xen using an IOMMU help with, as said, >>>>>>>>> address-restricted >>>>>>>>> devices? They may still need e.g. a 32-bit address to be >>>>>>>>> programmed in, >>>>>>>>> and if the kernel has memory beyond the 4G boundary not all I/O >>>>>>>>> buffers >>>>>>>>> may fulfill this requirement. >>>>>>>> In short, it is going to work as long as Linux has guest physical >>>>>>>> addresses (not machine addresses, those could be anything) lower >>>>>>>> than >>>>>>>> 4GB. >>>>>>>> >>>>>>>> If the address-restricted device does DMA via an IOMMU, then the >>>>>>>> device >>>>>>>> gets programmed by Linux using its guest physical addresses (not >>>>>>>> machine >>>>>>>> addresses). >>>>>>>> >>>>>>>> The 32-bit restriction would be applied by Linux to its choice of >>>>>>>> guest >>>>>>>> physical address to use to program the device, the same way it does >>>>>>>> on >>>>>>>> native. The device would be fine as it always uses Linux-provided >>>>>>>> <4GB >>>>>>>> addresses. After the IOMMU translation (pagetable setup by Xen), we >>>>>>>> could get any address, including >4GB addresses, and that is >>>>>>>> expected to >>>>>>>> work. >>>>>>> I understand that's the "normal" way of working. But whatever the >>>>>>> swiotlb >>>>>>> is used for in baremetal Linux, that would similarly require its use >>>>>>> in >>>>>>> PVH (or HVM) aiui. So unconditionally disabling it in PVH would look >>>>>>> to >>>>>>> me like an incomplete attempt to disable its use altogether on x86. >>>>>>> What >>>>>>> difference of PVH vs baremetal am I missing here? >>>>>> swiotlb is not usable for GPUs even on bare metal. They often have >>>>>> hundreds or megs or even gigs of memory mapped on the device at any >>>>>> given time. Also, AMD GPUs support 44-48 bit DMA masks (depending on >>>>>> the chip family). >>>>> But the swiotlb isn't per device, but system global. >>>> Sure, but if the swiotlb is in use, then you can't really use the GPU. >>>> So you get to pick one. >>> The swiotlb is used only for buffers which are not within the DMA mask of a >>> device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask >>> won't use the swiotlb unless you have a buffer above guest physical address of >>> 16TB (so basically never). >>> >>> Disabling swiotlb in such a guest would OTOH mean, that a device with only >>> 32 bit DMA mask passed through to this guest couldn't work with buffers >>> above 4GB. >>> >>> I don't think this is acceptable. >> From the Xen subsystem in Linux point of view, the only thing we need to >> do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not >> the global swiotlb) on PVH because it is not needed anyway. >> >> I think we should leave the global "swiotlb" setting alone. The global >> swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to >> have a way to deal with swiotlb/GPU incompatibilities. >> >> We just have to avoid making things worse on Xen, and for that we just >> need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem >> doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables >> swiotlb, then we have a good Linux configuration capable of handling the >> GPU properly. >> >> Alex, please correct me if I am wrong. How is x86_swiotlb_enable set to >> false on native (non-Xen) x86? > In most cases we have an IOMMU enabled and IIRC, TTM has slightly > different behavior for memory allocation depending on whether swiotlb > would be needed or not. Well "slightly different" is an understatement. We need to disable quite a bunch of features to make swiotlb work with GPUs. Especially userptr and inter device sharing won't work any more. Regards, Christian. > > Alex
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c index 30bbe4abb5d6..f5c73dd18f2a 100644 --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -74,6 +74,12 @@ static inline void __init pci_swiotlb_detect(void) #ifdef CONFIG_SWIOTLB_XEN static void __init pci_xen_swiotlb_init(void) { + /* Xen PVH domain won't use swiotlb */ + if (xen_pvh_domain()) { + x86_swiotlb_enable = false; + return; + } + if (!xen_initial_domain() && !x86_swiotlb_enable) return; x86_swiotlb_enable = true; @@ -86,7 +92,7 @@ static void __init pci_xen_swiotlb_init(void) int pci_xen_swiotlb_init_late(void) { - if (dma_ops == &xen_swiotlb_dma_ops) + if (xen_pvh_domain() || dma_ops == &xen_swiotlb_dma_ops) return 0; /* we can work with the default swiotlb */