Message ID | 20230518134253.909623-3-hch@lst.de |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp512420vqo; Thu, 18 May 2023 06:59:54 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6+EWHn0qop1sajPeiGAXMUB5c2z1wTlCPSXLYFLHzTy7mvhmtu+8Tull55faAO6lizYBLt X-Received: by 2002:a05:6a20:918e:b0:100:d061:52ca with SMTP id v14-20020a056a20918e00b00100d06152camr2818036pzd.50.1684418394185; Thu, 18 May 2023 06:59:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684418394; cv=none; d=google.com; s=arc-20160816; b=Cl9+DT0VBOKCgkUCEIlHuADS7uVsMbbE12DpF0dWLMGJlxLjmn/xjwk7eF69Onv4tM RChmzXiMsDkykg070/SPnYlTJRwisDKuW2dBG01ffYlcmrYTozZ+zn8JVOpiHzLJxCV1 FUOZkLvJ+zbavJDphpPHi5q+24Fg2d+HAOwXjzlU/O83rzGK15mZ+yTUDE2MIIorHrf1 +YwQa5F2SgbQ5AaAFIN4/pUD7F5fLhHqgBYykTSwSt+GaUoBBvqUxlGosFKi9SxCS7eW CRWQXulZJFs4Xkkr+fwVLglBQLrZ+zzZYXPPnKMZ58KUs5esu022f82gUU2jun1lfGrS BLGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=05AyJy/WnduYw6cv5NEmxDHJJYQdGW0Vqnw+CujW2f0=; b=vE8xM68zLdyU/c6y3JCWsSYdR6cDh/cBlrbW/Ka5xec5X4M1cm5kyohrtdi7psBxLy s1DCXNORJvLDRmCFpX26vhOSjLoD8Mdh7vRPRES4EF6V5saPvoLtCKK9UzYHQLqh95A+ zfj1pKelMCoyUgaUVP15WKVRyf8mgMlo3NruGjWCAafX0djw6VMwMeZK3OOxuPLwcZp0 wIxCnKhWgy2a1P7ZU+3o3LVKp0A/3Ysh5ujYry8eMNJDpd8KFZRrizGrqB3Qnuqip01u rB+JvRbV1/+8Wxsm+cVnKkmdY6j+gfPxt1Qw8gE0/vsrUfGN2ErK94r0TjlGXisc9rl0 EbTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=YCm6+nCf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z4-20020a626504000000b0064cf289a327si1707246pfb.129.2023.05.18.06.59.41; Thu, 18 May 2023 06:59:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=YCm6+nCf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231311AbjERNn3 (ORCPT <rfc822;pacteraone@gmail.com> + 99 others); Thu, 18 May 2023 09:43:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231148AbjERNnY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 18 May 2023 09:43:24 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B367D19B for <linux-kernel@vger.kernel.org>; Thu, 18 May 2023 06:43:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=05AyJy/WnduYw6cv5NEmxDHJJYQdGW0Vqnw+CujW2f0=; b=YCm6+nCfk/tqJs7qLv0fLb0i6p rh6YUXDVTuctprWr6BzRIVl5659Y51hsG50TV0qeGB/0lKu4HFOpG4FGQ2JEFosMgtSoOG1Ow82Uz T9PjamZA1bYNoiR5YkTkBWsX5NhqcUyOh6poOma8ugLkFknBvmIaPStma9KpMTsBbUcJJGJj2GKOM m4FKMYO7r8cwQykZilTii7odYMouDxqaidikNF/yF3CHzf5J2OZ+NbaaCrMe+2WSAAYCfKSuT1EoI IoZGn66ql9UNMrCi9szQ7Xcw0aE6fgi6+RqZgIv8hKUVhPl2kjaVw0wt5trZqQLvqFJegnCIvf3F9 Xml1tTcw==; Received: from [2001:4bb8:188:3dd5:1149:8081:5f51:3e54] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pzduI-00D6RW-0C; Thu, 18 May 2023 13:43:02 +0000 From: Christoph Hellwig <hch@lst.de> To: Juergen Gross <jgross@suse.com>, Stefano Stabellini <sstabellini@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>, Ben Skeggs <bskeggs@redhat.com>, Karol Herbst <kherbst@redhat.com>, Lyude Paul <lyude@redhat.com> Cc: xen-devel@lists.xenproject.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, nouveau@lists.freedesktop.org Subject: [PATCH 2/4] x86: always initialize xen-swiotlb when xen-pcifront is enabling Date: Thu, 18 May 2023 15:42:51 +0200 Message-Id: <20230518134253.909623-3-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230518134253.909623-1-hch@lst.de> References: <20230518134253.909623-1-hch@lst.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SRS-Rewrite: SMTP reverse-path rewritten from <hch@infradead.org> by bombadil.infradead.org. See http://www.infradead.org/rpr.html X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766240702304880074?= X-GMAIL-MSGID: =?utf-8?q?1766240702304880074?= |
Series |
[1/4] x86: move a check out of pci_xen_swiotlb_init
|
|
Commit Message
Christoph Hellwig
May 18, 2023, 1:42 p.m. UTC
Remove the dangerous late initialization of xen-swiotlb in
pci_xen_swiotlb_init_late and instead just always initialize
xen-swiotlb in the boot code if CONFIG_XEN_PCIDEV_FRONTEND is enabled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/x86/include/asm/xen/swiotlb-xen.h | 6 ------
arch/x86/kernel/pci-dma.c | 25 +++----------------------
drivers/pci/xen-pcifront.c | 6 ------
3 files changed, 3 insertions(+), 34 deletions(-)
Comments
On Thu, May 18, 2023 at 03:42:51PM +0200, Christoph Hellwig wrote: > Remove the dangerous late initialization of xen-swiotlb in > pci_xen_swiotlb_init_late and instead just always initialize > xen-swiotlb in the boot code if CONFIG_XEN_PCIDEV_FRONTEND is enabled. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Doesn't it mean all the PV guests will basically waste 64MB of RAM by default each if they don't really have PCI devices? > --- > arch/x86/include/asm/xen/swiotlb-xen.h | 6 ------ > arch/x86/kernel/pci-dma.c | 25 +++---------------------- > drivers/pci/xen-pcifront.c | 6 ------ > 3 files changed, 3 insertions(+), 34 deletions(-) > > diff --git a/arch/x86/include/asm/xen/swiotlb-xen.h b/arch/x86/include/asm/xen/swiotlb-xen.h > index 77a2d19cc9909e..abde0f44df57dc 100644 > --- a/arch/x86/include/asm/xen/swiotlb-xen.h > +++ b/arch/x86/include/asm/xen/swiotlb-xen.h > @@ -2,12 +2,6 @@ > #ifndef _ASM_X86_SWIOTLB_XEN_H > #define _ASM_X86_SWIOTLB_XEN_H > > -#ifdef CONFIG_SWIOTLB_XEN > -extern int pci_xen_swiotlb_init_late(void); > -#else > -static inline int pci_xen_swiotlb_init_late(void) { return -ENXIO; } > -#endif > - > int xen_swiotlb_fixup(void *buf, unsigned long nslabs); > int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order, > unsigned int address_bits, > diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c > index f887b08ac5ffe4..c4a7ead9eb674e 100644 > --- a/arch/x86/kernel/pci-dma.c > +++ b/arch/x86/kernel/pci-dma.c > @@ -81,27 +81,6 @@ static void __init pci_xen_swiotlb_init(void) > if (IS_ENABLED(CONFIG_PCI)) > pci_request_acs(); > } > - > -int pci_xen_swiotlb_init_late(void) > -{ > - if (dma_ops == &xen_swiotlb_dma_ops) > - return 0; > - > - /* we can work with the default swiotlb */ > - if (!io_tlb_default_mem.nslabs) { > - int rc = swiotlb_init_late(swiotlb_size_or_default(), > - GFP_KERNEL, xen_swiotlb_fixup); > - if (rc < 0) > - return rc; > - } > - > - /* XXX: this switches the dma ops under live devices! */ > - dma_ops = &xen_swiotlb_dma_ops; > - if (IS_ENABLED(CONFIG_PCI)) > - pci_request_acs(); > - return 0; > -} > -EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late); > #else > static inline void __init pci_xen_swiotlb_init(void) > { > @@ -111,7 +90,9 @@ static inline void __init pci_xen_swiotlb_init(void) > void __init pci_iommu_alloc(void) > { > if (xen_pv_domain()) { > - if (xen_initial_domain() || x86_swiotlb_enable) > + if (xen_initial_domain() || > + IS_ENABLED(CONFIG_XEN_PCIDEV_FRONTEND) || > + x86_swiotlb_enable) > pci_xen_swiotlb_init(); > return; > } > diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c > index 83c0ab50676dff..11636634ae512f 100644 > --- a/drivers/pci/xen-pcifront.c > +++ b/drivers/pci/xen-pcifront.c > @@ -22,7 +22,6 @@ > #include <linux/bitops.h> > #include <linux/time.h> > #include <linux/ktime.h> > -#include <linux/swiotlb.h> > #include <xen/platform_pci.h> > > #include <asm/xen/swiotlb-xen.h> > @@ -669,11 +668,6 @@ static int pcifront_connect_and_init_dma(struct pcifront_device *pdev) > > spin_unlock(&pcifront_dev_lock); > > - if (!err && !is_swiotlb_active(&pdev->xdev->dev)) { > - err = pci_xen_swiotlb_init_late(); > - if (err) > - dev_err(&pdev->xdev->dev, "Could not setup SWIOTLB!\n"); > - } > return err; > } > > -- > 2.39.2 > >
On Thu, May 18, 2023 at 08:18:39PM +0200, Marek Marczykowski-Górecki wrote: > On Thu, May 18, 2023 at 03:42:51PM +0200, Christoph Hellwig wrote: > > Remove the dangerous late initialization of xen-swiotlb in > > pci_xen_swiotlb_init_late and instead just always initialize > > xen-swiotlb in the boot code if CONFIG_XEN_PCIDEV_FRONTEND is enabled. > > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > Doesn't it mean all the PV guests will basically waste 64MB of RAM > by default each if they don't really have PCI devices? If CONFIG_XEN_PCIDEV_FRONTEND is enabled, and the kernel's isn't booted with swiotlb=noforce, yes.
On Fri, May 19, 2023 at 06:04:05AM +0200, Christoph Hellwig wrote: > On Thu, May 18, 2023 at 08:18:39PM +0200, Marek Marczykowski-Górecki wrote: > > On Thu, May 18, 2023 at 03:42:51PM +0200, Christoph Hellwig wrote: > > > Remove the dangerous late initialization of xen-swiotlb in > > > pci_xen_swiotlb_init_late and instead just always initialize > > > xen-swiotlb in the boot code if CONFIG_XEN_PCIDEV_FRONTEND is enabled. > > > > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > > > Doesn't it mean all the PV guests will basically waste 64MB of RAM > > by default each if they don't really have PCI devices? > > If CONFIG_XEN_PCIDEV_FRONTEND is enabled, and the kernel's isn't booted > with swiotlb=noforce, yes. That's "a bit" unfortunate, since that might be significant part of the VM memory, or if you have a lot of VMs, a significant part of the host memory - it quickly adds up. While I would say PCI passthrough is not very common for PV guests, can the decision about xen-swiotlb be delayed until you can enumerate xenstore to check if there are any PCI devices connected (and not allocate xen-swiotlb by default if there are none)? This would still not cover the hotplug case (in which case, you'd need to force it with a cmdline), but at least you wouldn't loose much memory just because one of your VMs may use PCI passthrough (so, you have it enabled in your kernel). Please remember that guest kernel is not always under full control of the host admin, so making guests loose 64MB of RAM always, in default setup isn't good for customers of such VMs...
On Fri, May 19, 2023 at 12:10:26PM +0200, Marek Marczykowski-Górecki wrote: > While I would say PCI passthrough is not very common for PV guests, can > the decision about xen-swiotlb be delayed until you can enumerate > xenstore to check if there are any PCI devices connected (and not > allocate xen-swiotlb by default if there are none)? This would > still not cover the hotplug case (in which case, you'd need to force it > with a cmdline), but at least you wouldn't loose much memory just > because one of your VMs may use PCI passthrough (so, you have it enabled > in your kernel). How early can we query xenstore? We'd need to do this before setting up DMA for any device. The alternative would be to finally merge swiotlb-xen into swiotlb, in which case we might be able to do this later. Let me see what I can do there.
On Fri, May 19, 2023 at 01:49:46PM +0100, Andrew Cooper wrote: > > The alternative would be to finally merge swiotlb-xen into swiotlb, in > > which case we might be able to do this later. Let me see what I can > > do there. > > If that is an option, it would be great to reduce the special-cashing. I think it's doable, and I've been wanting it for a while. I just need motivated testers, but it seems like I just found at least two :)
On Fri, May 19, 2023 at 02:58:57PM +0200, Christoph Hellwig wrote: > On Fri, May 19, 2023 at 01:49:46PM +0100, Andrew Cooper wrote: > > > The alternative would be to finally merge swiotlb-xen into swiotlb, in > > > which case we might be able to do this later. Let me see what I can > > > do there. > > > > If that is an option, it would be great to reduce the special-cashing. > > I think it's doable, and I've been wanting it for a while. I just > need motivated testers, but it seems like I just found at least two :) So looking at swiotlb-xen it does these off things where it takes a value generated originally be xen_phys_to_dma, then only does a dma_to_phys to go back and call pfn_valid on the result. Does this make sense, or is it wrong and just works by accident? I.e. is the patch below correct? diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 67aa74d201627d..3396c5766f0dd8 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -90,9 +90,7 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size) static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr) { - unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr)); - unsigned long xen_pfn = bfn_to_local_pfn(bfn); - phys_addr_t paddr = (phys_addr_t)xen_pfn << XEN_PAGE_SHIFT; + phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr); /* If the address is outside our domain, it CAN * have the same virtual address as another address @@ -234,7 +232,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, done: if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { - if (pfn_valid(PFN_DOWN(dma_to_phys(dev, dev_addr)))) + if (pfn_valid(PFN_DOWN(phys))) arch_sync_dma_for_device(phys, size, dir); else xen_dma_sync_for_device(dev, dev_addr, size, dir); @@ -258,7 +256,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr, BUG_ON(dir == DMA_NONE); if (!dev_is_dma_coherent(hwdev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { - if (pfn_valid(PFN_DOWN(dma_to_phys(hwdev, dev_addr)))) + if (pfn_valid(PFN_DOWN(paddr))) arch_sync_dma_for_cpu(paddr, size, dir); else xen_dma_sync_for_cpu(hwdev, dev_addr, size, dir); @@ -276,7 +274,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr, phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr); if (!dev_is_dma_coherent(dev)) { - if (pfn_valid(PFN_DOWN(dma_to_phys(dev, dma_addr)))) + if (pfn_valid(PFN_DOWN(paddr))) arch_sync_dma_for_cpu(paddr, size, dir); else xen_dma_sync_for_cpu(dev, dma_addr, size, dir); @@ -296,7 +294,7 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr, swiotlb_sync_single_for_device(dev, paddr, size, dir); if (!dev_is_dma_coherent(dev)) { - if (pfn_valid(PFN_DOWN(dma_to_phys(dev, dma_addr)))) + if (pfn_valid(PFN_DOWN(paddr))) arch_sync_dma_for_device(paddr, size, dir); else xen_dma_sync_for_device(dev, dma_addr, size, dir);
Hi Christoph, On Sat, 20 May 2023 08:21:03 +0200 Christoph Hellwig <hch@lst.de> wrote: > On Fri, May 19, 2023 at 02:58:57PM +0200, Christoph Hellwig wrote: > > On Fri, May 19, 2023 at 01:49:46PM +0100, Andrew Cooper wrote: > > > > The alternative would be to finally merge swiotlb-xen into swiotlb, in > > > > which case we might be able to do this later. Let me see what I can > > > > do there. > > > > > > If that is an option, it would be great to reduce the special-cashing. > > > > I think it's doable, and I've been wanting it for a while. I just > > need motivated testers, but it seems like I just found at least two :) > > So looking at swiotlb-xen it does these off things where it takes a value > generated originally be xen_phys_to_dma, then only does a dma_to_phys > to go back and call pfn_valid on the result. Does this make sense, or > is it wrong and just works by accident? I.e. is the patch below correct? > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > index 67aa74d201627d..3396c5766f0dd8 100644 > --- a/drivers/xen/swiotlb-xen.c > +++ b/drivers/xen/swiotlb-xen.c > @@ -90,9 +90,7 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size) > > static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr) > { > - unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr)); > - unsigned long xen_pfn = bfn_to_local_pfn(bfn); > - phys_addr_t paddr = (phys_addr_t)xen_pfn << XEN_PAGE_SHIFT; > + phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr); I'm no big Xen expert, but I think this is wrong. Let's go through it line by line: - bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr)); Take the DMA address (as seen by devices on the bus), convert it to a physical address (as seen by the CPU on the bus) and shift it right by XEN_PAGE_SHIFT. The result is a Xen machine PFN. - xen_pfn = bfn_to_local_pfn(bfn); Take the machine PFN and converts it to a physical PFN. - paddr = (phys_addr_t)xen_pfn << XEN_PAGE_SHIFT; Convert the physical PFN to a physical address. The important thing here is that Xen PV does not have auto-translated physical addresses, so physical address != machine address. Physical addresses in Xen PV domains are "artificial", used by the kernel to index the mem_map array, so a PFN can be easily converted to a struct page pointer and back. However, these addresses are never used by hardware, not even by CPU. The addresses used by the CPU are called machine addresses. There is no address translation between VCPUs and CPUs, because a PV domain runs directly on the CPU. After all, that's why it is called _para_virtualized. HTH Petr T
On Fri, 19 May 2023 12:10:26 +0200 Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> wrote: > On Fri, May 19, 2023 at 06:04:05AM +0200, Christoph Hellwig wrote: > > On Thu, May 18, 2023 at 08:18:39PM +0200, Marek Marczykowski-Górecki wrote: > > > On Thu, May 18, 2023 at 03:42:51PM +0200, Christoph Hellwig wrote: > > > > Remove the dangerous late initialization of xen-swiotlb in > > > > pci_xen_swiotlb_init_late and instead just always initialize > > > > xen-swiotlb in the boot code if CONFIG_XEN_PCIDEV_FRONTEND is enabled. > > > > > > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > > > > > Doesn't it mean all the PV guests will basically waste 64MB of RAM > > > by default each if they don't really have PCI devices? > > > > If CONFIG_XEN_PCIDEV_FRONTEND is enabled, and the kernel's isn't booted > > with swiotlb=noforce, yes. > > That's "a bit" unfortunate, since that might be significant part of the > VM memory, or if you have a lot of VMs, a significant part of the host > memory - it quickly adds up. I wonder if dynamic swiotlb allocation might also help with this... Petr T
On 19.05.23 12:10, Marek Marczykowski-Górecki wrote: > On Fri, May 19, 2023 at 06:04:05AM +0200, Christoph Hellwig wrote: >> On Thu, May 18, 2023 at 08:18:39PM +0200, Marek Marczykowski-Górecki wrote: >>> On Thu, May 18, 2023 at 03:42:51PM +0200, Christoph Hellwig wrote: >>>> Remove the dangerous late initialization of xen-swiotlb in >>>> pci_xen_swiotlb_init_late and instead just always initialize >>>> xen-swiotlb in the boot code if CONFIG_XEN_PCIDEV_FRONTEND is enabled. >>>> >>>> Signed-off-by: Christoph Hellwig <hch@lst.de> >>> >>> Doesn't it mean all the PV guests will basically waste 64MB of RAM >>> by default each if they don't really have PCI devices? >> >> If CONFIG_XEN_PCIDEV_FRONTEND is enabled, and the kernel's isn't booted >> with swiotlb=noforce, yes. > > That's "a bit" unfortunate, since that might be significant part of the > VM memory, or if you have a lot of VMs, a significant part of the host > memory - it quickly adds up. > While I would say PCI passthrough is not very common for PV guests, can > the decision about xen-swiotlb be delayed until you can enumerate > xenstore to check if there are any PCI devices connected (and not > allocate xen-swiotlb by default if there are none)? This would > still not cover the hotplug case (in which case, you'd need to force it > with a cmdline), but at least you wouldn't loose much memory just > because one of your VMs may use PCI passthrough (so, you have it enabled > in your kernel). > Please remember that guest kernel is not always under full control of > the host admin, so making guests loose 64MB of RAM always, in default > setup isn't good for customers of such VMs... > In normal cases PCI passthrough in PV guests requires to start the guest with e820_host=1. So it should be rather easy to limit allocating the 64MB in PV guests to the cases where the memory map has non-RAM regions especially in the first 1MB of the memory. This will cover even hotplug cases. The only case not covered would be a guest started with e820_host=1 even if no PCI passthrough was planned. But this should be rather rare (at least I hope so). Juergen
On Mon, May 22, 2023 at 10:37:09AM +0200, Juergen Gross wrote: > In normal cases PCI passthrough in PV guests requires to start the guest > with e820_host=1. So it should be rather easy to limit allocating the > 64MB in PV guests to the cases where the memory map has non-RAM regions > especially in the first 1MB of the memory. > > This will cover even hotplug cases. The only case not covered would be a > guest started with e820_host=1 even if no PCI passthrough was planned. > But this should be rather rare (at least I hope so). So is this an ACK for the patch and can we go ahead with it? (I'd still like to merge swiotlb-xen into swiotlb eventually, but it's probably not going to happen this merge window)
On 07.06.23 15:12, Christoph Hellwig wrote: > On Mon, May 22, 2023 at 10:37:09AM +0200, Juergen Gross wrote: >> In normal cases PCI passthrough in PV guests requires to start the guest >> with e820_host=1. So it should be rather easy to limit allocating the >> 64MB in PV guests to the cases where the memory map has non-RAM regions >> especially in the first 1MB of the memory. >> >> This will cover even hotplug cases. The only case not covered would be a >> guest started with e820_host=1 even if no PCI passthrough was planned. >> But this should be rather rare (at least I hope so). > > So is this an ACK for the patch and can we go ahead with it? As long as above mentioned check of the E820 map is done, yes. If you want I can send a diff to be folded into your patch on Monday. > > (I'd still like to merge swiotlb-xen into swiotlb eventually, but it's > probably not going to happen this merge window) Would be nice. Juergen
On Fri, Jun 09, 2023 at 05:38:28PM +0200, Juergen Gross wrote: >>> guest started with e820_host=1 even if no PCI passthrough was planned. >>> But this should be rather rare (at least I hope so). >> >> So is this an ACK for the patch and can we go ahead with it? > > As long as above mentioned check of the E820 map is done, yes. > > If you want I can send a diff to be folded into your patch on Monday. Yes, that would be great!
On 09.06.23 17:38, Juergen Gross wrote: > On 07.06.23 15:12, Christoph Hellwig wrote: >> On Mon, May 22, 2023 at 10:37:09AM +0200, Juergen Gross wrote: >>> In normal cases PCI passthrough in PV guests requires to start the guest >>> with e820_host=1. So it should be rather easy to limit allocating the >>> 64MB in PV guests to the cases where the memory map has non-RAM regions >>> especially in the first 1MB of the memory. >>> >>> This will cover even hotplug cases. The only case not covered would be a >>> guest started with e820_host=1 even if no PCI passthrough was planned. >>> But this should be rather rare (at least I hope so). >> >> So is this an ACK for the patch and can we go ahead with it? > > As long as above mentioned check of the E820 map is done, yes. > > If you want I can send a diff to be folded into your patch on Monday. Attached is a patch for adding a new flag xen_pv_pci_possible. You can either add the patch to your series or merge it into your patch 2. You need to modify your patch like this: @@ -111,7 +90,10 @@ static inline void __init pci_xen_swiotlb_init(void) void __init pci_iommu_alloc(void) { if (xen_pv_domain()) { - if (xen_initial_domain() || x86_swiotlb_enable) + if (xen_initial_domain() || + (IS_ENABLED(CONFIG_XEN_PCIDEV_FRONTEND) && + xen_pv_pci_possible) || + x86_swiotlb_enable) pci_xen_swiotlb_init(); return; } Juergen
Thank you. I'll queue it up as a separate patch.
diff --git a/arch/x86/include/asm/xen/swiotlb-xen.h b/arch/x86/include/asm/xen/swiotlb-xen.h index 77a2d19cc9909e..abde0f44df57dc 100644 --- a/arch/x86/include/asm/xen/swiotlb-xen.h +++ b/arch/x86/include/asm/xen/swiotlb-xen.h @@ -2,12 +2,6 @@ #ifndef _ASM_X86_SWIOTLB_XEN_H #define _ASM_X86_SWIOTLB_XEN_H -#ifdef CONFIG_SWIOTLB_XEN -extern int pci_xen_swiotlb_init_late(void); -#else -static inline int pci_xen_swiotlb_init_late(void) { return -ENXIO; } -#endif - int xen_swiotlb_fixup(void *buf, unsigned long nslabs); int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order, unsigned int address_bits, diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c index f887b08ac5ffe4..c4a7ead9eb674e 100644 --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -81,27 +81,6 @@ static void __init pci_xen_swiotlb_init(void) if (IS_ENABLED(CONFIG_PCI)) pci_request_acs(); } - -int pci_xen_swiotlb_init_late(void) -{ - if (dma_ops == &xen_swiotlb_dma_ops) - return 0; - - /* we can work with the default swiotlb */ - if (!io_tlb_default_mem.nslabs) { - int rc = swiotlb_init_late(swiotlb_size_or_default(), - GFP_KERNEL, xen_swiotlb_fixup); - if (rc < 0) - return rc; - } - - /* XXX: this switches the dma ops under live devices! */ - dma_ops = &xen_swiotlb_dma_ops; - if (IS_ENABLED(CONFIG_PCI)) - pci_request_acs(); - return 0; -} -EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late); #else static inline void __init pci_xen_swiotlb_init(void) { @@ -111,7 +90,9 @@ static inline void __init pci_xen_swiotlb_init(void) void __init pci_iommu_alloc(void) { if (xen_pv_domain()) { - if (xen_initial_domain() || x86_swiotlb_enable) + if (xen_initial_domain() || + IS_ENABLED(CONFIG_XEN_PCIDEV_FRONTEND) || + x86_swiotlb_enable) pci_xen_swiotlb_init(); return; } diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c index 83c0ab50676dff..11636634ae512f 100644 --- a/drivers/pci/xen-pcifront.c +++ b/drivers/pci/xen-pcifront.c @@ -22,7 +22,6 @@ #include <linux/bitops.h> #include <linux/time.h> #include <linux/ktime.h> -#include <linux/swiotlb.h> #include <xen/platform_pci.h> #include <asm/xen/swiotlb-xen.h> @@ -669,11 +668,6 @@ static int pcifront_connect_and_init_dma(struct pcifront_device *pdev) spin_unlock(&pcifront_dev_lock); - if (!err && !is_swiotlb_active(&pdev->xdev->dev)) { - err = pci_xen_swiotlb_init_late(); - if (err) - dev_err(&pdev->xdev->dev, "Could not setup SWIOTLB!\n"); - } return err; }