Message ID | 20221202211838.1061278-1-helgaas@kernel.org |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1075770wrr; Fri, 2 Dec 2022 13:21:42 -0800 (PST) X-Google-Smtp-Source: AA0mqf44V62HMmxGrIiIyJkX8FC3FfOlzPQjDC16gdCdN66US2B/auIf7n/+JgE/GCs2MObIYIh+ X-Received: by 2002:a17:906:1146:b0:78d:9f02:5458 with SMTP id i6-20020a170906114600b0078d9f025458mr47673609eja.753.1670016101819; Fri, 02 Dec 2022 13:21:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670016101; cv=none; d=google.com; s=arc-20160816; b=ADEPI6tTE/Ke8D4b6dLG9wMc4Fd704eR17g8ZqhIe7oG1Ve3zDlqPjP9FAPcNYlb6C 1Tav/ya//Z7kReM50tuuZ4PKW0Jbvug8/wBEcQfkzUn+g6ccgNtLkeNl3spPt85M5Hlf drOPwEEcLOdxdx9/4LIQbm8BGu0G8+3KdOaqHMIriEGNeD6IopWFAen5zcqZIMAvLG8+ lprR5vZigBPfKGeEIdXPQdSszYVyUhYHNzi5C+7qzbKTKyNXzWN24b3ClOmOOFKpm4yM TkTanPWasaChlhn6AmKwCVvvRWjpOP9HYBcJvc9/U2nrVxR6MusJIe0eKZYITT1xAp95 mXcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=z2FOPD1rxFRPinJIz3a/USpiAAMYHK1mmTzHNT25Beg=; b=kb3GRdlD/Y3sng9R+E972n0AtBw2vbtpp7zeMRu6LvgGHmjOl9jQWX/yuoMlYztZ8Y WYYDoQ7G9XXLArzQS+xwIa+iP34oUdTtCrpFuaeXWvBSQfNE3FfUucqRdFsrYXQrvXaZ 47TRQciD9yVeR+flH/Tx0kzu0HJsPho9XWqDXWM9oVW0mCe2HLMMhNnKFH9xmc/FB6tk ImJlpbVdH9PiuFyR+k5hZ+4izTbGz9KX5AOtz6RBemfh/WiTxeac2ldZHPSuCx4FPiNI 9jbc7+J9pLn97ygBqTf0Fsqm3ye/LtcrD+6rQfdfkl4MS4Gu32IiFO8niT0vErqyb7e+ OLnA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="d4qTFc/f"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sc38-20020a1709078a2600b007ade20fc40csi6775036ejc.810.2022.12.02.13.21.17; Fri, 02 Dec 2022 13:21:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="d4qTFc/f"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234480AbiLBVTE (ORCPT <rfc822;lhua1029@gmail.com> + 99 others); Fri, 2 Dec 2022 16:19:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234242AbiLBVSv (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 2 Dec 2022 16:18:51 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 867F65915A; Fri, 2 Dec 2022 13:18:46 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 900C8623D6; Fri, 2 Dec 2022 21:18:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BDA10C433D6; Fri, 2 Dec 2022 21:18:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670015925; bh=2rrX47NWa70VI8AqxvMAoJDdj+in56xV4jYSGcyJcbo=; h=From:To:Cc:Subject:Date:From; b=d4qTFc/f72OMYxi/1HZE0CdQFKVlgLXIQsf6tAbNbpLXfL/lJBiriy9Skujf59U/l UTDzQnx5ZAY1+ZeCHAeTueZVZwJG3MpPXzmAnjFtm+rxxzsWuPtng1ikgX6s9D1U9k 7tk1HQdZIhZCURQkfR9xlVfsZ1IxOZXND7fKXaLauWlAtasxRdI0t762JdMwYay7Mv SYxaVcGFm9XnsMlsPSCdH59mEBmngBdsR2ynwvKt3bV6FQI6t+PVDg7NmhkA0VveWW 3tzTlcIsad84rsaDJJd9vI4bQ+Wwa+xJHXHXiqGi+V6pauXAyoXc4qTuK4jI8VPOf8 Sz/xgLOG0MGig== From: Bjorn Helgaas <helgaas@kernel.org> To: linux-pci@vger.kernel.org Cc: Hans de Goede <hdegoede@redhat.com>, Florent DELAHAYE <kernelorg@undead.fr>, Konrad J Hambrick <kjhambrick@gmail.com>, Matt Hansen <2lprbe78@duck.com>, =?utf-8?q?Benoit_Gr=C3=A9goire?= <benoitg@coeus.ca>, Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>, Mika Westerberg <mika.westerberg@linux.intel.com>, Werner Sembach <wse@tuxedocomputers.com>, mumblingdrunkard@protonmail.com, linux-kernel@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com> Subject: [PATCH 0/4] PCI: Continue E820 vs host bridge window saga Date: Fri, 2 Dec 2022 15:18:34 -0600 Message-Id: <20221202211838.1061278-1-helgaas@kernel.org> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751138803621454594?= X-GMAIL-MSGID: =?utf-8?q?1751138803621454594?= |
Series |
PCI: Continue E820 vs host bridge window saga
|
|
Message
Bjorn Helgaas
Dec. 2, 2022, 9:18 p.m. UTC
From: Bjorn Helgaas <bhelgaas@google.com>
When allocating space for PCI BARs, Linux avoids allocating space mentioned
in the E820 map. This was originally done by 4dc2287c1805 ("x86: avoid
E820 regions when allocating address space") to work around BIOS defects
that included unusable space in host bridge _CRS.
Some recent machines use EfiMemoryMappedIO for PCI MMCONFIG and host bridge
apertures, and bootloaders and EFI stubs convert those to E820 regions,
which means we can't allocate space for hot-added PCI devices (often a
dock) or for devices the BIOS didn't configure (often a touchpad)
The current strategy is to add DMI quirks that disable the E820 filtering
on these machines and to disable it entirely starting with 2023 BIOSes:
d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks")
0ae084d5a674 ("x86/PCI: Disable E820 reserved region clipping starting in 2023")
But the quirks are problematic because it's really hard to list all the
machines that need them.
This series is an attempt at a more generic approach. I'm told by firmware
folks that EfiMemoryMappedIO means "the OS should map this area so EFI
runtime services can use it in virtual mode," but does not prevent the OS
from using it.
The first patch removes any EfiMemoryMappedIO areas from the E820 map.
This doesn't affect any virtual mapping of those areas (that would have to
be done directly from the EFI memory map) but it means Linux can allocate
space for PCI MMIO.
The rest are basically cosmetic log message changes.
Bjorn Helgaas (4):
efi/x86: Remove EfiMemoryMappedIO from E820 map
PCI: Skip allocate_resource() if too little space available
x86/PCI: Tidy E820 removal messages
x86/PCI: Fix log message typo
arch/x86/kernel/resource.c | 7 +++++--
arch/x86/pci/acpi.c | 2 +-
arch/x86/platform/efi/efi.c | 36 ++++++++++++++++++++++++++++++++++++
drivers/pci/bus.c | 4 ++++
4 files changed, 46 insertions(+), 3 deletions(-)
Comments
Hi Bjorn, On 12/2/22 22:18, Bjorn Helgaas wrote: > From: Bjorn Helgaas <bhelgaas@google.com> > > When allocating space for PCI BARs, Linux avoids allocating space mentioned > in the E820 map. This was originally done by 4dc2287c1805 ("x86: avoid > E820 regions when allocating address space") to work around BIOS defects > that included unusable space in host bridge _CRS. > > Some recent machines use EfiMemoryMappedIO for PCI MMCONFIG and host bridge > apertures, and bootloaders and EFI stubs convert those to E820 regions, > which means we can't allocate space for hot-added PCI devices (often a > dock) or for devices the BIOS didn't configure (often a touchpad) > > The current strategy is to add DMI quirks that disable the E820 filtering > on these machines and to disable it entirely starting with 2023 BIOSes: > > d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks") > 0ae084d5a674 ("x86/PCI: Disable E820 reserved region clipping starting in 2023") > > But the quirks are problematic because it's really hard to list all the > machines that need them. > > This series is an attempt at a more generic approach. I'm told by firmware > folks that EfiMemoryMappedIO means "the OS should map this area so EFI > runtime services can use it in virtual mode," but does not prevent the OS > from using it. > > The first patch removes any EfiMemoryMappedIO areas from the E820 map. > This doesn't affect any virtual mapping of those areas (that would have to > be done directly from the EFI memory map) but it means Linux can allocate > space for PCI MMIO. > > The rest are basically cosmetic log message changes. Thank you for working on this. I'm a bit worried about this series though. The 2 things which I worry about are: 1. I think this will not help when people boot in BIOS (CSM) mode rather then UEFI mode which quite a few Linux users still do because they learned to do this years ago when Linux EFI support (and EFI fw itself) was still a bit in flux. IIRC from the last time we looked at this in CSM mode the BIOS itself translates the EfiMemoryMappedIO areas to reserved E820 regions. So when people use the BIOS CSM mode to boot, then this patch will not help since the kernel lacks the info to do the translation.\ We may also hit the same case when the bootloader has done the translation which I believe is what upstream grub does. Fedora grub has been patched to use the kernel EFI stub when booting a kernel on EFI, so just an EFI equivalent of "exec" on the kernel EFI binary. Where as upstream grub does a more BIOS like boot, where it skips the EFI stub and instead does a whole bunch of stuff itself and then jumps to the kernel's start vector. So this might also not work with upstream grub, which is what I believe Ubuntu and Debian use. Although I case in this case we will still have access to the EFI memory map and I see that your patch removes the entries stemming from the EfiMemoryMappedIO areas from the E820 map, rather then never adding them. So I guess this will also work in the case when the bootloader has done the translation (leaving just the BIOS CSM case as an issue) ? This won't cause regressions, but it does mean that e.g. the touchpad i2c-controller / hotplugged PCI devices will still not work when booted in BIOS (CSM) mode / through upstream grub. I have asked the reporter of: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 To do a BIOS mode boot of a Fedora 37 live USB and then collect dmesg output, then we can check if that indeed has the EfiMemoryMappedIO areas as reserved E820 regions in a way where we cannot identify them anymore since we don't have access to the EFI memory map in this case. 2. I am afraid that now allowing PCI MMIO space to be allocated in regions marked as EfiMemoryMappedIO will cause regressions on some systems. Specifically when I tried something similar the last time I looked at this (using the BIOS date cut-off approach IIRC) there was a suspend/resume regression on a Lenovo ThinkPad X1 carbon (20A7) model: https://bugzilla.redhat.com/show_bug.cgi?id=2029207 Back then I came to the conclusion that the problem is that not avoiding the EfiMemoryMappedIO regions caused PCI MMIO space to be allocated in the 0xdfa00000 - 0xdfa10000 range which is listed in the EFI memmap as: [ 0.000000] efi: mem46: [MMIO |RUN| | | | | | | | | | | | | ] range=[0x00000000dfa00000-0x00000000dfa0ffff] (0MB) And with current kernels with the extra logging added for this the following is logged related to this: [ 0.326504] acpi PNP0A08:00: clipped [mem 0xdfa00000-0xfebfffff window] to [mem 0xdfa10000-0xfebfffff window] for e820 entry [mem 0xdceff000-0xdfa0ffff] I believe patch 1/4 of this set will make this clipping go away, re-introducing the suspend/resume problem. I will reach out to the reporter and see if I can get them to test this patch-set. Regards, Hans
On Sat, Dec 03, 2022 at 01:44:10PM +0100, Hans de Goede wrote: > Hi Bjorn, > > On 12/2/22 22:18, Bjorn Helgaas wrote: > > From: Bjorn Helgaas <bhelgaas@google.com> > > > > When allocating space for PCI BARs, Linux avoids allocating space mentioned > > in the E820 map. This was originally done by 4dc2287c1805 ("x86: avoid > > E820 regions when allocating address space") to work around BIOS defects > > that included unusable space in host bridge _CRS. > > > > Some recent machines use EfiMemoryMappedIO for PCI MMCONFIG and host bridge > > apertures, and bootloaders and EFI stubs convert those to E820 regions, > > which means we can't allocate space for hot-added PCI devices (often a > > dock) or for devices the BIOS didn't configure (often a touchpad) > > > > The current strategy is to add DMI quirks that disable the E820 filtering > > on these machines and to disable it entirely starting with 2023 BIOSes: > > > > d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks") > > 0ae084d5a674 ("x86/PCI: Disable E820 reserved region clipping starting in 2023") > > > > But the quirks are problematic because it's really hard to list all the > > machines that need them. > > > > This series is an attempt at a more generic approach. I'm told by firmware > > folks that EfiMemoryMappedIO means "the OS should map this area so EFI > > runtime services can use it in virtual mode," but does not prevent the OS > > from using it. > > > > The first patch removes any EfiMemoryMappedIO areas from the E820 map. > > This doesn't affect any virtual mapping of those areas (that would have to > > be done directly from the EFI memory map) but it means Linux can allocate > > space for PCI MMIO. > > > > The rest are basically cosmetic log message changes. > > Thank you for working on this. I'm a bit worried about this series though. > > The 2 things which I worry about are: > > > 1. I think this will not help when people boot in BIOS (CSM) mode rather > then UEFI mode which quite a few Linux users still do because they learned > to do this years ago when Linux EFI support (and EFI fw itself) was still > a bit in flux. > > IIRC from the last time we looked at this in CSM mode the BIOS itself > translates the EfiMemoryMappedIO areas to reserved E820 regions. So when > people use the BIOS CSM mode to boot, then this patch will not help > since the kernel lacks the info to do the translation. Right, if BIOS CSM puts EfiMemoryMappedIO in the E820 map the same way bootloaders do, and the kernel doesn't have the EFI memory map, this series won't help. > We may also hit the same case when the bootloader has done the > translation which I believe is what upstream grub does. Fedora grub > has been patched to use the kernel EFI stub when booting a kernel > on EFI, so just an EFI equivalent of "exec" on the kernel EFI binary. > > Where as upstream grub does a more BIOS like boot, where it skips the > EFI stub and instead does a whole bunch of stuff itself and then > jumps to the kernel's start vector. So this might also not work with > upstream grub, which is what I believe Ubuntu and Debian use. > > Although I case in this case we will still have access to the EFI > memory map and I see that your patch removes the entries stemming > from the EfiMemoryMappedIO areas from the E820 map, rather then > never adding them. So I guess this will also work in the case > when the bootloader has done the translation (leaving just > the BIOS CSM case as an issue)? > > This won't cause regressions, but it does mean that e.g. the > touchpad i2c-controller / hotplugged PCI devices will still not > work when booted in BIOS (CSM) mode / through upstream grub. Yes, I agree CSM could still be broken if BIOS puts EfiMemoryMappedIO in the E820 map. I'm not clear on the grub cases. Do some grub versions hide the EFI memory map from the kernel? If they do, they could be broken the same way as CSM. > 2. I am afraid that now allowing PCI MMIO space to be allocated > in regions marked as EfiMemoryMappedIO will cause regressions > on some systems. Specifically when I tried something similar > the last time I looked at this (using the BIOS date cut-off > approach IIRC) there was a suspend/resume regression on > a Lenovo ThinkPad X1 carbon (20A7) model: > > https://bugzilla.redhat.com/show_bug.cgi?id=2029207 > > Back then I came to the conclusion that the problem is that not > avoiding the EfiMemoryMappedIO regions caused PCI MMIO space to > be allocated in the 0xdfa00000 - 0xdfa10000 range which is > listed in the EFI memmap as: > > [ 0.000000] efi: mem46: [MMIO |RUN| | | | | | | | | | | | | ] range=[0x00000000dfa00000-0x00000000dfa0ffff] (0MB) > > And with current kernels with the extra logging added for this > the following is logged related to this: > > [ 0.326504] acpi PNP0A08:00: clipped [mem 0xdfa00000-0xfebfffff window] to [mem 0xdfa10000-0xfebfffff window] for e820 entry [mem 0xdceff000-0xdfa0ffff] > > I believe patch 1/4 of this set will make this clipping go away, > re-introducing the suspend/resume problem. Yes, I'm afraid you're right. Comparing the logs at comment #31 (fails) and comment #38 (works): pci_bus 0000:00: root bus resource [mem 0xdfa00000-0xfebfffff window] pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff] fails pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfb00000-0xdfcfffff] works Since 0xdfa00000 is included in the host bridge _CRS, but isn't usable, my guess is this is a _CRS bug. Or maybe BIOS is using the producer/consumer bit in _CRS to identify this as register space as opposed to a window? I thought we couldn't rely on that bit, but it's been a long time since I looked at it. An acpidump might have a clue. Bjorn
Hi Bjorn, On 12/3/22 18:57, Bjorn Helgaas wrote: > On Sat, Dec 03, 2022 at 01:44:10PM +0100, Hans de Goede wrote: >> Hi Bjorn, >> >> On 12/2/22 22:18, Bjorn Helgaas wrote: >>> From: Bjorn Helgaas <bhelgaas@google.com> >>> >>> When allocating space for PCI BARs, Linux avoids allocating space mentioned >>> in the E820 map. This was originally done by 4dc2287c1805 ("x86: avoid >>> E820 regions when allocating address space") to work around BIOS defects >>> that included unusable space in host bridge _CRS. >>> >>> Some recent machines use EfiMemoryMappedIO for PCI MMCONFIG and host bridge >>> apertures, and bootloaders and EFI stubs convert those to E820 regions, >>> which means we can't allocate space for hot-added PCI devices (often a >>> dock) or for devices the BIOS didn't configure (often a touchpad) >>> >>> The current strategy is to add DMI quirks that disable the E820 filtering >>> on these machines and to disable it entirely starting with 2023 BIOSes: >>> >>> d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks") >>> 0ae084d5a674 ("x86/PCI: Disable E820 reserved region clipping starting in 2023") >>> >>> But the quirks are problematic because it's really hard to list all the >>> machines that need them. >>> >>> This series is an attempt at a more generic approach. I'm told by firmware >>> folks that EfiMemoryMappedIO means "the OS should map this area so EFI >>> runtime services can use it in virtual mode," but does not prevent the OS >>> from using it. >>> >>> The first patch removes any EfiMemoryMappedIO areas from the E820 map. >>> This doesn't affect any virtual mapping of those areas (that would have to >>> be done directly from the EFI memory map) but it means Linux can allocate >>> space for PCI MMIO. >>> >>> The rest are basically cosmetic log message changes. >> >> Thank you for working on this. I'm a bit worried about this series though. >> >> The 2 things which I worry about are: >> >> >> 1. I think this will not help when people boot in BIOS (CSM) mode rather >> then UEFI mode which quite a few Linux users still do because they learned >> to do this years ago when Linux EFI support (and EFI fw itself) was still >> a bit in flux. >> >> IIRC from the last time we looked at this in CSM mode the BIOS itself >> translates the EfiMemoryMappedIO areas to reserved E820 regions. So when >> people use the BIOS CSM mode to boot, then this patch will not help >> since the kernel lacks the info to do the translation. > > Right, if BIOS CSM puts EfiMemoryMappedIO in the E820 map the same way > bootloaders do, and the kernel doesn't have the EFI memory map, this > series won't help. > >> We may also hit the same case when the bootloader has done the >> translation which I believe is what upstream grub does. Fedora grub >> has been patched to use the kernel EFI stub when booting a kernel >> on EFI, so just an EFI equivalent of "exec" on the kernel EFI binary. >> >> Where as upstream grub does a more BIOS like boot, where it skips the >> EFI stub and instead does a whole bunch of stuff itself and then >> jumps to the kernel's start vector. So this might also not work with >> upstream grub, which is what I believe Ubuntu and Debian use. >> >> Although I case in this case we will still have access to the EFI >> memory map and I see that your patch removes the entries stemming >> from the EfiMemoryMappedIO areas from the E820 map, rather then >> never adding them. So I guess this will also work in the case >> when the bootloader has done the translation (leaving just >> the BIOS CSM case as an issue)? >> >> This won't cause regressions, but it does mean that e.g. the >> touchpad i2c-controller / hotplugged PCI devices will still not >> work when booted in BIOS (CSM) mode / through upstream grub. > > Yes, I agree CSM could still be broken if BIOS puts EfiMemoryMappedIO > in the E820 map. > > I'm not clear on the grub cases. Do some grub versions hide the EFI > memory map from the kernel? If they do, they could be broken the same > way as CSM. I don't think grub hides the EFI memory map, I started writing the bit about grub because I believe that grub creates its own emulated E820 map, rather then relying on the kernel's EFI stub creating an emulated E820 map. But since you take the emulated map and then remove the EfiMemoryMappedIO entries afterwards I think this should be fine. (versus how patching the stub to never add the EfiMemoryMappedIO entries would _not_ be fine because the emulated E820 map does not always come from the EFI stub). >> 2. I am afraid that now allowing PCI MMIO space to be allocated >> in regions marked as EfiMemoryMappedIO will cause regressions >> on some systems. Specifically when I tried something similar >> the last time I looked at this (using the BIOS date cut-off >> approach IIRC) there was a suspend/resume regression on >> a Lenovo ThinkPad X1 carbon (20A7) model: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=2029207 >> >> Back then I came to the conclusion that the problem is that not >> avoiding the EfiMemoryMappedIO regions caused PCI MMIO space to >> be allocated in the 0xdfa00000 - 0xdfa10000 range which is >> listed in the EFI memmap as: >> >> [ 0.000000] efi: mem46: [MMIO |RUN| | | | | | | | | | | | | ] range=[0x00000000dfa00000-0x00000000dfa0ffff] (0MB) >> >> And with current kernels with the extra logging added for this >> the following is logged related to this: >> >> [ 0.326504] acpi PNP0A08:00: clipped [mem 0xdfa00000-0xfebfffff window] to [mem 0xdfa10000-0xfebfffff window] for e820 entry [mem 0xdceff000-0xdfa0ffff] >> >> I believe patch 1/4 of this set will make this clipping go away, >> re-introducing the suspend/resume problem. > > Yes, I'm afraid you're right. Comparing the logs at comment #31 > (fails) and comment #38 (works): > > pci_bus 0000:00: root bus resource [mem 0xdfa00000-0xfebfffff window] > pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff] fails > pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfb00000-0xdfcfffff] works > > Since 0xdfa00000 is included in the host bridge _CRS, but isn't > usable, my guess is this is a _CRS bug. Ack. So I was thinking to maybe limit the removal of EfiMemoryMappedIO regions from the E820 map if they are big enough to cause troubles? Looking at the EFI map MMIO regions on this Lenovo ThinkPad X1 carbon (20A7) model, they are tiny. Where as the ones which we know cause problems are huge. So maybe add a bit of heuristics to patch 1/4 based on the EfiMemoryMappedIO region size and only remove the big ones from the E820 map ? I know that adding heuristics like this always feels a bit wrong, because you end up putting a somewhat arbitrary cut off point in the code on which to toggle behavior on/off, but I think that in this case it should work nicely given how huge the EfiMemoryMappedIO regions which are actually causing problems are. > Or maybe BIOS is using the producer/consumer bit in _CRS to identify > this as register space as opposed to a window? I thought we couldn't > rely on that bit, but it's been a long time since I looked at it. An > acpidump might have a clue. Regards, Hans
Hi Bjorn, On 12/3/22 18:57, Bjorn Helgaas wrote: > On Sat, Dec 03, 2022 at 01:44:10PM +0100, Hans de Goede wrote: >> Hi Bjorn, >> >> On 12/2/22 22:18, Bjorn Helgaas wrote: >>> From: Bjorn Helgaas <bhelgaas@google.com> >>> >>> When allocating space for PCI BARs, Linux avoids allocating space mentioned >>> in the E820 map. This was originally done by 4dc2287c1805 ("x86: avoid >>> E820 regions when allocating address space") to work around BIOS defects >>> that included unusable space in host bridge _CRS. >>> >>> Some recent machines use EfiMemoryMappedIO for PCI MMCONFIG and host bridge >>> apertures, and bootloaders and EFI stubs convert those to E820 regions, >>> which means we can't allocate space for hot-added PCI devices (often a >>> dock) or for devices the BIOS didn't configure (often a touchpad) >>> >>> The current strategy is to add DMI quirks that disable the E820 filtering >>> on these machines and to disable it entirely starting with 2023 BIOSes: >>> >>> d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks") >>> 0ae084d5a674 ("x86/PCI: Disable E820 reserved region clipping starting in 2023") >>> >>> But the quirks are problematic because it's really hard to list all the >>> machines that need them. >>> >>> This series is an attempt at a more generic approach. I'm told by firmware >>> folks that EfiMemoryMappedIO means "the OS should map this area so EFI >>> runtime services can use it in virtual mode," but does not prevent the OS >>> from using it. >>> >>> The first patch removes any EfiMemoryMappedIO areas from the E820 map. >>> This doesn't affect any virtual mapping of those areas (that would have to >>> be done directly from the EFI memory map) but it means Linux can allocate >>> space for PCI MMIO. >>> >>> The rest are basically cosmetic log message changes. >> >> Thank you for working on this. I'm a bit worried about this series though. >> >> The 2 things which I worry about are: >> >> >> 1. I think this will not help when people boot in BIOS (CSM) mode rather >> then UEFI mode which quite a few Linux users still do because they learned >> to do this years ago when Linux EFI support (and EFI fw itself) was still >> a bit in flux. >> >> IIRC from the last time we looked at this in CSM mode the BIOS itself >> translates the EfiMemoryMappedIO areas to reserved E820 regions. So when >> people use the BIOS CSM mode to boot, then this patch will not help >> since the kernel lacks the info to do the translation. > > Right, if BIOS CSM puts EfiMemoryMappedIO in the E820 map the same way > bootloaders do, and the kernel doesn't have the EFI memory map, this > series won't help. So I just got the requested dmesg in BIOS CSM mode from: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 And it says: [ 0.000000] BIOS-e820: [mem 0x000000004bc50000-0x00000000cfffffff] reserved [ 0.316140] pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window] So I'm afraid that I remembered correctly and the CSM adds the EfiMemoryMappedIO regions to the E820 map as reserved :( So as you said, this series won't help for people booting in BIOS compatibility mode. Which means that we should at least keep the current list of no_e820 quirks to avoid regressing those models when booted in BIOS compatibility mode. And maybe still add at least the Clevo model for which I recently submitted a new no_e820 quirk so that that will work in BIOS CSM mode too ? Note I know you did not propose to drop the quirks in this series, just covering all the bases here. Regards, Hans
Hi Am 04.12.22 um 10:29 schrieb Hans de Goede: > Hi Bjorn, > > On 12/3/22 18:57, Bjorn Helgaas wrote: >> On Sat, Dec 03, 2022 at 01:44:10PM +0100, Hans de Goede wrote: >>> Hi Bjorn, >>> >>> On 12/2/22 22:18, Bjorn Helgaas wrote: >>>> From: Bjorn Helgaas <bhelgaas@google.com> >>>> >>>> When allocating space for PCI BARs, Linux avoids allocating space mentioned >>>> in the E820 map. This was originally done by 4dc2287c1805 ("x86: avoid >>>> E820 regions when allocating address space") to work around BIOS defects >>>> that included unusable space in host bridge _CRS. >>>> >>>> Some recent machines use EfiMemoryMappedIO for PCI MMCONFIG and host bridge >>>> apertures, and bootloaders and EFI stubs convert those to E820 regions, >>>> which means we can't allocate space for hot-added PCI devices (often a >>>> dock) or for devices the BIOS didn't configure (often a touchpad) >>>> >>>> The current strategy is to add DMI quirks that disable the E820 filtering >>>> on these machines and to disable it entirely starting with 2023 BIOSes: >>>> >>>> d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks") >>>> 0ae084d5a674 ("x86/PCI: Disable E820 reserved region clipping starting in 2023") >>>> >>>> But the quirks are problematic because it's really hard to list all the >>>> machines that need them. >>>> >>>> This series is an attempt at a more generic approach. I'm told by firmware >>>> folks that EfiMemoryMappedIO means "the OS should map this area so EFI >>>> runtime services can use it in virtual mode," but does not prevent the OS >>>> from using it. >>>> >>>> The first patch removes any EfiMemoryMappedIO areas from the E820 map. >>>> This doesn't affect any virtual mapping of those areas (that would have to >>>> be done directly from the EFI memory map) but it means Linux can allocate >>>> space for PCI MMIO. >>>> >>>> The rest are basically cosmetic log message changes. >>> >>> Thank you for working on this. I'm a bit worried about this series though. >>> >>> The 2 things which I worry about are: >>> >>> >>> 1. I think this will not help when people boot in BIOS (CSM) mode rather >>> then UEFI mode which quite a few Linux users still do because they learned >>> to do this years ago when Linux EFI support (and EFI fw itself) was still >>> a bit in flux. >>> >>> IIRC from the last time we looked at this in CSM mode the BIOS itself >>> translates the EfiMemoryMappedIO areas to reserved E820 regions. So when >>> people use the BIOS CSM mode to boot, then this patch will not help >>> since the kernel lacks the info to do the translation. >> >> Right, if BIOS CSM puts EfiMemoryMappedIO in the E820 map the same way >> bootloaders do, and the kernel doesn't have the EFI memory map, this >> series won't help. > > So I just got the requested dmesg in BIOS CSM mode from: > https://bugzilla.redhat.com/show_bug.cgi?id=1868899 > > And it says: > > [ 0.000000] BIOS-e820: [mem 0x000000004bc50000-0x00000000cfffffff] reserved > [ 0.316140] pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window] > > So I'm afraid that I remembered correctly and the CSM adds > the EfiMemoryMappedIO regions to the E820 map as reserved :( > > So as you said, this series won't help for people booting in > BIOS compatibility mode. Which means that we should at least keep > the current list of no_e820 quirks to avoid regressing those models > when booted in BIOS compatibility mode. > > And maybe still add at least the Clevo model for which I recently > submitted a new no_e820 quirk so that that will work in BIOS CSM > mode too ? Do you mean the X170KM-G? I don't think it has the option to switch to Legacy BIOS mode (At least i didn't found an option in the bios version i have) > > Note I know you did not propose to drop the quirks in this series, > just covering all the bases here. > > Regards, > > Hans > > > > Kind regards, Werner Sembach
Hi Werner, On 12/5/22 14:27, Werner Sembach wrote: > Hi > > Am 04.12.22 um 10:29 schrieb Hans de Goede: >> Hi Bjorn, >> >> On 12/3/22 18:57, Bjorn Helgaas wrote: >>> On Sat, Dec 03, 2022 at 01:44:10PM +0100, Hans de Goede wrote: >>>> Hi Bjorn, >>>> >>>> On 12/2/22 22:18, Bjorn Helgaas wrote: >>>>> From: Bjorn Helgaas <bhelgaas@google.com> >>>>> >>>>> When allocating space for PCI BARs, Linux avoids allocating space mentioned >>>>> in the E820 map. This was originally done by 4dc2287c1805 ("x86: avoid >>>>> E820 regions when allocating address space") to work around BIOS defects >>>>> that included unusable space in host bridge _CRS. >>>>> >>>>> Some recent machines use EfiMemoryMappedIO for PCI MMCONFIG and host bridge >>>>> apertures, and bootloaders and EFI stubs convert those to E820 regions, >>>>> which means we can't allocate space for hot-added PCI devices (often a >>>>> dock) or for devices the BIOS didn't configure (often a touchpad) >>>>> >>>>> The current strategy is to add DMI quirks that disable the E820 filtering >>>>> on these machines and to disable it entirely starting with 2023 BIOSes: >>>>> >>>>> d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks") >>>>> 0ae084d5a674 ("x86/PCI: Disable E820 reserved region clipping starting in 2023") >>>>> >>>>> But the quirks are problematic because it's really hard to list all the >>>>> machines that need them. >>>>> >>>>> This series is an attempt at a more generic approach. I'm told by firmware >>>>> folks that EfiMemoryMappedIO means "the OS should map this area so EFI >>>>> runtime services can use it in virtual mode," but does not prevent the OS >>>>> from using it. >>>>> >>>>> The first patch removes any EfiMemoryMappedIO areas from the E820 map. >>>>> This doesn't affect any virtual mapping of those areas (that would have to >>>>> be done directly from the EFI memory map) but it means Linux can allocate >>>>> space for PCI MMIO. >>>>> >>>>> The rest are basically cosmetic log message changes. >>>> >>>> Thank you for working on this. I'm a bit worried about this series though. >>>> >>>> The 2 things which I worry about are: >>>> >>>> >>>> 1. I think this will not help when people boot in BIOS (CSM) mode rather >>>> then UEFI mode which quite a few Linux users still do because they learned >>>> to do this years ago when Linux EFI support (and EFI fw itself) was still >>>> a bit in flux. >>>> >>>> IIRC from the last time we looked at this in CSM mode the BIOS itself >>>> translates the EfiMemoryMappedIO areas to reserved E820 regions. So when >>>> people use the BIOS CSM mode to boot, then this patch will not help >>>> since the kernel lacks the info to do the translation. >>> >>> Right, if BIOS CSM puts EfiMemoryMappedIO in the E820 map the same way >>> bootloaders do, and the kernel doesn't have the EFI memory map, this >>> series won't help. >> >> So I just got the requested dmesg in BIOS CSM mode from: >> https://bugzilla.redhat.com/show_bug.cgi?id=1868899 >> >> And it says: >> >> [ 0.000000] BIOS-e820: [mem 0x000000004bc50000-0x00000000cfffffff] reserved >> [ 0.316140] pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window] >> >> So I'm afraid that I remembered correctly and the CSM adds >> the EfiMemoryMappedIO regions to the E820 map as reserved :( >> >> So as you said, this series won't help for people booting in >> BIOS compatibility mode. Which means that we should at least keep >> the current list of no_e820 quirks to avoid regressing those models >> when booted in BIOS compatibility mode. >> >> And maybe still add at least the Clevo model for which I recently >> submitted a new no_e820 quirk so that that will work in BIOS CSM >> mode too ? > Do you mean the X170KM-G? I don't think it has the option to switch to Legacy BIOS mode (At least i didn't found an option in the bios version i have) I'm talking about this patch: https://lore.kernel.org/linux-pci/20221010150206.142615-1-hdegoede@redhat.com/ Regards, Hans
Hi, On 12/5/22 17:29, Konrad J Hambrick wrote: > > > On Mon, Dec 5, 2022 at 8:26 AM Hans de Goede <hdegoede@redhat.com <mailto:hdegoede@redhat.com>> wrote: > > > > Do you mean the X170KM-G? I don't think it has the option to switch to Legacy BIOS mode (At least i didn't found an option in the bios version i have) > > I'm talking about this patch: > > https://lore.kernel.org/linux-pci/20221010150206.142615-1-hdegoede@redhat.com/ <https://lore.kernel.org/linux-pci/20221010150206.142615-1-hdegoede@redhat.com/> > > Regards, > > Hans > > > Hans -- > > I applied Bjorn's four Patches last night and built 6.1-rc8 and the 'Clevo X170KM-G Barebone' Quirk is still in the pci_crs_quirks[] array in arch/x86/pci/acpi.c, even after I applied Bjorn's Patches. Right that is expected, the point which I was trying to make is, that at least for those models which support Legacy BIOS mode, we need to keep the quirks around. > I am stuck on a project for my 'real job' and I won't be free until this evening. > > Do you still need a set of logs from 6.1-rc8 ? I don't remember ever asking for logs in this email thread. I guess Bjorn maybe asked for logs? Assuming I have that right I will leave answering the question if logs are still necessary up to Bjorn. Regards, Hans
Hi Bjorn, On 12/4/22 10:13, Hans de Goede wrote: <snip> >>> 2. I am afraid that now allowing PCI MMIO space to be allocated >>> in regions marked as EfiMemoryMappedIO will cause regressions >>> on some systems. Specifically when I tried something similar >>> the last time I looked at this (using the BIOS date cut-off >>> approach IIRC) there was a suspend/resume regression on >>> a Lenovo ThinkPad X1 carbon (20A7) model: >>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=2029207 >>> >>> Back then I came to the conclusion that the problem is that not >>> avoiding the EfiMemoryMappedIO regions caused PCI MMIO space to >>> be allocated in the 0xdfa00000 - 0xdfa10000 range which is >>> listed in the EFI memmap as: >>> >>> [ 0.000000] efi: mem46: [MMIO |RUN| | | | | | | | | | | | | ] range=[0x00000000dfa00000-0x00000000dfa0ffff] (0MB) >>> >>> And with current kernels with the extra logging added for this >>> the following is logged related to this: >>> >>> [ 0.326504] acpi PNP0A08:00: clipped [mem 0xdfa00000-0xfebfffff window] to [mem 0xdfa10000-0xfebfffff window] for e820 entry [mem 0xdceff000-0xdfa0ffff] >>> >>> I believe patch 1/4 of this set will make this clipping go away, >>> re-introducing the suspend/resume problem. >> >> Yes, I'm afraid you're right. Comparing the logs at comment #31 >> (fails) and comment #38 (works): >> >> pci_bus 0000:00: root bus resource [mem 0xdfa00000-0xfebfffff window] >> pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff] fails >> pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfb00000-0xdfcfffff] works >> >> Since 0xdfa00000 is included in the host bridge _CRS, but isn't >> usable, my guess is this is a _CRS bug. > > Ack. > > So I was thinking to maybe limit the removal of EfiMemoryMappedIO > regions from the E820 map if they are big enough to cause troubles? > > Looking at the EFI map MMIO regions on this Lenovo ThinkPad X1 carbon > (20A7) model, they are tiny. Where as the ones which we know cause > problems are huge. So maybe add a bit of heuristics to patch 1/4 based > on the EfiMemoryMappedIO region size and only remove the big ones > from the E820 map ? > > I know that adding heuristics like this always feels a bit wrong, > because you end up putting a somewhat arbitrary cut off point in > the code on which to toggle behavior on/off, but I think that in > this case it should work nicely given how huge the EfiMemoryMappedIO > regions which are actually causing problems are. One of the original reporters of the suspend/resume problem has gotten back to me in: https://bugzilla.redhat.com/show_bug.cgi?id=2029207 And they are willing to run some more tests. I could give them a 6.0 kernel with the 4 patches from this series added to test, but I think we both agree that it is very likely that the suspend/resume problem will resurface, so I'm not sure how useful that would be ? >> Or maybe BIOS is using the producer/consumer bit in _CRS to identify >> this as register space as opposed to a window? I thought we couldn't >> rely on that bit, but it's been a long time since I looked at it. An >> acpidump might have a clue. I have asked the reporter of; https://bugzilla.redhat.com/show_bug.cgi?id=2029207 To grab any acpidump. Regards, Hans p.s. Looking at the efi=debug output from: https://bugzilla-attachments.redhat.com/attachment.cgi?id=1861035 The small MMIO regions which we most honor as reserved do have the "RUN" (runtime) flag set in the EFI mmap. But I'm afraid that the same applies to the troublesome MMIO EFI regions which cause the failures to assign PCI regions for devices not setup by the firmware: https://bugzilla-attachments.redhat.com/attachment.cgi?id=1861407 So that "RUN" flag is of no use.
On Wed, Dec 07, 2022 at 04:31:12PM +0100, Hans de Goede wrote: > On 12/4/22 10:13, Hans de Goede wrote: > > <snip> > > >>> 2. I am afraid that now allowing PCI MMIO space to be allocated > >>> in regions marked as EfiMemoryMappedIO will cause regressions > >>> on some systems. Specifically when I tried something similar > >>> the last time I looked at this (using the BIOS date cut-off > >>> approach IIRC) there was a suspend/resume regression on > >>> a Lenovo ThinkPad X1 carbon (20A7) model: > >>> > >>> https://bugzilla.redhat.com/show_bug.cgi?id=2029207 > >>> > >>> Back then I came to the conclusion that the problem is that not > >>> avoiding the EfiMemoryMappedIO regions caused PCI MMIO space to > >>> be allocated in the 0xdfa00000 - 0xdfa10000 range which is > >>> listed in the EFI memmap as: > >>> > >>> [ 0.000000] efi: mem46: [MMIO |RUN| | | | | | | | | | | | | ] range=[0x00000000dfa00000-0x00000000dfa0ffff] (0MB) > >>> > >>> And with current kernels with the extra logging added for this > >>> the following is logged related to this: > >>> > >>> [ 0.326504] acpi PNP0A08:00: clipped [mem 0xdfa00000-0xfebfffff window] to [mem 0xdfa10000-0xfebfffff window] for e820 entry [mem 0xdceff000-0xdfa0ffff] > >>> > >>> I believe patch 1/4 of this set will make this clipping go away, > >>> re-introducing the suspend/resume problem. > >> > >> Yes, I'm afraid you're right. Comparing the logs at comment #31 > >> (fails) and comment #38 (works): > >> > >> pci_bus 0000:00: root bus resource [mem 0xdfa00000-0xfebfffff window] > >> pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff] fails > >> pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfb00000-0xdfcfffff] works > >> > >> Since 0xdfa00000 is included in the host bridge _CRS, but isn't > >> usable, my guess is this is a _CRS bug. > > > > Ack. > > > > So I was thinking to maybe limit the removal of EfiMemoryMappedIO > > regions from the E820 map if they are big enough to cause troubles? > > > > Looking at the EFI map MMIO regions on this Lenovo ThinkPad X1 carbon > > (20A7) model, they are tiny. Where as the ones which we know cause > > problems are huge. So maybe add a bit of heuristics to patch 1/4 based > > on the EfiMemoryMappedIO region size and only remove the big ones > > from the E820 map ? > > > > I know that adding heuristics like this always feels a bit wrong, > > because you end up putting a somewhat arbitrary cut off point in > > the code on which to toggle behavior on/off, but I think that in > > this case it should work nicely given how huge the EfiMemoryMappedIO > > regions which are actually causing problems are. I'll post a v2 that removes only regions 256KB or larger in a minute. > Looking at the efi=debug output from: > > https://bugzilla-attachments.redhat.com/attachment.cgi?id=1861035 > > The small MMIO regions which we most honor as reserved do > have the "RUN" (runtime) flag set in the EFI mmap. Just trying to follow along here, so not sure any of the following is relevant ... This attachment is from https://bugzilla.redhat.com/show_bug.cgi?id=2029207, and it shows: efi: mem46: [MMIO|RUN| ] range=[0xdfa00000-0xdfa0ffff] (0MB) [64K] efi: mem47: [MMIO|RUN|UC] range=[0xf80f8000-0xf80f8fff] (0MB) [4K] pci_bus 0000:00: root bus resource [mem 0xdfa00000-0xfebfffff window] pci_bus 0000:00: root bus resource [mem 0xfed40000-0xfed4bfff window] mem46 is included in the PNP0A08 _CRS, and Ivan has verified experimentally that we have to avoid it. mem47 is also included in the _CRS, but I don't have a clue what it is. Maybe some hidden device used by BIOS but not visible to us? > But I'm afraid that the same applies to the troublesome > MMIO EFI regions which cause the failures to assign > PCI regions for devices not setup by the firmware: > > https://bugzilla-attachments.redhat.com/attachment.cgi?id=1861407 > > So that "RUN" flag is of no use. I don't know what bug this attachment is from. Is the point here that you considered doing the E820 removal based on the EFI_MEMORY_RUNTIME memory *attribute* instead of the EFI_MEMORY_MAPPED_IO memory *type*? I don't really know the details of EFI_MEMORY_MAPPED_IO vs EFI_MEMORY_RUNTIME, but it looks like EFI_MEMORY_RUNTIME can be applied to things like EFI_RUNTIME_SERVICES_CODE (not MMIO space) that should stay in E820. Bjorn
Hi Bjorn, On 12/8/22 19:57, Bjorn Helgaas wrote: > On Wed, Dec 07, 2022 at 04:31:12PM +0100, Hans de Goede wrote: >> On 12/4/22 10:13, Hans de Goede wrote: >> >> <snip> >> >>>>> 2. I am afraid that now allowing PCI MMIO space to be allocated >>>>> in regions marked as EfiMemoryMappedIO will cause regressions >>>>> on some systems. Specifically when I tried something similar >>>>> the last time I looked at this (using the BIOS date cut-off >>>>> approach IIRC) there was a suspend/resume regression on >>>>> a Lenovo ThinkPad X1 carbon (20A7) model: >>>>> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=2029207 >>>>> >>>>> Back then I came to the conclusion that the problem is that not >>>>> avoiding the EfiMemoryMappedIO regions caused PCI MMIO space to >>>>> be allocated in the 0xdfa00000 - 0xdfa10000 range which is >>>>> listed in the EFI memmap as: >>>>> >>>>> [ 0.000000] efi: mem46: [MMIO |RUN| | | | | | | | | | | | | ] range=[0x00000000dfa00000-0x00000000dfa0ffff] (0MB) >>>>> >>>>> And with current kernels with the extra logging added for this >>>>> the following is logged related to this: >>>>> >>>>> [ 0.326504] acpi PNP0A08:00: clipped [mem 0xdfa00000-0xfebfffff window] to [mem 0xdfa10000-0xfebfffff window] for e820 entry [mem 0xdceff000-0xdfa0ffff] >>>>> >>>>> I believe patch 1/4 of this set will make this clipping go away, >>>>> re-introducing the suspend/resume problem. >>>> >>>> Yes, I'm afraid you're right. Comparing the logs at comment #31 >>>> (fails) and comment #38 (works): >>>> >>>> pci_bus 0000:00: root bus resource [mem 0xdfa00000-0xfebfffff window] >>>> pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff] fails >>>> pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfb00000-0xdfcfffff] works >>>> >>>> Since 0xdfa00000 is included in the host bridge _CRS, but isn't >>>> usable, my guess is this is a _CRS bug. >>> >>> Ack. >>> >>> So I was thinking to maybe limit the removal of EfiMemoryMappedIO >>> regions from the E820 map if they are big enough to cause troubles? >>> >>> Looking at the EFI map MMIO regions on this Lenovo ThinkPad X1 carbon >>> (20A7) model, they are tiny. Where as the ones which we know cause >>> problems are huge. So maybe add a bit of heuristics to patch 1/4 based >>> on the EfiMemoryMappedIO region size and only remove the big ones >>> from the E820 map ? >>> >>> I know that adding heuristics like this always feels a bit wrong, >>> because you end up putting a somewhat arbitrary cut off point in >>> the code on which to toggle behavior on/off, but I think that in >>> this case it should work nicely given how huge the EfiMemoryMappedIO >>> regions which are actually causing problems are. > > I'll post a v2 that removes only regions 256KB or larger in a minute. Ok, may I ask why 256KB? I see that that rules out then troublesome MMIO regions from the X1 carbon from: https://bugzilla.redhat.com/show_bug.cgi?id=2029207 : efi: mem46: [MMIO|RUN| ] range=[0xdfa00000-0xdfa0ffff] (0MB) [64K] which we know we need to avoid / keep reserved. But OTOH the reservations which are causing the problems with assigning resources to PCI devices by Linux look like this: efi: mem50: [MMIO |RUN| | | | | | | | | | | | |UC] range=[0x0000000065400000-0x00000000cfffffff] (1708MB) which is significantly larger then 256KB. So we could e.g. also put the cut-off point at 16MB and still remove the above troublesome reservation from the E820 table. Note just thinking out loud here. I have no idea if 16MB would be better... > >> Looking at the efi=debug output from: >> >> https://bugzilla-attachments.redhat.com/attachment.cgi?id=1861035 >> >> The small MMIO regions which we most honor as reserved do >> have the "RUN" (runtime) flag set in the EFI mmap. > > Just trying to follow along here, so not sure any of the following is > relevant ... > > This attachment is from > https://bugzilla.redhat.com/show_bug.cgi?id=2029207, and it shows: > > efi: mem46: [MMIO|RUN| ] range=[0xdfa00000-0xdfa0ffff] (0MB) [64K] > efi: mem47: [MMIO|RUN|UC] range=[0xf80f8000-0xf80f8fff] (0MB) [4K] > pci_bus 0000:00: root bus resource [mem 0xdfa00000-0xfebfffff window] > pci_bus 0000:00: root bus resource [mem 0xfed40000-0xfed4bfff window] > > mem46 is included in the PNP0A08 _CRS, and Ivan has verified > experimentally that we have to avoid it. Ack. > mem47 is also included in the _CRS, but I don't have a clue what it > is. Maybe some hidden device used by BIOS but not visible to us? Could be, there is at least one hidden device called the P2SB on most Intel systems. >> But I'm afraid that the same applies to the troublesome >> MMIO EFI regions which cause the failures to assign >> PCI regions for devices not setup by the firmware: >> >> https://bugzilla-attachments.redhat.com/attachment.cgi?id=1861407 >> >> So that "RUN" flag is of no use. > > I don't know what bug this attachment is from. It is from https://bugzilla.redhat.com/show_bug.cgi?id=1868899 which is the ideapad slim 3 with the touchpad issue caused by the: efi: mem50: [MMIO |RUN| | | | | | | | | | | | |UC] range=[0x0000000065400000-0x00000000cfffffff] (1708MB) reservation getting in the way of assigning resources to the i2c-controller. > Is the point here that you considered doing the E820 removal based on > the EFI_MEMORY_RUNTIME memory *attribute* instead of the > EFI_MEMORY_MAPPED_IO memory *type*? > > I don't really know the details of EFI_MEMORY_MAPPED_IO vs > EFI_MEMORY_RUNTIME, but it looks like EFI_MEMORY_RUNTIME can be > applied to things like EFI_RUNTIME_SERVICES_CODE (not MMIO space) that > should stay in E820. Sorry for the confusion. What I was trying to say is that I was interested in seeing if we could use the "RUN" flag to differentiate between: 1. The big MMIO region which we want to remove from the e820 map: efi: mem50: [MMIO |RUN| | | | | | | | | | | | |UC] range=[0x0000000065400000-0x00000000cfffffff] (1708MB) 2. The small MMIO region which we want to keep to avoid the reported suspend/resume issue: efi: mem46: [MMIO|RUN| ] range=[0xdfa00000-0xdfa0ffff] (0MB) [64K] But unfortunately both have the RUN flag set so the RUN flag is of no use to us. Regards, Hans
On Thu, Dec 08, 2022 at 08:16:31PM +0100, Hans de Goede wrote: > Hi Bjorn, > > On 12/8/22 19:57, Bjorn Helgaas wrote: > > On Wed, Dec 07, 2022 at 04:31:12PM +0100, Hans de Goede wrote: > >> On 12/4/22 10:13, Hans de Goede wrote: > >> > >> <snip> > >> > >>>>> 2. I am afraid that now allowing PCI MMIO space to be allocated > >>>>> in regions marked as EfiMemoryMappedIO will cause regressions > >>>>> on some systems. Specifically when I tried something similar > >>>>> the last time I looked at this (using the BIOS date cut-off > >>>>> approach IIRC) there was a suspend/resume regression on > >>>>> a Lenovo ThinkPad X1 carbon (20A7) model: > >>>>> > >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=2029207 > >>>>> > >>>>> Back then I came to the conclusion that the problem is that not > >>>>> avoiding the EfiMemoryMappedIO regions caused PCI MMIO space to > >>>>> be allocated in the 0xdfa00000 - 0xdfa10000 range which is > >>>>> listed in the EFI memmap as: > >>>>> > >>>>> [ 0.000000] efi: mem46: [MMIO |RUN| | | | | | | | | | | | | ] range=[0x00000000dfa00000-0x00000000dfa0ffff] (0MB) > >>>>> > >>>>> And with current kernels with the extra logging added for this > >>>>> the following is logged related to this: > >>>>> > >>>>> [ 0.326504] acpi PNP0A08:00: clipped [mem 0xdfa00000-0xfebfffff window] to [mem 0xdfa10000-0xfebfffff window] for e820 entry [mem 0xdceff000-0xdfa0ffff] > >>>>> > >>>>> I believe patch 1/4 of this set will make this clipping go away, > >>>>> re-introducing the suspend/resume problem. > >>>> > >>>> Yes, I'm afraid you're right. Comparing the logs at comment #31 > >>>> (fails) and comment #38 (works): > >>>> > >>>> pci_bus 0000:00: root bus resource [mem 0xdfa00000-0xfebfffff window] > >>>> pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff] fails > >>>> pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfb00000-0xdfcfffff] works > >>>> > >>>> Since 0xdfa00000 is included in the host bridge _CRS, but isn't > >>>> usable, my guess is this is a _CRS bug. > >>> > >>> Ack. > >>> > >>> So I was thinking to maybe limit the removal of EfiMemoryMappedIO > >>> regions from the E820 map if they are big enough to cause troubles? > >>> > >>> Looking at the EFI map MMIO regions on this Lenovo ThinkPad X1 carbon > >>> (20A7) model, they are tiny. Where as the ones which we know cause > >>> problems are huge. So maybe add a bit of heuristics to patch 1/4 based > >>> on the EfiMemoryMappedIO region size and only remove the big ones > >>> from the E820 map ? > >>> > >>> I know that adding heuristics like this always feels a bit wrong, > >>> because you end up putting a somewhat arbitrary cut off point in > >>> the code on which to toggle behavior on/off, but I think that in > >>> this case it should work nicely given how huge the EfiMemoryMappedIO > >>> regions which are actually causing problems are. > > > > I'll post a v2 that removes only regions 256KB or larger in a minute. > > Ok, may I ask why 256KB? > > I see that that rules out then troublesome MMIO regions from the X1 carbon from: > https://bugzilla.redhat.com/show_bug.cgi?id=2029207 : > efi: mem46: [MMIO|RUN| ] range=[0xdfa00000-0xdfa0ffff] (0MB) [64K] > which we know we need to avoid / keep reserved. > > But OTOH the reservations which are causing the problems with assigning > resources to PCI devices by Linux look like this: > efi: mem50: [MMIO |RUN| | | | | | | | | | | | |UC] range=[0x0000000065400000-0x00000000cfffffff] (1708MB) > which is significantly larger then 256KB. > > So we could e.g. also put the cut-off point at 16MB and still > remove the above troublesome reservation from the E820 table. > Note just thinking out loud here. I have no idea if 16MB > would be better... No good reason for 256KB. We know it needs to be at least 64KB for the X1 Carbon. I picked 4x bigger just for headroom, since I assume the 64KB is platform-specific host bridge registers or something. Do you think a bigger number would be better, i.e., we would retain more MMIO things in E820? ECAM areas would be 1MB per bus, so between 1MB and 256MB. Those areas *should* be reserved by PNP0C02 _CRS, but IIRC the early MMCONFIG code checks E820, and the late code checks for _CRS. I guess one could argue that ignoring those, e.g., by retaining anything 256MB or smaller in E820, would reduce the amount of change. But if the host bridge _CRS includes 256MB of legitimate window that EFI says is MMIO and is hence included in E820, that seems like kind of a lot of usable window space to give up. > ... > Sorry for the confusion. What I was trying to say is that I was interested > in seeing if we could use the "RUN" flag to differentiate between: > > 1. The big MMIO region which we want to remove from the e820 map: > efi: mem50: [MMIO |RUN| | | | | | | | | | | | |UC] range=[0x0000000065400000-0x00000000cfffffff] (1708MB) > > 2. The small MMIO region which we want to keep to avoid the reported suspend/resume issue: > efi: mem46: [MMIO|RUN| ] range=[0xdfa00000-0xdfa0ffff] (0MB) [64K] > > But unfortunately both have the RUN flag set so the RUN flag is > of no use to us. Right, makes sense. Bjorn
Hi, On 12/8/22 21:06, Bjorn Helgaas wrote: > On Thu, Dec 08, 2022 at 08:16:31PM +0100, Hans de Goede wrote: >> Hi Bjorn, >> >> On 12/8/22 19:57, Bjorn Helgaas wrote: >>> On Wed, Dec 07, 2022 at 04:31:12PM +0100, Hans de Goede wrote: >>>> On 12/4/22 10:13, Hans de Goede wrote: >>>> >>>> <snip> >>>> >>>>>>> 2. I am afraid that now allowing PCI MMIO space to be allocated >>>>>>> in regions marked as EfiMemoryMappedIO will cause regressions >>>>>>> on some systems. Specifically when I tried something similar >>>>>>> the last time I looked at this (using the BIOS date cut-off >>>>>>> approach IIRC) there was a suspend/resume regression on >>>>>>> a Lenovo ThinkPad X1 carbon (20A7) model: >>>>>>> >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=2029207 >>>>>>> >>>>>>> Back then I came to the conclusion that the problem is that not >>>>>>> avoiding the EfiMemoryMappedIO regions caused PCI MMIO space to >>>>>>> be allocated in the 0xdfa00000 - 0xdfa10000 range which is >>>>>>> listed in the EFI memmap as: >>>>>>> >>>>>>> [ 0.000000] efi: mem46: [MMIO |RUN| | | | | | | | | | | | | ] range=[0x00000000dfa00000-0x00000000dfa0ffff] (0MB) >>>>>>> >>>>>>> And with current kernels with the extra logging added for this >>>>>>> the following is logged related to this: >>>>>>> >>>>>>> [ 0.326504] acpi PNP0A08:00: clipped [mem 0xdfa00000-0xfebfffff window] to [mem 0xdfa10000-0xfebfffff window] for e820 entry [mem 0xdceff000-0xdfa0ffff] >>>>>>> >>>>>>> I believe patch 1/4 of this set will make this clipping go away, >>>>>>> re-introducing the suspend/resume problem. >>>>>> >>>>>> Yes, I'm afraid you're right. Comparing the logs at comment #31 >>>>>> (fails) and comment #38 (works): >>>>>> >>>>>> pci_bus 0000:00: root bus resource [mem 0xdfa00000-0xfebfffff window] >>>>>> pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff] fails >>>>>> pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfb00000-0xdfcfffff] works >>>>>> >>>>>> Since 0xdfa00000 is included in the host bridge _CRS, but isn't >>>>>> usable, my guess is this is a _CRS bug. >>>>> >>>>> Ack. >>>>> >>>>> So I was thinking to maybe limit the removal of EfiMemoryMappedIO >>>>> regions from the E820 map if they are big enough to cause troubles? >>>>> >>>>> Looking at the EFI map MMIO regions on this Lenovo ThinkPad X1 carbon >>>>> (20A7) model, they are tiny. Where as the ones which we know cause >>>>> problems are huge. So maybe add a bit of heuristics to patch 1/4 based >>>>> on the EfiMemoryMappedIO region size and only remove the big ones >>>>> from the E820 map ? >>>>> >>>>> I know that adding heuristics like this always feels a bit wrong, >>>>> because you end up putting a somewhat arbitrary cut off point in >>>>> the code on which to toggle behavior on/off, but I think that in >>>>> this case it should work nicely given how huge the EfiMemoryMappedIO >>>>> regions which are actually causing problems are. >>> >>> I'll post a v2 that removes only regions 256KB or larger in a minute. >> >> Ok, may I ask why 256KB? >> >> I see that that rules out then troublesome MMIO regions from the X1 carbon from: >> https://bugzilla.redhat.com/show_bug.cgi?id=2029207 : >> efi: mem46: [MMIO|RUN| ] range=[0xdfa00000-0xdfa0ffff] (0MB) [64K] >> which we know we need to avoid / keep reserved. >> >> But OTOH the reservations which are causing the problems with assigning >> resources to PCI devices by Linux look like this: >> efi: mem50: [MMIO |RUN| | | | | | | | | | | | |UC] range=[0x0000000065400000-0x00000000cfffffff] (1708MB) >> which is significantly larger then 256KB. >> >> So we could e.g. also put the cut-off point at 16MB and still >> remove the above troublesome reservation from the E820 table. >> Note just thinking out loud here. I have no idea if 16MB >> would be better... > > No good reason for 256KB. We know it needs to be at least 64KB for > the X1 Carbon. I picked 4x bigger just for headroom, since I assume > the 64KB is platform-specific host bridge registers or something. Do > you think a bigger number would be better, i.e., we would retain more > MMIO things in E820? > > ECAM areas would be 1MB per bus, so between 1MB and 256MB. Those areas > *should* be reserved by PNP0C02 _CRS, but IIRC the early MMCONFIG code > checks E820, and the late code checks for _CRS. I guess one could > argue that ignoring those, e.g., by retaining anything 256MB or > smaller in E820, would reduce the amount of change. Right, reducing how much we change what the E820 map looks like after this would be the main reason to make the cut of point bigger then 256KB. > But if the host bridge _CRS includes 256MB of legitimate window that > EFI says is MMIO and is hence included in E820, that seems like kind > of a lot of usable window space to give up. Ack, I guess we can just go with 256KB for now and then see how things go. Regards, Hans