Message ID | 20221118160916.7e165306.alex.williamson@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp460062wrr; Fri, 18 Nov 2022 15:23:11 -0800 (PST) X-Google-Smtp-Source: AA0mqf4duJd6IQolLArbO9cg+tT04K6ksMNOLmsVJaeEhwySaGzNZRTbuA0IChRvd5D9PniYUwEm X-Received: by 2002:a17:906:6d88:b0:7ad:b86b:3ff with SMTP id h8-20020a1709066d8800b007adb86b03ffmr8056999ejt.448.1668813791676; Fri, 18 Nov 2022 15:23:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668813791; cv=none; d=google.com; s=arc-20160816; b=VJZ6wYH482k4VyEK+sBZ6QqevFTdIvVnxv8U1b9gi8qn0XuAwWcUfmSrntHJW8iCYY NsTU30aImA6scmabuxhxqCdcP7r7basI9mgPVR+Pzbu7bupF7u4Z8woky37YUSaeSomp gz3UjGBHXmm0qzC7rLjjDseGaAXP5gWwoR7NKlJ4qm8vdGvrCck81pqGQ78Ls3hpvBXL bEd1CIH8E84IyFggaurIm7iSVrKDLtnm4WA6L5c4OZlYFbISzG8cHWnj9xqS4WxOHlAy ue8Mbn+09IUa/3P4429F7s4a4o44jyGVPQgBODp4b5pypVKvJgOi5MCNn0Mol4P/XXsa LU/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:subject:cc:to:from:date:dkim-signature; bh=fqxYlPjStId8INOMKlUIJ9udLA0AzXzEw7XWCP07sCA=; b=TfIKew4Td35zfWWNgvdHeHygjaEhZdgsBYcWiCbpkV0cN+3zeyK03Jd8yNA5yC8mht EPptf+5KT8uUSiwNDTAp08rF4+nPav2z6Z50hCeUCCR037T6qK7eeqjc0r69ibCo9Dig /PpVlYYyDecpm+OcEd7xXD75KR607cB9tuDVcxsvVhFcLlcgX9PYv72F1U7EQFRcPP8H ZVwh8tebwQ/G/RRXquPIrk+xFqJlZld9pb17jobnu7SvIRm/ueMW8KvpF9bcPrCCqjYN qngaPEFdyuV/CXFAyUXmFZcIEUkKGasCpJDSvieDI6LHu67qMUwmp3BavMZbK6AaggeH AaUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ShB2nT5h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u1-20020a1709064ac100b0078c3197bf86si3385928ejt.533.2022.11.18.15.22.45; Fri, 18 Nov 2022 15:23:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ShB2nT5h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229719AbiKRXVR (ORCPT <rfc822;kkmonlee@gmail.com> + 99 others); Fri, 18 Nov 2022 18:21:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231790AbiKRXU5 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 18 Nov 2022 18:20:57 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71E2F7C6AF for <linux-kernel@vger.kernel.org>; Fri, 18 Nov 2022 15:09:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668812961; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=fqxYlPjStId8INOMKlUIJ9udLA0AzXzEw7XWCP07sCA=; b=ShB2nT5hZ9TmyBLg70rRqZJNTRlu0vl02lTtMRH2+0j8qi4E3gyKzf6dpHkdC3gvyI5V5J EYsZxmrROWx7cyioIf9p91Bhw5StM/A4w0aP1Gop2wdnAeshK4epd+mlxPGwCwlQUakZRk g0W3Be7bdZlFfWHVP8G1JWDbxitc8rk= Received: from mail-io1-f72.google.com (mail-io1-f72.google.com [209.85.166.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-347-zZhdaS2ZMiOzxxwsoT0uFg-1; Fri, 18 Nov 2022 18:09:20 -0500 X-MC-Unique: zZhdaS2ZMiOzxxwsoT0uFg-1 Received: by mail-io1-f72.google.com with SMTP id y5-20020a056602120500b006cf628c14ddso3358089iot.15 for <linux-kernel@vger.kernel.org>; Fri, 18 Nov 2022 15:09:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:subject:cc:to :from:date:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fqxYlPjStId8INOMKlUIJ9udLA0AzXzEw7XWCP07sCA=; b=mbO232NvvydxzENQtxJBrcC5AUSQ3iTV25+93xIDzwpSSNaKm9F3/1EqSU0JCm9swf AE5sKdt184dRRMaGMJCuXugiPVobpB2qQiS0XXJiYz6R0U9UrdgJkPhbRcgqdd192SV0 rwt3qblI326brhRuoEbsiQm2svV5Xgt5ocoItVT4zdTL24eA0ovVUDAQQcvedEAt5/H7 y+ziNIhjUfiRhYBYvl7wPU2qSuBQuqWbdg8k1wHm+LIFfeyPEB16M6JWVGV64BFOAS2V IQKEint//0ZQmwkJh29pehV31Yo234SmTenKUNmFfFpLZPkJL+StSe18N4EzNVEPGpf5 mLxw== X-Gm-Message-State: ANoB5pnjSp/+zPy9ldbm29ZH+3v/X6aaN7vEYbhxuW9vJV08ddTg/Gem Wc6LHDV7OX1n2XipRm66IVhvP+KMcCQw9ERgQzqQkkgIy7o6E7cpewcBNr97f9i1joAXfdgG7gu F9GGcaMAMX0PjDJp0+Szgz+PN X-Received: by 2002:a05:6638:e8e:b0:365:ca83:bafb with SMTP id p14-20020a0566380e8e00b00365ca83bafbmr4266499jas.272.1668812959485; Fri, 18 Nov 2022 15:09:19 -0800 (PST) X-Received: by 2002:a05:6638:e8e:b0:365:ca83:bafb with SMTP id p14-20020a0566380e8e00b00365ca83bafbmr4266495jas.272.1668812959204; Fri, 18 Nov 2022 15:09:19 -0800 (PST) Received: from redhat.com ([38.15.36.239]) by smtp.gmail.com with ESMTPSA id g17-20020a056e02131100b0030249f369f7sm1631332ilr.82.2022.11.18.15.09.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Nov 2022 15:09:18 -0800 (PST) Date: Fri, 18 Nov 2022 16:09:16 -0700 From: Alex Williamson <alex.williamson@redhat.com> To: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org> Cc: <linux-kernel@vger.kernel.org>, christian.koenig@amd.com Subject: [RFC] Resizable BARs vs bridges with BARs Message-ID: <20221118160916.7e165306.alex.williamson@redhat.com> X-Mailer: Claws Mail 4.1.0 (GTK 3.24.34; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749878090209813142?= X-GMAIL-MSGID: =?utf-8?q?1749878090209813142?= |
Series |
[RFC] Resizable BARs vs bridges with BARs
|
|
Commit Message
Alex Williamson
Nov. 18, 2022, 11:09 p.m. UTC
Hi,
I'm trying to get resizable BARs working in a configuration where my
root bus resources provide plenty of aperture for the BAR:
pci_bus 0000:5d: root bus resource [io 0x8000-0x9fff window]
pci_bus 0000:5d: root bus resource [mem 0xb8800000-0xc5ffffff window]
pci_bus 0000:5d: root bus resource [mem 0xb000000000-0xbfffffffff window] <<<
pci_bus 0000:5d: root bus resource [bus 5d-7f]
But resizing fails with -ENOSPC. The topology looks like this:
+-[0000:5d]-+-00.0-[5e-61]----00.0-[5f-61]--+-01.0-[60]----00.0 Intel Corporation DG2 [Arc A380]
\-04.0-[61]----00.0 Intel Corporation Device 4f92
The BIOS is not fluent in resizable BARs and only programs the root
port with a small aperture:
5d:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07) (prog-if 00 [Normal decode])
Bus: primary=5d, secondary=5e, subordinate=61, sec-latency=0
I/O behind bridge: 0000f000-00000fff [disabled]
Memory behind bridge: b9000000-ba0fffff [size=17M]
Prefetchable memory behind bridge: 000000bfe0000000-000000bff07fffff [size=264M]
Kernel driver in use: pcieport
The trouble comes on the upstream PCIe switch port:
5e:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01) (prog-if 00 [Normal decode])
>>> Region 0: Memory at b010000000 (64-bit, prefetchable)
Bus: primary=5e, secondary=5f, subordinate=61, sec-latency=0
I/O behind bridge: 0000f000-00000fff [disabled]
Memory behind bridge: b9000000-ba0fffff [size=17M]
Prefetchable memory behind bridge: 000000bfe0000000-000000bfefffffff [size=256M]
Kernel driver in use: pcieport
Note region 0 of this bridge, which is 64-bit, prefetchable and
therefore conflicts with the same type for the resizable BAR on the GPU:
60:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A380] (rev 05) (prog-if 00 [VGA controller])
Region 0: Memory at b9000000 (64-bit, non-prefetchable) [disabled] [size=16M]
Region 2: Memory at bfe0000000 (64-bit, prefetchable) [disabled] [size=256M]
Expansion ROM at <ignored> [disabled]
Capabilities: [420 v1] Physical Resizable BAR
BAR 2: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB
It's a shame that the hardware designers didn't mark the upstream port
BAR as non-prefetchable to avoid it living in the same resource
aperture as the resizable BAR on the downstream device. In any case,
it's my understanding that our bridge drivers don't generally make use
of bridge BARs. I think we can test whether a driver has done a
pci_request_region() or equivalent by looking for the IORESOURCE_BUSY
flag, but I also suspect this is potentially racy.
The patch below works for me, allowing the new resourceN_resize sysfs
attribute to resize the root port window within the provided bus
window. Is this the right answer? How can we make it feel less
sketchy? Thanks,
Alex
Comments
Hi Alex, Am 19.11.22 um 00:09 schrieb Alex Williamson: > Hi, > > I'm trying to get resizable BARs working in a configuration where my > root bus resources provide plenty of aperture for the BAR: > > pci_bus 0000:5d: root bus resource [io 0x8000-0x9fff window] > pci_bus 0000:5d: root bus resource [mem 0xb8800000-0xc5ffffff window] > pci_bus 0000:5d: root bus resource [mem 0xb000000000-0xbfffffffff window] <<< > pci_bus 0000:5d: root bus resource [bus 5d-7f] > > But resizing fails with -ENOSPC. The topology looks like this: > > +-[0000:5d]-+-00.0-[5e-61]----00.0-[5f-61]--+-01.0-[60]----00.0 Intel Corporation DG2 [Arc A380] > \-04.0-[61]----00.0 Intel Corporation Device 4f92 > > The BIOS is not fluent in resizable BARs and only programs the root > port with a small aperture: > > 5d:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07) (prog-if 00 [Normal decode]) > Bus: primary=5d, secondary=5e, subordinate=61, sec-latency=0 > I/O behind bridge: 0000f000-00000fff [disabled] > Memory behind bridge: b9000000-ba0fffff [size=17M] > Prefetchable memory behind bridge: 000000bfe0000000-000000bff07fffff [size=264M] > Kernel driver in use: pcieport > > The trouble comes on the upstream PCIe switch port: > > 5e:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01) (prog-if 00 [Normal decode]) > >>> Region 0: Memory at b010000000 (64-bit, prefetchable) > Bus: primary=5e, secondary=5f, subordinate=61, sec-latency=0 > I/O behind bridge: 0000f000-00000fff [disabled] > Memory behind bridge: b9000000-ba0fffff [size=17M] > Prefetchable memory behind bridge: 000000bfe0000000-000000bfefffffff [size=256M] > Kernel driver in use: pcieport > > Note region 0 of this bridge, which is 64-bit, prefetchable and > therefore conflicts with the same type for the resizable BAR on the GPU: > > 60:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A380] (rev 05) (prog-if 00 [VGA controller]) > Region 0: Memory at b9000000 (64-bit, non-prefetchable) [disabled] [size=16M] > Region 2: Memory at bfe0000000 (64-bit, prefetchable) [disabled] [size=256M] > Expansion ROM at <ignored> [disabled] > Capabilities: [420 v1] Physical Resizable BAR > BAR 2: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB > > It's a shame that the hardware designers didn't mark the upstream port > BAR as non-prefetchable to avoid it living in the same resource > aperture as the resizable BAR on the downstream device. This is expected. Bridges always have a 32bit non prefetchable and a 64bit prefetchable BAR. This is part of the PCI(e) spec. > In any case, it's my understanding that our bridge drivers don't generally make use > of bridge BARs. I think we can test whether a driver has done a > pci_request_region() or equivalent by looking for the IORESOURCE_BUSY > flag, but I also suspect this is potentially racy. That sounds like we have a misunderstanding here how those bridges work. The upstream bridges should include all the resources of the downstream devices/bridges in their BARs. > The patch below works for me, allowing the new resourceN_resize sysfs > attribute to resize the root port window within the provided bus > window. Is this the right answer? How can we make it feel less > sketchy? Thanks, The correct approach is to remove all the drivers (EFI, vesafb etc...) which are using the PCI(e) devices under the bridge in question. Then release the resources and puzzle everything back together. See amdgpu_device_resize_fb_bar() how to do this correctly. Regards, Christian. > > Alex > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c > index b4096598dbcb..8c332a08174d 100644 > --- a/drivers/pci/setup-bus.c > +++ b/drivers/pci/setup-bus.c > @@ -2137,13 +2137,19 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type) > next = bridge; > do { > bridge = next; > - for (i = PCI_BRIDGE_RESOURCES; i < PCI_BRIDGE_RESOURCE_END; > + for (i = PCI_STD_RESOURCES; i < PCI_BRIDGE_RESOURCE_END; > i++) { > struct resource *res = &bridge->resource[i]; > > if ((res->flags ^ type) & PCI_RES_TYPE_MASK) > continue; > > + if (i < PCI_STD_NUM_BARS) { > + if (!(res->flags & IORESOURCE_BUSY)) > + pci_release_resource(bridge, i); > + continue; > + } > + > /* Ignore BARs which are still in use */ > if (res->child) > continue; >
Hi Christian, On Sat, 19 Nov 2022 12:02:55 +0100 Christian König <christian.koenig@amd.com> wrote: > Hi Alex, > > Am 19.11.22 um 00:09 schrieb Alex Williamson: > > Hi, > > > > I'm trying to get resizable BARs working in a configuration where my > > root bus resources provide plenty of aperture for the BAR: > > > > pci_bus 0000:5d: root bus resource [io 0x8000-0x9fff window] > > pci_bus 0000:5d: root bus resource [mem 0xb8800000-0xc5ffffff window] > > pci_bus 0000:5d: root bus resource [mem 0xb000000000-0xbfffffffff window] <<< > > pci_bus 0000:5d: root bus resource [bus 5d-7f] > > > > But resizing fails with -ENOSPC. The topology looks like this: > > > > +-[0000:5d]-+-00.0-[5e-61]----00.0-[5f-61]--+-01.0-[60]----00.0 Intel Corporation DG2 [Arc A380] > > \-04.0-[61]----00.0 Intel Corporation Device 4f92 > > > > The BIOS is not fluent in resizable BARs and only programs the root > > port with a small aperture: > > > > 5d:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07) (prog-if 00 [Normal decode]) > > Bus: primary=5d, secondary=5e, subordinate=61, sec-latency=0 > > I/O behind bridge: 0000f000-00000fff [disabled] > > Memory behind bridge: b9000000-ba0fffff [size=17M] > > Prefetchable memory behind bridge: 000000bfe0000000-000000bff07fffff [size=264M] > > Kernel driver in use: pcieport > > > > The trouble comes on the upstream PCIe switch port: > > > > 5e:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01) (prog-if 00 [Normal decode]) > > >>> Region 0: Memory at b010000000 (64-bit, prefetchable) > > Bus: primary=5e, secondary=5f, subordinate=61, sec-latency=0 > > I/O behind bridge: 0000f000-00000fff [disabled] > > Memory behind bridge: b9000000-ba0fffff [size=17M] > > Prefetchable memory behind bridge: 000000bfe0000000-000000bfefffffff [size=256M] > > Kernel driver in use: pcieport > > > > Note region 0 of this bridge, which is 64-bit, prefetchable and > > therefore conflicts with the same type for the resizable BAR on the GPU: > > > > 60:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A380] (rev 05) (prog-if 00 [VGA controller]) > > Region 0: Memory at b9000000 (64-bit, non-prefetchable) [disabled] [size=16M] > > Region 2: Memory at bfe0000000 (64-bit, prefetchable) [disabled] [size=256M] > > Expansion ROM at <ignored> [disabled] > > Capabilities: [420 v1] Physical Resizable BAR > > BAR 2: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB > > > > It's a shame that the hardware designers didn't mark the upstream port > > BAR as non-prefetchable to avoid it living in the same resource > > aperture as the resizable BAR on the downstream device. > > This is expected. Bridges always have a 32bit non prefetchable and a > 64bit prefetchable BAR. This is part of the PCI(e) spec. To be clear, the issue is a bridge implementing a 64-bit, prefetchable BAR at config offset 0x10 & 0x14, not the limit/base registers that define the bridge windows for prefetchable and non-prefetchable downstream resources. > > In any case, it's my understanding that our bridge drivers don't generally make use > > of bridge BARs. I think we can test whether a driver has done a > > pci_request_region() or equivalent by looking for the IORESOURCE_BUSY > > flag, but I also suspect this is potentially racy. > > That sounds like we have a misunderstanding here how those bridges work. > The upstream bridges should include all the resources of the downstream > devices/bridges in their BARs. Correct, and the issue is that the bridge at 5e:00.0 _consumes_ a portion of the window we need to resize at the root port. Root port: Prefetchable memory behind bridge: 000000bfe0000000-000000bff07fffff [size=264M] Upstream switch port: Region 0: Memory at b010000000 (64-bit, prefetchable) Prefetchable memory behind bridge: 000000bfe0000000-000000bfefffffff [size=256M] It's that Region 0 resource that prevents resizing. > > The patch below works for me, allowing the new resourceN_resize sysfs > > attribute to resize the root port window within the provided bus > > window. Is this the right answer? How can we make it feel less > > sketchy? Thanks, > > The correct approach is to remove all the drivers (EFI, vesafb etc...) > which are using the PCI(e) devices under the bridge in question. Then > release the resources and puzzle everything back together. > > See amdgpu_device_resize_fb_bar() how to do this correctly. Resource resizing in pci-sysfs is largely modeled after the amdgpu code, but I don't see any special provisions for handling conflicting resources consumed on intermediate devices. The driver attached to the upstream switch port is pcieport and removing it doesn't resolve the problem. The necessary resource on the root port still reports a child. Is amdgppu resizing known to work in cases where the GPU is downstream of a PCIe switch that consumes resources of the same type and the root port aperture needs to be resized? I suspect it does not. Thanks, Alex
Hi Alex, Am 19.11.22 um 15:07 schrieb Alex Williamson: > Hi Christian, > > On Sat, 19 Nov 2022 12:02:55 +0100 > Christian König <christian.koenig@amd.com> wrote: > >> Hi Alex, >> >> Am 19.11.22 um 00:09 schrieb Alex Williamson: >>> Hi, >>> >>> I'm trying to get resizable BARs working in a configuration where my >>> root bus resources provide plenty of aperture for the BAR: >>> >>> pci_bus 0000:5d: root bus resource [io 0x8000-0x9fff window] >>> pci_bus 0000:5d: root bus resource [mem 0xb8800000-0xc5ffffff window] >>> pci_bus 0000:5d: root bus resource [mem 0xb000000000-0xbfffffffff window] <<< >>> pci_bus 0000:5d: root bus resource [bus 5d-7f] >>> >>> But resizing fails with -ENOSPC. The topology looks like this: >>> >>> +-[0000:5d]-+-00.0-[5e-61]----00.0-[5f-61]--+-01.0-[60]----00.0 Intel Corporation DG2 [Arc A380] >>> \-04.0-[61]----00.0 Intel Corporation Device 4f92 >>> >>> The BIOS is not fluent in resizable BARs and only programs the root >>> port with a small aperture: >>> >>> 5d:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07) (prog-if 00 [Normal decode]) >>> Bus: primary=5d, secondary=5e, subordinate=61, sec-latency=0 >>> I/O behind bridge: 0000f000-00000fff [disabled] >>> Memory behind bridge: b9000000-ba0fffff [size=17M] >>> Prefetchable memory behind bridge: 000000bfe0000000-000000bff07fffff [size=264M] >>> Kernel driver in use: pcieport >>> >>> The trouble comes on the upstream PCIe switch port: >>> >>> 5e:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01) (prog-if 00 [Normal decode]) >>> >>> Region 0: Memory at b010000000 (64-bit, prefetchable) >>> Bus: primary=5e, secondary=5f, subordinate=61, sec-latency=0 >>> I/O behind bridge: 0000f000-00000fff [disabled] >>> Memory behind bridge: b9000000-ba0fffff [size=17M] >>> Prefetchable memory behind bridge: 000000bfe0000000-000000bfefffffff [size=256M] >>> Kernel driver in use: pcieport >>> >>> Note region 0 of this bridge, which is 64-bit, prefetchable and >>> therefore conflicts with the same type for the resizable BAR on the GPU: >>> >>> 60:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A380] (rev 05) (prog-if 00 [VGA controller]) >>> Region 0: Memory at b9000000 (64-bit, non-prefetchable) [disabled] [size=16M] >>> Region 2: Memory at bfe0000000 (64-bit, prefetchable) [disabled] [size=256M] >>> Expansion ROM at <ignored> [disabled] >>> Capabilities: [420 v1] Physical Resizable BAR >>> BAR 2: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB >>> >>> It's a shame that the hardware designers didn't mark the upstream port >>> BAR as non-prefetchable to avoid it living in the same resource >>> aperture as the resizable BAR on the downstream device. >> This is expected. Bridges always have a 32bit non prefetchable and a >> 64bit prefetchable BAR. This is part of the PCI(e) spec. > To be clear, the issue is a bridge implementing a 64-bit, prefetchable > BAR at config offset 0x10 & 0x14, not the limit/base registers that > define the bridge windows for prefetchable and non-prefetchable > downstream resources. WHAT? I've never heard of a bridge with this configuration. I don't fully remember the spec, but I'm pretty sure that this isn't something standard. Can you give me the output of "sudo lspci -vvvv -s $busID" for this device. >>> In any case, it's my understanding that our bridge drivers don't generally make use >>> of bridge BARs. I think we can test whether a driver has done a >>> pci_request_region() or equivalent by looking for the IORESOURCE_BUSY >>> flag, but I also suspect this is potentially racy. >> That sounds like we have a misunderstanding here how those bridges work. >> The upstream bridges should include all the resources of the downstream >> devices/bridges in their BARs. > Correct, and the issue is that the bridge at 5e:00.0 _consumes_ a > portion of the window we need to resize at the root port. > > Root port: > Prefetchable memory behind bridge: 000000bfe0000000-000000bff07fffff [size=264M] > > Upstream switch port: > Region 0: Memory at b010000000 (64-bit, prefetchable) > Prefetchable memory behind bridge: 000000bfe0000000-000000bfefffffff [size=256M] > > It's that Region 0 resource that prevents resizing. Could it be that some of the ACPI tables are broken and because of this we add a fixed resource to this device? Otherwise I have a hard time coming up with a way for a bridge to have a BAR in the config space. >>> The patch below works for me, allowing the new resourceN_resize sysfs >>> attribute to resize the root port window within the provided bus >>> window. Is this the right answer? How can we make it feel less >>> sketchy? Thanks, >> The correct approach is to remove all the drivers (EFI, vesafb etc...) >> which are using the PCI(e) devices under the bridge in question. Then >> release the resources and puzzle everything back together. >> >> See amdgpu_device_resize_fb_bar() how to do this correctly. > Resource resizing in pci-sysfs is largely modeled after the amdgpu > code, but I don't see any special provisions for handling conflicting > resources consumed on intermediate devices. The driver attached to the > upstream switch port is pcieport and removing it doesn't resolve the > problem. The necessary resource on the root port still reports a > child. > > Is amdgppu resizing known to work in cases where the GPU is downstream > of a PCIe switch that consumes resources of the same type and the root > port aperture needs to be resized? I suspect it does not. Thanks, Well we have the possibility to add extra space to bridges on the kernel command line for this. This is used for things like hotplug behind bridges with limited address space. Quite a while ago there was also a patch set which dynamically binds/unbinds drivers from resources to resize the BARs. But that never got far because of locking problems. Regards, Christian. > > Alex >
Hi Christian, On Sat, 19 Nov 2022 20:14:15 +0100 Christian König <christian.koenig@amd.com> wrote: > Am 19.11.22 um 15:07 schrieb Alex Williamson: > > On Sat, 19 Nov 2022 12:02:55 +0100 > > Christian König <christian.koenig@amd.com> wrote: > >> Am 19.11.22 um 00:09 schrieb Alex Williamson: > >>> I'm trying to get resizable BARs working in a configuration where my > >>> root bus resources provide plenty of aperture for the BAR: > >>> > >>> pci_bus 0000:5d: root bus resource [io 0x8000-0x9fff window] > >>> pci_bus 0000:5d: root bus resource [mem 0xb8800000-0xc5ffffff window] > >>> pci_bus 0000:5d: root bus resource [mem 0xb000000000-0xbfffffffff window] <<< > >>> pci_bus 0000:5d: root bus resource [bus 5d-7f] > >>> > >>> But resizing fails with -ENOSPC. The topology looks like this: > >>> > >>> +-[0000:5d]-+-00.0-[5e-61]----00.0-[5f-61]--+-01.0-[60]----00.0 Intel Corporation DG2 [Arc A380] > >>> \-04.0-[61]----00.0 Intel Corporation Device 4f92 > >>> > >>> The BIOS is not fluent in resizable BARs and only programs the root > >>> port with a small aperture: > >>> > >>> 5d:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07) (prog-if 00 [Normal decode]) > >>> Bus: primary=5d, secondary=5e, subordinate=61, sec-latency=0 > >>> I/O behind bridge: 0000f000-00000fff [disabled] > >>> Memory behind bridge: b9000000-ba0fffff [size=17M] > >>> Prefetchable memory behind bridge: 000000bfe0000000-000000bff07fffff [size=264M] > >>> Kernel driver in use: pcieport > >>> > >>> The trouble comes on the upstream PCIe switch port: > >>> > >>> 5e:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01) (prog-if 00 [Normal decode]) > >>> >>> Region 0: Memory at b010000000 (64-bit, prefetchable) > >>> Bus: primary=5e, secondary=5f, subordinate=61, sec-latency=0 > >>> I/O behind bridge: 0000f000-00000fff [disabled] > >>> Memory behind bridge: b9000000-ba0fffff [size=17M] > >>> Prefetchable memory behind bridge: 000000bfe0000000-000000bfefffffff [size=256M] > >>> Kernel driver in use: pcieport > >>> > >>> Note region 0 of this bridge, which is 64-bit, prefetchable and > >>> therefore conflicts with the same type for the resizable BAR on the GPU: > >>> > >>> 60:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A380] (rev 05) (prog-if 00 [VGA controller]) > >>> Region 0: Memory at b9000000 (64-bit, non-prefetchable) [disabled] [size=16M] > >>> Region 2: Memory at bfe0000000 (64-bit, prefetchable) [disabled] [size=256M] > >>> Expansion ROM at <ignored> [disabled] > >>> Capabilities: [420 v1] Physical Resizable BAR > >>> BAR 2: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB > >>> > >>> It's a shame that the hardware designers didn't mark the upstream port > >>> BAR as non-prefetchable to avoid it living in the same resource > >>> aperture as the resizable BAR on the downstream device. > >> This is expected. Bridges always have a 32bit non prefetchable and a > >> 64bit prefetchable BAR. This is part of the PCI(e) spec. > > To be clear, the issue is a bridge implementing a 64-bit, prefetchable > > BAR at config offset 0x10 & 0x14, not the limit/base registers that > > define the bridge windows for prefetchable and non-prefetchable > > downstream resources. > > WHAT? I've never heard of a bridge with this configuration. I don't > fully remember the spec, but I'm pretty sure that this isn't something > standard. Type1 config space allows for two standard BARs. > Can you give me the output of "sudo lspci -vvvv -s $busID" for this device. 5e:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 42 NUMA node: 0 IOMMU group: 1 Region 0: Memory at bff0000000 (64-bit, prefetchable) [size=8M] Bus: primary=5e, secondary=5f, subordinate=61, sec-latency=0 I/O behind bridge: 0000f000-00000fff [disabled] Memory behind bridge: b9000000-ba0fffff [size=17M] Prefetchable memory behind bridge: 000000bfe0000000-000000bfefffffff [size=256M] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity+ SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] Express (v2) Upstream Port, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 75.000W DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (downgraded), Width x8 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP+ LTR+ 10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS+ AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: EgressBlck+ LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS+ LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: Upstream Port Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [148 v1] Power Budgeting <?> Capabilities: [158 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [178 v1] Physical Layer 16.0 GT/s <?> Capabilities: [1a0 v1] Lane Margining at the Receiver <?> Capabilities: [1d4 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Capabilities: [1dc v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=10us PortTPowerOnTime=14us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us Capabilities: [1f8 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?> Capabilities: [2f8 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?> Capabilities: [330 v1] Data Link Feature <?> Kernel driver in use: pcieport > >>> In any case, it's my understanding that our bridge drivers don't generally make use > >>> of bridge BARs. I think we can test whether a driver has done a > >>> pci_request_region() or equivalent by looking for the IORESOURCE_BUSY > >>> flag, but I also suspect this is potentially racy. > >> That sounds like we have a misunderstanding here how those bridges work. > >> The upstream bridges should include all the resources of the downstream > >> devices/bridges in their BARs. > > Correct, and the issue is that the bridge at 5e:00.0 _consumes_ a > > portion of the window we need to resize at the root port. > > > > Root port: > > Prefetchable memory behind bridge: 000000bfe0000000-000000bff07fffff [size=264M] > > > > Upstream switch port: > > Region 0: Memory at b010000000 (64-bit, prefetchable) > > Prefetchable memory behind bridge: 000000bfe0000000-000000bfefffffff [size=256M] > > > > It's that Region 0 resource that prevents resizing. > > Could it be that some of the ACPI tables are broken and because of this > we add a fixed resource to this device? The switch is part of a plug-in card, I'd not expect ACPI to be involved. It's just a standard BAR: # setpci -s 5e:00.0 BASE_ADDRESS_0 f000000c # setpci -s 5e:00.0 BASE_ADDRESS_1 000000bf # setpci -s 5e:00.0 BASE_ADDRESS_0=ffffffff # setpci -s 5e:00.0 BASE_ADDRESS_1=ffffffff # setpci -s 5e:00.0 BASE_ADDRESS_0 ff80000c # setpci -s 5e:00.0 BASE_ADDRESS_1 ffffffff All this would have transparently worked if they would have chosen to implement a non-prefetchable BAR. > Otherwise I have a hard time coming up with a way for a bridge to have a > BAR in the config space. It's a standard part of the Type1 config header. > >>> The patch below works for me, allowing the new resourceN_resize sysfs > >>> attribute to resize the root port window within the provided bus > >>> window. Is this the right answer? How can we make it feel less > >>> sketchy? Thanks, > >> The correct approach is to remove all the drivers (EFI, vesafb etc...) > >> which are using the PCI(e) devices under the bridge in question. Then > >> release the resources and puzzle everything back together. > >> > >> See amdgpu_device_resize_fb_bar() how to do this correctly. > > Resource resizing in pci-sysfs is largely modeled after the amdgpu > > code, but I don't see any special provisions for handling conflicting > > resources consumed on intermediate devices. The driver attached to the > > upstream switch port is pcieport and removing it doesn't resolve the > > problem. The necessary resource on the root port still reports a > > child. > > > > Is amdgppu resizing known to work in cases where the GPU is downstream > > of a PCIe switch that consumes resources of the same type and the root > > port aperture needs to be resized? I suspect it does not. Thanks, > > Well we have the possibility to add extra space to bridges on the kernel > command line for this. > > This is used for things like hotplug behind bridges with limited address > space. AFAIK, this is only for hotplug slots, my root port is HotPlug-. I'd also like to make pci=realloc aware of resizable BARs, but it hits the same problem. Thanks, Alex
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index b4096598dbcb..8c332a08174d 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -2137,13 +2137,19 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type) next = bridge; do { bridge = next; - for (i = PCI_BRIDGE_RESOURCES; i < PCI_BRIDGE_RESOURCE_END; + for (i = PCI_STD_RESOURCES; i < PCI_BRIDGE_RESOURCE_END; i++) { struct resource *res = &bridge->resource[i]; if ((res->flags ^ type) & PCI_RES_TYPE_MASK) continue; + if (i < PCI_STD_NUM_BARS) { + if (!(res->flags & IORESOURCE_BUSY)) + pci_release_resource(bridge, i); + continue; + } + /* Ignore BARs which are still in use */ if (res->child) continue;