Message ID | 20231228165707.3447-1-ilpo.jarvinen@linux.intel.com |
---|---|
Headers |
Return-Path: <linux-kernel+bounces-12787-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:6f82:b0:100:9c79:88ff with SMTP id tb2csp2110884dyb; Thu, 28 Dec 2023 08:57:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IEFPoSbWYG7wuZU2lw32Rg+L51qGS+1dd9dm+f+lwXvVJwmxswy8OyEaDDZThMjeTMw+YcE X-Received: by 2002:a05:6a20:7fa9:b0:195:105d:f40f with SMTP id d41-20020a056a207fa900b00195105df40fmr12030485pzj.81.1703782660013; Thu, 28 Dec 2023 08:57:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703782659; cv=none; d=google.com; s=arc-20160816; b=T/H63Q4EHHjFXxRfTjhHKXNTvf1E7/2uSz5hCuFYK1B5ltbuAFOh6Pp+V5p1LFbVyW gKdsc6ATT5SW2sqbFfeqkoGYLn7eL5qSjIK/koISMeWbFoLb8vQs91j8CgMFZ1W6yzgo wmUkZRobvSrWJNgLGIfaRhyxVNM+Zv5V0QnvoO1XY4FiTC6gtFXFX6+CXY0r1KG0gwuC r+JqUHO1MtFfp8hLHzQqCcULUtIfnWDCVfsrb+dheiGHvLcLkTlFzHNvySUDPff79P5W OGwM0Efb3dpMlZxoqcGu7SNu8EOWOZT+ewbVMfhpgwSBtPHgog8EF9MCNyAsgPntfznV FgYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=IkTaivMqSbwkQx6tdRK8iRmTpoPzk2TgRqpFAR1zSL0=; fh=4e2qFjS66mGpWyHKr9N8kXPPYMKbttPvTgPubxtup3s=; b=EX+9LUkjv3Sh7i6CBA//fQRtTeruefQ+7Hh8R7LTNCYfAm8rVYBZc3yo5sLWY1+tvY RWSXiUkx+7sDovLVJai4N5VEBXTlTQaUHAYPdaeJo/4hoHsZtreYYNxpAwui3wIjXgzZ 9sHQfhpbwHT/dIRZgurnpuJYh+3vhPsp+7Nxhjm5oD/3WOeHpH54KkPJGGSKbZqNfS/M cQXL7NpLDn+4dDl1SaHWRBi/TvvL+QzxK+kFMOT94OR+UqC/JpH3rNYm7LbV4Sv2oJGl D4R+FNs7QCIRlq2jYdftFC4VmMGTbbg6TCvIfnTz+2cBiOtcfzDKlJHlSm+61PvVwAhJ g36A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="aXrQswo/"; spf=pass (google.com: domain of linux-kernel+bounces-12787-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-12787-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id z30-20020a62d11e000000b006da34404b67si300507pfg.254.2023.12.28.08.57.39 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Dec 2023 08:57:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-12787-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="aXrQswo/"; spf=pass (google.com: domain of linux-kernel+bounces-12787-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-12787-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id C7EA4285C69 for <ouuuleilei@gmail.com>; Thu, 28 Dec 2023 16:57:39 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DD4B6F9E9; Thu, 28 Dec 2023 16:57:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aXrQswo/" X-Original-To: linux-kernel@vger.kernel.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7348CF508; Thu, 28 Dec 2023 16:57:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1703782643; x=1735318643; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=6BZEQD+BPdQPPiPwf+9/1GG6o7RjVGwpGa0Ba6OSvC8=; b=aXrQswo/UAG1u3QtnMSk6q68a5HAvpgF0pZUwC1KN0zvGRDM9k0fPBjI tk4kQDSHjDrbd4WC5vztZ7AtDnvcGxDoDHZnYLQ8BctUi8SdS8a4IxwhT nRLnHp9wCY4UjXj4jHhGkKdY0voxUJ3GasCK2/lHb4L5st88qSto/vz7o ya9U5N3+xxFaKsM0dKoW3c2IJkkyrInK3W6q3Qu48aEzK2M6zFqFONoVk rmJ/eFCsI76YALAKO2+twLsNU1XO5WcIm9GY7a8ADFqHMq+NlngC3RieK TwAYN+FtZGwbIJp6700+nLuTLftU36jkiflfJmVs25F3vHsfEbKTonDc1 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10937"; a="381536537" X-IronPort-AV: E=Sophos;i="6.04,312,1695711600"; d="scan'208";a="381536537" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Dec 2023 08:57:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10937"; a="844488429" X-IronPort-AV: E=Sophos;i="6.04,312,1695711600"; d="scan'208";a="844488429" Received: from ijarvine-desk1.ger.corp.intel.com (HELO localhost) ([10.94.250.171]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Dec 2023 08:57:15 -0800 From: =?utf-8?q?Ilpo_J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com> To: linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>, Rob Herring <robh@kernel.org>, =?utf-8?q?Krzysztof_Wilczy=C5=84ski?= <kw@linux.com>, Igor Mammedov <imammedo@redhat.com>, Lukas Wunner <lukas@wunner.de>, Mika Westerberg <mika.westerberg@linux.intel.com>, Andy Shevchenko <andriy.shevchenko@intel.com>, "Rafael J . Wysocki" <rafael@kernel.org> Cc: linux-kernel@vger.kernel.org, =?utf-8?q?Ilpo_J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com> Subject: [PATCH v2 0/7] PCI: Solve two bridge window sizing issues Date: Thu, 28 Dec 2023 18:57:00 +0200 Message-Id: <20231228165707.3447-1-ilpo.jarvinen@linux.intel.com> X-Mailer: git-send-email 2.30.2 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1786545606141875843 X-GMAIL-MSGID: 1786545606141875843 |
Series |
PCI: Solve two bridge window sizing issues
|
|
Message
Ilpo Järvinen
Dec. 28, 2023, 4:57 p.m. UTC
Hi all, Here's a series that contains two fixes to PCI bridge window sizing algorithm. Together, they should enable remove & rescan cycle to work for a PCI bus that has PCI devices with optional resources and/or disparity in BAR sizes. For the second fix, I chose to expose find_empty_resource_slot() from kernel/resource.c because it should increase accuracy of the cannot-fit decision (currently that function is called find_resource()). In order to do that sensibly, a few improvements seemed in order to make its interface and name of the function sane before exposing it. Thus, the few extra patches on resource side. Unfortunately I don't have a reason to suspect these would help with the issues related to the currently ongoing resource regression thread [1]. [1] https://lore.kernel.org/linux-pci/ZXpaNCLiDM+Kv38H@marvin.atrad.com.au/ v2: - Add "typedef" to kerneldoc to get correct formatting - Use RESOURCE_SIZE_MAX instead of literal - Remove unnecessary checks for io{port/mem}_resource - Apply a few style tweaks from Andy Ilpo Järvinen (7): PCI: Fix resource double counting on remove & rescan resource: Rename find_resource() to find_empty_resource_slot() resource: Document find_empty_resource_slot() and resource_constraint resource: Use typedef for alignf callback resource: Handle simple alignment inside __find_empty_resource_slot() resource: Export find_empty_resource_slot() PCI: Relax bridge window tail sizing rules drivers/pci/bus.c | 10 ++---- drivers/pci/setup-bus.c | 80 +++++++++++++++++++++++++++++++++++++---- include/linux/ioport.h | 44 ++++++++++++++++++++--- include/linux/pci.h | 5 +-- kernel/resource.c | 68 ++++++++++++++++------------------- 5 files changed, 148 insertions(+), 59 deletions(-)
Comments
Hi Ilpo, On Thu, Dec 28, 2023 at 06:57:00PM +0200, Ilpo Järvinen wrote: > Hi all, > > Here's a series that contains two fixes to PCI bridge window sizing > algorithm. Together, they should enable remove & rescan cycle to work > for a PCI bus that has PCI devices with optional resources and/or > disparity in BAR sizes. > > For the second fix, I chose to expose find_empty_resource_slot() from > kernel/resource.c because it should increase accuracy of the cannot-fit > decision (currently that function is called find_resource()). In order > to do that sensibly, a few improvements seemed in order to make its > interface and name of the function sane before exposing it. Thus, the > few extra patches on resource side. > > Unfortunately I don't have a reason to suspect these would help with > the issues related to the currently ongoing resource regression > thread [1]. > > [1] https://lore.kernel.org/linux-pci/ZXpaNCLiDM+Kv38H@marvin.atrad.com.au/ > > v2: > - Add "typedef" to kerneldoc to get correct formatting > - Use RESOURCE_SIZE_MAX instead of literal > - Remove unnecessary checks for io{port/mem}_resource > - Apply a few style tweaks from Andy > > Ilpo Järvinen (7): > PCI: Fix resource double counting on remove & rescan > resource: Rename find_resource() to find_empty_resource_slot() > resource: Document find_empty_resource_slot() and resource_constraint > resource: Use typedef for alignf callback > resource: Handle simple alignment inside __find_empty_resource_slot() > resource: Export find_empty_resource_slot() > PCI: Relax bridge window tail sizing rules Thanks for doing this! :) All look good to me, Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
On Thu, 28 Dec 2023 18:57:00 +0200 Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > Hi all, > > Here's a series that contains two fixes to PCI bridge window sizing > algorithm. Together, they should enable remove & rescan cycle to work > for a PCI bus that has PCI devices with optional resources and/or > disparity in BAR sizes. > > For the second fix, I chose to expose find_empty_resource_slot() from > kernel/resource.c because it should increase accuracy of the cannot-fit > decision (currently that function is called find_resource()). In order > to do that sensibly, a few improvements seemed in order to make its > interface and name of the function sane before exposing it. Thus, the > few extra patches on resource side. > > Unfortunately I don't have a reason to suspect these would help with > the issues related to the currently ongoing resource regression > thread [1]. Jonathan, can you test this series on affected machine with broken kernel to see if it's of any help in your case? > > [1] https://lore.kernel.org/linux-pci/ZXpaNCLiDM+Kv38H@marvin.atrad.com.au/ > > v2: > - Add "typedef" to kerneldoc to get correct formatting > - Use RESOURCE_SIZE_MAX instead of literal > - Remove unnecessary checks for io{port/mem}_resource > - Apply a few style tweaks from Andy > > Ilpo Järvinen (7): > PCI: Fix resource double counting on remove & rescan > resource: Rename find_resource() to find_empty_resource_slot() > resource: Document find_empty_resource_slot() and resource_constraint > resource: Use typedef for alignf callback > resource: Handle simple alignment inside __find_empty_resource_slot() > resource: Export find_empty_resource_slot() > PCI: Relax bridge window tail sizing rules > > drivers/pci/bus.c | 10 ++---- > drivers/pci/setup-bus.c | 80 +++++++++++++++++++++++++++++++++++++---- > include/linux/ioport.h | 44 ++++++++++++++++++++--- > include/linux/pci.h | 5 +-- > kernel/resource.c | 68 ++++++++++++++++------------------- > 5 files changed, 148 insertions(+), 59 deletions(-) >
On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote: > On Thu, 28 Dec 2023 18:57:00 +0200 > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > Hi all, > > > > Here's a series that contains two fixes to PCI bridge window sizing > > algorithm. Together, they should enable remove & rescan cycle to work > > for a PCI bus that has PCI devices with optional resources and/or > > disparity in BAR sizes. > > > > For the second fix, I chose to expose find_empty_resource_slot() from > > kernel/resource.c because it should increase accuracy of the cannot-fit > > decision (currently that function is called find_resource()). In order > > to do that sensibly, a few improvements seemed in order to make its > > interface and name of the function sane before exposing it. Thus, the > > few extra patches on resource side. > > > > Unfortunately I don't have a reason to suspect these would help with > > the issues related to the currently ongoing resource regression > > thread [1]. > > Jonathan, > can you test this series on affected machine with broken kernel to see if > it's of any help in your case? Certainly, but it will have to wait until next Thursday (11 Jan 2024). I'm still on leave this week, and when at work I only have physical access to the machine concerned on Thursdays at present. Which kernel would you prefer I apply the series to? Regards jonathan
On Thu, Jan 04, 2024 at 10:48:53PM +1030, Jonathan Woithe wrote: > On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote: > > On Thu, 28 Dec 2023 18:57:00 +0200 > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > > > Hi all, > > > > > > Here's a series that contains two fixes to PCI bridge window sizing > > > algorithm. Together, they should enable remove & rescan cycle to work > > > for a PCI bus that has PCI devices with optional resources and/or > > > disparity in BAR sizes. > > > > > > For the second fix, I chose to expose find_empty_resource_slot() from > > > kernel/resource.c because it should increase accuracy of the cannot-fit > > > decision (currently that function is called find_resource()). In order > > > to do that sensibly, a few improvements seemed in order to make its > > > interface and name of the function sane before exposing it. Thus, the > > > few extra patches on resource side. > > > > > > Unfortunately I don't have a reason to suspect these would help with > > > the issues related to the currently ongoing resource regression > > > thread [1]. > > > > Jonathan, > > can you test this series on affected machine with broken kernel to see if > > it's of any help in your case? > > Certainly, but it will have to wait until next Thursday (11 Jan 2024). I'm > still on leave this week, and when at work I only have physical access to > the machine concerned on Thursdays at present. > > Which kernel would you prefer I apply the series to? I was very short of time today but I did apply the above series to the 5.15.y branch (since I had this source available), resulting in version 5.15.141+. Unfortunately, in the rush I forgot to do a clean after the bisect reset, so the resulting kernel was not correctly built. It booted but thought it was a different version and therefore none of the modules could be found. As a result, the test is invalid. I will try again in a week when I next have physical access to the system. Apologies for the delay. In the meantime, if there's a specific kernel I should apply the patch series against please let me know. As I understand it, you want it applied to one of the kernels which failed, making 5.15.y (for y < 145) a reasonable choice. Regards jonathan
On Thu, Jan 11, 2024 at 06:30:22PM +1030, Jonathan Woithe wrote: > On Thu, Jan 04, 2024 at 10:48:53PM +1030, Jonathan Woithe wrote: > > On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote: > > > On Thu, 28 Dec 2023 18:57:00 +0200 > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > > > > > Hi all, > > > > > > > > Here's a series that contains two fixes to PCI bridge window sizing > > > > algorithm. Together, they should enable remove & rescan cycle to work > > > > for a PCI bus that has PCI devices with optional resources and/or > > > > disparity in BAR sizes. > > > > > > > > For the second fix, I chose to expose find_empty_resource_slot() from > > > > kernel/resource.c because it should increase accuracy of the cannot-fit > > > > decision (currently that function is called find_resource()). In order > > > > to do that sensibly, a few improvements seemed in order to make its > > > > interface and name of the function sane before exposing it. Thus, the > > > > few extra patches on resource side. > > > > > > > > Unfortunately I don't have a reason to suspect these would help with > > > > the issues related to the currently ongoing resource regression > > > > thread [1]. > > > > > > Jonathan, > > > can you test this series on affected machine with broken kernel to see if > > > it's of any help in your case? > > > > Certainly, but it will have to wait until next Thursday (11 Jan 2024). I'm > > still on leave this week, and when at work I only have physical access to > > the machine concerned on Thursdays at present. > > > > Which kernel would you prefer I apply the series to? > > I was very short of time today but I did apply the above series to the > 5.15.y branch (since I had this source available), resulting in version > 5.15.141+. Unfortunately, in the rush I forgot to do a clean after the > bisect reset, so the resulting kernel was not correctly built. It booted > but thought it was a different version and therefore none of the modules > could be found. As a result, the test is invalid. > > I will try again in a week when I next have physical access to the system. > Apologies for the delay. In the meantime, if there's a specific kernel I > should apply the patch series against please let me know. As I understand > it, you want it applied to one of the kernels which failed, making 5.15.y > (for y < 145) a reasonable choice. I did a "make clean" to reset the source tree and recompiled. However, it errored out: drivers/pci/setup-bus.c:988:24: error: ‘RESOURCE_SIZE_MAX’ undeclared drivers/pci/setup-bus.c:998:17: error: ‘pci_bus_for_each_resource’ undeclared This was with the patch series applied against 5.15.141. It seems the patch targets a kernel that's too far removed from 5.15.x. Which kernel would you like me to apply the patch series to and test? Regards jonathan
On Thu, 18 Jan 2024, Jonathan Woithe wrote: > On Thu, Jan 11, 2024 at 06:30:22PM +1030, Jonathan Woithe wrote: > > On Thu, Jan 04, 2024 at 10:48:53PM +1030, Jonathan Woithe wrote: > > > On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote: > > > > On Thu, 28 Dec 2023 18:57:00 +0200 > > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > > > > > > > Hi all, > > > > > > > > > > Here's a series that contains two fixes to PCI bridge window sizing > > > > > algorithm. Together, they should enable remove & rescan cycle to work > > > > > for a PCI bus that has PCI devices with optional resources and/or > > > > > disparity in BAR sizes. > > > > > > > > > > For the second fix, I chose to expose find_empty_resource_slot() from > > > > > kernel/resource.c because it should increase accuracy of the cannot-fit > > > > > decision (currently that function is called find_resource()). In order > > > > > to do that sensibly, a few improvements seemed in order to make its > > > > > interface and name of the function sane before exposing it. Thus, the > > > > > few extra patches on resource side. > > > > > > > > > > Unfortunately I don't have a reason to suspect these would help with > > > > > the issues related to the currently ongoing resource regression > > > > > thread [1]. > > > > > > > > Jonathan, > > > > can you test this series on affected machine with broken kernel to see if > > > > it's of any help in your case? > > > > > > Certainly, but it will have to wait until next Thursday (11 Jan 2024). I'm > > > still on leave this week, and when at work I only have physical access to > > > the machine concerned on Thursdays at present. > > > > > > Which kernel would you prefer I apply the series to? > > > > I was very short of time today but I did apply the above series to the > > 5.15.y branch (since I had this source available), resulting in version > > 5.15.141+. Unfortunately, in the rush I forgot to do a clean after the > > bisect reset, so the resulting kernel was not correctly built. It booted > > but thought it was a different version and therefore none of the modules > > could be found. As a result, the test is invalid. > > > > I will try again in a week when I next have physical access to the system. > > Apologies for the delay. In the meantime, if there's a specific kernel I > > should apply the patch series against please let me know. As I understand > > it, you want it applied to one of the kernels which failed, making 5.15.y > > (for y < 145) a reasonable choice. > > I did a "make clean" to reset the source tree and recompiled. However, it > errored out: > > drivers/pci/setup-bus.c:988:24: error: ‘RESOURCE_SIZE_MAX’ undeclared > drivers/pci/setup-bus.c:998:17: error: ‘pci_bus_for_each_resource’ undeclared > > This was with the patch series applied against 5.15.141. It seems the patch > targets a kernel that's too far removed from 5.15.x. > > Which kernel would you like me to apply the patch series to and test? Two argument version of pci_bus_for_each_resource() is quite new (so either 6.6 or 6.7). If want to attempt to compile in 5.15.x, you need this: include/linux/limits.h:#define RESOURCE_SIZE_MAX ((resource_size_t)~0) And to add one extra argument into pci_bus_for_each_resource(bus, r) in pbus_upstream_assigned_limit(): ... while ((bus = bus->parent)) { + unsigned int i; if (pci_is_root_bus(bus)) break; - pci_bus_for_each_resource(bus, r) { + pci_bus_for_each_resource(bus, r, i) { Note I've written this "patch" by hand inline so patch command cannot apply it but you need to edit those in.
On Thu, Jan 18, 2024 at 05:18:45PM +1030, Jonathan Woithe wrote: > On Thu, Jan 11, 2024 at 06:30:22PM +1030, Jonathan Woithe wrote: > > On Thu, Jan 04, 2024 at 10:48:53PM +1030, Jonathan Woithe wrote: > > > On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote: > > > > On Thu, 28 Dec 2023 18:57:00 +0200 > > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > > > > > > > Hi all, > > > > > > > > > > Here's a series that contains two fixes to PCI bridge window sizing > > > > > algorithm. Together, they should enable remove & rescan cycle to work > > > > > for a PCI bus that has PCI devices with optional resources and/or > > > > > disparity in BAR sizes. > > > > > > > > > > For the second fix, I chose to expose find_empty_resource_slot() from > > > > > kernel/resource.c because it should increase accuracy of the cannot-fit > > > > > decision (currently that function is called find_resource()). In order > > > > > to do that sensibly, a few improvements seemed in order to make its > > > > > interface and name of the function sane before exposing it. Thus, the > > > > > few extra patches on resource side. > > > > > > > > > > Unfortunately I don't have a reason to suspect these would help with > > > > > the issues related to the currently ongoing resource regression > > > > > thread [1]. > > > > > > > > Jonathan, > > > > can you test this series on affected machine with broken kernel to see if > > > > it's of any help in your case? > > > > > > Certainly, but it will have to wait until next Thursday (11 Jan 2024). I'm > > > still on leave this week, and when at work I only have physical access to > > > the machine concerned on Thursdays at present. > > > > > > Which kernel would you prefer I apply the series to? > > > > I was very short of time today but I did apply the above series to the > > 5.15.y branch (since I had this source available), resulting in version > > 5.15.141+. Unfortunately, in the rush I forgot to do a clean after the > > bisect reset, so the resulting kernel was not correctly built. It booted > > but thought it was a different version and therefore none of the modules > > could be found. As a result, the test is invalid. > > > > I will try again in a week when I next have physical access to the system. > > Apologies for the delay. In the meantime, if there's a specific kernel I > > should apply the patch series against please let me know. As I understand > > it, you want it applied to one of the kernels which failed, making 5.15.y > > (for y < 145) a reasonable choice. > > I did a "make clean" to reset the source tree and recompiled. However, it > errored out: > > drivers/pci/setup-bus.c:988:24: error: ‘RESOURCE_SIZE_MAX’ undeclared > drivers/pci/setup-bus.c:998:17: error: ‘pci_bus_for_each_resource’ undeclared > > This was with the patch series applied against 5.15.141. It seems the patch > targets a kernel that's too far removed from 5.15.x. > > Which kernel would you like me to apply the patch series to and test? The rule of thumb is to test against latest vanilla (as of today v6.7). Also makes sense to test against Linux Next. The v5.15 is way too old for a new code.
On Sun, Jan 21, 2024 at 02:54:22PM +0200, Andy Shevchenko wrote: > On Thu, Jan 18, 2024 at 05:18:45PM +1030, Jonathan Woithe wrote: > > On Thu, Jan 11, 2024 at 06:30:22PM +1030, Jonathan Woithe wrote: > > > On Thu, Jan 04, 2024 at 10:48:53PM +1030, Jonathan Woithe wrote: > > > > On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote: > > > > > On Thu, 28 Dec 2023 18:57:00 +0200 > > > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > Here's a series that contains two fixes to PCI bridge window sizing > > > > > > algorithm. Together, they should enable remove & rescan cycle to work > > > > > > for a PCI bus that has PCI devices with optional resources and/or > > > > > > disparity in BAR sizes. > > > > > > > > > > > > For the second fix, I chose to expose find_empty_resource_slot() from > > > > > > kernel/resource.c because it should increase accuracy of the cannot-fit > > > > > > decision (currently that function is called find_resource()). In order > > > > > > to do that sensibly, a few improvements seemed in order to make its > > > > > > interface and name of the function sane before exposing it. Thus, the > > > > > > few extra patches on resource side. > > > > > > > > > > > > Unfortunately I don't have a reason to suspect these would help with > > > > > > the issues related to the currently ongoing resource regression > > > > > > thread [1]. > > > > > > > > > > Jonathan, > > > > > can you test this series on affected machine with broken kernel to see if > > > > > it's of any help in your case? > > > > > > > > Certainly, but it will have to wait until next Thursday (11 Jan 2024). I'm > > > > still on leave this week, and when at work I only have physical access to > > > > the machine concerned on Thursdays at present. > > > > > > > > Which kernel would you prefer I apply the series to? > > > > > > I was very short of time today but I did apply the above series to the > > > 5.15.y branch (since I had this source available), resulting in version > > > 5.15.141+. Unfortunately, in the rush I forgot to do a clean after the > > > bisect reset, so the resulting kernel was not correctly built. It booted > > > but thought it was a different version and therefore none of the modules > > > could be found. As a result, the test is invalid. > > > > > > I will try again in a week when I next have physical access to the system. > > > Apologies for the delay. In the meantime, if there's a specific kernel I > > > should apply the patch series against please let me know. As I understand > > > it, you want it applied to one of the kernels which failed, making 5.15.y > > > (for y < 145) a reasonable choice. > > > > I did a "make clean" to reset the source tree and recompiled. However, it > > errored out: > > > > drivers/pci/setup-bus.c:988:24: error: ‘RESOURCE_SIZE_MAX’ undeclared > > drivers/pci/setup-bus.c:998:17: error: ‘pci_bus_for_each_resource’ undeclared > > > > This was with the patch series applied against 5.15.141. It seems the patch > > targets a kernel that's too far removed from 5.15.x. > > > > Which kernel would you like me to apply the patch series to and test? > > The rule of thumb is to test against latest vanilla (as of today v6.7). > Also makes sense to test against Linux Next. The v5.15 is way too old for > a new code. Thanks, and understood. In this case the request from Igor was can you test this series on affected machine with broken kernel to see if it's of any help in your case? The latest vanilla kernel (6.7) has (AFAIK) had the offending commit reverted, so it's not a "broken" kernel in this respect. Therefore, if I've understood the request correctly, working with that kernel won't produce the desired test. Regards jonathan
On Mon, 22 Jan 2024, Jonathan Woithe wrote: > On Sun, Jan 21, 2024 at 02:54:22PM +0200, Andy Shevchenko wrote: > > On Thu, Jan 18, 2024 at 05:18:45PM +1030, Jonathan Woithe wrote: > > > On Thu, Jan 11, 2024 at 06:30:22PM +1030, Jonathan Woithe wrote: > > > > On Thu, Jan 04, 2024 at 10:48:53PM +1030, Jonathan Woithe wrote: > > > > > On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote: > > > > > > On Thu, 28 Dec 2023 18:57:00 +0200 > > > > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > Here's a series that contains two fixes to PCI bridge window sizing > > > > > > > algorithm. Together, they should enable remove & rescan cycle to work > > > > > > > for a PCI bus that has PCI devices with optional resources and/or > > > > > > > disparity in BAR sizes. > > > > > > > > > > > > > > For the second fix, I chose to expose find_empty_resource_slot() from > > > > > > > kernel/resource.c because it should increase accuracy of the cannot-fit > > > > > > > decision (currently that function is called find_resource()). In order > > > > > > > to do that sensibly, a few improvements seemed in order to make its > > > > > > > interface and name of the function sane before exposing it. Thus, the > > > > > > > few extra patches on resource side. > > > > > > > > > > > > > > Unfortunately I don't have a reason to suspect these would help with > > > > > > > the issues related to the currently ongoing resource regression > > > > > > > thread [1]. > > > > > > > > > > > > Jonathan, > > > > > > can you test this series on affected machine with broken kernel to see if > > > > > > it's of any help in your case? > > > > > > > > > > Certainly, but it will have to wait until next Thursday (11 Jan 2024). I'm > > > > > still on leave this week, and when at work I only have physical access to > > > > > the machine concerned on Thursdays at present. > > > > > > > > > > Which kernel would you prefer I apply the series to? > > > > > > > > I was very short of time today but I did apply the above series to the > > > > 5.15.y branch (since I had this source available), resulting in version > > > > 5.15.141+. Unfortunately, in the rush I forgot to do a clean after the > > > > bisect reset, so the resulting kernel was not correctly built. It booted > > > > but thought it was a different version and therefore none of the modules > > > > could be found. As a result, the test is invalid. > > > > > > > > I will try again in a week when I next have physical access to the system. > > > > Apologies for the delay. In the meantime, if there's a specific kernel I > > > > should apply the patch series against please let me know. As I understand > > > > it, you want it applied to one of the kernels which failed, making 5.15.y > > > > (for y < 145) a reasonable choice. > > > > > > I did a "make clean" to reset the source tree and recompiled. However, it > > > errored out: > > > > > > drivers/pci/setup-bus.c:988:24: error: ‘RESOURCE_SIZE_MAX’ undeclared > > > drivers/pci/setup-bus.c:998:17: error: ‘pci_bus_for_each_resource’ undeclared > > > > > > This was with the patch series applied against 5.15.141. It seems the patch > > > targets a kernel that's too far removed from 5.15.x. > > > > > > Which kernel would you like me to apply the patch series to and test? > > > > The rule of thumb is to test against latest vanilla (as of today v6.7). > > Also makes sense to test against Linux Next. The v5.15 is way too old for > > a new code. > > Thanks, and understood. In this case the request from Igor was > > can you test this series on affected machine with broken kernel to see if > it's of any help in your case? > > The latest vanilla kernel (6.7) has (AFAIK) had the offending commit > reverted, so it's not a "broken" kernel in this respect. Therefore, if I've > understood the request correctly, working with that kernel won't produce the > desired test. Well, you can revert the revert again to get back to the broken state.
On Mon, 22 Jan 2024 14:37:32 +0200 (EET) Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > On Mon, 22 Jan 2024, Jonathan Woithe wrote: > > > On Sun, Jan 21, 2024 at 02:54:22PM +0200, Andy Shevchenko wrote: > > > On Thu, Jan 18, 2024 at 05:18:45PM +1030, Jonathan Woithe wrote: > > > > On Thu, Jan 11, 2024 at 06:30:22PM +1030, Jonathan Woithe wrote: > > > > > On Thu, Jan 04, 2024 at 10:48:53PM +1030, Jonathan Woithe wrote: > > > > > > On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote: > > > > > > > On Thu, 28 Dec 2023 18:57:00 +0200 > > > > > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > Here's a series that contains two fixes to PCI bridge window sizing > > > > > > > > algorithm. Together, they should enable remove & rescan cycle to work > > > > > > > > for a PCI bus that has PCI devices with optional resources and/or > > > > > > > > disparity in BAR sizes. > > > > > > > > > > > > > > > > For the second fix, I chose to expose find_empty_resource_slot() from > > > > > > > > kernel/resource.c because it should increase accuracy of the cannot-fit > > > > > > > > decision (currently that function is called find_resource()). In order > > > > > > > > to do that sensibly, a few improvements seemed in order to make its > > > > > > > > interface and name of the function sane before exposing it. Thus, the > > > > > > > > few extra patches on resource side. > > > > > > > > > > > > > > > > Unfortunately I don't have a reason to suspect these would help with > > > > > > > > the issues related to the currently ongoing resource regression > > > > > > > > thread [1]. > > > > > > > > > > > > > > Jonathan, > > > > > > > can you test this series on affected machine with broken kernel to see if > > > > > > > it's of any help in your case? > > > > > > > > > > > > Certainly, but it will have to wait until next Thursday (11 Jan 2024). I'm > > > > > > still on leave this week, and when at work I only have physical access to > > > > > > the machine concerned on Thursdays at present. > > > > > > > > > > > > Which kernel would you prefer I apply the series to? > > > > > > > > > > I was very short of time today but I did apply the above series to the > > > > > 5.15.y branch (since I had this source available), resulting in version > > > > > 5.15.141+. Unfortunately, in the rush I forgot to do a clean after the > > > > > bisect reset, so the resulting kernel was not correctly built. It booted > > > > > but thought it was a different version and therefore none of the modules > > > > > could be found. As a result, the test is invalid. > > > > > > > > > > I will try again in a week when I next have physical access to the system. > > > > > Apologies for the delay. In the meantime, if there's a specific kernel I > > > > > should apply the patch series against please let me know. As I understand > > > > > it, you want it applied to one of the kernels which failed, making 5.15.y > > > > > (for y < 145) a reasonable choice. > > > > > > > > I did a "make clean" to reset the source tree and recompiled. However, it > > > > errored out: > > > > > > > > drivers/pci/setup-bus.c:988:24: error: ‘RESOURCE_SIZE_MAX’ undeclared > > > > drivers/pci/setup-bus.c:998:17: error: ‘pci_bus_for_each_resource’ undeclared > > > > > > > > This was with the patch series applied against 5.15.141. It seems the patch > > > > targets a kernel that's too far removed from 5.15.x. > > > > > > > > Which kernel would you like me to apply the patch series to and test? > > > > > > The rule of thumb is to test against latest vanilla (as of today v6.7). > > > Also makes sense to test against Linux Next. The v5.15 is way too old for > > > a new code. > > > > Thanks, and understood. In this case the request from Igor was > > > > can you test this series on affected machine with broken kernel to see if > > it's of any help in your case? > > > > The latest vanilla kernel (6.7) has (AFAIK) had the offending commit > > reverted, so it's not a "broken" kernel in this respect. Therefore, if I've > > understood the request correctly, working with that kernel won't produce the > > desired test. > > Well, you can revert the revert again to get back to the broken state. either this or just a hand patching as Ilpo has suggested earlier would do. There is non zero chance that this series might fix issues Jonathan is facing. i.e. failed resource reallocation which offending patches trigger. There are 2 different issues here, * 1st unwanted reallocation - it should happen but well that how current code works * 2nd failed reallocation - seemingly matches what this series is trying to fix and if it doesn't help we would need to dig some more in this direction as well to figure out why it fails.
On Mon, Jan 22, 2024 at 02:45:20PM +0100, Igor Mammedov wrote: > On Mon, 22 Jan 2024 14:37:32 +0200 (EET) > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > On Mon, 22 Jan 2024, Jonathan Woithe wrote: > > > > > On Sun, Jan 21, 2024 at 02:54:22PM +0200, Andy Shevchenko wrote: > > > > On Thu, Jan 18, 2024 at 05:18:45PM +1030, Jonathan Woithe wrote: > > > > > On Thu, Jan 11, 2024 at 06:30:22PM +1030, Jonathan Woithe wrote: > > > > > > On Thu, Jan 04, 2024 at 10:48:53PM +1030, Jonathan Woithe wrote: > > > > > > > On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote: > > > > > > > > On Thu, 28 Dec 2023 18:57:00 +0200 > > > > > > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > Here's a series that contains two fixes to PCI bridge window sizing > > > > > > > > > algorithm. Together, they should enable remove & rescan cycle to work > > > > > > > > > for a PCI bus that has PCI devices with optional resources and/or > > > > > > > > > disparity in BAR sizes. > > > > > > > > > > > > > > > > > > For the second fix, I chose to expose find_empty_resource_slot() from > > > > > > > > > kernel/resource.c because it should increase accuracy of the cannot-fit > > > > > > > > > decision (currently that function is called find_resource()). In order > > > > > > > > > to do that sensibly, a few improvements seemed in order to make its > > > > > > > > > interface and name of the function sane before exposing it. Thus, the > > > > > > > > > few extra patches on resource side. > > > > > > > > > > > > > > > > > > Unfortunately I don't have a reason to suspect these would help with > > > > > > > > > the issues related to the currently ongoing resource regression > > > > > > > > > thread [1]. > > > > > > > > > > > > > > > > Jonathan, > > > > > > > > can you test this series on affected machine with broken kernel to see if > > > > > > > > it's of any help in your case? > > > > > > > > > > > > > > Certainly, but it will have to wait until next Thursday (11 Jan 2024). I'm > > > > > > > still on leave this week, and when at work I only have physical access to > > > > > > > the machine concerned on Thursdays at present. > > > > > > > > > > > > > > Which kernel would you prefer I apply the series to? > > > > > > > > > > > > I was very short of time today but I did apply the above series to the > > > > > > 5.15.y branch (since I had this source available), resulting in version > > > > > > 5.15.141+. Unfortunately, in the rush I forgot to do a clean after the > > > > > > bisect reset, so the resulting kernel was not correctly built. It booted > > > > > > but thought it was a different version and therefore none of the modules > > > > > > could be found. As a result, the test is invalid. > > > > > > > > > > > > I will try again in a week when I next have physical access to the system. > > > > > > Apologies for the delay. In the meantime, if there's a specific kernel I > > > > > > should apply the patch series against please let me know. As I understand > > > > > > it, you want it applied to one of the kernels which failed, making 5.15.y > > > > > > (for y < 145) a reasonable choice. > > > > > > > > > > I did a "make clean" to reset the source tree and recompiled. However, it > > > > > errored out: > > > > > > > > > > drivers/pci/setup-bus.c:988:24: error: ‘RESOURCE_SIZE_MAX’ undeclared > > > > > drivers/pci/setup-bus.c:998:17: error: ‘pci_bus_for_each_resource’ undeclared > > > > > > > > > > This was with the patch series applied against 5.15.141. It seems the patch > > > > > targets a kernel that's too far removed from 5.15.x. > > > > > > > > > > Which kernel would you like me to apply the patch series to and test? > > > > > > > > The rule of thumb is to test against latest vanilla (as of today v6.7). > > > > Also makes sense to test against Linux Next. The v5.15 is way too old for > > > > a new code. > > > > > > Thanks, and understood. In this case the request from Igor was > > > > > > can you test this series on affected machine with broken kernel to see if > > > it's of any help in your case? > > > > > > The latest vanilla kernel (6.7) has (AFAIK) had the offending commit > > > reverted, so it's not a "broken" kernel in this respect. Therefore, if I've > > > understood the request correctly, working with that kernel won't produce the > > > desired test. > > > > Well, you can revert the revert again to get back to the broken state. > > either this or just a hand patching as Ilpo has suggested earlier > would do. No problem. This was the easiest approach for me and I have now done this. Apologies for the delay in getting to this: I ran out of time last Thursday. > There is non zero chance that this series might fix issues > Jonathan is facing. i.e. failed resource reallocation which > offending patches trigger. I can confirm that as expected, this patch series has had no effect on the system which experiences the failed resource reallocation. From syslog, running a 5.15.141+ kernel[1]: kernel: radeon 0000:4b:00.0: Fatal error during GPU init kernel: radeon: probe of 0000:4b:00.0 failed with error -12 This is unchanged from what is seen with the unaltered 5.15.141 kernel. In case it's important, can also confirm that the errors related to the thunderbolt device are are also still present in the patched 5.15.141+ kernel: thunderbolt 0000:04:00.0: interrupt for TX ring 0 is already enabled : thunderbolt 0000:04:00.0: interrupt for RX ring 0 is already enabled : Like the GPU failure, they do not appear in the working kernels on this system. Let me know if you would like to me to run further tests. Regards jonathan [1] This is 5.15.141, patched with the series of interest here and the hand patch from Ilpo.
On Thu, 1 Feb 2024, Jonathan Woithe wrote: > On Mon, Jan 22, 2024 at 02:45:20PM +0100, Igor Mammedov wrote: > > On Mon, 22 Jan 2024 14:37:32 +0200 (EET) > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > > > On Mon, 22 Jan 2024, Jonathan Woithe wrote: > > > > > > > On Sun, Jan 21, 2024 at 02:54:22PM +0200, Andy Shevchenko wrote: > > > > > On Thu, Jan 18, 2024 at 05:18:45PM +1030, Jonathan Woithe wrote: > > > > > > On Thu, Jan 11, 2024 at 06:30:22PM +1030, Jonathan Woithe wrote: > > > > > > > On Thu, Jan 04, 2024 at 10:48:53PM +1030, Jonathan Woithe wrote: > > > > > > > > On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote: > > > > > > > > > On Thu, 28 Dec 2023 18:57:00 +0200 > > > > > > > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > Here's a series that contains two fixes to PCI bridge window sizing > > > > > > > > > > algorithm. Together, they should enable remove & rescan cycle to work > > > > > > > > > > for a PCI bus that has PCI devices with optional resources and/or > > > > > > > > > > disparity in BAR sizes. > > > > > > > > > > > > > > > > > > > > For the second fix, I chose to expose find_empty_resource_slot() from > > > > > > > > > > kernel/resource.c because it should increase accuracy of the cannot-fit > > > > > > > > > > decision (currently that function is called find_resource()). In order > > > > > > > > > > to do that sensibly, a few improvements seemed in order to make its > > > > > > > > > > interface and name of the function sane before exposing it. Thus, the > > > > > > > > > > few extra patches on resource side. > > > > > > > > > > > > > > > > > > > > Unfortunately I don't have a reason to suspect these would help with > > > > > > > > > > the issues related to the currently ongoing resource regression > > > > > > > > > > thread [1]. > > > > Thanks, and understood. In this case the request from Igor was > > > > > > > > can you test this series on affected machine with broken kernel to see if > > > > it's of any help in your case? > > > > > > > > The latest vanilla kernel (6.7) has (AFAIK) had the offending commit > > > > reverted, so it's not a "broken" kernel in this respect. Therefore, if I've > > > > understood the request correctly, working with that kernel won't produce the > > > > desired test. > > > > > > Well, you can revert the revert again to get back to the broken state. > > > > either this or just a hand patching as Ilpo has suggested earlier > > would do. > > No problem. This was the easiest approach for me and I have now done this. > Apologies for the delay in getting to this: I ran out of time last Thursday. > > > There is non zero chance that this series might fix issues > > Jonathan is facing. i.e. failed resource reallocation which > > offending patches trigger. > > I can confirm that as expected, this patch series has had no effect on the > system which experiences the failed resource reallocation. From syslog, > running a 5.15.141+ kernel[1]: > > kernel: radeon 0000:4b:00.0: Fatal error during GPU init > kernel: radeon: probe of 0000:4b:00.0 failed with error -12 > > This is unchanged from what is seen with the unaltered 5.15.141 kernel. > > In case it's important, can also confirm that the errors related to the > thunderbolt device are are also still present in the patched 5.15.141+ > kernel: > > thunderbolt 0000:04:00.0: interrupt for TX ring 0 is already enabled > : > thunderbolt 0000:04:00.0: interrupt for RX ring 0 is already enabled > : > > Like the GPU failure, they do not appear in the working kernels on this > system. > > Let me know if you would like to me to run further tests. > > Regards > jonathan > > [1] This is 5.15.141, patched with the series of interest here and the hand > patch from Ilpo. Hi Jonathan, Thanks a lot for testing it regardless. The end result was not a big surprise given how it looked like based on the logs but was certainly worth a test like Igor mentioned. The resource allocation code isn't among the easiest to track.