[v3,05/10] PCI: Store all PCIe Supported Link Speeds

Message ID 20230929115723.7864-6-ilpo.jarvinen@linux.intel.com
State New
Headers
Series Add PCIe Bandwidth Controller |

Commit Message

Ilpo Järvinen Sept. 29, 2023, 11:57 a.m. UTC
  struct pci_bus stores max_bus_speed. Implementation Note in PCIe r6.0.1
sec 7.5.3.18, however, recommends determining supported Link Speeds
using the Supported Link Speeds Vector in the Link Capabilities 2
Register (when available).

Add pcie_bus_speeds into struct pci_bus which caches the Supported Link
Speeds. The value is taken directly from the Supported Link Speeds
Vector or synthetized from the Max Link Speed in the Link Capabilities
Register when the Link Capabilities 2 Register is not available.

pcie_bus_speeds field keeps the extra reserved zero at the least
significant bit to match the Link Capabilities 2 Register layouting.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/probe.c           | 28 +++++++++++++++++++++++++++-
 include/linux/pci.h           |  1 +
 include/uapi/linux/pci_regs.h |  1 +
 3 files changed, 29 insertions(+), 1 deletion(-)
  

Comments

Lukas Wunner Dec. 30, 2023, 11:45 a.m. UTC | #1
On Fri, Sep 29, 2023 at 02:57:18PM +0300, Ilpo Järvinen wrote:
> struct pci_bus stores max_bus_speed. Implementation Note in PCIe r6.0.1
> sec 7.5.3.18, however, recommends determining supported Link Speeds
> using the Supported Link Speeds Vector in the Link Capabilities 2
> Register (when available).
> 
> Add pcie_bus_speeds into struct pci_bus which caches the Supported Link
> Speeds. The value is taken directly from the Supported Link Speeds
> Vector or synthetized from the Max Link Speed in the Link Capabilities
> Register when the Link Capabilities 2 Register is not available.

Remind me, what's the reason again to cache this and why is
max_bus_speed not sufficient?  Is the point that there may be
"gaps" in the supported link speeds, i.e. not every bit below
the maximum supported speed may be set?  And you need to skip
over those gaps when throttling to a lower speed?

Maybe this becomes apparent in a later patch but from a reviewer's
perspective starting at patch 1 and working one's way forward through
the series, it's a bit puzzling, so an explanation in the commit
message would be beneficial.


> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
[...]
> +static u8 pcie_get_supported_speeds(u32 linkcap, u32 linkcap2)
> +{
> +	u8 speeds;
> +
> +	speeds = linkcap2 & PCI_EXP_LNKCAP2_SLS;
> +	if (speeds)
> +		return speeds;
> +
> +	/*
> +	 * Synthetize supported link speeds from the Max Link Speed in the
> +	 * Link Capabilities Register.
> +	 */
> +	speeds = PCI_EXP_LNKCAP2_SLS_2_5GB;
> +	if ((linkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
> +		speeds |= PCI_EXP_LNKCAP2_SLS_5_0GB;
> +	return speeds;
> +}

This seems to duplicate portions of pcie_get_speed_cap().

Can you refactor that function to take advantage of the cached value,
i.e. basically return PCIE_LNKCAP2_SLS2SPEED(dev->bus->pcie_bus_speeds)?

Also, I note that pci_set_bus_speed() doesn't use LNKCAP2.
Presumably that's a historic artefact but maybe it can be
converted to use LNKCAP2 as well.  Granted, it's not directly
related to this series, but always nice to clean up and
rationalize the code.

Thanks,

Lukas
  
Lukas Wunner Dec. 30, 2023, 7:30 p.m. UTC | #2
On Sat, Dec 30, 2023 at 12:45:49PM +0100, Lukas Wunner wrote:
> On Fri, Sep 29, 2023 at 02:57:18PM +0300, Ilpo Järvinen wrote:
> > struct pci_bus stores max_bus_speed. Implementation Note in PCIe r6.0.1
> > sec 7.5.3.18, however, recommends determining supported Link Speeds
> > using the Supported Link Speeds Vector in the Link Capabilities 2
> > Register (when available).
> > 
> > Add pcie_bus_speeds into struct pci_bus which caches the Supported Link
> > Speeds. The value is taken directly from the Supported Link Speeds
> > Vector or synthetized from the Max Link Speed in the Link Capabilities
> > Register when the Link Capabilities 2 Register is not available.
> 
> Remind me, what's the reason again to cache this and why is
> max_bus_speed not sufficient?  Is the point that there may be
> "gaps" in the supported link speeds, i.e. not every bit below
> the maximum supported speed may be set?  And you need to skip
> over those gaps when throttling to a lower speed?

FWIW I went and re-read the internal review I provided on May 18.
Turns out I already mentioned back then that gaps aren't permitted:

 "Per PCIe r6.0.1 sec 8.2.1, the bitfield in the Link Capabilities 2
  register is not permitted to contain gaps between maximum supported
  speed and lowest possible speed (2.5 GT/s Gen1)."


> Also, I note that pci_set_bus_speed() doesn't use LNKCAP2.

About that, I wrote in May:

 "Actually, scratch that.  pci_set_bus_speed() is fine.  Since it's only
  interested in the *maximum* link speed, reading just LnkCap is correct.
  LnkCap2 only needs to be read to determine if a certain speed is
  *supported*.  E.g., even though 32 GT/s are supported, perhaps 16 GT/s
  are not.

  It's rather pcie_get_speed_cap() which should be changed.  There's
  no need for it to read LnkCap2.  The commit which introduced this,
  6cf57be0f78e, was misguided and had to be fixed up with f1f90e254e46.
  It could be simplified to just read LnkCap and return
  pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS].  If the device is a
  Root Port or Downstream Port, it doesn't even have to do that but
  could return the cached value in subordinate->max_bus_speed.
  If you add another attribute to struct pci_bus for the downstream
  device's maximum speed, the maximum speed for Endpoints and Upstream
  Ports could be returned directly as well from that attribute."

Thanks,

Lukas
  
Ilpo Järvinen Jan. 1, 2024, 4:26 p.m. UTC | #3
On Sat, 30 Dec 2023, Lukas Wunner wrote:

> On Sat, Dec 30, 2023 at 12:45:49PM +0100, Lukas Wunner wrote:
> > On Fri, Sep 29, 2023 at 02:57:18PM +0300, Ilpo Järvinen wrote:
> > > struct pci_bus stores max_bus_speed. Implementation Note in PCIe r6.0.1
> > > sec 7.5.3.18, however, recommends determining supported Link Speeds
> > > using the Supported Link Speeds Vector in the Link Capabilities 2
> > > Register (when available).
> > > 
> > > Add pcie_bus_speeds into struct pci_bus which caches the Supported Link
> > > Speeds. The value is taken directly from the Supported Link Speeds
> > > Vector or synthetized from the Max Link Speed in the Link Capabilities
> > > Register when the Link Capabilities 2 Register is not available.
> > 
> > Remind me, what's the reason again to cache this and why is
> > max_bus_speed not sufficient?  Is the point that there may be
> > "gaps" in the supported link speeds, i.e. not every bit below
> > the maximum supported speed may be set?  And you need to skip
> > over those gaps when throttling to a lower speed?
> 
> FWIW I went and re-read the internal review I provided on May 18.
> Turns out I already mentioned back then that gaps aren't permitted:
> 
>  "Per PCIe r6.0.1 sec 8.2.1, the bitfield in the Link Capabilities 2
>   register is not permitted to contain gaps between maximum supported
>   speed and lowest possible speed (2.5 GT/s Gen1)."
> 
> 
> > Also, I note that pci_set_bus_speed() doesn't use LNKCAP2.
> 
> About that, I wrote in May:
> 
>  "Actually, scratch that.  pci_set_bus_speed() is fine.  Since it's only
>   interested in the *maximum* link speed, reading just LnkCap is correct.
>   LnkCap2 only needs to be read to determine if a certain speed is
>   *supported*.  E.g., even though 32 GT/s are supported, perhaps 16 GT/s
>   are not.
> 
>   It's rather pcie_get_speed_cap() which should be changed.  There's
>   no need for it to read LnkCap2.  The commit which introduced this,
>   6cf57be0f78e, was misguided and had to be fixed up with f1f90e254e46.
>   It could be simplified to just read LnkCap and return
>   pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS].  If the device is a
>   Root Port or Downstream Port, it doesn't even have to do that but
>   could return the cached value in subordinate->max_bus_speed.
>   If you add another attribute to struct pci_bus for the downstream
>   device's maximum speed, the maximum speed for Endpoints and Upstream
>   Ports could be returned directly as well from that attribute."

I know it's quite far back so it's understandable to forget :-), 
but already by May 23rd your position had changed and you wrote this:

'Per the Implementation Note at the end of PCIe r6.0.1 sec 7.5.3.18,

   "It is strongly encouraged that software primarily utilize the
    Supported Link Speeds Vector instead of the Max Link Speed field,
    so that software can determine the exact set of supported speeds on
    current and future hardware. This can avoid software being confused
    if a future specification defines Links that do not require support
    for all slower speeds."

This means that it's not sufficient if you just check that the desired 
speed is lower than the maximum.  Instead, you should check if the bit 
corresponding to the desired speed is set in the LnkCap2 register's 
Supported Link Speeds Vector.

PCIe r6.0.1 sec 8.2.1 stipulates that the bitfield is not permitted to 
contain gaps between maximum supported speed and lowest possible speed
(2.5 GT/s Gen1).  However the Implementation Note suggests that rule may 
no longer apply in future revisions of the PCIe Base Spec.'

So I'd assume I should still follow the way spec recommends, not the "old 
method" that may not function correctly after some future version of the 
spec, or have you really changed position once again on this?
  
Lukas Wunner Jan. 1, 2024, 4:40 p.m. UTC | #4
On Mon, Jan 01, 2024 at 06:26:40PM +0200, Ilpo Järvinen wrote:
> On Sat, 30 Dec 2023, Lukas Wunner wrote:
> 
> > On Sat, Dec 30, 2023 at 12:45:49PM +0100, Lukas Wunner wrote:
> > > On Fri, Sep 29, 2023 at 02:57:18PM +0300, Ilpo Järvinen wrote:
> > > > struct pci_bus stores max_bus_speed. Implementation Note in PCIe r6.0.1
> > > > sec 7.5.3.18, however, recommends determining supported Link Speeds
> > > > using the Supported Link Speeds Vector in the Link Capabilities 2
> > > > Register (when available).
> > > > 
> > > > Add pcie_bus_speeds into struct pci_bus which caches the Supported Link
> > > > Speeds. The value is taken directly from the Supported Link Speeds
> > > > Vector or synthetized from the Max Link Speed in the Link Capabilities
> > > > Register when the Link Capabilities 2 Register is not available.
> > > 
> > > Remind me, what's the reason again to cache this and why is
> > > max_bus_speed not sufficient?  Is the point that there may be
> > > "gaps" in the supported link speeds, i.e. not every bit below
> > > the maximum supported speed may be set?  And you need to skip
> > > over those gaps when throttling to a lower speed?
> > 
> > FWIW I went and re-read the internal review I provided on May 18.
> > Turns out I already mentioned back then that gaps aren't permitted:
> > 
> >  "Per PCIe r6.0.1 sec 8.2.1, the bitfield in the Link Capabilities 2
> >   register is not permitted to contain gaps between maximum supported
> >   speed and lowest possible speed (2.5 GT/s Gen1)."
> > 
> > 
> > > Also, I note that pci_set_bus_speed() doesn't use LNKCAP2.
> > 
> > About that, I wrote in May:
> > 
> >  "Actually, scratch that.  pci_set_bus_speed() is fine.  Since it's only
> >   interested in the *maximum* link speed, reading just LnkCap is correct.
> >   LnkCap2 only needs to be read to determine if a certain speed is
> >   *supported*.  E.g., even though 32 GT/s are supported, perhaps 16 GT/s
> >   are not.
> > 
> >   It's rather pcie_get_speed_cap() which should be changed.  There's
> >   no need for it to read LnkCap2.  The commit which introduced this,
> >   6cf57be0f78e, was misguided and had to be fixed up with f1f90e254e46.
> >   It could be simplified to just read LnkCap and return
> >   pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS].  If the device is a
> >   Root Port or Downstream Port, it doesn't even have to do that but
> >   could return the cached value in subordinate->max_bus_speed.
> >   If you add another attribute to struct pci_bus for the downstream
> >   device's maximum speed, the maximum speed for Endpoints and Upstream
> >   Ports could be returned directly as well from that attribute."
> 
> I know it's quite far back so it's understandable to forget :-), 
> but already by May 23rd your position had changed and you wrote this:
> 
> 'Per the Implementation Note at the end of PCIe r6.0.1 sec 7.5.3.18,
> 
>    "It is strongly encouraged that software primarily utilize the
>     Supported Link Speeds Vector instead of the Max Link Speed field,
>     so that software can determine the exact set of supported speeds on
>     current and future hardware. This can avoid software being confused
>     if a future specification defines Links that do not require support
>     for all slower speeds."
> 
> This means that it's not sufficient if you just check that the desired 
> speed is lower than the maximum.  Instead, you should check if the bit 
> corresponding to the desired speed is set in the LnkCap2 register's 
> Supported Link Speeds Vector.
> 
> PCIe r6.0.1 sec 8.2.1 stipulates that the bitfield is not permitted to 
> contain gaps between maximum supported speed and lowest possible speed
> (2.5 GT/s Gen1).  However the Implementation Note suggests that rule may 
> no longer apply in future revisions of the PCIe Base Spec.'
> 
> So I'd assume I should still follow the way spec recommends, not the "old 
> method" that may not function correctly after some future version of the 
> spec, or have you really changed position once again on this?

I haven't, you're right, I forgot about all those details.
Thanks for that blast from the past. ;)

But it would be good to extend the commit message because without all
that context, it's difficult to understand why the max_bus_speed isn't
sufficient.

Lukas
  
Ilpo Järvinen Jan. 1, 2024, 4:53 p.m. UTC | #5
On Mon, 1 Jan 2024, Lukas Wunner wrote:

> On Mon, Jan 01, 2024 at 06:26:40PM +0200, Ilpo Järvinen wrote:
> > On Sat, 30 Dec 2023, Lukas Wunner wrote:
> > 
> > > On Sat, Dec 30, 2023 at 12:45:49PM +0100, Lukas Wunner wrote:
> > > > On Fri, Sep 29, 2023 at 02:57:18PM +0300, Ilpo Järvinen wrote:
> > > > > struct pci_bus stores max_bus_speed. Implementation Note in PCIe r6.0.1
> > > > > sec 7.5.3.18, however, recommends determining supported Link Speeds
> > > > > using the Supported Link Speeds Vector in the Link Capabilities 2
> > > > > Register (when available).
> > > > > 
> > > > > Add pcie_bus_speeds into struct pci_bus which caches the Supported Link
> > > > > Speeds. The value is taken directly from the Supported Link Speeds
> > > > > Vector or synthetized from the Max Link Speed in the Link Capabilities
> > > > > Register when the Link Capabilities 2 Register is not available.
> > > > 
> > > > Remind me, what's the reason again to cache this and why is
> > > > max_bus_speed not sufficient?  Is the point that there may be
> > > > "gaps" in the supported link speeds, i.e. not every bit below
> > > > the maximum supported speed may be set?  And you need to skip
> > > > over those gaps when throttling to a lower speed?
> > > 
> > > FWIW I went and re-read the internal review I provided on May 18.
> > > Turns out I already mentioned back then that gaps aren't permitted:
> > > 
> > >  "Per PCIe r6.0.1 sec 8.2.1, the bitfield in the Link Capabilities 2
> > >   register is not permitted to contain gaps between maximum supported
> > >   speed and lowest possible speed (2.5 GT/s Gen1)."
> > > 
> > > 
> > > > Also, I note that pci_set_bus_speed() doesn't use LNKCAP2.
> > > 
> > > About that, I wrote in May:
> > > 
> > >  "Actually, scratch that.  pci_set_bus_speed() is fine.  Since it's only
> > >   interested in the *maximum* link speed, reading just LnkCap is correct.
> > >   LnkCap2 only needs to be read to determine if a certain speed is
> > >   *supported*.  E.g., even though 32 GT/s are supported, perhaps 16 GT/s
> > >   are not.
> > > 
> > >   It's rather pcie_get_speed_cap() which should be changed.  There's
> > >   no need for it to read LnkCap2.  The commit which introduced this,
> > >   6cf57be0f78e, was misguided and had to be fixed up with f1f90e254e46.
> > >   It could be simplified to just read LnkCap and return
> > >   pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS].  If the device is a
> > >   Root Port or Downstream Port, it doesn't even have to do that but
> > >   could return the cached value in subordinate->max_bus_speed.
> > >   If you add another attribute to struct pci_bus for the downstream
> > >   device's maximum speed, the maximum speed for Endpoints and Upstream
> > >   Ports could be returned directly as well from that attribute."
> > 
> > I know it's quite far back so it's understandable to forget :-), 
> > but already by May 23rd your position had changed and you wrote this:
> > 
> > 'Per the Implementation Note at the end of PCIe r6.0.1 sec 7.5.3.18,
> > 
> >    "It is strongly encouraged that software primarily utilize the
> >     Supported Link Speeds Vector instead of the Max Link Speed field,
> >     so that software can determine the exact set of supported speeds on
> >     current and future hardware. This can avoid software being confused
> >     if a future specification defines Links that do not require support
> >     for all slower speeds."
> > 
> > This means that it's not sufficient if you just check that the desired 
> > speed is lower than the maximum.  Instead, you should check if the bit 
> > corresponding to the desired speed is set in the LnkCap2 register's 
> > Supported Link Speeds Vector.
> > 
> > PCIe r6.0.1 sec 8.2.1 stipulates that the bitfield is not permitted to 
> > contain gaps between maximum supported speed and lowest possible speed
> > (2.5 GT/s Gen1).  However the Implementation Note suggests that rule may 
> > no longer apply in future revisions of the PCIe Base Spec.'
> > 
> > So I'd assume I should still follow the way spec recommends, not the "old 
> > method" that may not function correctly after some future version of the 
> > spec, or have you really changed position once again on this?
> 
> I haven't, you're right, I forgot about all those details.
> Thanks for that blast from the past. ;)
>
> But it would be good to extend the commit message because without all
> that context, it's difficult to understand why the max_bus_speed isn't
> sufficient.

Thanks. I'll extend the commit message.
  

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 795534589b98..ca1d797a30cb 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -767,6 +767,29 @@  static enum pci_bus_speed agp_speed(int agp3, int agpstat)
 	return agp_speeds[index];
 }
 
+/*
+ * Implementation Note in PCIe r6.0.1 sec 7.5.3.18 recommends determining
+ * supported link speeds using the Supported Link Speeds Vector in the Link
+ * Capabilities 2 Register (when available).
+ */
+static u8 pcie_get_supported_speeds(u32 linkcap, u32 linkcap2)
+{
+	u8 speeds;
+
+	speeds = linkcap2 & PCI_EXP_LNKCAP2_SLS;
+	if (speeds)
+		return speeds;
+
+	/*
+	 * Synthetize supported link speeds from the Max Link Speed in the
+	 * Link Capabilities Register.
+	 */
+	speeds = PCI_EXP_LNKCAP2_SLS_2_5GB;
+	if ((linkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
+		speeds |= PCI_EXP_LNKCAP2_SLS_5_0GB;
+	return speeds;
+}
+
 static void pci_set_bus_speed(struct pci_bus *bus)
 {
 	struct pci_dev *bridge = bus->self;
@@ -814,12 +837,15 @@  static void pci_set_bus_speed(struct pci_bus *bus)
 	}
 
 	if (pci_is_pcie(bridge)) {
-		u32 linkcap;
+		u32 linkcap, linkcap2;
 		u16 linksta;
 
 		pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap);
 		bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS];
 
+		pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP2, &linkcap2);
+		bus->pcie_bus_speeds = pcie_get_supported_speeds(linkcap, linkcap2);
+
 		pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
 		pcie_update_link_speed(bus, linksta);
 	}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 16db80f8b15c..cb03f3ff9d23 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -664,6 +664,7 @@  struct pci_bus {
 	unsigned char	primary;	/* Number of primary bridge */
 	unsigned char	max_bus_speed;	/* enum pci_bus_speed */
 	unsigned char	cur_bus_speed;	/* enum pci_bus_speed */
+	u8		pcie_bus_speeds;/* Supported Link Speeds Vector (+ reserved 0 at LSB) */
 #ifdef CONFIG_PCI_DOMAINS_GENERIC
 	int		domain_nr;
 #endif
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index e5f558d96493..2b27e4f6854a 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -674,6 +674,7 @@ 
 #define PCI_EXP_DEVSTA2		0x2a	/* Device Status 2 */
 #define PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V2 0x2c	/* end of v2 EPs w/o link */
 #define PCI_EXP_LNKCAP2		0x2c	/* Link Capabilities 2 */
+#define  PCI_EXP_LNKCAP2_SLS		0x000000fe /* Supported Link Speeds Vector */
 #define  PCI_EXP_LNKCAP2_SLS_2_5GB	0x00000002 /* Supported Speed 2.5GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_5_0GB	0x00000004 /* Supported Speed 5GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_8_0GB	0x00000008 /* Supported Speed 8GT/s */