PCI/DOE: Expose the DOE protocols via sysfs

Message ID 20230725035755.2621507-1-alistair.francis@wdc.com
State New
Headers
Series PCI/DOE: Expose the DOE protocols via sysfs |

Commit Message

Alistair Francis July 25, 2023, 3:57 a.m. UTC
  The PCIe 6 specification added support for the Data Object Exchange (DOE).
When DOE is supported the Discovery Data Object Protocol must be
implemented. The protocol allows a requester to obtain information about
the other DOE protocols supported by the device.

The kernel is already querying the DOE protocols supported and cacheing
the values. This patch exposes the values via sysfs. This will allow
userspace to determine which DOE protocols are supported by the PCIe
device.

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 drivers/pci/doe.c       | 28 ++++++++++++++++++++++++++++
 drivers/pci/pci-sysfs.c | 27 +++++++++++++++++++++++++++
 include/linux/pci-doe.h |  2 ++
 3 files changed, 57 insertions(+)
  

Comments

Lukas Wunner July 25, 2023, 4:30 p.m. UTC | #1
On Tue, Jul 25, 2023 at 01:57:55PM +1000, Alistair Francis wrote:
> The PCIe 6 specification added support for the Data Object Exchange (DOE).
> When DOE is supported the Discovery Data Object Protocol must be
> implemented. The protocol allows a requester to obtain information about
> the other DOE protocols supported by the device.
> 
> The kernel is already querying the DOE protocols supported and cacheing
> the values. This patch exposes the values via sysfs. This will allow
> userspace to determine which DOE protocols are supported by the PCIe
> device.

Just dumping the list of supported protocols into dmesg might be simpler,
unless you intend to add mechanisms to actually use certain DOE mailboxes
from user space or expose the information in lspci.  Do have plans for
either of that or what's the motivation to use sysfs?

I think I'd rather want everything in doe.c (#ifdef'ed to CONFIG_SYSFS)
and only make dev_attr_doe_proto public.

Thanks,

Lukas
  
Alistair Francis July 31, 2023, 3:24 p.m. UTC | #2
On Tue, Jul 25, 2023 at 11:00 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Tue, 25 Jul 2023 13:57:55 +1000
> Alistair Francis <alistair23@gmail.com> wrote:
>
> > The PCIe 6 specification added support for the Data Object Exchange (DOE).
> > When DOE is supported the Discovery Data Object Protocol must be
> > implemented. The protocol allows a requester to obtain information about
> > the other DOE protocols supported by the device.
> >
> > The kernel is already querying the DOE protocols supported and cacheing
> > the values. This patch exposes the values via sysfs. This will allow
> > userspace to determine which DOE protocols are supported by the PCIe
> > device.
> >
> > Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
>
> Hi Alistair,
>
> Needs documentation.
> Documentation/ABI/testing/sys-bus-pci probably the right place.

Ok, I'll add some

>
> Also, I wonder if we want to associate each DOE with the protocols
> it supports rather than clumping them together in one file...

In the future there could be vendor IDs or even standardised IDs that
the current kernel doesn't support/understand. So my thinking was that
by exposing all of the supported protocols userspace can use that
information as it wishes.

Then I wanted to work on supporting specific DOE protocols in the
kernel and exposing that to userspace. But I think a single sysfs
makes sense as a base point to convey information.

> Otherwise we'll loose information on sharing + we may well see
> repeated entries as it's fine to have more than one instance of
> given protocol.
>
> A few more comments inline about what happens when we do run out
> of space in the buffer.
>
> Thanks,
>
> Jonathan
>
> > ---
> >  drivers/pci/doe.c       | 28 ++++++++++++++++++++++++++++
> >  drivers/pci/pci-sysfs.c | 27 +++++++++++++++++++++++++++
> >  include/linux/pci-doe.h |  2 ++
> >  3 files changed, 57 insertions(+)
> >
> > diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
> > index 1b97a5ab71a9..cc1c23c78ac1 100644
> > --- a/drivers/pci/doe.c
> > +++ b/drivers/pci/doe.c
> > @@ -563,6 +563,34 @@ static bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type)
> >       return false;
> >  }
> >
> > +/**
> > + * pci_doe_sysfs_proto_supports() - Write the supported DOE protocols
> > + *                        to a sysfs buffer
> > + * @doe_mb: DOE mailbox capability to query
> > + * @buf: buffer to store the sysfs strings
> > + * @offset: offset in buffer to store the sysfs strings
> > + *
> > + * RETURNS: The number of bytes written, 0 means an error occured
> > + */
> > +unsigned long pci_doe_sysfs_proto_supports(struct pci_doe_mb *doe_mb,
> > +                                        char *buf, ssize_t offset)
> > +{
> > +     unsigned long index;
> > +     ssize_t ret = offset, r;
>
> I find that hard to parse.  Maybe
>
>         ssize_t ret = offset;
>         ssize_t r;
>
>
>
> > +     void *entry;
> > +
> > +     xa_for_each(&doe_mb->prots, index, entry) {
> > +             r = sysfs_emit_at(buf, ret, "0x%08lX\n", xa_to_value(entry));
> > +
> > +             if (r == 0)
> > +                     return 0;
>
> return ret here? Otherwise we aren't outputting everything that we have
> managed to print before the buffer fills up.
> It's also inconsistent because if the last entry happens to partly print
> we'll let it through whereas, if it doesn't fit at all we will drop
> all the info about this DOE instance.

Good point, I'll update this and the variable declarations

Alistair

>
>
> > +
> > +             ret += r;
> > +     }
> > +
> > +     return ret;
> > +}
> > +
> >  /**
> >   * pci_doe_submit_task() - Submit a task to be processed by the state machine
> >   *
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index ab32a91f287b..df93051e65bf 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -16,6 +16,7 @@
> >  #include <linux/kernel.h>
> >  #include <linux/sched.h>
> >  #include <linux/pci.h>
> > +#include <linux/pci-doe.h>
> >  #include <linux/stat.h>
> >  #include <linux/export.h>
> >  #include <linux/topology.h>
> > @@ -290,6 +291,29 @@ static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
> >  }
> >  static DEVICE_ATTR_RO(modalias);
> >
> > +#ifdef CONFIG_PCI_DOE
> > +static ssize_t doe_proto_show(struct device *dev, struct device_attribute *attr,
> > +                          char *buf)
> > +{
> > +     struct pci_dev *pci_dev = to_pci_dev(dev);
> > +     unsigned long index;
> > +     ssize_t ret = 0, r;
>
> As above.
>
> > +     struct pci_doe_mb *doe_mb;
> > +
> > +     xa_for_each(&pci_dev->doe_mbs, index, doe_mb) {
> > +             r = pci_doe_sysfs_proto_supports(doe_mb, buf, ret);
> > +
> > +             if (r == 0)
> > +                     return 0;
>
> As above, return ret here I think makes more sense than 0.
>
>
> > +
> > +             ret += r;
> > +     }
> > +
> > +     return ret;
> > +}
> > +static DEVICE_ATTR_RO(doe_proto);
> > +#endif
> > +
> >  static ssize_t enable_store(struct device *dev, struct device_attribute *attr,
> >                            const char *buf, size_t count)
> >  {
> > @@ -603,6 +627,9 @@ static struct attribute *pci_dev_attrs[] = {
> >       &dev_attr_local_cpus.attr,
> >       &dev_attr_local_cpulist.attr,
> >       &dev_attr_modalias.attr,
> > +#ifdef CONFIG_PCI_DOE
> > +     &dev_attr_doe_proto.attr,
> > +#endif
> >  #ifdef CONFIG_NUMA
> >       &dev_attr_numa_node.attr,
> >  #endif
> > diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
> > index 1f14aed4354b..066494a4dba3 100644
> > --- a/include/linux/pci-doe.h
> > +++ b/include/linux/pci-doe.h
> > @@ -21,5 +21,7 @@ struct pci_doe_mb *pci_find_doe_mailbox(struct pci_dev *pdev, u16 vendor,
> >  int pci_doe(struct pci_doe_mb *doe_mb, u16 vendor, u8 type,
> >           const void *request, size_t request_sz,
> >           void *response, size_t response_sz);
> > +unsigned long pci_doe_sysfs_proto_supports(struct pci_doe_mb *doe_mb,
> > +                                        char *buf, ssize_t offset);
> >
> >  #endif
>
  
Alistair Francis July 31, 2023, 3:30 p.m. UTC | #3
On Thu, Jul 27, 2023 at 4:39 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Tue, 25 Jul 2023 18:30:46 +0200
> Lukas Wunner <lukas@wunner.de> wrote:
>
> > On Tue, Jul 25, 2023 at 01:57:55PM +1000, Alistair Francis wrote:
> > > The PCIe 6 specification added support for the Data Object Exchange (DOE).
> > > When DOE is supported the Discovery Data Object Protocol must be
> > > implemented. The protocol allows a requester to obtain information about
> > > the other DOE protocols supported by the device.
> > >
> > > The kernel is already querying the DOE protocols supported and cacheing
> > > the values. This patch exposes the values via sysfs. This will allow
> > > userspace to determine which DOE protocols are supported by the PCIe
> > > device.
> >
> > Just dumping the list of supported protocols into dmesg might be simpler,
> > unless you intend to add mechanisms to actually use certain DOE mailboxes
> > from user space or expose the information in lspci.  Do have plans for
> > either of that or what's the motivation to use sysfs?
> >
>
> I can answer this one in rather than waiting for Alastair to see it
> (given I was involved in shooting down the earlier proposal :(
>
> At least partly motivated by providing the info for lspci which
> I agree with being a useful addition for debug etc.
> https://github.com/pciutils/pciutils/pull/152

Yep, a big benefit is lspci or other userspace tools to be able to see
what DOE protocols are supported.

I also have plans to expose DOE mailboxes to userspace. That way we
can run the SPDM requester code (using libspdm) in userspace to
communicate with devices using SPDM. That will allow device
authentication for example.

>
> I can see it would also be useful for things that will poke from
> userspace because they aren't expected to run in production (and hence
> hopefully don't care about potential races etc). CXL compliance
> comes to mind - I don't think we ever want to carry kernel code for that.

That too!

>
> Jonathan
>
>
>
> > I think I'd rather want everything in doe.c (#ifdef'ed to CONFIG_SYSFS)
> > and only make dev_attr_doe_proto public.

It seemed like all of the sysf was in this file, which is why I added
it there. I'm happy to move it if that's preferred though

Alistair

> >
> > Thanks,
> >
> > Lukas
> >
>
  
Lukas Wunner July 31, 2023, 5:52 p.m. UTC | #4
On Mon, Jul 31, 2023 at 11:30:29AM -0400, Alistair Francis wrote:
> Yep, a big benefit is lspci or other userspace tools to be able to see
> what DOE protocols are supported.

Fair enough, but please make that motivation explicit in the commit
message.


> I also have plans to expose DOE mailboxes to userspace. That way we
> can run the SPDM requester code (using libspdm) in userspace to
> communicate with devices using SPDM. That will allow device
> authentication for example.

That duplicates our effort to bring up authentication in the kernel:
https://github.com/l1k/linux/commits/doe

We had a lengthy debate on the pros and cons of doing SPDM in-kernel
versus in user space.  We realized that when resuming a device from
D3cold or recovering it after reset, the user space component performing
SPDM authentication might reside on the device (disk, network) being
resumed or reset, preventing its execution.

When resuming from system sleep, devices need to be reauthenticated
during the ->resume_noirq phase, i.e. with device interrupts disabled,
as drivers are allowed access to devices already in that phase.
So authentication (and encryption establishment) needs to happen this
early, when user space isn't available yet.

See the commit message of "PCI/CMA: Reauthenticate devices on reset
and resume" on the above-linked branch for details.

Thanks,

Lukas
  
Lukas Wunner July 31, 2023, 5:59 p.m. UTC | #5
On Thu, Jul 27, 2023 at 09:38:57AM +0100, Jonathan Cameron wrote:
> On Tue, 25 Jul 2023 18:30:46 +0200 Lukas Wunner <lukas@wunner.de> wrote:
> > Do have plans for
> > either of that or what's the motivation to use sysfs?
[...]
> I can see it would also be useful for things that will poke from
> userspace because they aren't expected to run in production (and hence
> hopefully don't care about potential races etc). CXL compliance
> comes to mind - I don't think we ever want to carry kernel code for that.

FWIW, USB4/Thunderbolt does have kernel code for DMA testing with a
special dongle (drivers/thunderbolt/dma_test.c) as well as code for
lane margining, so there *is* some precedent for that... :)

  config USB4_DEBUGFS_MARGINING
	bool "Expose receiver lane margining operations under USB4 ports (DANGEROUS)"
	depends on DEBUG_FS
	depends on USB4_DEBUGFS_WRITE
	help
	  Enables hardware and software based receiver lane margining support
	  under each USB4 port. Used for electrical quality and robustness
	  validation during manufacturing. Should not be enabled by distro
	  kernels.

Thanks,

Lukas
  
Alistair Francis Aug. 1, 2023, 2:03 p.m. UTC | #6
On Mon, Jul 31, 2023 at 1:52 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Mon, Jul 31, 2023 at 11:30:29AM -0400, Alistair Francis wrote:
> > Yep, a big benefit is lspci or other userspace tools to be able to see
> > what DOE protocols are supported.
>
> Fair enough, but please make that motivation explicit in the commit
> message.

Fixed in v2

>
>
> > I also have plans to expose DOE mailboxes to userspace. That way we
> > can run the SPDM requester code (using libspdm) in userspace to
> > communicate with devices using SPDM. That will allow device
> > authentication for example.
>
> That duplicates our effort to bring up authentication in the kernel:
> https://github.com/l1k/linux/commits/doe

That's great!

>
> We had a lengthy debate on the pros and cons of doing SPDM in-kernel
> versus in user space.  We realized that when resuming a device from
> D3cold or recovering it after reset, the user space component performing
> SPDM authentication might reside on the device (disk, network) being
> resumed or reset, preventing its execution.
>
> When resuming from system sleep, devices need to be reauthenticated
> during the ->resume_noirq phase, i.e. with device interrupts disabled,
> as drivers are allowed access to devices already in that phase.
> So authentication (and encryption establishment) needs to happen this
> early, when user space isn't available yet.

Yeah, I had the same debate. Resume/suspend and reset are good points though.

Do you plan on supporting all SPDM commands in the kernel and then
passing information to userspace as required?

Alistair

>
> See the commit message of "PCI/CMA: Reauthenticate devices on reset
> and resume" on the above-linked branch for details.
>
> Thanks,
>
> Lukas
  

Patch

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index 1b97a5ab71a9..cc1c23c78ac1 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -563,6 +563,34 @@  static bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type)
 	return false;
 }
 
+/**
+ * pci_doe_sysfs_proto_supports() - Write the supported DOE protocols
+ *			     to a sysfs buffer
+ * @doe_mb: DOE mailbox capability to query
+ * @buf: buffer to store the sysfs strings
+ * @offset: offset in buffer to store the sysfs strings
+ *
+ * RETURNS: The number of bytes written, 0 means an error occured
+ */
+unsigned long pci_doe_sysfs_proto_supports(struct pci_doe_mb *doe_mb,
+					   char *buf, ssize_t offset)
+{
+	unsigned long index;
+	ssize_t ret = offset, r;
+	void *entry;
+
+	xa_for_each(&doe_mb->prots, index, entry) {
+		r = sysfs_emit_at(buf, ret, "0x%08lX\n", xa_to_value(entry));
+
+		if (r == 0)
+			return 0;
+
+		ret += r;
+	}
+
+	return ret;
+}
+
 /**
  * pci_doe_submit_task() - Submit a task to be processed by the state machine
  *
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index ab32a91f287b..df93051e65bf 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -16,6 +16,7 @@ 
 #include <linux/kernel.h>
 #include <linux/sched.h>
 #include <linux/pci.h>
+#include <linux/pci-doe.h>
 #include <linux/stat.h>
 #include <linux/export.h>
 #include <linux/topology.h>
@@ -290,6 +291,29 @@  static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RO(modalias);
 
+#ifdef CONFIG_PCI_DOE
+static ssize_t doe_proto_show(struct device *dev, struct device_attribute *attr,
+			     char *buf)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	unsigned long index;
+	ssize_t ret = 0, r;
+	struct pci_doe_mb *doe_mb;
+
+	xa_for_each(&pci_dev->doe_mbs, index, doe_mb) {
+		r = pci_doe_sysfs_proto_supports(doe_mb, buf, ret);
+
+		if (r == 0)
+			return 0;
+
+		ret += r;
+	}
+
+	return ret;
+}
+static DEVICE_ATTR_RO(doe_proto);
+#endif
+
 static ssize_t enable_store(struct device *dev, struct device_attribute *attr,
 			     const char *buf, size_t count)
 {
@@ -603,6 +627,9 @@  static struct attribute *pci_dev_attrs[] = {
 	&dev_attr_local_cpus.attr,
 	&dev_attr_local_cpulist.attr,
 	&dev_attr_modalias.attr,
+#ifdef CONFIG_PCI_DOE
+	&dev_attr_doe_proto.attr,
+#endif
 #ifdef CONFIG_NUMA
 	&dev_attr_numa_node.attr,
 #endif
diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
index 1f14aed4354b..066494a4dba3 100644
--- a/include/linux/pci-doe.h
+++ b/include/linux/pci-doe.h
@@ -21,5 +21,7 @@  struct pci_doe_mb *pci_find_doe_mailbox(struct pci_dev *pdev, u16 vendor,
 int pci_doe(struct pci_doe_mb *doe_mb, u16 vendor, u8 type,
 	    const void *request, size_t request_sz,
 	    void *response, size_t response_sz);
+unsigned long pci_doe_sysfs_proto_supports(struct pci_doe_mb *doe_mb,
+					   char *buf, ssize_t offset);
 
 #endif