[V10,2/5] PCI: Create device tree node for bridge

Message ID 1688059190-4225-3-git-send-email-lizhi.hou@amd.com
State New
Headers
Series Generate device tree node for pci devices |

Commit Message

Lizhi Hou June 29, 2023, 5:19 p.m. UTC
  The PCI endpoint device such as Xilinx Alveo PCI card maps the register
spaces from multiple hardware peripherals to its PCI BAR. Normally,
the PCI core discovers devices and BARs using the PCI enumeration process.
There is no infrastructure to discover the hardware peripherals that are
present in a PCI device, and which can be accessed through the PCI BARs.

Apparently, the device tree framework requires a device tree node for the
PCI device. Thus, it can generate the device tree nodes for hardware
peripherals underneath. Because PCI is self discoverable bus, there might
not be a device tree node created for PCI devices. Furthermore, if the PCI
device is hot pluggable, when it is plugged in, the device tree nodes for
its parent bridges are required. Add support to generate device tree node
for PCI bridges.

Add an of_pci_make_dev_node() interface that can be used to create device
tree node for PCI devices.

Add a PCI_DYNAMIC_OF_NODES config option. When the option is turned on,
the kernel will generate device tree nodes for PCI bridges unconditionally.

Initially, add the basic properties for the dynamically generated device
tree nodes which include #address-cells, #size-cells, device_type,
compatible, ranges, reg.

Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
---
 drivers/pci/Kconfig       |  12 ++
 drivers/pci/Makefile      |   1 +
 drivers/pci/bus.c         |   2 +
 drivers/pci/of.c          |  88 ++++++++++++++
 drivers/pci/of_property.c | 235 ++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h         |  15 +++
 drivers/pci/remove.c      |   1 +
 7 files changed, 354 insertions(+)
 create mode 100644 drivers/pci/of_property.c
  

Comments

Bjorn Helgaas June 29, 2023, 10:56 p.m. UTC | #1
On Thu, Jun 29, 2023 at 10:19:47AM -0700, Lizhi Hou wrote:
> The PCI endpoint device such as Xilinx Alveo PCI card maps the register
> spaces from multiple hardware peripherals to its PCI BAR. Normally,
> the PCI core discovers devices and BARs using the PCI enumeration process.
> There is no infrastructure to discover the hardware peripherals that are
> present in a PCI device, and which can be accessed through the PCI BARs.

IIUC this is basically a multi-function device except that instead of
each device being a separate PCI Function, they all appear in a single
Function.  That would mean all the devices share the same config space
so a single PCI Command register controls all of them, they all share
the same IRQs (either INTx or MSI/MSI-X), any MMIO registers are likely
in a shared BAR, etc., right?

Obviously PCI enumeration only sees the single Function and binds a
single driver to it.  But IIUC, you want to use existing drivers for
each of these sub-devices, so this series adds a DT node for the
single Function (using the quirks that call of_pci_make_dev_node()).
And I assume that when the PCI driver claims the single Function, it
will use that DT node to add platform devices, and those existing
drivers can claim those?

I don't see the PCI driver for the single Function in this series.  Is
that coming?  Is this series useful without it?

> Apparently, the device tree framework requires a device tree node for the
> PCI device. Thus, it can generate the device tree nodes for hardware
> peripherals underneath. Because PCI is self discoverable bus, there might
> not be a device tree node created for PCI devices. Furthermore, if the PCI
> device is hot pluggable, when it is plugged in, the device tree nodes for
> its parent bridges are required. Add support to generate device tree node
> for PCI bridges.

Can you remind me why hot-adding a PCI device requires DT nodes for
parent bridges?  I don't think we have those today, so maybe the DT
node for the PCI device requires a DT parent?  How far up does that
go?  From this patch, I guess a Root Port would be the top DT node on
a PCIe system, since that's the top PCI-to-PCI bridge?

This patch adds a DT node for *every* PCI bridge in the system.  We
only actually need that node for these unusual devices.  Is there some
way the driver for the single PCI Function could add that node when it
is needed?  Sorry if you've answered this in the past; maybe the
answer could be in the commit log or a code comment in case somebody
else wonders.

> @@ -340,6 +340,8 @@ void pci_bus_add_device(struct pci_dev *dev)
>  	 */
>  	pcibios_bus_add_device(dev);
>  	pci_fixup_device(pci_fixup_final, dev);
> +	if (pci_is_bridge(dev))
> +		of_pci_make_dev_node(dev);

It'd be nice to have a clue here about why we need this, since this is
executed for *every* system, even ACPI platforms that typically don't
use OF things.

>  	pci_create_sysfs_dev_files(dev);
>  	pci_proc_attach_device(dev);
>  	pci_bridge_d3_update(dev);
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index 2c25f4fa0225..9786ae407948 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -487,6 +487,15 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *
>  		} else {
>  			/* We found a P2P bridge, check if it has a node */
>  			ppnode = pci_device_to_OF_node(ppdev);
> +#if IS_ENABLED(CONFIG_PCI_DYNAMIC_OF_NODES)

I would use plain #ifdef here instead of IS_ENABLED(), as you did in
pci.h below.  IS_ENABLED() is true if the Kconfig symbol is set to
either "y" or "m".

Using IS_ENABLED() suggests that the config option *could* be a
module, which is not the case here because CONFIG_PCI_DYNAMIC_OF_NODES
is a bool.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/kconfig.h?id=v6.4#n69

> @@ -617,6 +626,85 @@ int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge *bridge)
>  	return pci_parse_request_of_pci_ranges(dev, bridge);
>  }
>  
> +#if IS_ENABLED(CONFIG_PCI_DYNAMIC_OF_NODES)

Same here, of course.

> +void of_pci_remove_node(struct pci_dev *pdev)
> +{
> +	struct device_node *np;
> +
> +	np = pci_device_to_OF_node(pdev);
> +	if (!np || !of_node_check_flag(np, OF_DYNAMIC))

> + * Each entry in the ranges table is a tuple containing the child address,
> + * the parent address, and the size of the region in the child address space.
> + * Thus, for PCI, in each entry parent address is an address on the primary
> + * side and the child address is the corresponding address on the secondary
> + * side.
> + */
> +struct of_pci_range {
> +	u32		child_addr[OF_PCI_ADDRESS_CELLS];
> +	u32		parent_addr[OF_PCI_ADDRESS_CELLS];
> +	u32		size[OF_PCI_SIZE_CELLS];

> +		if (pci_is_bridge(pdev)) {
> +			memcpy(rp[i].child_addr, rp[i].parent_addr,
> +			       sizeof(rp[i].child_addr));
> +		} else {
> +			/*
> +			 * For endpoint device, the lower 64-bits of child
> +			 * address is always zero.

I think this connects with the secondary side comment above, right?  I
think I would comment this as:

  /*
   * PCI-PCI bridges don't translate addresses, so the child
   * (secondary side) address is identical to the parent (primary
   * side) address.
   */

and

  /*
   * Non-bridges have no child (secondary side) address, so clear it
   * out.
   */

> +			 */
> +			rp[i].child_addr[0] = j;

> +	ret = of_changeset_add_empty_prop(ocs, np, "dynamic");

It seems slightly confusing to use a "dynamic" property here when we
also have the OF_DYNAMIC dynamic flag above.  I think they have
different meanings, don't they?

Bjorn
  
Rob Herring June 29, 2023, 11:52 p.m. UTC | #2
On Thu, Jun 29, 2023 at 05:56:31PM -0500, Bjorn Helgaas wrote:
> On Thu, Jun 29, 2023 at 10:19:47AM -0700, Lizhi Hou wrote:
> > The PCI endpoint device such as Xilinx Alveo PCI card maps the register
> > spaces from multiple hardware peripherals to its PCI BAR. Normally,
> > the PCI core discovers devices and BARs using the PCI enumeration process.
> > There is no infrastructure to discover the hardware peripherals that are
> > present in a PCI device, and which can be accessed through the PCI BARs.
> 
> IIUC this is basically a multi-function device except that instead of
> each device being a separate PCI Function, they all appear in a single
> Function.  That would mean all the devices share the same config space
> so a single PCI Command register controls all of them, they all share
> the same IRQs (either INTx or MSI/MSI-X), any MMIO registers are likely
> in a shared BAR, etc., right?

Could be multiple BARs, but yes.

> Obviously PCI enumeration only sees the single Function and binds a
> single driver to it.  But IIUC, you want to use existing drivers for
> each of these sub-devices, so this series adds a DT node for the
> single Function (using the quirks that call of_pci_make_dev_node()).
> And I assume that when the PCI driver claims the single Function, it
> will use that DT node to add platform devices, and those existing
> drivers can claim those?

Yes. It will call some variant of of_platform_populate().

> I don't see the PCI driver for the single Function in this series.  Is
> that coming?  Is this series useful without it?

https://lore.kernel.org/all/20220305052304.726050-4-lizhi.hou@xilinx.com/

I asked for things to be split up as the original series did a lot 
of new things at once. This series only works with the QEMU PCI test 
device which the DT unittest will use.

> > Apparently, the device tree framework requires a device tree node for the
> > PCI device. Thus, it can generate the device tree nodes for hardware
> > peripherals underneath. Because PCI is self discoverable bus, there might
> > not be a device tree node created for PCI devices. Furthermore, if the PCI
> > device is hot pluggable, when it is plugged in, the device tree nodes for
> > its parent bridges are required. Add support to generate device tree node
> > for PCI bridges.
> 
> Can you remind me why hot-adding a PCI device requires DT nodes for
> parent bridges?

Because the PCI device needs a DT node and we can't just put PCI devices 
in the DT root. We have to create the bus hierarchy.

> I don't think we have those today, so maybe the DT
> node for the PCI device requires a DT parent?  How far up does that
> go?

All the way.

>  From this patch, I guess a Root Port would be the top DT node on
> a PCIe system, since that's the top PCI-to-PCI bridge?

Yes. Plus above the host bridge could have a hierarchy of nodes.

> This patch adds a DT node for *every* PCI bridge in the system.  We
> only actually need that node for these unusual devices.  Is there some
> way the driver for the single PCI Function could add that node when it
> is needed?  Sorry if you've answered this in the past; maybe the
> answer could be in the commit log or a code comment in case somebody
> else wonders.

This was discussed early on. I don't think it would work to create the 
nodes at the time we discover we have a device that wants a DT node. The 
issue is decisions are made in the code based on whether there's a DT 
node for a PCI device or not. It might work, but I think it's fragile to 
have nodes attached to devices at different points in time.

> 
> > @@ -340,6 +340,8 @@ void pci_bus_add_device(struct pci_dev *dev)
> >  	 */
> >  	pcibios_bus_add_device(dev);
> >  	pci_fixup_device(pci_fixup_final, dev);
> > +	if (pci_is_bridge(dev))
> > +		of_pci_make_dev_node(dev);
> 
> It'd be nice to have a clue here about why we need this, since this is
> executed for *every* system, even ACPI platforms that typically don't
> use OF things.
> 
> >  	pci_create_sysfs_dev_files(dev);
> >  	pci_proc_attach_device(dev);
> >  	pci_bridge_d3_update(dev);
> > diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> > index 2c25f4fa0225..9786ae407948 100644
> > --- a/drivers/pci/of.c
> > +++ b/drivers/pci/of.c
> > @@ -487,6 +487,15 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *
> >  		} else {
> >  			/* We found a P2P bridge, check if it has a node */
> >  			ppnode = pci_device_to_OF_node(ppdev);
> > +#if IS_ENABLED(CONFIG_PCI_DYNAMIC_OF_NODES)
> 
> I would use plain #ifdef here instead of IS_ENABLED(), as you did in
> pci.h below.  IS_ENABLED() is true if the Kconfig symbol is set to
> either "y" or "m".
> 
> Using IS_ENABLED() suggests that the config option *could* be a
> module, which is not the case here because CONFIG_PCI_DYNAMIC_OF_NODES
> is a bool.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/kconfig.h?id=v6.4#n69
> 
> > @@ -617,6 +626,85 @@ int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge *bridge)
> >  	return pci_parse_request_of_pci_ranges(dev, bridge);
> >  }
> >  
> > +#if IS_ENABLED(CONFIG_PCI_DYNAMIC_OF_NODES)
> 
> Same here, of course.
> 
> > +void of_pci_remove_node(struct pci_dev *pdev)
> > +{
> > +	struct device_node *np;
> > +
> > +	np = pci_device_to_OF_node(pdev);
> > +	if (!np || !of_node_check_flag(np, OF_DYNAMIC))
> 
> > + * Each entry in the ranges table is a tuple containing the child address,
> > + * the parent address, and the size of the region in the child address space.
> > + * Thus, for PCI, in each entry parent address is an address on the primary
> > + * side and the child address is the corresponding address on the secondary
> > + * side.
> > + */
> > +struct of_pci_range {
> > +	u32		child_addr[OF_PCI_ADDRESS_CELLS];
> > +	u32		parent_addr[OF_PCI_ADDRESS_CELLS];
> > +	u32		size[OF_PCI_SIZE_CELLS];
> 
> > +		if (pci_is_bridge(pdev)) {
> > +			memcpy(rp[i].child_addr, rp[i].parent_addr,
> > +			       sizeof(rp[i].child_addr));
> > +		} else {
> > +			/*
> > +			 * For endpoint device, the lower 64-bits of child
> > +			 * address is always zero.
> 
> I think this connects with the secondary side comment above, right?  I
> think I would comment this as:
> 
>   /*
>    * PCI-PCI bridges don't translate addresses, so the child
>    * (secondary side) address is identical to the parent (primary
>    * side) address.
>    */
> 
> and
> 
>   /*
>    * Non-bridges have no child (secondary side) address, so clear it
>    * out.
>    */
> 
> > +			 */
> > +			rp[i].child_addr[0] = j;
> 
> > +	ret = of_changeset_add_empty_prop(ocs, np, "dynamic");
> 
> It seems slightly confusing to use a "dynamic" property here when we
> also have the OF_DYNAMIC dynamic flag above.  I think they have
> different meanings, don't they?

Hum, what's the property for? It's new in this version. Any DT property 
needs to be documented, but I don't see why we need it.

Rob
  
Rob Herring June 29, 2023, 11:55 p.m. UTC | #3
On Thu, Jun 29, 2023 at 05:56:31PM -0500, Bjorn Helgaas wrote:
> On Thu, Jun 29, 2023 at 10:19:47AM -0700, Lizhi Hou wrote:
> > The PCI endpoint device such as Xilinx Alveo PCI card maps the register
> > spaces from multiple hardware peripherals to its PCI BAR. Normally,
> > the PCI core discovers devices and BARs using the PCI enumeration process.
> > There is no infrastructure to discover the hardware peripherals that are
> > present in a PCI device, and which can be accessed through the PCI BARs.
> 
> IIUC this is basically a multi-function device except that instead of
> each device being a separate PCI Function, they all appear in a single
> Function.  That would mean all the devices share the same config space
> so a single PCI Command register controls all of them, they all share
> the same IRQs (either INTx or MSI/MSI-X), any MMIO registers are likely
> in a shared BAR, etc., right?
> 
> Obviously PCI enumeration only sees the single Function and binds a
> single driver to it.  But IIUC, you want to use existing drivers for
> each of these sub-devices, so this series adds a DT node for the
> single Function (using the quirks that call of_pci_make_dev_node()).
> And I assume that when the PCI driver claims the single Function, it
> will use that DT node to add platform devices, and those existing
> drivers can claim those?
> 
> I don't see the PCI driver for the single Function in this series.  Is
> that coming?  Is this series useful without it?
> 
> > Apparently, the device tree framework requires a device tree node for the
> > PCI device. Thus, it can generate the device tree nodes for hardware
> > peripherals underneath. Because PCI is self discoverable bus, there might
> > not be a device tree node created for PCI devices. Furthermore, if the PCI
> > device is hot pluggable, when it is plugged in, the device tree nodes for
> > its parent bridges are required. Add support to generate device tree node
> > for PCI bridges.
> 
> Can you remind me why hot-adding a PCI device requires DT nodes for
> parent bridges?  I don't think we have those today, so maybe the DT
> node for the PCI device requires a DT parent?  How far up does that
> go?  From this patch, I guess a Root Port would be the top DT node on
> a PCIe system, since that's the top PCI-to-PCI bridge?
> 
> This patch adds a DT node for *every* PCI bridge in the system.  We
> only actually need that node for these unusual devices.  Is there some
> way the driver for the single PCI Function could add that node when it
> is needed?  Sorry if you've answered this in the past; maybe the
> answer could be in the commit log or a code comment in case somebody
> else wonders.
> 
> > @@ -340,6 +340,8 @@ void pci_bus_add_device(struct pci_dev *dev)
> >  	 */
> >  	pcibios_bus_add_device(dev);
> >  	pci_fixup_device(pci_fixup_final, dev);
> > +	if (pci_is_bridge(dev))
> > +		of_pci_make_dev_node(dev);
> 
> It'd be nice to have a clue here about why we need this, since this is
> executed for *every* system, even ACPI platforms that typically don't
> use OF things.
> 
> >  	pci_create_sysfs_dev_files(dev);
> >  	pci_proc_attach_device(dev);
> >  	pci_bridge_d3_update(dev);
> > diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> > index 2c25f4fa0225..9786ae407948 100644
> > --- a/drivers/pci/of.c
> > +++ b/drivers/pci/of.c
> > @@ -487,6 +487,15 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *
> >  		} else {
> >  			/* We found a P2P bridge, check if it has a node */
> >  			ppnode = pci_device_to_OF_node(ppdev);
> > +#if IS_ENABLED(CONFIG_PCI_DYNAMIC_OF_NODES)
> 
> I would use plain #ifdef here instead of IS_ENABLED(), as you did in
> pci.h below.  IS_ENABLED() is true if the Kconfig symbol is set to
> either "y" or "m".

Actually, IS_ENABLED() with a C 'if' rather than a preprocessor #if 
would work here and is preferred.

But this code and the "dynamic" property needs more discussion.

Rob
  
Bjorn Helgaas June 30, 2023, 2:42 p.m. UTC | #4
On Thu, Jun 29, 2023 at 05:55:51PM -0600, Rob Herring wrote:
> On Thu, Jun 29, 2023 at 05:56:31PM -0500, Bjorn Helgaas wrote:
> > On Thu, Jun 29, 2023 at 10:19:47AM -0700, Lizhi Hou wrote:
> > > The PCI endpoint device such as Xilinx Alveo PCI card maps the register
> > > spaces from multiple hardware peripherals to its PCI BAR. Normally,
> > > the PCI core discovers devices and BARs using the PCI enumeration process.
> > > There is no infrastructure to discover the hardware peripherals that are
> > > present in a PCI device, and which can be accessed through the PCI BARs.

> > > --- a/drivers/pci/of.c
> > > +++ b/drivers/pci/of.c
> > > @@ -487,6 +487,15 @@ static int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *
> > >  		} else {
> > >  			/* We found a P2P bridge, check if it has a node */
> > >  			ppnode = pci_device_to_OF_node(ppdev);
> > > +#if IS_ENABLED(CONFIG_PCI_DYNAMIC_OF_NODES)
> > 
> > I would use plain #ifdef here instead of IS_ENABLED(), as you did in
> > pci.h below.  IS_ENABLED() is true if the Kconfig symbol is set to
> > either "y" or "m".
> 
> Actually, IS_ENABLED() with a C 'if' rather than a preprocessor #if 
> would work here and is preferred.

Makes sense; I see the justification at [1].  I do wish it didn't have
to be different between this usage and the "#ifdef
CONFIG_PCI_DYNAMIC_OF_NODES" in pci.h for the stubs.  But this is OK
by me.

Bjorn

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=v6.4#n1162
  
Bjorn Helgaas June 30, 2023, 4:48 p.m. UTC | #5
On Thu, Jun 29, 2023 at 05:52:26PM -0600, Rob Herring wrote:
> On Thu, Jun 29, 2023 at 05:56:31PM -0500, Bjorn Helgaas wrote:
> > On Thu, Jun 29, 2023 at 10:19:47AM -0700, Lizhi Hou wrote:
> > > The PCI endpoint device such as Xilinx Alveo PCI card maps the register
> > > spaces from multiple hardware peripherals to its PCI BAR. Normally,
> > > the PCI core discovers devices and BARs using the PCI enumeration process.
> > > There is no infrastructure to discover the hardware peripherals that are
> > > present in a PCI device, and which can be accessed through the PCI BARs.
> > 
> > IIUC this is basically a multi-function device except that instead of
> > each device being a separate PCI Function, they all appear in a single
> > Function.  That would mean all the devices share the same config space
> > so a single PCI Command register controls all of them, they all share
> > the same IRQs (either INTx or MSI/MSI-X), any MMIO registers are likely
> > in a shared BAR, etc., right?
> 
> Could be multiple BARs, but yes.

Where does the PCI glue live?  E.g., who ioremaps the BARs?  Who sets
up PCI interrupts?  Who enables bus mastering?  The platform driver
that claims the DT node wouldn't know that this is part of a PCI
device, so I guess the PCI driver must do all that stuff?  I don't see
it in the xmgmt-drv.c from
https://lore.kernel.org/all/20220305052304.726050-4-lizhi.hou@xilinx.com/

> > Obviously PCI enumeration only sees the single Function and binds a
> > single driver to it.  But IIUC, you want to use existing drivers for
> > each of these sub-devices, so this series adds a DT node for the
> > single Function (using the quirks that call of_pci_make_dev_node()).
> > And I assume that when the PCI driver claims the single Function, it
> > will use that DT node to add platform devices, and those existing
> > drivers can claim those?
> 
> Yes. It will call some variant of of_platform_populate().
> 
> > I don't see the PCI driver for the single Function in this series.  Is
> > that coming?  Is this series useful without it?
> 
> https://lore.kernel.org/all/20220305052304.726050-4-lizhi.hou@xilinx.com/
> 
> I asked for things to be split up as the original series did a lot 
> of new things at once. This series only works with the QEMU PCI test 
> device which the DT unittest will use.
> 
> > > Apparently, the device tree framework requires a device tree node for the
> > > PCI device. Thus, it can generate the device tree nodes for hardware
> > > peripherals underneath. Because PCI is self discoverable bus, there might
> > > not be a device tree node created for PCI devices. Furthermore, if the PCI
> > > device is hot pluggable, when it is plugged in, the device tree nodes for
> > > its parent bridges are required. Add support to generate device tree node
> > > for PCI bridges.
> > 
> > Can you remind me why hot-adding a PCI device requires DT nodes for
> > parent bridges?
> 
> Because the PCI device needs a DT node and we can't just put PCI devices 
> in the DT root. We have to create the bus hierarchy.
> 
> > I don't think we have those today, so maybe the DT
> > node for the PCI device requires a DT parent?  How far up does that
> > go?
> 
> All the way.
> 
> >  From this patch, I guess a Root Port would be the top DT node on
> > a PCIe system, since that's the top PCI-to-PCI bridge?
> 
> Yes. Plus above the host bridge could have a hierarchy of nodes.

I'm missing something if it goes "all the way up," i.e., to a single
system root, but a Root Port is the top DT node.  If a Root Port is
the top, there would be several roots.

> > This patch adds a DT node for *every* PCI bridge in the system.  We
> > only actually need that node for these unusual devices.  Is there some
> > way the driver for the single PCI Function could add that node when it
> > is needed?  Sorry if you've answered this in the past; maybe the
> > answer could be in the commit log or a code comment in case somebody
> > else wonders.
> 
> This was discussed early on. I don't think it would work to create the 
> nodes at the time we discover we have a device that wants a DT node. The 
> issue is decisions are made in the code based on whether there's a DT 
> node for a PCI device or not. It might work, but I think it's fragile to 
> have nodes attached to devices at different points in time.

Ah.  So I guess the problem is we enumerate a PCI bridge, we might do
something based on the fact that it doesn't have a DT node, then add a
DT node for it later.

Bjorn
  
Lizhi Hou June 30, 2023, 6:24 p.m. UTC | #6
On 6/29/23 16:52, Rob Herring wrote:
>>> +			rp[i].child_addr[0] = j;
>>> +	ret = of_changeset_add_empty_prop(ocs, np, "dynamic");
>> It seems slightly confusing to use a "dynamic" property here when we
>> also have the OF_DYNAMIC dynamic flag above.  I think they have
>> different meanings, don't they?
> Hum, what's the property for? It's new in this version. Any DT property
> needs to be documented, but I don't see why we need it.

This is mentioned in my previous reply for V9

https://lore.kernel.org/lkml/af9b6bb3-a98d-4fb6-b51e-b48bca61dada@amd.com/

As we discussed before, "interrupt-map" was intended to be used here.

And after thinking it more, it may not work for the cases where ppnode

is not dynamically generated and it does not have "interrupt-map".

For example the IBM ppc system, its device tree has nodes for pci bridge

and it does not have "interrupt-map".

Based on previous discussions, OF_DYNAMIC should not be used here.

So I think adding "dynamic" might be a way to identify the dynamically

added node. Or we can introduce a new flag e.g OF_IRQ_SWIZZLING.


Thanks,

Lizhi
  
Lizhi Hou June 30, 2023, 7:59 p.m. UTC | #7
On 6/30/23 09:48, Bjorn Helgaas wrote:
> On Thu, Jun 29, 2023 at 05:52:26PM -0600, Rob Herring wrote:
>> On Thu, Jun 29, 2023 at 05:56:31PM -0500, Bjorn Helgaas wrote:
>>> On Thu, Jun 29, 2023 at 10:19:47AM -0700, Lizhi Hou wrote:
>>>> The PCI endpoint device such as Xilinx Alveo PCI card maps the register
>>>> spaces from multiple hardware peripherals to its PCI BAR. Normally,
>>>> the PCI core discovers devices and BARs using the PCI enumeration process.
>>>> There is no infrastructure to discover the hardware peripherals that are
>>>> present in a PCI device, and which can be accessed through the PCI BARs.
>>> IIUC this is basically a multi-function device except that instead of
>>> each device being a separate PCI Function, they all appear in a single
>>> Function.  That would mean all the devices share the same config space
>>> so a single PCI Command register controls all of them, they all share
>>> the same IRQs (either INTx or MSI/MSI-X), any MMIO registers are likely
>>> in a shared BAR, etc., right?
>> Could be multiple BARs, but yes.
> Where does the PCI glue live?  E.g., who ioremaps the BARs?  Who sets
> up PCI interrupts?  Who enables bus mastering?  The platform driver
> that claims the DT node wouldn't know that this is part of a PCI
> device, so I guess the PCI driver must do all that stuff?  I don't see
> it in the xmgmt-drv.c from
> https://lore.kernel.org/all/20220305052304.726050-4-lizhi.hou@xilinx.com/
>
Yes, the PCI driver will do all that stuff. This xmgmt-drv.c is created

to just populating the devices based on fdt input.  It is removed after

the unittest is created which can populate devices and verify the

address translation.


Thanks,

Lizhi
  
Lizhi Hou July 18, 2023, 3:50 p.m. UTC | #8
Hi Rob,

Do you have any comment on this?

Thanks,

Lizhi

On 6/30/23 11:24, Lizhi Hou wrote:
>
> On 6/29/23 16:52, Rob Herring wrote:
>>>> +            rp[i].child_addr[0] = j;
>>>> +    ret = of_changeset_add_empty_prop(ocs, np, "dynamic");
>>> It seems slightly confusing to use a "dynamic" property here when we
>>> also have the OF_DYNAMIC dynamic flag above.  I think they have
>>> different meanings, don't they?
>> Hum, what's the property for? It's new in this version. Any DT property
>> needs to be documented, but I don't see why we need it.
>
> This is mentioned in my previous reply for V9
>
> https://lore.kernel.org/lkml/af9b6bb3-a98d-4fb6-b51e-b48bca61dada@amd.com/ 
>
>
> As we discussed before, "interrupt-map" was intended to be used here.
>
> And after thinking it more, it may not work for the cases where ppnode
>
> is not dynamically generated and it does not have "interrupt-map".
>
> For example the IBM ppc system, its device tree has nodes for pci bridge
>
> and it does not have "interrupt-map".
>
> Based on previous discussions, OF_DYNAMIC should not be used here.
>
> So I think adding "dynamic" might be a way to identify the dynamically
>
> added node. Or we can introduce a new flag e.g OF_IRQ_SWIZZLING.
>
>
> Thanks,
>
> Lizhi
>
  
Rob Herring July 18, 2023, 6:15 p.m. UTC | #9
On Fri, Jun 30, 2023 at 12:25 PM Lizhi Hou <lizhi.hou@amd.com> wrote:
>
>
> On 6/29/23 16:52, Rob Herring wrote:
> >>> +                   rp[i].child_addr[0] = j;
> >>> +   ret = of_changeset_add_empty_prop(ocs, np, "dynamic");
> >> It seems slightly confusing to use a "dynamic" property here when we
> >> also have the OF_DYNAMIC dynamic flag above.  I think they have
> >> different meanings, don't they?
> > Hum, what's the property for? It's new in this version. Any DT property
> > needs to be documented, but I don't see why we need it.
>
> This is mentioned in my previous reply for V9
>
> https://lore.kernel.org/lkml/af9b6bb3-a98d-4fb6-b51e-b48bca61dada@amd.com/
>
> As we discussed before, "interrupt-map" was intended to be used here.
>
> And after thinking it more, it may not work for the cases where ppnode
>
> is not dynamically generated and it does not have "interrupt-map".
>
> For example the IBM ppc system, its device tree has nodes for pci bridge
>
> and it does not have "interrupt-map".

How do you know? I ask because usually the only way I have visibility
there is when I break something. In traditional OpenFirmware, which
IBM PPC is, all PCI devices have a DT node because it's the firmware
telling the OS "these are the devices I discovered and this is how I
configured them".

> Based on previous discussions, OF_DYNAMIC should not be used here.

For the same reasons, I don't think the behavior should change based
on being dynamic. Now maybe the behavior when it's an ACPI system with
DT overlays has to change, but that's a problem for later. I don't yet
know if we'd handle that here somehow or elsewhere so that this node
looks like a normal DT system.

This should all work the same whether we've generated the nodes or
they were already present in the FDT when we booted.

> So I think adding "dynamic" might be a way to identify the dynamically
>
> added node. Or we can introduce a new flag e.g OF_IRQ_SWIZZLING.

I hope not. The flags tend to be hacks.

Rob
  
Lizhi Hou July 24, 2023, 6:18 p.m. UTC | #10
On 7/18/23 11:15, Rob Herring wrote:
> On Fri, Jun 30, 2023 at 12:25 PM Lizhi Hou <lizhi.hou@amd.com> wrote:
>>
>> On 6/29/23 16:52, Rob Herring wrote:
>>>>> +                   rp[i].child_addr[0] = j;
>>>>> +   ret = of_changeset_add_empty_prop(ocs, np, "dynamic");
>>>> It seems slightly confusing to use a "dynamic" property here when we
>>>> also have the OF_DYNAMIC dynamic flag above.  I think they have
>>>> different meanings, don't they?
>>> Hum, what's the property for? It's new in this version. Any DT property
>>> needs to be documented, but I don't see why we need it.
>> This is mentioned in my previous reply for V9
>>
>> https://lore.kernel.org/lkml/af9b6bb3-a98d-4fb6-b51e-b48bca61dada@amd.com/
>>
>> As we discussed before, "interrupt-map" was intended to be used here.
>>
>> And after thinking it more, it may not work for the cases where ppnode
>>
>> is not dynamically generated and it does not have "interrupt-map".
>>
>> For example the IBM ppc system, its device tree has nodes for pci bridge
>>
>> and it does not have "interrupt-map".
> How do you know? I ask because usually the only way I have visibility
> there is when I break something. In traditional OpenFirmware, which
> IBM PPC is, all PCI devices have a DT node because it's the firmware
> telling the OS "these are the devices I discovered and this is how I
> configured them".

I configured a ppc VM and added a bridge to the VM

qemu-system-ppc -L pc-bios -boot c -prom-env "boot-device=hd:,\yaboot" 
-prom-env "boot-args=conf=hd:,\yaboot.conf" -M mac99 -m 1024 -hda 
debian10.qcow2 -nographic -device pci-bridge,chassis_nr=1,id=pci.9

# ls /proc/device-tree/pci\@f2000000/pci1b36\,1\@f/
66mhz-capable        class-code      fast-back-to-back min-grant      
vendor-id
assigned-addresses  device-id      interrupts         name
bus-range        devsel-speed  linux,phandle      reg
cache-line-size     ethernet@1      max-latency revision-id

The bridge node does not have 'interrupt-map'. That is why I concerned 
for using 'interrupt-map'.


To further debugging on if it really breaks anything, I added a nic 
device under bridge. Even without my patch, it is failed anyway.

      [    0.086586] pci 0000:01:01.0: of_irq_parse_pci: failed with rc=-22

So I setup another power10 VM and see the 'interrupt-map' is created for 
pci bridge. And the nic device under bridge works fine.


Maybe using 'interrupt-map' will not break anything in the real world.  
I will re-create a patchset which uses 'interrupt-map' (like V9) and 
checks it only when CONFIG_PCI_DYNAMIC_OF_NODES is turned on.


Thanks,

Lizhi

>
>> Based on previous discussions, OF_DYNAMIC should not be used here.
> For the same reasons, I don't think the behavior should change based
> on being dynamic. Now maybe the behavior when it's an ACPI system with
> DT overlays has to change, but that's a problem for later. I don't yet
> know if we'd handle that here somehow or elsewhere so that this node
> looks like a normal DT system.
>
> This should all work the same whether we've generated the nodes or
> they were already present in the FDT when we booted.
>
>> So I think adding "dynamic" might be a way to identify the dynamically
>>
>> added node. Or we can introduce a new flag e.g OF_IRQ_SWIZZLING.
> I hope not. The flags tend to be hacks.
>
> Rob
  

Patch

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 9309f2469b41..7264a5cee6bf 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -193,6 +193,18 @@  config PCI_HYPERV
 	  The PCI device frontend driver allows the kernel to import arbitrary
 	  PCI devices from a PCI backend to support PCI driver domains.
 
+config PCI_DYNAMIC_OF_NODES
+	bool "Create device tree nodes for PCI devices"
+	depends on OF
+	select OF_DYNAMIC
+	help
+	  This option enables support for generating device tree nodes for some
+	  PCI devices. Thus, the driver of this kind can load and overlay
+	  flattened device tree for its downstream devices.
+
+	  Once this option is selected, the device tree nodes will be generated
+	  for all PCI bridges.
+
 choice
 	prompt "PCI Express hierarchy optimization setting"
 	default PCIE_BUS_DEFAULT
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 2680e4c92f0a..cc8b4e01e29d 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -32,6 +32,7 @@  obj-$(CONFIG_PCI_P2PDMA)	+= p2pdma.o
 obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
 obj-$(CONFIG_VGA_ARB)		+= vgaarb.o
 obj-$(CONFIG_PCI_DOE)		+= doe.o
+obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o
 
 # Endpoint library must be initialized before its users
 obj-$(CONFIG_PCI_ENDPOINT)	+= endpoint/
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 5bc81cc0a2de..ab7d06cd0099 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -340,6 +340,8 @@  void pci_bus_add_device(struct pci_dev *dev)
 	 */
 	pcibios_bus_add_device(dev);
 	pci_fixup_device(pci_fixup_final, dev);
+	if (pci_is_bridge(dev))
+		of_pci_make_dev_node(dev);
 	pci_create_sysfs_dev_files(dev);
 	pci_proc_attach_device(dev);
 	pci_bridge_d3_update(dev);
diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index 2c25f4fa0225..9786ae407948 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -487,6 +487,15 @@  static int of_irq_parse_pci(const struct pci_dev *pdev, struct of_phandle_args *
 		} else {
 			/* We found a P2P bridge, check if it has a node */
 			ppnode = pci_device_to_OF_node(ppdev);
+#if IS_ENABLED(CONFIG_PCI_DYNAMIC_OF_NODES)
+			/*
+			 * Interrupt mapping is not supported for dynamic
+			 * generated bridge node. Thus, set ppnode to NULL
+			 * to do standard swizzling.
+			 */
+			if (of_property_present(ppnode, "dynamic"))
+				ppnode = NULL;
+#endif
 		}
 
 		/*
@@ -617,6 +626,85 @@  int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge *bridge)
 	return pci_parse_request_of_pci_ranges(dev, bridge);
 }
 
+#if IS_ENABLED(CONFIG_PCI_DYNAMIC_OF_NODES)
+
+void of_pci_remove_node(struct pci_dev *pdev)
+{
+	struct device_node *np;
+
+	np = pci_device_to_OF_node(pdev);
+	if (!np || !of_node_check_flag(np, OF_DYNAMIC))
+		return;
+	pdev->dev.of_node = NULL;
+
+	of_changeset_revert(np->data);
+	of_changeset_destroy(np->data);
+	of_node_put(np);
+}
+
+void of_pci_make_dev_node(struct pci_dev *pdev)
+{
+	struct device_node *ppnode, *np = NULL;
+	const char *pci_type;
+	struct of_changeset *cset;
+	const char *name;
+	int ret;
+
+	/*
+	 * If there is already a device tree node linked to this device,
+	 * return immediately.
+	 */
+	if (pci_device_to_OF_node(pdev))
+		return;
+
+	/* Check if there is device tree node for parent device */
+	if (!pdev->bus->self)
+		ppnode = pdev->bus->dev.of_node;
+	else
+		ppnode = pdev->bus->self->dev.of_node;
+	if (!ppnode)
+		return;
+
+	if (pci_is_bridge(pdev))
+		pci_type = "pci";
+	else
+		pci_type = "dev";
+
+	name = kasprintf(GFP_KERNEL, "%s@%x,%x", pci_type,
+			 PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
+	if (!name)
+		return;
+
+	cset = kmalloc(sizeof(*cset), GFP_KERNEL);
+	if (!cset)
+		goto failed;
+	of_changeset_init(cset);
+
+	np = of_changeset_create_node(ppnode, name, cset);
+	if (!np)
+		goto failed;
+	np->data = cset;
+
+	ret = of_pci_add_properties(pdev, cset, np);
+	if (ret)
+		goto failed;
+
+	ret = of_changeset_apply(cset);
+	if (ret)
+		goto failed;
+
+	pdev->dev.of_node = np;
+	kfree(name);
+
+	return;
+
+failed:
+	if (np)
+		of_node_put(np);
+	kfree(name);
+}
+#endif
+
 #endif /* CONFIG_PCI */
 
 /**
diff --git a/drivers/pci/of_property.c b/drivers/pci/of_property.c
new file mode 100644
index 000000000000..1432f9eed3af
--- /dev/null
+++ b/drivers/pci/of_property.c
@@ -0,0 +1,235 @@ 
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2022-2023, Advanced Micro Devices, Inc.
+ */
+
+#include <linux/pci.h>
+#include <linux/of.h>
+#include <linux/bitfield.h>
+#include <linux/bits.h>
+#include "pci.h"
+
+#define OF_PCI_ADDRESS_CELLS		3
+#define OF_PCI_SIZE_CELLS		2
+
+struct of_pci_addr_pair {
+	u32		phys_addr[OF_PCI_ADDRESS_CELLS];
+	u32		size[OF_PCI_SIZE_CELLS];
+};
+
+/*
+ * Each entry in the ranges table is a tuple containing the child address,
+ * the parent address, and the size of the region in the child address space.
+ * Thus, for PCI, in each entry parent address is an address on the primary
+ * side and the child address is the corresponding address on the secondary
+ * side.
+ */
+struct of_pci_range {
+	u32		child_addr[OF_PCI_ADDRESS_CELLS];
+	u32		parent_addr[OF_PCI_ADDRESS_CELLS];
+	u32		size[OF_PCI_SIZE_CELLS];
+};
+
+#define OF_PCI_ADDR_SPACE_IO		0x1
+#define OF_PCI_ADDR_SPACE_MEM32		0x2
+#define OF_PCI_ADDR_SPACE_MEM64		0x3
+
+#define OF_PCI_ADDR_FIELD_NONRELOC	BIT(31)
+#define OF_PCI_ADDR_FIELD_SS		GENMASK(25, 24)
+#define OF_PCI_ADDR_FIELD_PREFETCH	BIT(30)
+#define OF_PCI_ADDR_FIELD_BUS		GENMASK(23, 16)
+#define OF_PCI_ADDR_FIELD_DEV		GENMASK(15, 11)
+#define OF_PCI_ADDR_FIELD_FUNC		GENMASK(10, 8)
+#define OF_PCI_ADDR_FIELD_REG		GENMASK(7, 0)
+
+enum of_pci_prop_compatible {
+	PROP_COMPAT_PCI_VVVV_DDDD,
+	PROP_COMPAT_PCICLASS_CCSSPP,
+	PROP_COMPAT_PCICLASS_CCSS,
+	PROP_COMPAT_NUM,
+};
+
+static void of_pci_set_address(struct pci_dev *pdev, u32 *prop, u64 addr,
+			       u32 reg_num, u32 flags, bool reloc)
+{
+	prop[0] = FIELD_PREP(OF_PCI_ADDR_FIELD_BUS, pdev->bus->number) |
+		FIELD_PREP(OF_PCI_ADDR_FIELD_DEV, PCI_SLOT(pdev->devfn)) |
+		FIELD_PREP(OF_PCI_ADDR_FIELD_FUNC, PCI_FUNC(pdev->devfn));
+	prop[0] |= flags | reg_num;
+	if (!reloc) {
+		prop[0] |= OF_PCI_ADDR_FIELD_NONRELOC;
+		prop[1] = upper_32_bits(addr);
+		prop[2] = lower_32_bits(addr);
+	}
+}
+
+static int of_pci_get_addr_flags(struct resource *res, u32 *flags)
+{
+	u32 ss;
+
+	if (res->flags & IORESOURCE_IO)
+		ss = OF_PCI_ADDR_SPACE_IO;
+	else if (res->flags & IORESOURCE_MEM_64)
+		ss = OF_PCI_ADDR_SPACE_MEM64;
+	else if (res->flags & IORESOURCE_MEM)
+		ss = OF_PCI_ADDR_SPACE_MEM32;
+	else
+		return -EINVAL;
+
+	*flags = 0;
+	if (res->flags & IORESOURCE_PREFETCH)
+		*flags |= OF_PCI_ADDR_FIELD_PREFETCH;
+
+	*flags |= FIELD_PREP(OF_PCI_ADDR_FIELD_SS, ss);
+
+	return 0;
+}
+
+static int of_pci_prop_bus_range(struct pci_dev *pdev,
+				 struct of_changeset *ocs,
+				 struct device_node *np)
+{
+	u32 bus_range[] = { pdev->subordinate->busn_res.start,
+			    pdev->subordinate->busn_res.end };
+
+	return of_changeset_add_prop_u32_array(ocs, np, "bus-range", bus_range,
+					       ARRAY_SIZE(bus_range));
+}
+
+static int of_pci_prop_ranges(struct pci_dev *pdev, struct of_changeset *ocs,
+			      struct device_node *np)
+{
+	struct of_pci_range *rp;
+	struct resource *res;
+	u32 flags, num;
+	int i, j, ret;
+	u64 val64;
+
+	if (pci_is_bridge(pdev)) {
+		num = PCI_BRIDGE_RESOURCE_NUM;
+		res = &pdev->resource[PCI_BRIDGE_RESOURCES];
+	} else {
+		num = PCI_STD_NUM_BARS;
+		res = &pdev->resource[PCI_STD_RESOURCES];
+	}
+
+	rp = kcalloc(num, sizeof(*rp), GFP_KERNEL);
+	if (!rp)
+		return -ENOMEM;
+
+	for (i = 0, j = 0; j < num; j++) {
+		if (!resource_size(&res[j]))
+			continue;
+
+		if (of_pci_get_addr_flags(&res[j], &flags))
+			continue;
+
+		val64 = res[j].start;
+		of_pci_set_address(pdev, rp[i].parent_addr, val64, 0, flags,
+				   false);
+		if (pci_is_bridge(pdev)) {
+			memcpy(rp[i].child_addr, rp[i].parent_addr,
+			       sizeof(rp[i].child_addr));
+		} else {
+			/*
+			 * For endpoint device, the lower 64-bits of child
+			 * address is always zero.
+			 */
+			rp[i].child_addr[0] = j;
+		}
+
+		val64 = resource_size(&res[j]);
+		rp[i].size[0] = upper_32_bits(val64);
+		rp[i].size[1] = lower_32_bits(val64);
+
+		i++;
+	}
+
+	ret = of_changeset_add_prop_u32_array(ocs, np, "ranges", (u32 *)rp,
+					      i * sizeof(*rp) / sizeof(u32));
+	kfree(rp);
+
+	return ret;
+}
+
+static int of_pci_prop_reg(struct pci_dev *pdev, struct of_changeset *ocs,
+			   struct device_node *np)
+{
+	struct of_pci_addr_pair reg = { 0 };
+
+	/* configuration space */
+	of_pci_set_address(pdev, reg.phys_addr, 0, 0, 0, true);
+
+	return of_changeset_add_prop_u32_array(ocs, np, "reg", (u32 *)&reg,
+					       sizeof(reg) / sizeof(u32));
+}
+
+static int of_pci_prop_compatible(struct pci_dev *pdev,
+				  struct of_changeset *ocs,
+				  struct device_node *np)
+{
+	const char *compat_strs[PROP_COMPAT_NUM] = { 0 };
+	int i, ret;
+
+	compat_strs[PROP_COMPAT_PCI_VVVV_DDDD] =
+		kasprintf(GFP_KERNEL, "pci%x,%x", pdev->vendor, pdev->device);
+	compat_strs[PROP_COMPAT_PCICLASS_CCSSPP] =
+		kasprintf(GFP_KERNEL, "pciclass,%06x", pdev->class);
+	compat_strs[PROP_COMPAT_PCICLASS_CCSS] =
+		kasprintf(GFP_KERNEL, "pciclass,%04x", pdev->class >> 8);
+
+	ret = of_changeset_add_prop_string_array(ocs, np, "compatible",
+						 compat_strs, PROP_COMPAT_NUM);
+	for (i = 0; i < PROP_COMPAT_NUM; i++)
+		kfree(compat_strs[i]);
+
+	return ret;
+}
+
+int of_pci_add_properties(struct pci_dev *pdev, struct of_changeset *ocs,
+			  struct device_node *np)
+{
+	int ret;
+	/*
+	 * The added properties will be released when the
+	 * changeset is destroyed.
+	 */
+	if (pci_is_bridge(pdev)) {
+		ret = of_changeset_add_prop_string(ocs, np, "device_type",
+						   "pci");
+		if (ret)
+			return ret;
+
+		ret = of_pci_prop_bus_range(pdev, ocs, np);
+		if (ret)
+			return ret;
+	}
+
+	ret = of_changeset_add_empty_prop(ocs, np, "dynamic");
+	if (ret)
+		return ret;
+
+	ret = of_pci_prop_ranges(pdev, ocs, np);
+	if (ret)
+		return ret;
+
+	ret = of_changeset_add_prop_u32(ocs, np, "#address-cells",
+					OF_PCI_ADDRESS_CELLS);
+	if (ret)
+		return ret;
+
+	ret = of_changeset_add_prop_u32(ocs, np, "#size-cells",
+					OF_PCI_SIZE_CELLS);
+	if (ret)
+		return ret;
+
+	ret = of_pci_prop_reg(pdev, ocs, np);
+	if (ret)
+		return ret;
+
+	ret = of_pci_prop_compatible(pdev, ocs, np);
+	if (ret)
+		return ret;
+
+	return 0;
+}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 2475098f6518..686836afee1c 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -678,6 +678,21 @@  static inline int devm_of_pci_bridge_init(struct device *dev, struct pci_host_br
 
 #endif /* CONFIG_OF */
 
+struct of_changeset;
+
+#ifdef CONFIG_PCI_DYNAMIC_OF_NODES
+void of_pci_make_dev_node(struct pci_dev *pdev);
+void of_pci_remove_node(struct pci_dev *pdev);
+int of_pci_add_properties(struct pci_dev *pdev, struct of_changeset *ocs,
+			  struct device_node *np);
+#else
+static inline void
+of_pci_make_dev_node(struct pci_dev *pdev) { }
+
+static inline void
+of_pci_remove_node(struct pci_dev *pdev) { }
+#endif
+
 #ifdef CONFIG_PCIEAER
 void pci_no_aer(void);
 void pci_aer_init(struct pci_dev *dev);
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index d68aee29386b..d749ea8250d6 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -22,6 +22,7 @@  static void pci_stop_dev(struct pci_dev *dev)
 		device_release_driver(&dev->dev);
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);
+		of_pci_remove_node(dev);
 
 		pci_dev_assign_added(dev, false);
 	}