driver core: improve cycle detection on fwnode graph

Message ID 20240124084636.1415652-1-xu.yang_2@nxp.com
State New
Headers
Series driver core: improve cycle detection on fwnode graph |

Commit Message

Xu Yang Jan. 24, 2024, 8:46 a.m. UTC
  Currently, cycle detection on fwnode graph is still defective.
Such as fwnode link A.EP->B is not marked as cycle in below case:

                 +-----+
                 |     |
 +-----+         |  +--|
 |     |<-----------|EP|
 |--+  |         |  +--|
 |EP|----------->|     |
 |--+  |         |  B  |
 |     |         +-----+
 |  A  |            ^
 +-----+   +-----+  |
    |      |     |  |
    +----->|  C  |--+
           |     |
           +-----+

1. Node C is populated as device C. But nodes A and B are still not
   populated. When do cycle detection with device C, no cycle is found.
2. Node B is populated as device B. When do cycle detection with device
   B, it found a link cycle B.EP->A->C->B. Then, fwnode link B.EP->A,
   A->C and C->B are marked as cycle. The fwnode link C->B is converted
   to device link too.
3. Node A is populated as device A. When do cycle detection with device
   A, it find A->C is marked as cycle and convert it to device link. It
   also find B.EP->A is marked as cycle but will not convert it to device
   link since node B.EP is not a device.

Finally, fwnode link C->B and A->C is removed, B.EP->A is only marked as
cycle and A.EP->B is neither been marked as cycle nor removed.

For fwnode graph, the endpoint node can only be a supplier of other node
and the endpoint node will never be populated as device. Therefore, when
creating device link to supplier for fwnode graph, we need to relax cycle
with the real node rather endpoint node.

Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
---
 drivers/base/core.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Saravana Kannan Jan. 25, 2024, 2:55 a.m. UTC | #1
On Wed, Jan 24, 2024 at 12:40 AM Xu Yang <xu.yang_2@nxp.com> wrote:
>
> Currently, cycle detection on fwnode graph is still defective.
> Such as fwnode link A.EP->B is not marked as cycle in below case:
>
>                  +-----+
>                  |     |
>  +-----+         |  +--|
>  |     |<-----------|EP|
>  |--+  |         |  +--|
>  |EP|----------->|     |
>  |--+  |         |  B  |
>  |     |         +-----+
>  |  A  |            ^
>  +-----+   +-----+  |
>     |      |     |  |
>     +----->|  C  |--+
>            |     |
>            +-----+
>
> 1. Node C is populated as device C. But nodes A and B are still not
>    populated. When do cycle detection with device C, no cycle is found.
> 2. Node B is populated as device B. When do cycle detection with device
>    B, it found a link cycle B.EP->A->C->B. Then, fwnode link B.EP->A,
>    A->C and C->B are marked as cycle. The fwnode link C->B is converted
>    to device link too.
> 3. Node A is populated as device A. When do cycle detection with device
>    A, it find A->C is marked as cycle and convert it to device link. It
>    also find B.EP->A is marked as cycle but will not convert it to device
>    link since node B.EP is not a device.

Your example doesn't sound correct (I'l explain further down) and it
is vague. Need a couple of clarifications first.

1. What is the ---> representing? Is it references in DT or fwnode
links? Which end of the arrow is the consumer? The tail or the pointy
end? I typically use the format consumer --> supplier.

2. You say "link" sometimes but it's not clear if you mean fwnode
links or device links. So please be explicit about it.

3. Your statement "Such as fwnode link A.EP->B is not marked as cycle"
doesn't sound correct. When remote-endpoint properties are parsed, the
fwnode is created from the device node with compatible property to the
destination. So A.EP ----> B can't exist if I assume the consumer -->
supplier format.

4. Has this actually caused an issue? If so, what is it? And give me
an example in an upstream DT.

Btw, I definitely don't anticipate ACKing this patch because the cycle
detection code shouldn't be having property specific logic. It's not
even DT specific in this place. If there is an issue and it needs
fixing, it should be where the fwnode links are created. But then
again I'm not sure what the actual symptom we are trying to solve is.


-Saravana

>
> Finally, fwnode link C->B and A->C is removed, B.EP->A is only marked as
> cycle and A.EP->B is neither been marked as cycle nor removed.
>
> For fwnode graph, the endpoint node can only be a supplier of other node
> and the endpoint node will never be populated as device. Therefore, when
> creating device link to supplier for fwnode graph, we need to relax cycle
> with the real node rather endpoint node.
>
> Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
> ---
>  drivers/base/core.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 14d46af40f9a..278ded6cd3ce 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2217,6 +2217,9 @@ static void __fw_devlink_link_to_suppliers(struct device *dev,
>                 int ret;
>                 struct fwnode_handle *sup = link->supplier;
>
> +               if (fwnode_graph_is_endpoint(sup))
> +                       sup = fwnode_graph_get_port_parent(sup);
> +
>                 ret = fw_devlink_create_devlink(dev, sup, link);
>                 if (!own_link || ret == -EAGAIN)
>                         continue;
> --
> 2.34.1
>
  
Xu Yang Jan. 25, 2024, 4:21 a.m. UTC | #2
Hi Saravana,

> 
> On Wed, Jan 24, 2024 at 12:40 AM Xu Yang <xu.yang_2@nxp.com> wrote:
> >
> > Currently, cycle detection on fwnode graph is still defective.
> > Such as fwnode link A.EP->B is not marked as cycle in below case:
> >
> >                  +-----+
> >                  |     |
> >  +-----+         |  +--|
> >  |     |<-----------|EP|
> >  |--+  |         |  +--|
> >  |EP|----------->|     |
> >  |--+  |         |  B  |
> >  |     |         +-----+
> >  |  A  |            ^
> >  +-----+   +-----+  |
> >     |      |     |  |
> >     +----->|  C  |--+
> >            |     |
> >            +-----+
> >
> > 1. Node C is populated as device C. But nodes A and B are still not
> >    populated. When do cycle detection with device C, no cycle is found.
> > 2. Node B is populated as device B. When do cycle detection with device
> >    B, it found a link cycle B.EP->A->C->B. Then, fwnode link B.EP->A,
> >    A->C and C->B are marked as cycle. The fwnode link C->B is converted
> >    to device link too.
> > 3. Node A is populated as device A. When do cycle detection with device
> >    A, it find A->C is marked as cycle and convert it to device link. It
> >    also find B.EP->A is marked as cycle but will not convert it to device
> >    link since node B.EP is not a device.
> 
> Your example doesn't sound correct (I'l explain further down) and it
> is vague. Need a couple of clarifications first.
> 
> 1. What is the ---> representing? Is it references in DT or fwnode
> links? Which end of the arrow is the consumer? The tail or the pointy
> end? I typically use the format consumer --> supplier.

Sorry, I represent "-->" as "supplier --> consumer" and it's a fwnode link.

> 
> 2. You say "link" sometimes but it's not clear if you mean fwnode
> links or device links. So please be explicit about it.

It’s fwnode link by default.

> 
> 3. Your statement "Such as fwnode link A.EP->B is not marked as cycle"
> doesn't sound correct. When remote-endpoint properties are parsed, the
> fwnode is created from the device node with compatible property to the
> destination. So A.EP ----> B can't exist if I assume the consumer -->
> supplier format.

The fwnode is not created from the device node with compatible property
since below commit. The endpoint node is the supplier. No, you can see my
case later.

4a032827daa8 (of: property: Simplify of_link_to_phandle(), 2023-02-06)

> 
> 4. Has this actually caused an issue? If so, what is it? And give me
> an example in an upstream DT.

Yes, there are two cycles (B.EP->A->C->B and B.EP->A/A.EP->B) in above
example. But only one cycle (B.EP->A->C->B) is recognized.

My real case as below:
---
tcpc@50 {
    compatible = "nxp,ptn5110";
    ...

    port {
        typec_dr_sw: endpoint {
            remote-endpoint = <&usb3_drd_sw>;
        };
    };    
};

usb@38100000 {
    compatible = "snps,dwc3";
    phys = <&usb3_phy0>, <&usb3_phy0>;
    ...

    port {
        usb3_drd_sw: endpoint {
            remote-endpoint = <&typec_dr_sw>;
        };
    };
};

usb3_phy0: usb-phy@381f0040 {
    compatible = "fsl,imx8mp-usb-phy";

    ...
};

And fwnode links are created as below:
---
[    0.059553] /soc@0/bus@30800000/i2c@30a30000/tcpc@50 Linked as a fwnode consumer to /soc@0/usb@32f10100/usb@38100000/port/endpoint
[    0.066365] /soc@0/usb-phy@381f0040 Linked as a fwnode consumer to /soc@0/bus@30800000/i2c@30a30000/tcpc@50
[    0.066624] /soc@0/usb@32f10100/usb@38100000 Linked as a fwnode consumer to /soc@0/usb-phy@381f0040
[    0.066702] /soc@0/usb@32f10100/usb@38100000 Linked as a fwnode consumer to /soc@0/bus@30800000/i2c@30a30000/tcpc@50/port/endpoint

> 
> Btw, I definitely don't anticipate ACKing this patch because the cycle
> detection code shouldn't be having property specific logic. It's not
> even DT specific in this place. If there is an issue and it needs
> fixing, it should be where the fwnode links are created. But then
> again I'm not sure what the actual symptom we are trying to solve is.

Sorry for the inconvenience. I saw that you push some patches about fwnode
link and device link handling, so I think you may understand this issue
well and give some suggestions.

Thanks,
Xu Yang
  
Saravana Kannan Jan. 26, 2024, 2:08 a.m. UTC | #3
On Wed, Jan 24, 2024 at 8:21 PM Xu Yang <xu.yang_2@nxp.com> wrote:
>
> Hi Saravana,
>
> >
> > On Wed, Jan 24, 2024 at 12:40 AM Xu Yang <xu.yang_2@nxp.com> wrote:
> > >
> > > Currently, cycle detection on fwnode graph is still defective.
> > > Such as fwnode link A.EP->B is not marked as cycle in below case:
> > >
> > >                  +-----+
> > >                  |     |
> > >  +-----+         |  +--|
> > >  |     |<-----------|EP|
> > >  |--+  |         |  +--|
> > >  |EP|----------->|     |
> > >  |--+  |         |  B  |
> > >  |     |         +-----+
> > >  |  A  |            ^
> > >  +-----+   +-----+  |
> > >     |      |     |  |
> > >     +----->|  C  |--+
> > >            |     |
> > >            +-----+
> > >
> > > 1. Node C is populated as device C. But nodes A and B are still not
> > >    populated. When do cycle detection with device C, no cycle is found.
> > > 2. Node B is populated as device B. When do cycle detection with device
> > >    B, it found a link cycle B.EP->A->C->B. Then, fwnode link B.EP->A,
> > >    A->C and C->B are marked as cycle. The fwnode link C->B is converted
> > >    to device link too.
> > > 3. Node A is populated as device A. When do cycle detection with device
> > >    A, it find A->C is marked as cycle and convert it to device link. It
> > >    also find B.EP->A is marked as cycle but will not convert it to device
> > >    link since node B.EP is not a device.
> >
> > Your example doesn't sound correct (I'l explain further down) and it
> > is vague. Need a couple of clarifications first.
> >
> > 1. What is the ---> representing? Is it references in DT or fwnode
> > links? Which end of the arrow is the consumer? The tail or the pointy
> > end? I typically use the format consumer --> supplier.
>
> Sorry, I represent "-->" as "supplier --> consumer" and it's a fwnode link.
>
> >
> > 2. You say "link" sometimes but it's not clear if you mean fwnode
> > links or device links. So please be explicit about it.
>
> It’s fwnode link by default.
>
> >
> > 3. Your statement "Such as fwnode link A.EP->B is not marked as cycle"
> > doesn't sound correct. When remote-endpoint properties are parsed, the
> > fwnode is created from the device node with compatible property to the
> > destination. So A.EP ----> B can't exist if I assume the consumer -->
> > supplier format.
>
> The fwnode is not created from the device node with compatible property
> since below commit. The endpoint node is the supplier. No, you can see my
> case later.
>
> 4a032827daa8 (of: property: Simplify of_link_to_phandle(), 2023-02-06)

I think my confusion was because you use ----> in the opposite way to
what I have used for all my fw_devlink and cycle detection patches.

The part I was referring to is related to how driver/of/property.c has
node_not_dev set to true for pasrse_remote_endpoint.

> >
> > 4. Has this actually caused an issue? If so, what is it? And give me
> > an example in an upstream DT.
>
> Yes, there are two cycles (B.EP->A->C->B and B.EP->A/A.EP->B) in above
> example. But only one cycle (B.EP->A->C->B) is recognized.
>
> My real case as below:

I think you still missed some details because usb3_phy0 seems
irrelevant here. Can you just point me to the dts (not dtsi) file for
this platform in the kernel tree?
Also, can you change all the pr_debug and dev_dbg in
drivers/base/core.c to their info equivalent and boot up the system
and give me the logs? That'll be a lot easier for me to understand
your case.

> ---
> tcpc@50 {
>     compatible = "nxp,ptn5110";
>     ...
>
>     port {
>         typec_dr_sw: endpoint {
>             remote-endpoint = <&usb3_drd_sw>;
>         };
>     };
> };
>
> usb@38100000 {
>     compatible = "snps,dwc3";
>     phys = <&usb3_phy0>, <&usb3_phy0>;
>     ...
>
>     port {
>         usb3_drd_sw: endpoint {
>             remote-endpoint = <&typec_dr_sw>;
>         };
>     };
> };
>
> usb3_phy0: usb-phy@381f0040 {
>     compatible = "fsl,imx8mp-usb-phy";
>
>     ...
> };
>
> And fwnode links are created as below:
> ---
> [    0.059553] /soc@0/bus@30800000/i2c@30a30000/tcpc@50 Linked as a fwnode consumer to /soc@0/usb@32f10100/usb@38100000/port/endpoint
> [    0.066365] /soc@0/usb-phy@381f0040 Linked as a fwnode consumer to /soc@0/bus@30800000/i2c@30a30000/tcpc@50
> [    0.066624] /soc@0/usb@32f10100/usb@38100000 Linked as a fwnode consumer to /soc@0/usb-phy@381f0040
> [    0.066702] /soc@0/usb@32f10100/usb@38100000 Linked as a fwnode consumer to /soc@0/bus@30800000/i2c@30a30000/tcpc@50/port/endpoint
>

So let's say I see your logs and what you say is true, but you still
aren't telling me what's the problem you have because of this
incorrect cycle detection. What's breaking? Is something not allowed
to probe? If so, which one? What's supposed to be the right order of
probes?

> >
> > Btw, I definitely don't anticipate ACKing this patch because the cycle
> > detection code shouldn't be having property specific logic. It's not
> > even DT specific in this place. If there is an issue and it needs
> > fixing, it should be where the fwnode links are created. But then
> > again I'm not sure what the actual symptom we are trying to solve is.
>
> Sorry for the inconvenience. I saw that you push some patches about fwnode
> link and device link handling, so I think you may understand this issue
> well and give some suggestions.

No worries at all. Thanks for reporting the issue and thanks for
trying to fix it.

-Saravana
  
Xu Yang Jan. 26, 2024, 9 a.m. UTC | #4
Hi Saravana,

> 
> On Wed, Jan 24, 2024 at 8:21 PM Xu Yang <xu.yang_2@nxp.com> wrote:
> >
> I think my confusion was because you use ----> in the opposite way to
> what I have used for all my fw_devlink and cycle detection patches.

Okay, I will follow the usage of "-->" later as yours.

> 
> The part I was referring to is related to how driver/of/property.c has
> node_not_dev set to true for pasrse_remote_endpoint.
> 
> > >
> > > 4. Has this actually caused an issue? If so, what is it? And give me
> > > an example in an upstream DT.
> >
> > Yes, there are two cycles (B.EP->A->C->B and B.EP->A/A.EP->B) in above
> > example. But only one cycle (B.EP->A->C->B) is recognized.
> >
> > My real case as below:
> 
> I think you still missed some details because usb3_phy0 seems

One line is indeed missing in usb3_phy0.

> irrelevant here. Can you just point me to the dts (not dtsi) file for
> this platform in the kernel tree?

This parts of dts is not in upstream kernel tree due to some reasons.
Allow me to show the necessary parts as below again, you can also
get the full dts file from the link I attached below:

---
ptn5110: tcpc@50 {
    compatible = "nxp,ptn5110";
    ...

    port {
        typec_dr_sw: endpoint {
            remote-endpoint = <&usb3_drd_sw>;
        };
    };
};

usb_dwc3_0: usb@38100000 {
    compatible = "snps,dwc3";
    phys = <&usb3_phy0>, <&usb3_phy0>;
    ...

    port {
        usb3_drd_sw: endpoint {
            remote-endpoint = <&typec_dr_sw>;
        };
    };
};

usb3_phy0: usb-phy@381f0040 {
    compatible = "fsl,imx8mp-usb-phy";
    vbus-power-supply = <&ptn5110>;

    ...
};

> Also, can you change all the pr_debug and dev_dbg in
> drivers/base/core.c to their info equivalent and boot up the system
> and give me the logs? That'll be a lot easier for me to understand
> your case.

Thank you for willing to debug this issue.
The boot log and dts file is under: 
https://drive.google.com/drive/folders/1hlkzg042q5_b5l59DCW2pECXRmTH4Vy_?usp=sharing

> 
> So let's say I see your logs and what you say is true, but you still
> aren't telling me what's the problem you have because of this
> incorrect cycle detection. What's breaking? Is something not allowed
> to probe? If so, which one? What's supposed to be the right order of
> probes?
> 

Let me describe the issue again based on above log and dts:

                    usb
                  +-----+
   tcpc           |     |
  +-----+         |  +--|
  |     |----------->|EP|
  |--+  |         |  +--|
  |EP|<-----------|     |
  |--+  |         |  B  |
  |     |         +-----+
  |  A  |            |
  +-----+            |
     ^     +-----+   |
     |     |     |   |
     +-----|  C  |<--+
           |     |
           +-----+
           usb-phy

Node A (tcpc) will be populated as device 1-0050.
Node B (usb) will be populated as device 38100000.usb.
Node C (usb-phy) will be populated as device 381f0040.usb-phy.

1. Node C is populated as device C. But nodes A and B are still not
   populated. When do cycle detection with device C, no cycle is found.
2. Node B is populated as device B. When do cycle detection with device
   B, it found a fwnode link cycle B-->C-->A-->B.EP. Then, fwnode link
   A-->B.EP, C-->A and B-->C are marked as cycle. The fwnode link B-->C
   is converted to device link too.
3. Node A is populated as device A. When do cycle detection with device
   A, it find C-->A is marked as cycle and convert it to device link. It
   also find A-->B.EP is marked as cycle but will not convert it to device
   link since node B.EP is not a device.

Finally, fwnode link B-->C and C-->A is removed, A-->B.EP is only marked
as cycle and B-->A.EP is neither been marked as cycle nor removed.

So there are 2 cycles and only the first cycle is detected.
1. B-->C-->A-->B.EP--B
2. B-->A.EP--A-->B.EP--B

In the end, device 38100000.usb (node B) is defered probe due to node B
still has a supplier node A.EP. 
Device 1-0050 (node A) is also defered probe due to it depends on one device
which is created by 38100000.usb.

The normal behavior is all of the devices can be successfully probed after two
cycles are detected.

Thanks,
Xu Yang
  
Saravana Kannan Jan. 30, 2024, 3:10 a.m. UTC | #5
On Fri, Jan 26, 2024 at 1:00 AM Xu Yang <xu.yang_2@nxp.com> wrote:
>
> Hi Saravana,
>
> >
> > On Wed, Jan 24, 2024 at 8:21 PM Xu Yang <xu.yang_2@nxp.com> wrote:
> > >
> > I think my confusion was because you use ----> in the opposite way to
> > what I have used for all my fw_devlink and cycle detection patches.
>
> Okay, I will follow the usage of "-->" later as yours.
>
> >
> > The part I was referring to is related to how driver/of/property.c has
> > node_not_dev set to true for pasrse_remote_endpoint.
> >
> > > >
> > > > 4. Has this actually caused an issue? If so, what is it? And give me
> > > > an example in an upstream DT.
> > >
> > > Yes, there are two cycles (B.EP->A->C->B and B.EP->A/A.EP->B) in above
> > > example. But only one cycle (B.EP->A->C->B) is recognized.
> > >
> > > My real case as below:
> >
> > I think you still missed some details because usb3_phy0 seems
>
> One line is indeed missing in usb3_phy0.
>
> > irrelevant here. Can you just point me to the dts (not dtsi) file for
> > this platform in the kernel tree?
>
> This parts of dts is not in upstream kernel tree due to some reasons.
> Allow me to show the necessary parts as below again, you can also
> get the full dts file from the link I attached below:
>
> ---
> ptn5110: tcpc@50 {
>     compatible = "nxp,ptn5110";
>     ...
>
>     port {
>         typec_dr_sw: endpoint {
>             remote-endpoint = <&usb3_drd_sw>;
>         };
>     };
> };
>
> usb_dwc3_0: usb@38100000 {
>     compatible = "snps,dwc3";
>     phys = <&usb3_phy0>, <&usb3_phy0>;
>     ...
>
>     port {
>         usb3_drd_sw: endpoint {
>             remote-endpoint = <&typec_dr_sw>;
>         };
>     };
> };
>
> usb3_phy0: usb-phy@381f0040 {
>     compatible = "fsl,imx8mp-usb-phy";
>     vbus-power-supply = <&ptn5110>;
>
>     ...
> };
>
> > Also, can you change all the pr_debug and dev_dbg in
> > drivers/base/core.c to their info equivalent and boot up the system
> > and give me the logs? That'll be a lot easier for me to understand
> > your case.
>
> Thank you for willing to debug this issue.
> The boot log and dts file is under:
> https://drive.google.com/drive/folders/1hlkzg042q5_b5l59DCW2pECXRmTH4Vy_?usp=sharing
>
> >
> > So let's say I see your logs and what you say is true, but you still
> > aren't telling me what's the problem you have because of this
> > incorrect cycle detection. What's breaking? Is something not allowed
> > to probe? If so, which one? What's supposed to be the right order of
> > probes?
> >
>
> Let me describe the issue again based on above log and dts:
>
>                     usb
>                   +-----+
>    tcpc           |     |
>   +-----+         |  +--|
>   |     |----------->|EP|
>   |--+  |         |  +--|
>   |EP|<-----------|     |
>   |--+  |         |  B  |
>   |     |         +-----+
>   |  A  |            |
>   +-----+            |
>      ^     +-----+   |
>      |     |     |   |
>      +-----|  C  |<--+
>            |     |
>            +-----+
>            usb-phy
>
> Node A (tcpc) will be populated as device 1-0050.
> Node B (usb) will be populated as device 38100000.usb.
> Node C (usb-phy) will be populated as device 381f0040.usb-phy.
>
> 1. Node C is populated as device C. But nodes A and B are still not
>    populated. When do cycle detection with device C, no cycle is found.
> 2. Node B is populated as device B. When do cycle detection with device
>    B, it found a fwnode link cycle B-->C-->A-->B.EP. Then, fwnode link
>    A-->B.EP, C-->A and B-->C are marked as cycle. The fwnode link B-->C
>    is converted to device link too.
> 3. Node A is populated as device A. When do cycle detection with device
>    A, it find C-->A is marked as cycle and convert it to device link. It
>    also find A-->B.EP is marked as cycle but will not convert it to device
>    link since node B.EP is not a device.
>
> Finally, fwnode link B-->C and C-->A is removed, A-->B.EP is only marked
> as cycle and B-->A.EP is neither been marked as cycle nor removed.
>
> So there are 2 cycles and only the first cycle is detected.
> 1. B-->C-->A-->B.EP--B
> 2. B-->A.EP--A-->B.EP--B
>
> In the end, device 38100000.usb (node B) is defered probe due to node B
> still has a supplier node A.EP.
> Device 1-0050 (node A) is also defered probe due to it depends on one device
> which is created by 38100000.usb.
>
> The normal behavior is all of the devices can be successfully probed after two
> cycles are detected.
>

This took me several hours to debug this and I almost gave you the
"wrong" fix. A fix to create fwnode links between A --> and B --> A in
your example and remove EPs from the loop. But when typing up the
commit text, I realized what I was saying wasn't correct because this
cycle detection works fine if you don't have "C" in the example. Yet
again this bug comes down to my attempt to optimize some "unnecessary"
cycle detection logic that ended up being necessary.

Here's a test patch that I'm 99% sure will fix your issue. Please give
it a shot and let me know. After that, I need to run some more local
tests to make sure I'm not messing anything else up, clean up some
redundant logging, and then I can send a proper fix upstream.

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 14d46af40f9a..75203ccc96f6 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2060,7 +2060,6 @@ static int fw_devlink_create_devlink(struct device *con,
         * SYNC_STATE_ONLY device links don't block probing and supports cycles.
         * So cycle detection isn't necessary and shouldn't be done.
         */
-       if (!(flags & DL_FLAG_SYNC_STATE_ONLY)) {
                device_links_write_lock();
                if (__fw_devlink_relax_cycles(con, sup_handle)) {
                        __fwnode_link_cycle(link);
@@ -2069,7 +2068,6 @@ static int fw_devlink_create_devlink(struct device *con,
                                 sup_handle);
                }
                device_links_write_unlock();
-       }

        if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE)
                sup_dev = fwnode_get_next_parent_dev(sup_handle);

Thanks,
Saravana
  
Xu Yang Jan. 30, 2024, 3:39 a.m. UTC | #6
Hi Saravana,

>
> On Fri, Jan 26, 2024 at 1:00 AM Xu Yang <xu.yang_2@nxp.com> wrote:
> >
> > Hi Saravana,
> >
> > >
> > > On Wed, Jan 24, 2024 at 8:21 PM Xu Yang <xu.yang_2@nxp.com> wrote:
> > > >
> > > I think my confusion was because you use ----> in the opposite way to
> > > what I have used for all my fw_devlink and cycle detection patches.
> >
> > Okay, I will follow the usage of "-->" later as yours.
> >
> > >
> > > The part I was referring to is related to how driver/of/property.c has
> > > node_not_dev set to true for pasrse_remote_endpoint.
> > >
> > > > >
> > > > > 4. Has this actually caused an issue? If so, what is it? And give me
> > > > > an example in an upstream DT.
> > > >
> > > > Yes, there are two cycles (B.EP->A->C->B and B.EP->A/A.EP->B) in above
> > > > example. But only one cycle (B.EP->A->C->B) is recognized.
> > > >
> > > > My real case as below:
> > >
> > > I think you still missed some details because usb3_phy0 seems
> >
> > One line is indeed missing in usb3_phy0.
> >
> > > irrelevant here. Can you just point me to the dts (not dtsi) file for
> > > this platform in the kernel tree?
> >
> > This parts of dts is not in upstream kernel tree due to some reasons.
> > Allow me to show the necessary parts as below again, you can also
> > get the full dts file from the link I attached below:
> >
> > ---
> > ptn5110: tcpc@50 {
> >     compatible = "nxp,ptn5110";
> >     ...
> >
> >     port {
> >         typec_dr_sw: endpoint {
> >             remote-endpoint = <&usb3_drd_sw>;
> >         };
> >     };
> > };
> >
> > usb_dwc3_0: usb@38100000 {
> >     compatible = "snps,dwc3";
> >     phys = <&usb3_phy0>, <&usb3_phy0>;
> >     ...
> >
> >     port {
> >         usb3_drd_sw: endpoint {
> >             remote-endpoint = <&typec_dr_sw>;
> >         };
> >     };
> > };
> >
> > usb3_phy0: usb-phy@381f0040 {
> >     compatible = "fsl,imx8mp-usb-phy";
> >     vbus-power-supply = <&ptn5110>;
> >
> >     ...
> > };
> >
> > > Also, can you change all the pr_debug and dev_dbg in
> > > drivers/base/core.c to their info equivalent and boot up the system
> > > and give me the logs? That'll be a lot easier for me to understand
> > > your case.
> >
> > Thank you for willing to debug this issue.
> > The boot log and dts file is under:
> >
> https://drive.google.com/drive/folders/1hlkzg042
> q5_b5l59DCW2pECXRmTH4Vy_%3Fusp%3Dsharing&data=05%7C02%7Cxu.yang_2%40nxp.com%7Ca3d2f5c60e58402ba7a30
> 8dc21411d1b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638421810737996636%7CUnknown%7CTWFpbGZsb3d
> 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=yhw729d%2F5%2
> BKwoHcpjb0Bqcv%2BhhtGEQ75zE0N2d2Agac%3D&reserved=0
> >
> > >
> > > So let's say I see your logs and what you say is true, but you still
> > > aren't telling me what's the problem you have because of this
> > > incorrect cycle detection. What's breaking? Is something not allowed
> > > to probe? If so, which one? What's supposed to be the right order of
> > > probes?
> > >
> >
> > Let me describe the issue again based on above log and dts:
> >
> >                     usb
> >                   +-----+
> >    tcpc           |     |
> >   +-----+         |  +--|
> >   |     |----------->|EP|
> >   |--+  |         |  +--|
> >   |EP|<-----------|     |
> >   |--+  |         |  B  |
> >   |     |         +-----+
> >   |  A  |            |
> >   +-----+            |
> >      ^     +-----+   |
> >      |     |     |   |
> >      +-----|  C  |<--+
> >            |     |
> >            +-----+
> >            usb-phy
> >
> > Node A (tcpc) will be populated as device 1-0050.
> > Node B (usb) will be populated as device 38100000.usb.
> > Node C (usb-phy) will be populated as device 381f0040.usb-phy.
> >
> > 1. Node C is populated as device C. But nodes A and B are still not
> >    populated. When do cycle detection with device C, no cycle is found.
> > 2. Node B is populated as device B. When do cycle detection with device
> >    B, it found a fwnode link cycle B-->C-->A-->B.EP. Then, fwnode link
> >    A-->B.EP, C-->A and B-->C are marked as cycle. The fwnode link B-->C
> >    is converted to device link too.
> > 3. Node A is populated as device A. When do cycle detection with device
> >    A, it find C-->A is marked as cycle and convert it to device link. It
> >    also find A-->B.EP is marked as cycle but will not convert it to device
> >    link since node B.EP is not a device.
> >
> > Finally, fwnode link B-->C and C-->A is removed, A-->B.EP is only marked
> > as cycle and B-->A.EP is neither been marked as cycle nor removed.
> >
> > So there are 2 cycles and only the first cycle is detected.
> > 1. B-->C-->A-->B.EP--B
> > 2. B-->A.EP--A-->B.EP--B
> >
> > In the end, device 38100000.usb (node B) is defered probe due to node B
> > still has a supplier node A.EP.
> > Device 1-0050 (node A) is also defered probe due to it depends on one device
> > which is created by 38100000.usb.
> >
> > The normal behavior is all of the devices can be successfully probed after two
> > cycles are detected.
> >
>
> This took me several hours to debug this and I almost gave you the
> "wrong" fix. A fix to create fwnode links between A --> and B --> A in
> your example and remove EPs from the loop. But when typing up the
> commit text, I realized what I was saying wasn't correct because this
> cycle detection works fine if you don't have "C" in the example. Yet
> again this bug comes down to my attempt to optimize some "unnecessary"
> cycle detection logic that ended up being necessary.
>
> Here's a test patch that I'm 99% sure will fix your issue. Please give
> it a shot and let me know. After that, I need to run some more local
> tests to make sure I'm not messing anything else up, clean up some
> redundant logging, and then I can send a proper fix upstream.
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 14d46af40f9a..75203ccc96f6 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2060,7 +2060,6 @@ static int fw_devlink_create_devlink(struct device *con,
>          * SYNC_STATE_ONLY device links don't block probing and supports cycles.
>          * So cycle detection isn't necessary and shouldn't be done.
>          */
> -       if (!(flags & DL_FLAG_SYNC_STATE_ONLY)) {
>                 device_links_write_lock();
>                 if (__fw_devlink_relax_cycles(con, sup_handle)) {
>                         __fwnode_link_cycle(link);
> @@ -2069,7 +2068,6 @@ static int fw_devlink_create_devlink(struct device *con,
>                                  sup_handle);
>                 }
>                 device_links_write_unlock();
> -       }
>
>         if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE)
>                 sup_dev = fwnode_get_next_parent_dev(sup_handle);
>

It works now. All of these devices are probed correctly on my board.
Thanks for your input!

Best Regards,
Xu Yang
  

Patch

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 14d46af40f9a..278ded6cd3ce 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2217,6 +2217,9 @@  static void __fw_devlink_link_to_suppliers(struct device *dev,
 		int ret;
 		struct fwnode_handle *sup = link->supplier;
 
+		if (fwnode_graph_is_endpoint(sup))
+			sup = fwnode_graph_get_port_parent(sup);
+
 		ret = fw_devlink_create_devlink(dev, sup, link);
 		if (!own_link || ret == -EAGAIN)
 			continue;