[v2,2/3] of: property: Improve finding the supplier of a remote-endpoint property

Message ID 20240207011803.2637531-3-saravanak@google.com
State New
Headers
Series Improve remote-endpoint parsing |

Commit Message

Saravana Kannan Feb. 7, 2024, 1:18 a.m. UTC
  After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"),
remote-endpoint properties created a fwnode link from the consumer device
to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
trying to create device links or detecting cycles. So, improve this the
same way we improved finding the consumer of a remote-endpoint property.

Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/of/property.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)
  

Comments

Saravana Kannan Feb. 24, 2024, 1:35 a.m. UTC | #1
On Fri, Feb 23, 2024 at 8:18 AM Luca Ceresoli <luca.ceresoli@bootlin.com> wrote:
>
> Hello Saravana,
>
> [+cc Hervé Codina]
>
> On Tue,  6 Feb 2024 17:18:01 -0800
> Saravana Kannan <saravanak@google.com> wrote:
>
> > After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"),
> > remote-endpoint properties created a fwnode link from the consumer device
> > to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
> > trying to create device links or detecting cycles. So, improve this the
> > same way we improved finding the consumer of a remote-endpoint property.
> >
> > Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> After rebasing my own branch on v6.8-rc5 from v6.8-rc1 I started
> getting unexpected warnings during device tree overlay removal. After a
> somewhat painful bisection I identified this patch as the one that
> triggers it all.

Thanks for the report.

>
> > ---
> > --- a/drivers/of/property.c
> > +++ b/drivers/of/property.c
> > @@ -1232,7 +1232,6 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
> >  DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
> >  DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
> >  DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> > -DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
> >  DEFINE_SIMPLE_PROP(pwms, "pwms", "#pwm-cells")
> >  DEFINE_SIMPLE_PROP(resets, "resets", "#reset-cells")
> >  DEFINE_SIMPLE_PROP(leds, "leds", NULL)
> > @@ -1298,6 +1297,17 @@ static struct device_node *parse_interrupts(struct device_node *np,
> >       return of_irq_parse_one(np, index, &sup_args) ? NULL : sup_args.np;
> >  }
> >
> > +static struct device_node *parse_remote_endpoint(struct device_node *np,
> > +                                              const char *prop_name,
> > +                                              int index)
> > +{
> > +     /* Return NULL for index > 0 to signify end of remote-endpoints. */
> > +     if (!index || strcmp(prop_name, "remote-endpoint"))
>
> There seem to be a bug here: "!index" should be "index > 0", as the
> comment suggests. Otherwise NULL is always returned.

Ah crap, I think you are right. It should have been "index". Not
"!index". But I tested this! Sigh. I probably screwed up my testing.

Please send out a Fix for this.

Geert, we got excited too soon. :(

> I am going to send a quick patch for that, but haven't done so yet
> because it still won't solve the problem, so I wanted to open the topic
> here without further delay.
>
> Even with the 'index > 0' fix I'm still getting pretty much the same:

This part is confusing though. If I read your DT correctly, there's a
cycle between platform:panel-dsi-lvds and i2c:13-002c. And fw_devlink
should not be enforcing any ordering between those devices ever.

I'm surprised that in your "working" case, fw_devlink didn't detect
any cycle. It should have. If there's any debugging to do, that's the
one we need to debug.

>
> [   34.836781] ------------[ cut here ]------------
> [   34.841401] WARNING: CPU: 2 PID: 204 at drivers/base/devres.c:1064 devm_kfree+0x8c/0xfc
> ...
> [   35.024751] Call trace:
> [   35.027199]  devm_kfree+0x8c/0xfc
> [   35.030520]  devm_drm_panel_bridge_release+0x54/0x64 [drm_kms_helper]
> [   35.036990]  devres_release_group+0xe0/0x164
> [   35.041264]  i2c_device_remove+0x38/0x9c
> [   35.045196]  device_remove+0x4c/0x80
> [   35.048774]  device_release_driver_internal+0x1d4/0x230
> [   35.054003]  device_release_driver+0x18/0x24
> [   35.058279]  bus_remove_device+0xcc/0x10c
> [   35.062292]  device_del+0x15c/0x41c
> [   35.065786]  device_unregister+0x18/0x34
> [   35.069714]  i2c_unregister_device+0x54/0x88
> [   35.073988]  of_i2c_notify+0x98/0x224
> [   35.077656]  blocking_notifier_call_chain+0x6c/0xa0
> [   35.082543]  __of_changeset_entry_notify+0x100/0x16c
> [   35.087515]  __of_changeset_revert_notify+0x44/0x78
> [   35.092398]  of_overlay_remove+0x114/0x1c4
> ...
>
> By comparing the two versions I found that before removing the overlay:
>
>  * in the "working" case (with this patch reverted) I have:
>
>    # ls /sys/class/devlink/ | grep 002c
>    platform:hpbr--i2c:13-002c
>    platform:panel-dsi-lvds--i2c:13-002c

Can you check the "status" and "sync_state_only" file in this folder
and tell me what it says?

Since these devices have a cyclic dependency between them, it should
have been something other than "not tracked" and "sync_state_only"
should be "1". But my guess is you'll see "active" and "0".

>    platform:regulator-sys-1v8--i2c:13-002c
>    regulator:regulator.31--i2c:13-002c
>    #
>
>  * in the "broken" case (v6.8-rc5 + s/!index/index > 0/ as mentioned):
>
>    # ls /sys/class/devlink/ | grep 002c
>    platform:hpbr--i2c:13-002c
>    platform:regulator-sys-1v8--i2c:13-002c
>    regulator:regulator.30--i2c:13-002c
>    #
>
> So in the latter case the panel-dsi-lvds--i2c:13-002c link is missing.
> I think it gets created but later on removed. Here's a snippet of the
> kernel log when that happens:
>
> [    9.578279] ----- cycle: start -----
> [    9.578283] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> [    9.578308] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> [    9.578329] ----- cycle: end -----
> [    9.578334] platform panel-dsi-lvds: Fixed dependency cycle(s) with /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> ...

Somewhere in this area, I'm thinking you'll also see "device:
'i2c:13-002c--platform:panel-dsi-lvds': device_add" do you not? And if
you enabled device link logs, you'll see that it was "sync state only"
link.

> [    9.590620] /panel-dsi-lvds Dropping the fwnode link to /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> ...
> [    9.597280] ----- cycle: start -----
> [    9.597283] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> [    9.602781] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> [    9.607581] ----- cycle: end -----
> [    9.607585] i2c 13-002c: Fixed dependency cycle(s) with /panel-dsi-lvds
> [    9.614217] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_add
> ...
> [    9.614277] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /panel-dsi-lvds
> [    9.614369] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /regulator-dock-sys-1v8
> ...
> [    9.739840] panel-simple panel-dsi-lvds: Dropping the link to 13-002c
> [    9.739846] device: 'i2c:13-002c--platform:panel-dsi-lvds': device_unregister

Oh yeah, see. The "device_add" I expected earlier is getting removed here.

> [   10.247037] sn65dsi83 13-002c: Dropping the link to panel-dsi-lvds
> [   10.247049] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_unregister
>
> And here's the relevant portion of my device tree overlay:
>
> --------------------8<--------------------
>

I think the eventual fix would be this series + adding a
"post-init-providers" property to the device that's supposed to probe
first and point it to the device that's supposed to probe next. Do
this at the device node level, not the endpoint level.
https://lore.kernel.org/lkml/20240221233026.2915061-1-saravanak@google.com/


-Saravana

> /dts-v1/;
> /plugin/;
>
> &{/}
> {
>         panel_dsi_lvds: panel-dsi-lvds {
>                 compatible = "auo,g133han01.1";
>
>                 ports {
>                         #address-cells = <1>;
>                         #size-cells = <0>;
>                         port@0{
>                                 reg = <0>;
>                                 dual-lvds-odd-pixels;
>                                 panel_dsi_lvds_in0: endpoint {
>                                         remote-endpoint = <&sn65dsi84_out0>;
>                                 };
>                         };
>
>                         port@1{
>                                 reg = <1>;
>                                 dual-lvds-even-pixels;
>                                 panel_dsi_lvds_in1: endpoint {
>                                         remote-endpoint = <&sn65dsi84_out1>;
>                                 };
>                         };
>                 };
>         };
> };
>
> &i2c5_ch3 {
>         dsi-lvds-bridge@2c {
>                 compatible = "ti,sn65dsi84";
>                 reg = <0x2c>;
>                 vcc-supply = <&reg_sys_1v8>;
>
>                 ports {
>                         #address-cells = <1>;
>                         #size-cells = <0>;
>
>                         port@0 {
>                                 reg = <0>;
>
>                                 sn65dsi84_from_bridge: endpoint {
>                                         remote-endpoint = <&hpbr_source>;
>                                         data-lanes = <1 2 3 4>;
>                                 };
>                         };
>                         port@2 {
>                                 reg = <2>;
>
>                                 sn65dsi84_out0: endpoint {
>                                         remote-endpoint = <&panel_dsi_lvds_in0>;
>                                 };
>                         };
>                         port@3 {
>                                 reg = <3>;
>
>                                 sn65dsi84_out1: endpoint {
>                                         remote-endpoint = <&panel_dsi_lvds_in1>;
>                                 };
>                         };
>                 };
>         };
> };
>
> --------------------8<--------------------
>
> That's all I could get at this point. Any clues for further
> investigation?
>
> Best regards,
> Luca
>
> --
> Luca Ceresoli, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
  
Rob Herring Feb. 28, 2024, 9:56 p.m. UTC | #2
On Mon, Feb 26, 2024 at 5:52 AM Luca Ceresoli <luca.ceresoli@bootlin.com> wrote:
>
> Hello Saravana,
>
> On Fri, 23 Feb 2024 17:35:24 -0800
> Saravana Kannan <saravanak@google.com> wrote:
>
> > On Fri, Feb 23, 2024 at 8:18 AM Luca Ceresoli <luca.ceresoli@bootlin.com> wrote:
> > >
> > > Hello Saravana,
> > >
> > > [+cc Hervé Codina]
> > >
> > > On Tue,  6 Feb 2024 17:18:01 -0800
> > > Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > > After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"),
> > > > remote-endpoint properties created a fwnode link from the consumer device
> > > > to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
> > > > trying to create device links or detecting cycles. So, improve this the
> > > > same way we improved finding the consumer of a remote-endpoint property.
> > > >
> > > > Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > >
> > > After rebasing my own branch on v6.8-rc5 from v6.8-rc1 I started
> > > getting unexpected warnings during device tree overlay removal. After a
> > > somewhat painful bisection I identified this patch as the one that
> > > triggers it all.
> >
> > Thanks for the report.
> >
> > >
> > > > ---
> > > > --- a/drivers/of/property.c
> > > > +++ b/drivers/of/property.c
> > > > @@ -1232,7 +1232,6 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
> > > >  DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
> > > >  DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
> > > >  DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> > > > -DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
> > > >  DEFINE_SIMPLE_PROP(pwms, "pwms", "#pwm-cells")
> > > >  DEFINE_SIMPLE_PROP(resets, "resets", "#reset-cells")
> > > >  DEFINE_SIMPLE_PROP(leds, "leds", NULL)
> > > > @@ -1298,6 +1297,17 @@ static struct device_node *parse_interrupts(struct device_node *np,
> > > >       return of_irq_parse_one(np, index, &sup_args) ? NULL : sup_args.np;
> > > >  }
> > > >
> > > > +static struct device_node *parse_remote_endpoint(struct device_node *np,
> > > > +                                              const char *prop_name,
> > > > +                                              int index)
> > > > +{
> > > > +     /* Return NULL for index > 0 to signify end of remote-endpoints. */
> > > > +     if (!index || strcmp(prop_name, "remote-endpoint"))
> > >
> > > There seem to be a bug here: "!index" should be "index > 0", as the
> > > comment suggests. Otherwise NULL is always returned.
> >
> > Ah crap, I think you are right. It should have been "index". Not
> > "!index". But I tested this! Sigh. I probably screwed up my testing.
> >
> > Please send out a Fix for this.
> >
> > Geert, we got excited too soon. :(
> >
> > > I am going to send a quick patch for that, but haven't done so yet
> > > because it still won't solve the problem, so I wanted to open the topic
> > > here without further delay.
> > >
> > > Even with the 'index > 0' fix I'm still getting pretty much the same:
> >
> > This part is confusing though. If I read your DT correctly, there's a
> > cycle between platform:panel-dsi-lvds and i2c:13-002c. And fw_devlink
> > should not be enforcing any ordering between those devices ever.
> >
> > I'm surprised that in your "working" case, fw_devlink didn't detect
> > any cycle. It should have. If there's any debugging to do, that's the
> > one we need to debug.
> >
> > >
> > > [   34.836781] ------------[ cut here ]------------
> > > [   34.841401] WARNING: CPU: 2 PID: 204 at drivers/base/devres.c:1064 devm_kfree+0x8c/0xfc
> > > ...
> > > [   35.024751] Call trace:
> > > [   35.027199]  devm_kfree+0x8c/0xfc
> > > [   35.030520]  devm_drm_panel_bridge_release+0x54/0x64 [drm_kms_helper]
> > > [   35.036990]  devres_release_group+0xe0/0x164
> > > [   35.041264]  i2c_device_remove+0x38/0x9c
> > > [   35.045196]  device_remove+0x4c/0x80
> > > [   35.048774]  device_release_driver_internal+0x1d4/0x230
> > > [   35.054003]  device_release_driver+0x18/0x24
> > > [   35.058279]  bus_remove_device+0xcc/0x10c
> > > [   35.062292]  device_del+0x15c/0x41c
> > > [   35.065786]  device_unregister+0x18/0x34
> > > [   35.069714]  i2c_unregister_device+0x54/0x88
> > > [   35.073988]  of_i2c_notify+0x98/0x224
> > > [   35.077656]  blocking_notifier_call_chain+0x6c/0xa0
> > > [   35.082543]  __of_changeset_entry_notify+0x100/0x16c
> > > [   35.087515]  __of_changeset_revert_notify+0x44/0x78
> > > [   35.092398]  of_overlay_remove+0x114/0x1c4
> > > ...
> > >
> > > By comparing the two versions I found that before removing the overlay:
> > >
> > >  * in the "working" case (with this patch reverted) I have:
> > >
> > >    # ls /sys/class/devlink/ | grep 002c
> > >    platform:hpbr--i2c:13-002c
> > >    platform:panel-dsi-lvds--i2c:13-002c
> >
> > Can you check the "status" and "sync_state_only" file in this folder
> > and tell me what it says?
> >
> > Since these devices have a cyclic dependency between them, it should
> > have been something other than "not tracked" and "sync_state_only"
> > should be "1". But my guess is you'll see "active" and "0".
> >
> > >    platform:regulator-sys-1v8--i2c:13-002c
> > >    regulator:regulator.31--i2c:13-002c
> > >    #
> > >
> > >  * in the "broken" case (v6.8-rc5 + s/!index/index > 0/ as mentioned):
> > >
> > >    # ls /sys/class/devlink/ | grep 002c
> > >    platform:hpbr--i2c:13-002c
> > >    platform:regulator-sys-1v8--i2c:13-002c
> > >    regulator:regulator.30--i2c:13-002c
> > >    #
> > >
> > > So in the latter case the panel-dsi-lvds--i2c:13-002c link is missing.
> > > I think it gets created but later on removed. Here's a snippet of the
> > > kernel log when that happens:
> > >
> > > [    9.578279] ----- cycle: start -----
> > > [    9.578283] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > [    9.578308] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > [    9.578329] ----- cycle: end -----
> > > [    9.578334] platform panel-dsi-lvds: Fixed dependency cycle(s) with /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > ...
> >
> > Somewhere in this area, I'm thinking you'll also see "device:
> > 'i2c:13-002c--platform:panel-dsi-lvds': device_add" do you not? And if
> > you enabled device link logs, you'll see that it was "sync state only"
> > link.
> >
> > > [    9.590620] /panel-dsi-lvds Dropping the fwnode link to /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > ...
> > > [    9.597280] ----- cycle: start -----
> > > [    9.597283] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > [    9.602781] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > [    9.607581] ----- cycle: end -----
> > > [    9.607585] i2c 13-002c: Fixed dependency cycle(s) with /panel-dsi-lvds
> > > [    9.614217] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_add
> > > ...
> > > [    9.614277] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /panel-dsi-lvds
> > > [    9.614369] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /regulator-dock-sys-1v8
> > > ...
> > > [    9.739840] panel-simple panel-dsi-lvds: Dropping the link to 13-002c
> > > [    9.739846] device: 'i2c:13-002c--platform:panel-dsi-lvds': device_unregister
> >
> > Oh yeah, see. The "device_add" I expected earlier is getting removed here.
> >
> > > [   10.247037] sn65dsi83 13-002c: Dropping the link to panel-dsi-lvds
> > > [   10.247049] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_unregister
> > >
> > > And here's the relevant portion of my device tree overlay:
> > >
> > > --------------------8<--------------------
> > >
> >
> > I think the eventual fix would be this series + adding a
> > "post-init-providers" property to the device that's supposed to probe
> > first and point it to the device that's supposed to probe next. Do
> > this at the device node level, not the endpoint level.
> > https://lore.kernel.org/lkml/20240221233026.2915061-1-saravanak@google.com/
>
> I'm certainly going to look at this series in more detail and at the
> debugging you asked for, however I'm afraid I won't have access to the
> hardware this week and it's not going to be a quick task anyway.
>
> So in this moment I think it's quite clear that this specific patch
> creates a regression and there is no clear fix that is reasonably
> likely to get merged before 6.8.
>
> I propose reverting this patch immediately, unless you have a better
> short-term solution.

It's just this one of the 3 patches that needs reverting?

Rob
  
Rob Herring Feb. 28, 2024, 10:01 p.m. UTC | #3
On Fri, Feb 23, 2024 at 10:18 AM Luca Ceresoli
<luca.ceresoli@bootlin.com> wrote:
>
> Hello Saravana,
>
> [+cc Hervé Codina]
>
> On Tue,  6 Feb 2024 17:18:01 -0800
> Saravana Kannan <saravanak@google.com> wrote:
>
> > After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"),
> > remote-endpoint properties created a fwnode link from the consumer device
> > to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
> > trying to create device links or detecting cycles. So, improve this the
> > same way we improved finding the consumer of a remote-endpoint property.
> >
> > Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> After rebasing my own branch on v6.8-rc5 from v6.8-rc1 I started
> getting unexpected warnings during device tree overlay removal. After a
> somewhat painful bisection I identified this patch as the one that
> triggers it all.

How are you applying the overlay?

Rob
  
Saravana Kannan Feb. 28, 2024, 11:58 p.m. UTC | #4
On Wed, Feb 28, 2024 at 1:56 PM Rob Herring <robh+dt@kernel.org> wrote:
>
> On Mon, Feb 26, 2024 at 5:52 AM Luca Ceresoli <luca.ceresoli@bootlin.com> wrote:
> >
> > Hello Saravana,
> >
> > On Fri, 23 Feb 2024 17:35:24 -0800
> > Saravana Kannan <saravanak@google.com> wrote:
> >
> > > On Fri, Feb 23, 2024 at 8:18 AM Luca Ceresoli <luca.ceresoli@bootlin.com> wrote:
> > > >
> > > > Hello Saravana,
> > > >
> > > > [+cc Hervé Codina]
> > > >
> > > > On Tue,  6 Feb 2024 17:18:01 -0800
> > > > Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > > After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"),
> > > > > remote-endpoint properties created a fwnode link from the consumer device
> > > > > to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
> > > > > trying to create device links or detecting cycles. So, improve this the
> > > > > same way we improved finding the consumer of a remote-endpoint property.
> > > > >
> > > > > Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
> > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > >
> > > > After rebasing my own branch on v6.8-rc5 from v6.8-rc1 I started
> > > > getting unexpected warnings during device tree overlay removal. After a
> > > > somewhat painful bisection I identified this patch as the one that
> > > > triggers it all.
> > >
> > > Thanks for the report.
> > >
> > > >
> > > > > ---
> > > > > --- a/drivers/of/property.c
> > > > > +++ b/drivers/of/property.c
> > > > > @@ -1232,7 +1232,6 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
> > > > >  DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
> > > > >  DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
> > > > >  DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> > > > > -DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
> > > > >  DEFINE_SIMPLE_PROP(pwms, "pwms", "#pwm-cells")
> > > > >  DEFINE_SIMPLE_PROP(resets, "resets", "#reset-cells")
> > > > >  DEFINE_SIMPLE_PROP(leds, "leds", NULL)
> > > > > @@ -1298,6 +1297,17 @@ static struct device_node *parse_interrupts(struct device_node *np,
> > > > >       return of_irq_parse_one(np, index, &sup_args) ? NULL : sup_args.np;
> > > > >  }
> > > > >
> > > > > +static struct device_node *parse_remote_endpoint(struct device_node *np,
> > > > > +                                              const char *prop_name,
> > > > > +                                              int index)
> > > > > +{
> > > > > +     /* Return NULL for index > 0 to signify end of remote-endpoints. */
> > > > > +     if (!index || strcmp(prop_name, "remote-endpoint"))
> > > >
> > > > There seem to be a bug here: "!index" should be "index > 0", as the
> > > > comment suggests. Otherwise NULL is always returned.
> > >
> > > Ah crap, I think you are right. It should have been "index". Not
> > > "!index". But I tested this! Sigh. I probably screwed up my testing.
> > >
> > > Please send out a Fix for this.
> > >
> > > Geert, we got excited too soon. :(
> > >
> > > > I am going to send a quick patch for that, but haven't done so yet
> > > > because it still won't solve the problem, so I wanted to open the topic
> > > > here without further delay.
> > > >
> > > > Even with the 'index > 0' fix I'm still getting pretty much the same:
> > >
> > > This part is confusing though. If I read your DT correctly, there's a
> > > cycle between platform:panel-dsi-lvds and i2c:13-002c. And fw_devlink
> > > should not be enforcing any ordering between those devices ever.
> > >
> > > I'm surprised that in your "working" case, fw_devlink didn't detect
> > > any cycle. It should have. If there's any debugging to do, that's the
> > > one we need to debug.
> > >
> > > >
> > > > [   34.836781] ------------[ cut here ]------------
> > > > [   34.841401] WARNING: CPU: 2 PID: 204 at drivers/base/devres.c:1064 devm_kfree+0x8c/0xfc
> > > > ...
> > > > [   35.024751] Call trace:
> > > > [   35.027199]  devm_kfree+0x8c/0xfc
> > > > [   35.030520]  devm_drm_panel_bridge_release+0x54/0x64 [drm_kms_helper]
> > > > [   35.036990]  devres_release_group+0xe0/0x164
> > > > [   35.041264]  i2c_device_remove+0x38/0x9c
> > > > [   35.045196]  device_remove+0x4c/0x80
> > > > [   35.048774]  device_release_driver_internal+0x1d4/0x230
> > > > [   35.054003]  device_release_driver+0x18/0x24
> > > > [   35.058279]  bus_remove_device+0xcc/0x10c
> > > > [   35.062292]  device_del+0x15c/0x41c
> > > > [   35.065786]  device_unregister+0x18/0x34
> > > > [   35.069714]  i2c_unregister_device+0x54/0x88
> > > > [   35.073988]  of_i2c_notify+0x98/0x224
> > > > [   35.077656]  blocking_notifier_call_chain+0x6c/0xa0
> > > > [   35.082543]  __of_changeset_entry_notify+0x100/0x16c
> > > > [   35.087515]  __of_changeset_revert_notify+0x44/0x78
> > > > [   35.092398]  of_overlay_remove+0x114/0x1c4
> > > > ...
> > > >
> > > > By comparing the two versions I found that before removing the overlay:
> > > >
> > > >  * in the "working" case (with this patch reverted) I have:
> > > >
> > > >    # ls /sys/class/devlink/ | grep 002c
> > > >    platform:hpbr--i2c:13-002c
> > > >    platform:panel-dsi-lvds--i2c:13-002c
> > >
> > > Can you check the "status" and "sync_state_only" file in this folder
> > > and tell me what it says?
> > >
> > > Since these devices have a cyclic dependency between them, it should
> > > have been something other than "not tracked" and "sync_state_only"
> > > should be "1". But my guess is you'll see "active" and "0".
> > >
> > > >    platform:regulator-sys-1v8--i2c:13-002c
> > > >    regulator:regulator.31--i2c:13-002c
> > > >    #
> > > >
> > > >  * in the "broken" case (v6.8-rc5 + s/!index/index > 0/ as mentioned):
> > > >
> > > >    # ls /sys/class/devlink/ | grep 002c
> > > >    platform:hpbr--i2c:13-002c
> > > >    platform:regulator-sys-1v8--i2c:13-002c
> > > >    regulator:regulator.30--i2c:13-002c
> > > >    #
> > > >
> > > > So in the latter case the panel-dsi-lvds--i2c:13-002c link is missing.
> > > > I think it gets created but later on removed. Here's a snippet of the
> > > > kernel log when that happens:
> > > >
> > > > [    9.578279] ----- cycle: start -----
> > > > [    9.578283] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > > [    9.578308] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > [    9.578329] ----- cycle: end -----
> > > > [    9.578334] platform panel-dsi-lvds: Fixed dependency cycle(s) with /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > ...
> > >
> > > Somewhere in this area, I'm thinking you'll also see "device:
> > > 'i2c:13-002c--platform:panel-dsi-lvds': device_add" do you not? And if
> > > you enabled device link logs, you'll see that it was "sync state only"
> > > link.
> > >
> > > > [    9.590620] /panel-dsi-lvds Dropping the fwnode link to /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > ...
> > > > [    9.597280] ----- cycle: start -----
> > > > [    9.597283] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > [    9.602781] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > > [    9.607581] ----- cycle: end -----
> > > > [    9.607585] i2c 13-002c: Fixed dependency cycle(s) with /panel-dsi-lvds
> > > > [    9.614217] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_add
> > > > ...
> > > > [    9.614277] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /panel-dsi-lvds
> > > > [    9.614369] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /regulator-dock-sys-1v8
> > > > ...
> > > > [    9.739840] panel-simple panel-dsi-lvds: Dropping the link to 13-002c
> > > > [    9.739846] device: 'i2c:13-002c--platform:panel-dsi-lvds': device_unregister
> > >
> > > Oh yeah, see. The "device_add" I expected earlier is getting removed here.
> > >
> > > > [   10.247037] sn65dsi83 13-002c: Dropping the link to panel-dsi-lvds
> > > > [   10.247049] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_unregister
> > > >
> > > > And here's the relevant portion of my device tree overlay:
> > > >
> > > > --------------------8<--------------------
> > > >
> > >
> > > I think the eventual fix would be this series + adding a
> > > "post-init-providers" property to the device that's supposed to probe
> > > first and point it to the device that's supposed to probe next. Do
> > > this at the device node level, not the endpoint level.
> > > https://lore.kernel.org/lkml/20240221233026.2915061-1-saravanak@google.com/
> >
> > I'm certainly going to look at this series in more detail and at the
> > debugging you asked for, however I'm afraid I won't have access to the
> > hardware this week and it's not going to be a quick task anyway.
> >
> > So in this moment I think it's quite clear that this specific patch
> > creates a regression and there is no clear fix that is reasonably
> > likely to get merged before 6.8.
> >
> > I propose reverting this patch immediately, unless you have a better
> > short-term solution.
>
> It's just this one of the 3 patches that needs reverting?

I sent a fix. With the fix, it's just exposing a bug elsewhere.

-Saravana
  
Rob Herring Feb. 29, 2024, 12:26 a.m. UTC | #5
On Wed, Feb 28, 2024 at 5:58 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Wed, Feb 28, 2024 at 1:56 PM Rob Herring <robh+dt@kernel.org> wrote:
> >
> > On Mon, Feb 26, 2024 at 5:52 AM Luca Ceresoli <luca.ceresoli@bootlin.com> wrote:
> > >
> > > Hello Saravana,
> > >
> > > On Fri, 23 Feb 2024 17:35:24 -0800
> > > Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > > On Fri, Feb 23, 2024 at 8:18 AM Luca Ceresoli <luca.ceresoli@bootlin.com> wrote:
> > > > >
> > > > > Hello Saravana,
> > > > >
> > > > > [+cc Hervé Codina]
> > > > >
> > > > > On Tue,  6 Feb 2024 17:18:01 -0800
> > > > > Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > > After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"),
> > > > > > remote-endpoint properties created a fwnode link from the consumer device
> > > > > > to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
> > > > > > trying to create device links or detecting cycles. So, improve this the
> > > > > > same way we improved finding the consumer of a remote-endpoint property.
> > > > > >
> > > > > > Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
> > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > >
> > > > > After rebasing my own branch on v6.8-rc5 from v6.8-rc1 I started
> > > > > getting unexpected warnings during device tree overlay removal. After a
> > > > > somewhat painful bisection I identified this patch as the one that
> > > > > triggers it all.
> > > >
> > > > Thanks for the report.
> > > >
> > > > >
> > > > > > ---
> > > > > > --- a/drivers/of/property.c
> > > > > > +++ b/drivers/of/property.c
> > > > > > @@ -1232,7 +1232,6 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
> > > > > >  DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
> > > > > >  DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
> > > > > >  DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> > > > > > -DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
> > > > > >  DEFINE_SIMPLE_PROP(pwms, "pwms", "#pwm-cells")
> > > > > >  DEFINE_SIMPLE_PROP(resets, "resets", "#reset-cells")
> > > > > >  DEFINE_SIMPLE_PROP(leds, "leds", NULL)
> > > > > > @@ -1298,6 +1297,17 @@ static struct device_node *parse_interrupts(struct device_node *np,
> > > > > >       return of_irq_parse_one(np, index, &sup_args) ? NULL : sup_args.np;
> > > > > >  }
> > > > > >
> > > > > > +static struct device_node *parse_remote_endpoint(struct device_node *np,
> > > > > > +                                              const char *prop_name,
> > > > > > +                                              int index)
> > > > > > +{
> > > > > > +     /* Return NULL for index > 0 to signify end of remote-endpoints. */
> > > > > > +     if (!index || strcmp(prop_name, "remote-endpoint"))
> > > > >
> > > > > There seem to be a bug here: "!index" should be "index > 0", as the
> > > > > comment suggests. Otherwise NULL is always returned.
> > > >
> > > > Ah crap, I think you are right. It should have been "index". Not
> > > > "!index". But I tested this! Sigh. I probably screwed up my testing.
> > > >
> > > > Please send out a Fix for this.
> > > >
> > > > Geert, we got excited too soon. :(
> > > >
> > > > > I am going to send a quick patch for that, but haven't done so yet
> > > > > because it still won't solve the problem, so I wanted to open the topic
> > > > > here without further delay.
> > > > >
> > > > > Even with the 'index > 0' fix I'm still getting pretty much the same:
> > > >
> > > > This part is confusing though. If I read your DT correctly, there's a
> > > > cycle between platform:panel-dsi-lvds and i2c:13-002c. And fw_devlink
> > > > should not be enforcing any ordering between those devices ever.
> > > >
> > > > I'm surprised that in your "working" case, fw_devlink didn't detect
> > > > any cycle. It should have. If there's any debugging to do, that's the
> > > > one we need to debug.
> > > >
> > > > >
> > > > > [   34.836781] ------------[ cut here ]------------
> > > > > [   34.841401] WARNING: CPU: 2 PID: 204 at drivers/base/devres.c:1064 devm_kfree+0x8c/0xfc
> > > > > ...
> > > > > [   35.024751] Call trace:
> > > > > [   35.027199]  devm_kfree+0x8c/0xfc
> > > > > [   35.030520]  devm_drm_panel_bridge_release+0x54/0x64 [drm_kms_helper]
> > > > > [   35.036990]  devres_release_group+0xe0/0x164
> > > > > [   35.041264]  i2c_device_remove+0x38/0x9c
> > > > > [   35.045196]  device_remove+0x4c/0x80
> > > > > [   35.048774]  device_release_driver_internal+0x1d4/0x230
> > > > > [   35.054003]  device_release_driver+0x18/0x24
> > > > > [   35.058279]  bus_remove_device+0xcc/0x10c
> > > > > [   35.062292]  device_del+0x15c/0x41c
> > > > > [   35.065786]  device_unregister+0x18/0x34
> > > > > [   35.069714]  i2c_unregister_device+0x54/0x88
> > > > > [   35.073988]  of_i2c_notify+0x98/0x224
> > > > > [   35.077656]  blocking_notifier_call_chain+0x6c/0xa0
> > > > > [   35.082543]  __of_changeset_entry_notify+0x100/0x16c
> > > > > [   35.087515]  __of_changeset_revert_notify+0x44/0x78
> > > > > [   35.092398]  of_overlay_remove+0x114/0x1c4
> > > > > ...
> > > > >
> > > > > By comparing the two versions I found that before removing the overlay:
> > > > >
> > > > >  * in the "working" case (with this patch reverted) I have:
> > > > >
> > > > >    # ls /sys/class/devlink/ | grep 002c
> > > > >    platform:hpbr--i2c:13-002c
> > > > >    platform:panel-dsi-lvds--i2c:13-002c
> > > >
> > > > Can you check the "status" and "sync_state_only" file in this folder
> > > > and tell me what it says?
> > > >
> > > > Since these devices have a cyclic dependency between them, it should
> > > > have been something other than "not tracked" and "sync_state_only"
> > > > should be "1". But my guess is you'll see "active" and "0".
> > > >
> > > > >    platform:regulator-sys-1v8--i2c:13-002c
> > > > >    regulator:regulator.31--i2c:13-002c
> > > > >    #
> > > > >
> > > > >  * in the "broken" case (v6.8-rc5 + s/!index/index > 0/ as mentioned):
> > > > >
> > > > >    # ls /sys/class/devlink/ | grep 002c
> > > > >    platform:hpbr--i2c:13-002c
> > > > >    platform:regulator-sys-1v8--i2c:13-002c
> > > > >    regulator:regulator.30--i2c:13-002c
> > > > >    #
> > > > >
> > > > > So in the latter case the panel-dsi-lvds--i2c:13-002c link is missing.
> > > > > I think it gets created but later on removed. Here's a snippet of the
> > > > > kernel log when that happens:
> > > > >
> > > > > [    9.578279] ----- cycle: start -----
> > > > > [    9.578283] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > > > [    9.578308] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > > [    9.578329] ----- cycle: end -----
> > > > > [    9.578334] platform panel-dsi-lvds: Fixed dependency cycle(s) with /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > > ...
> > > >
> > > > Somewhere in this area, I'm thinking you'll also see "device:
> > > > 'i2c:13-002c--platform:panel-dsi-lvds': device_add" do you not? And if
> > > > you enabled device link logs, you'll see that it was "sync state only"
> > > > link.
> > > >
> > > > > [    9.590620] /panel-dsi-lvds Dropping the fwnode link to /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > > ...
> > > > > [    9.597280] ----- cycle: start -----
> > > > > [    9.597283] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > > [    9.602781] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > > > [    9.607581] ----- cycle: end -----
> > > > > [    9.607585] i2c 13-002c: Fixed dependency cycle(s) with /panel-dsi-lvds
> > > > > [    9.614217] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_add
> > > > > ...
> > > > > [    9.614277] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /panel-dsi-lvds
> > > > > [    9.614369] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /regulator-dock-sys-1v8
> > > > > ...
> > > > > [    9.739840] panel-simple panel-dsi-lvds: Dropping the link to 13-002c
> > > > > [    9.739846] device: 'i2c:13-002c--platform:panel-dsi-lvds': device_unregister
> > > >
> > > > Oh yeah, see. The "device_add" I expected earlier is getting removed here.
> > > >
> > > > > [   10.247037] sn65dsi83 13-002c: Dropping the link to panel-dsi-lvds
> > > > > [   10.247049] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_unregister
> > > > >
> > > > > And here's the relevant portion of my device tree overlay:
> > > > >
> > > > > --------------------8<--------------------
> > > > >
> > > >
> > > > I think the eventual fix would be this series + adding a
> > > > "post-init-providers" property to the device that's supposed to probe
> > > > first and point it to the device that's supposed to probe next. Do
> > > > this at the device node level, not the endpoint level.
> > > > https://lore.kernel.org/lkml/20240221233026.2915061-1-saravanak@google.com/
> > >
> > > I'm certainly going to look at this series in more detail and at the
> > > debugging you asked for, however I'm afraid I won't have access to the
> > > hardware this week and it's not going to be a quick task anyway.
> > >
> > > So in this moment I think it's quite clear that this specific patch
> > > creates a regression and there is no clear fix that is reasonably
> > > likely to get merged before 6.8.
> > >
> > > I propose reverting this patch immediately, unless you have a better
> > > short-term solution.
> >
> > It's just this one of the 3 patches that needs reverting?
>
> I sent a fix. With the fix, it's just exposing a bug elsewhere.

That's not telling me what to do...

You say apply the fix. Luca says revert. I say I wish I made this 6.9
material. Which is it?

If the overlay applying depends on out of tree code (likely as there
are limited ways to apply an overlay in mainline), then I don't really
care if there is still a regression.

Rob
  
Rob Herring Feb. 29, 2024, 10:10 p.m. UTC | #6
On Thu, Feb 29, 2024 at 3:34 AM Luca Ceresoli <luca.ceresoli@bootlin.com> wrote:
>
> Hi Rob, Saravana,
>
> On Wed, 28 Feb 2024 18:26:36 -0600
> Rob Herring <robh+dt@kernel.org> wrote:
>
> > On Wed, Feb 28, 2024 at 5:58 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > On Wed, Feb 28, 2024 at 1:56 PM Rob Herring <robh+dt@kernel.org> wrote:
> > > >
> > > > On Mon, Feb 26, 2024 at 5:52 AM Luca Ceresoli <luca.ceresoli@bootlin.com> wrote:
> > > > >
> > > > > Hello Saravana,
> > > > >
> > > > > On Fri, 23 Feb 2024 17:35:24 -0800
> > > > > Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > > On Fri, Feb 23, 2024 at 8:18 AM Luca Ceresoli <luca.ceresoli@bootlin.com> wrote:
> > > > > > >
> > > > > > > Hello Saravana,
> > > > > > >
> > > > > > > [+cc Hervé Codina]
> > > > > > >
> > > > > > > On Tue,  6 Feb 2024 17:18:01 -0800
> > > > > > > Saravana Kannan <saravanak@google.com> wrote:
> > > > > > >
> > > > > > > > After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"),
> > > > > > > > remote-endpoint properties created a fwnode link from the consumer device
> > > > > > > > to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
> > > > > > > > trying to create device links or detecting cycles. So, improve this the
> > > > > > > > same way we improved finding the consumer of a remote-endpoint property.
> > > > > > > >
> > > > > > > > Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
> > > > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > >
> > > > > > > After rebasing my own branch on v6.8-rc5 from v6.8-rc1 I started
> > > > > > > getting unexpected warnings during device tree overlay removal. After a
> > > > > > > somewhat painful bisection I identified this patch as the one that
> > > > > > > triggers it all.
> > > > > >
> > > > > > Thanks for the report.
> > > > > >
> > > > > > >
> > > > > > > > ---
> > > > > > > > --- a/drivers/of/property.c
> > > > > > > > +++ b/drivers/of/property.c
> > > > > > > > @@ -1232,7 +1232,6 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
> > > > > > > >  DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
> > > > > > > >  DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
> > > > > > > >  DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> > > > > > > > -DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
> > > > > > > >  DEFINE_SIMPLE_PROP(pwms, "pwms", "#pwm-cells")
> > > > > > > >  DEFINE_SIMPLE_PROP(resets, "resets", "#reset-cells")
> > > > > > > >  DEFINE_SIMPLE_PROP(leds, "leds", NULL)
> > > > > > > > @@ -1298,6 +1297,17 @@ static struct device_node *parse_interrupts(struct device_node *np,
> > > > > > > >       return of_irq_parse_one(np, index, &sup_args) ? NULL : sup_args.np;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static struct device_node *parse_remote_endpoint(struct device_node *np,
> > > > > > > > +                                              const char *prop_name,
> > > > > > > > +                                              int index)
> > > > > > > > +{
> > > > > > > > +     /* Return NULL for index > 0 to signify end of remote-endpoints. */
> > > > > > > > +     if (!index || strcmp(prop_name, "remote-endpoint"))
> > > > > > >
> > > > > > > There seem to be a bug here: "!index" should be "index > 0", as the
> > > > > > > comment suggests. Otherwise NULL is always returned.
> > > > > >
> > > > > > Ah crap, I think you are right. It should have been "index". Not
> > > > > > "!index". But I tested this! Sigh. I probably screwed up my testing.
> > > > > >
> > > > > > Please send out a Fix for this.
> > > > > >
> > > > > > Geert, we got excited too soon. :(
> > > > > >
> > > > > > > I am going to send a quick patch for that, but haven't done so yet
> > > > > > > because it still won't solve the problem, so I wanted to open the topic
> > > > > > > here without further delay.
> > > > > > >
> > > > > > > Even with the 'index > 0' fix I'm still getting pretty much the same:
> > > > > >
> > > > > > This part is confusing though. If I read your DT correctly, there's a
> > > > > > cycle between platform:panel-dsi-lvds and i2c:13-002c. And fw_devlink
> > > > > > should not be enforcing any ordering between those devices ever.
> > > > > >
> > > > > > I'm surprised that in your "working" case, fw_devlink didn't detect
> > > > > > any cycle. It should have. If there's any debugging to do, that's the
> > > > > > one we need to debug.
> > > > > >
> > > > > > >
> > > > > > > [   34.836781] ------------[ cut here ]------------
> > > > > > > [   34.841401] WARNING: CPU: 2 PID: 204 at drivers/base/devres.c:1064 devm_kfree+0x8c/0xfc
> > > > > > > ...
> > > > > > > [   35.024751] Call trace:
> > > > > > > [   35.027199]  devm_kfree+0x8c/0xfc
> > > > > > > [   35.030520]  devm_drm_panel_bridge_release+0x54/0x64 [drm_kms_helper]
> > > > > > > [   35.036990]  devres_release_group+0xe0/0x164
> > > > > > > [   35.041264]  i2c_device_remove+0x38/0x9c
> > > > > > > [   35.045196]  device_remove+0x4c/0x80
> > > > > > > [   35.048774]  device_release_driver_internal+0x1d4/0x230
> > > > > > > [   35.054003]  device_release_driver+0x18/0x24
> > > > > > > [   35.058279]  bus_remove_device+0xcc/0x10c
> > > > > > > [   35.062292]  device_del+0x15c/0x41c
> > > > > > > [   35.065786]  device_unregister+0x18/0x34
> > > > > > > [   35.069714]  i2c_unregister_device+0x54/0x88
> > > > > > > [   35.073988]  of_i2c_notify+0x98/0x224
> > > > > > > [   35.077656]  blocking_notifier_call_chain+0x6c/0xa0
> > > > > > > [   35.082543]  __of_changeset_entry_notify+0x100/0x16c
> > > > > > > [   35.087515]  __of_changeset_revert_notify+0x44/0x78
> > > > > > > [   35.092398]  of_overlay_remove+0x114/0x1c4
> > > > > > > ...
> > > > > > >
> > > > > > > By comparing the two versions I found that before removing the overlay:
> > > > > > >
> > > > > > >  * in the "working" case (with this patch reverted) I have:
> > > > > > >
> > > > > > >    # ls /sys/class/devlink/ | grep 002c
> > > > > > >    platform:hpbr--i2c:13-002c
> > > > > > >    platform:panel-dsi-lvds--i2c:13-002c
> > > > > >
> > > > > > Can you check the "status" and "sync_state_only" file in this folder
> > > > > > and tell me what it says?
> > > > > >
> > > > > > Since these devices have a cyclic dependency between them, it should
> > > > > > have been something other than "not tracked" and "sync_state_only"
> > > > > > should be "1". But my guess is you'll see "active" and "0".
> > > > > >
> > > > > > >    platform:regulator-sys-1v8--i2c:13-002c
> > > > > > >    regulator:regulator.31--i2c:13-002c
> > > > > > >    #
> > > > > > >
> > > > > > >  * in the "broken" case (v6.8-rc5 + s/!index/index > 0/ as mentioned):
> > > > > > >
> > > > > > >    # ls /sys/class/devlink/ | grep 002c
> > > > > > >    platform:hpbr--i2c:13-002c
> > > > > > >    platform:regulator-sys-1v8--i2c:13-002c
> > > > > > >    regulator:regulator.30--i2c:13-002c
> > > > > > >    #
> > > > > > >
> > > > > > > So in the latter case the panel-dsi-lvds--i2c:13-002c link is missing.
> > > > > > > I think it gets created but later on removed. Here's a snippet of the
> > > > > > > kernel log when that happens:
> > > > > > >
> > > > > > > [    9.578279] ----- cycle: start -----
> > > > > > > [    9.578283] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > > > > > [    9.578308] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > > > > [    9.578329] ----- cycle: end -----
> > > > > > > [    9.578334] platform panel-dsi-lvds: Fixed dependency cycle(s) with /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > > > > ...
> > > > > >
> > > > > > Somewhere in this area, I'm thinking you'll also see "device:
> > > > > > 'i2c:13-002c--platform:panel-dsi-lvds': device_add" do you not? And if
> > > > > > you enabled device link logs, you'll see that it was "sync state only"
> > > > > > link.
> > > > > >
> > > > > > > [    9.590620] /panel-dsi-lvds Dropping the fwnode link to /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > > > > ...
> > > > > > > [    9.597280] ----- cycle: start -----
> > > > > > > [    9.597283] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > > > > [    9.602781] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > > > > > [    9.607581] ----- cycle: end -----
> > > > > > > [    9.607585] i2c 13-002c: Fixed dependency cycle(s) with /panel-dsi-lvds
> > > > > > > [    9.614217] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_add
> > > > > > > ...
> > > > > > > [    9.614277] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /panel-dsi-lvds
> > > > > > > [    9.614369] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /regulator-dock-sys-1v8
> > > > > > > ...
> > > > > > > [    9.739840] panel-simple panel-dsi-lvds: Dropping the link to 13-002c
> > > > > > > [    9.739846] device: 'i2c:13-002c--platform:panel-dsi-lvds': device_unregister
> > > > > >
> > > > > > Oh yeah, see. The "device_add" I expected earlier is getting removed here.
> > > > > >
> > > > > > > [   10.247037] sn65dsi83 13-002c: Dropping the link to panel-dsi-lvds
> > > > > > > [   10.247049] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_unregister
> > > > > > >
> > > > > > > And here's the relevant portion of my device tree overlay:
> > > > > > >
> > > > > > > --------------------8<--------------------
> > > > > > >
> > > > > >
> > > > > > I think the eventual fix would be this series + adding a
> > > > > > "post-init-providers" property to the device that's supposed to probe
> > > > > > first and point it to the device that's supposed to probe next. Do
> > > > > > this at the device node level, not the endpoint level.
> > > > > > https://lore.kernel.org/lkml/20240221233026.2915061-1-saravanak@google.com/
> > > > >
> > > > > I'm certainly going to look at this series in more detail and at the
> > > > > debugging you asked for, however I'm afraid I won't have access to the
> > > > > hardware this week and it's not going to be a quick task anyway.
> > > > >
> > > > > So in this moment I think it's quite clear that this specific patch
> > > > > creates a regression and there is no clear fix that is reasonably
> > > > > likely to get merged before 6.8.
> > > > >
> > > > > I propose reverting this patch immediately, unless you have a better
> > > > > short-term solution.
> > > >
> > > > It's just this one of the 3 patches that needs reverting?
>
> Just this patch. I reverted only this and the issue disappeared.
>
> > > I sent a fix. With the fix, it's just exposing a bug elsewhere.
>
> Exactly, this patch has two issues and only the easy one has a fix [0]
> currently as far as I know.
>
> > You say apply the fix. Luca says revert. I say I wish I made this 6.9
> > material. Which is it?
> >
> > If the overlay applying depends on out of tree code (likely as there
> > are limited ways to apply an overlay in mainline), then I don't really
> > care if there is still a regression.
>
> Obviously, to load and unload the overlays I'm using code not yet
> in mainline. It is using of_overlay_fdt_apply() and of_overlay_remove()
> via a driver underdevelopment that is similar to the one Hervé and
> Lizhi Hou are working on [1][2].
>
> I see the point that "we are not breaking existing use cases as no code
> is (un)loading overlays except unittest", sure.
>
> As I see it, we have a feature in the kernel that is not used, but it
> will be, eventually: there are use cases, development is progressing and
> patches are being sent actively. My opinion is that we should not
> put additional known obstacles that will make it even harder than it
> already is.

Well, I don't care to do extra work of applying things and then have
to turn right around fix or revert them. It happens enough as-is with
just mainline. And no one wants to step up and fix the problems with
overlays, but are fine just carrying their out of tree patches. What's
one more. This is the 2nd case of overlay problems with out of tree
users *today*! Some days I'm tempted to just remove overlay support
altogether given the only way to apply them is unittest.

Given Geert is having issues too, I guess I'm going to revert.

Rob
  
Luca Ceresoli March 1, 2024, 9:59 a.m. UTC | #7
Hello Rob,

On Thu, 29 Feb 2024 16:10:38 -0600
Rob Herring <robh+dt@kernel.org> wrote:
[...]
> > > > > It's just this one of the 3 patches that needs reverting?  
> >
> > Just this patch. I reverted only this and the issue disappeared.
> >  
> > > > I sent a fix. With the fix, it's just exposing a bug elsewhere.  
> >
> > Exactly, this patch has two issues and only the easy one has a fix [0]
> > currently as far as I know.
> >  
> > > You say apply the fix. Luca says revert. I say I wish I made this 6.9
> > > material. Which is it?
> > >
> > > If the overlay applying depends on out of tree code (likely as there
> > > are limited ways to apply an overlay in mainline), then I don't really
> > > care if there is still a regression.  
> >
> > Obviously, to load and unload the overlays I'm using code not yet
> > in mainline. It is using of_overlay_fdt_apply() and of_overlay_remove()
> > via a driver underdevelopment that is similar to the one Hervé and
> > Lizhi Hou are working on [1][2].
> >
> > I see the point that "we are not breaking existing use cases as no code
> > is (un)loading overlays except unittest", sure.
> >
> > As I see it, we have a feature in the kernel that is not used, but it
> > will be, eventually: there are use cases, development is progressing and
> > patches are being sent actively. My opinion is that we should not
> > put additional known obstacles that will make it even harder than it
> > already is.  
> 
> Well, I don't care to do extra work of applying things and then have
> to turn right around fix or revert them. It happens enough as-is with
> just mainline. And no one wants to step up and fix the problems with
> overlays, but are fine just carrying their out of tree patches. What's
> one more. This is the 2nd case of overlay problems with out of tree
> users *today*! Some days I'm tempted to just remove overlay support
> altogether given the only way to apply them is unittest.

Thanks for taking time to understand the situation.

Just to clarify my position: together with Hervé we are not just
carrying out of tree code, we are actively developing code that uses
overlay load/unload at runtime and we will send it to mainline as soon
as it is ready.

As part of this process, Hervé has already sent patches to fix various
problems that happen when overlays are loaded and especially unloaded:
https://lore.kernel.org/all/20240229105204.720717-1-herve.codina@bootlin.com/
https://lore.kernel.org/all/20240227113426.253232-1-herve.codina@bootlin.com/
https://lore.kernel.org/all/20240220133950.138452-1-herve.codina@bootlin.com/
https://lore.kernel.org/all/20240220111044.133776-1-herve.codina@bootlin.com/

Best regards,
Luca
  

Patch

diff --git a/drivers/of/property.c b/drivers/of/property.c
index da70aaa62ca3..7bb2d8e290de 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1232,7 +1232,6 @@  DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
 DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
 DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
 DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
-DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
 DEFINE_SIMPLE_PROP(pwms, "pwms", "#pwm-cells")
 DEFINE_SIMPLE_PROP(resets, "resets", "#reset-cells")
 DEFINE_SIMPLE_PROP(leds, "leds", NULL)
@@ -1298,6 +1297,17 @@  static struct device_node *parse_interrupts(struct device_node *np,
 	return of_irq_parse_one(np, index, &sup_args) ? NULL : sup_args.np;
 }
 
+static struct device_node *parse_remote_endpoint(struct device_node *np,
+						 const char *prop_name,
+						 int index)
+{
+	/* Return NULL for index > 0 to signify end of remote-endpoints. */
+	if (!index || strcmp(prop_name, "remote-endpoint"))
+		return NULL;
+
+	return of_graph_get_remote_port_parent(np);
+}
+
 static const struct supplier_bindings of_supplier_bindings[] = {
 	{ .parse_prop = parse_clocks, },
 	{ .parse_prop = parse_interconnects, },