[v2,1/2] clk: Warn and add workaround on misuse of .parent_data with .name only

Message ID 20230131160829.23369-1-ansuelsmth@gmail.com
State New
Headers
Series [v2,1/2] clk: Warn and add workaround on misuse of .parent_data with .name only |

Commit Message

Christian Marangi Jan. 31, 2023, 4:08 p.m. UTC
  By a simple mistake in a .parent_names to .parent_data conversion it was
found that clk core assume fw_name is always provided with a parent_data
struct for each parent and never fallback to .name to get parent name even
if declared.

This is caused by clk_core_get that only checks for parent .fw_name and
doesn't handle .name.

While it's sane to request the dev to correctly do the conversion and
add both .fw_name and .name in a parent_data struct, it's not sane to
silently drop parents without a warning.

Fix this in 2 ways. Add a kernel warning when a wrong implementation is
used and copy .name in .fw_name in parent map populate function to
handle clk problems and malfunctions.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
 drivers/clk/clk.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)
  

Comments

Christian Marangi Feb. 10, 2023, 6:34 p.m. UTC | #1
On Fri, Feb 10, 2023 at 04:40:29PM -0800, Stephen Boyd wrote:
> Quoting Christian Marangi (2023-01-31 08:08:28)
> > By a simple mistake in a .parent_names to .parent_data conversion it was
> > found that clk core assume fw_name is always provided with a parent_data
> > struct for each parent and never fallback to .name to get parent name even
> > if declared.
> 
> It sounds like you have clk_parent_data and the .index member is 0? Can
> you show an example structure? I'm guessing it is like this:
> 
> 	struct clk_parent_data pdata = { .name = "global_name" };
>

An example of this problem and the relative fix is here
35dc8e101a8e08f69f4725839b98ec0f11a8e2d3

You example is also ok and this patch wants to handle just a case like
that.

> > 
> > This is caused by clk_core_get that only checks for parent .fw_name and
> > doesn't handle .name.
> 
> clk_core_get() is not supposed to operate on the .name member. It is a
> firmware based lookup with clkdev as a fallback because clkdev is a
> psudeo-firmware interface to assign a name to a clk when some device
> pointer is used in conjunction with it.
> 

And the problem is just that. We currently permit to have a
configuration with .name but no .fw_name. In a case like that a dev may
think that this configuration is valid but in reality the clk is
silently ignored/not found and cause clk problem with selecting a
parent.

Took some good hours to discover this and to me it seems an error that
everybody can do since nowhere is specificed that the following
parent_data configuration is illegal. 

> > 
> > While it's sane to request the dev to correctly do the conversion and
> > add both .fw_name and .name in a parent_data struct, it's not sane to
> > silently drop parents without a warning.
> 
> I suppose we can do
> 
> 	WARN(parent->index >= 0 && !parent_data[i].fw_name && parent_data[i].name, ...);
> 
> or maybe better would be to make the clk registration fail if there's a
> .name field and the index is non-negative and the fw_name is NULL.
> 
> Can you grep the code and see if anyone is assigning a .name without a
> .fw_name or .index?
> 

I can check and have some fun with a good regex.

Reject registration may be an option but consider that this may cause
some device to not boot at all if the error is done on a core clock
driver like a gcc driver.

What I would love is if there is a way to cause a compilation error but
I don't think that is doable with a C macro?

> > 
> > Fix this in 2 ways. Add a kernel warning when a wrong implementation is
> > used and copy .name in .fw_name in parent map populate function to
> > handle clk problems and malfunctions.
> 
> We shouldn't be copying .name to .fw_name. They're different things.

The idea here was that in theory the global name should not be that
different than fw_name. But I understand this can have drammatic side
effect so I agree that we should only WARN that there is something
wrong.

Hope with these expleination it's more clear what this patch is trying
to achieve. The referenced commit should make the problem clear.
  
Stephen Boyd Feb. 11, 2023, 12:40 a.m. UTC | #2
Quoting Christian Marangi (2023-01-31 08:08:28)
> By a simple mistake in a .parent_names to .parent_data conversion it was
> found that clk core assume fw_name is always provided with a parent_data
> struct for each parent and never fallback to .name to get parent name even
> if declared.

It sounds like you have clk_parent_data and the .index member is 0? Can
you show an example structure? I'm guessing it is like this:

	struct clk_parent_data pdata = { .name = "global_name" };

> 
> This is caused by clk_core_get that only checks for parent .fw_name and
> doesn't handle .name.

clk_core_get() is not supposed to operate on the .name member. It is a
firmware based lookup with clkdev as a fallback because clkdev is a
psudeo-firmware interface to assign a name to a clk when some device
pointer is used in conjunction with it.

> 
> While it's sane to request the dev to correctly do the conversion and
> add both .fw_name and .name in a parent_data struct, it's not sane to
> silently drop parents without a warning.

I suppose we can do

	WARN(parent->index >= 0 && !parent_data[i].fw_name && parent_data[i].name, ...);

or maybe better would be to make the clk registration fail if there's a
.name field and the index is non-negative and the fw_name is NULL.

Can you grep the code and see if anyone is assigning a .name without a
.fw_name or .index?

> 
> Fix this in 2 ways. Add a kernel warning when a wrong implementation is
> used and copy .name in .fw_name in parent map populate function to
> handle clk problems and malfunctions.

We shouldn't be copying .name to .fw_name. They're different things.
  
Stephen Boyd Feb. 15, 2023, 6:54 p.m. UTC | #3
Quoting Christian Marangi (2023-02-10 10:34:11)
> On Fri, Feb 10, 2023 at 04:40:29PM -0800, Stephen Boyd wrote:
> > Quoting Christian Marangi (2023-01-31 08:08:28)
> > > By a simple mistake in a .parent_names to .parent_data conversion it was
> > > found that clk core assume fw_name is always provided with a parent_data
> > > struct for each parent and never fallback to .name to get parent name even
> > > if declared.
> > 
> > It sounds like you have clk_parent_data and the .index member is 0? Can
> > you show an example structure? I'm guessing it is like this:
> > 
> >       struct clk_parent_data pdata = { .name = "global_name" };
> >
> 
> An example of this problem and the relative fix is here
> 35dc8e101a8e08f69f4725839b98ec0f11a8e2d3
> 
> You example is also ok and this patch wants to handle just a case like
> that.

Ok, so you have a firmware .index of 0. The .name is a fallback. I
suppose you want the .name to be a fallback if there isn't a clocks
property in the registering device node? I thought that should already
work but maybe there is a bug somewhere. Presumably you have a gcc node
that doesn't have a clocks property

                gcc: gcc@1800000 {
                        compatible = "qcom,gcc-ipq8074";
                        reg = <0x01800000 0x80000>;
                        #clock-cells = <0x1>;
                        #power-domain-cells = <1>;
                        #reset-cells = <0x1>;
                };	

Looking at clk_core_get() we'll call of_parse_clkspec() and that should fail

	struct clk_hw *hw = ERR_PTR(-ENOENT);

	...

        if (np && (name || index >= 0) &&
            !of_parse_clkspec(np, index, name, &clkspec)) {
		...
	} else if (name) {
		...
	}

        if (IS_ERR(hw))
                return ERR_CAST(hw);

so we should have a -ENOENT clk_hw pointer in
clk_core_fill_parent_index(). That should land in this if condition in
clk_core_fill_parent_index()

                parent = clk_core_get(core, index);
                if (PTR_ERR(parent) == -ENOENT && entry->name)
                        parent = clk_core_lookup(entry->name);

and then entry->name should be used. 

> 
> > > 
> > > This is caused by clk_core_get that only checks for parent .fw_name and
> > > doesn't handle .name.
> > 
> > clk_core_get() is not supposed to operate on the .name member. It is a
> > firmware based lookup with clkdev as a fallback because clkdev is a
> > psudeo-firmware interface to assign a name to a clk when some device
> > pointer is used in conjunction with it.
> > 
> 
> And the problem is just that. We currently permit to have a
> configuration with .name but no .fw_name. In a case like that a dev may
> think that this configuration is valid but in reality the clk is
> silently ignored/not found and cause clk problem with selecting a
> parent.

It is valid though.

> 
> Took some good hours to discover this and to me it seems an error that
> everybody can do since nowhere is specificed that the following
> parent_data configuration is illegal. 
> 

I'll look at adding a test. Seems to be the best way to solve this.
  
Christian Marangi Feb. 15, 2023, 11:33 p.m. UTC | #4
On Wed, Feb 15, 2023 at 10:54:56AM -0800, Stephen Boyd wrote:
> Quoting Christian Marangi (2023-02-10 10:34:11)
> > On Fri, Feb 10, 2023 at 04:40:29PM -0800, Stephen Boyd wrote:
> > > Quoting Christian Marangi (2023-01-31 08:08:28)
> > > > By a simple mistake in a .parent_names to .parent_data conversion it was
> > > > found that clk core assume fw_name is always provided with a parent_data
> > > > struct for each parent and never fallback to .name to get parent name even
> > > > if declared.
> > > 
> > > It sounds like you have clk_parent_data and the .index member is 0? Can
> > > you show an example structure? I'm guessing it is like this:
> > > 
> > >       struct clk_parent_data pdata = { .name = "global_name" };
> > >
> > 
> > An example of this problem and the relative fix is here
> > 35dc8e101a8e08f69f4725839b98ec0f11a8e2d3
> > 
> > You example is also ok and this patch wants to handle just a case like
> > that.
> 
> Ok, so you have a firmware .index of 0. The .name is a fallback. I
> suppose you want the .name to be a fallback if there isn't a clocks
> property in the registering device node? I thought that should already
> work but maybe there is a bug somewhere. Presumably you have a gcc node
> that doesn't have a clocks property
> 
>                 gcc: gcc@1800000 {
>                         compatible = "qcom,gcc-ipq8074";
>                         reg = <0x01800000 0x80000>;
>                         #clock-cells = <0x1>;
>                         #power-domain-cells = <1>;
>                         #reset-cells = <0x1>;
>                 };	
> 
> Looking at clk_core_get() we'll call of_parse_clkspec() and that should fail
> 
> 	struct clk_hw *hw = ERR_PTR(-ENOENT);
> 
> 	...
> 
>         if (np && (name || index >= 0) &&
>             !of_parse_clkspec(np, index, name, &clkspec)) {
> 		...
> 	} else if (name) {
> 		...
> 	}
> 
>         if (IS_ERR(hw))
>                 return ERR_CAST(hw);
> 
> so we should have a -ENOENT clk_hw pointer in
> clk_core_fill_parent_index(). That should land in this if condition in
> clk_core_fill_parent_index()
> 
>                 parent = clk_core_get(core, index);
>                 if (PTR_ERR(parent) == -ENOENT && entry->name)
>                         parent = clk_core_lookup(entry->name);
> 
> and then entry->name should be used. 
>

Hi, thanks for making me give this an extra check... I think I found
the real cause.
I send a patch that should suppress this and give an extensive
explaination of the problem.
This is the ID: 20230215232712.17072-1-ansuelsmth@gmail.com

The hint that made me get what was wrong was a problem with index and
the fact that it should have returned -ENOENT... Fun to discover a clock
was actually returned and the function never returned an error.

> > 
> > > > 
> > > > This is caused by clk_core_get that only checks for parent .fw_name and
> > > > doesn't handle .name.
> > > 
> > > clk_core_get() is not supposed to operate on the .name member. It is a
> > > firmware based lookup with clkdev as a fallback because clkdev is a
> > > psudeo-firmware interface to assign a name to a clk when some device
> > > pointer is used in conjunction with it.
> > > 
> > 
> > And the problem is just that. We currently permit to have a
> > configuration with .name but no .fw_name. In a case like that a dev may
> > think that this configuration is valid but in reality the clk is
> > silently ignored/not found and cause clk problem with selecting a
> > parent.
> 
> It is valid though.
> 
> > 
> > Took some good hours to discover this and to me it seems an error that
> > everybody can do since nowhere is specificed that the following
> > parent_data configuration is illegal. 
> > 
> 
> I'll look at adding a test. Seems to be the best way to solve this.

Eh probably a test may have made this more clear. The main problem  here
was that the function never returned an error but under the hood the
parent was pointing to another clock.
  

Patch

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 57b83665e5c3..dccd4ea6f692 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -4015,10 +4015,21 @@  static int clk_core_populate_parent_map(struct clk_core *core,
 			ret = clk_cpy_name(&parent->name, parent_names[i],
 					   true);
 		} else if (parent_data) {
+			const char *parent_name;
+
 			parent->hw = parent_data[i].hw;
 			parent->index = parent_data[i].index;
+			parent_name = parent_data[i].fw_name;
+
+			if (!parent_name && parent_data[i].name) {
+				WARN(1, "Empty .fw_name with .name in %s's .parent_data. Using .name for .fw_name declaration.\n",
+				     core->name);
+				parent_name = parent_data[i].name;
+			}
+
 			ret = clk_cpy_name(&parent->fw_name,
-					   parent_data[i].fw_name, false);
+					   parent_name, false);
+
 			if (!ret)
 				ret = clk_cpy_name(&parent->name,
 						   parent_data[i].name,