clk: Fix wrong clock returned in parent_data with .name and no .index

Message ID 20230215232712.17072-1-ansuelsmth@gmail.com
State New
Headers
Series clk: Fix wrong clock returned in parent_data with .name and no .index |

Commit Message

Christian Marangi Feb. 15, 2023, 11:27 p.m. UTC
  Commit 601b6e93304a ("clk: Allow parents to be specified via clkspec index")
introduced a regression due to a "fragile" implementation present in some very
corner case.

Such commit introduced the support for parents to be specified using
clkspec index. The index is an int and should be -1 if the feature
should not be used. This is the case with parent_hws or legacy
parent_names used and the index value is set to -1 by default.
With parent_data the situation is different, since it's a struct that
can have multiple value (.index, .name, .fw_name), it's init to all 0 by
default. This cause the index value to be set to 0 everytime even if not
intended to be defined and used.

This simple "fragile" implementation cause side-effect and unintended
behaviour.

Assuming the following scenario (to repro the corner case and doesn't
reflect real code):

In dt we have a node like this:
		acc1: clock-controller@2098000 {
			compatible = "qcom,kpss-acc-v1";
			reg = <0x02098000 0x1000>, <0x02008000 0x1000>;
			clock-output-names = "acpu1_aux";
			clocks = <&pxo_board>;
			clock-names = "pxo";
			#clock-cells = <0>;
		};

And on the relevant driver we have the parent data defined as such:
		static const struct clk_parent_data aux_parents[] = {
			{ .name = "pll8_vote" },
			{ .fw_name = "pxo", .name = "pxo_board" },
		};

Someone would expect the first parent to be globally searched and set to
point to the clock named "pll8_vote".
But this is not the case and instead under the hood, the parent point to
the pxo clock. This happen without any warning and was discovered on
another platform while the gcc driver was converted to parent_data and
only .name was defined.

The reason is hard to discover but very simple.

Due to the introduction of index support, clk_core_get() won't return
-ENOENT but insted will correctly return a clock.
This is because of_parse_clkspec() will use the index (that is set to 0
due to how things are allocated) and correctly find in the DT node a
clock at index 0. That in the provided example is exactly the phandle to
pxo_board.

Clock is found so the parent is now wrongly linked to the pxo_board
clock.

This only happens in this specific scenario but it's still worth to be
handled and currently there are some driver that hardcode the
parent_data and may suffer from this.

To fix this and handle it correctly we can use the following logic:
1. With a .fw_name not defined (index searching is skipped if a named
   clock is provided)
2. Check if .name is provided
3. Compare the provided .name with what clockspec found
4. Return -ENOENT if the name doesn't match, return the clock if it does.

Returning -ENOENT permit clk core code flow to fallback to global
searching and correctly search the right clock.

Fixes: 601b6e93304a ("clk: Allow parents to be specified via clkspec index")
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
 drivers/clk/clk.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)
  

Comments

Christian Marangi Feb. 15, 2023, 11:34 p.m. UTC | #1
On Thu, Feb 16, 2023 at 12:27:12AM +0100, Christian Marangi wrote:
> Commit 601b6e93304a ("clk: Allow parents to be specified via clkspec index")
> introduced a regression due to a "fragile" implementation present in some very
> corner case.
> 
> Such commit introduced the support for parents to be specified using
> clkspec index. The index is an int and should be -1 if the feature
> should not be used. This is the case with parent_hws or legacy
> parent_names used and the index value is set to -1 by default.
> With parent_data the situation is different, since it's a struct that
> can have multiple value (.index, .name, .fw_name), it's init to all 0 by
> default. This cause the index value to be set to 0 everytime even if not
> intended to be defined and used.
> 
> This simple "fragile" implementation cause side-effect and unintended
> behaviour.
> 
> Assuming the following scenario (to repro the corner case and doesn't
> reflect real code):
> 
> In dt we have a node like this:
> 		acc1: clock-controller@2098000 {
> 			compatible = "qcom,kpss-acc-v1";
> 			reg = <0x02098000 0x1000>, <0x02008000 0x1000>;
> 			clock-output-names = "acpu1_aux";
> 			clocks = <&pxo_board>;
> 			clock-names = "pxo";
> 			#clock-cells = <0>;
> 		};
> 
> And on the relevant driver we have the parent data defined as such:
> 		static const struct clk_parent_data aux_parents[] = {
> 			{ .name = "pll8_vote" },
> 			{ .fw_name = "pxo", .name = "pxo_board" },
> 		};
> 
> Someone would expect the first parent to be globally searched and set to
> point to the clock named "pll8_vote".
> But this is not the case and instead under the hood, the parent point to
> the pxo clock. This happen without any warning and was discovered on
> another platform while the gcc driver was converted to parent_data and
> only .name was defined.
> 
> The reason is hard to discover but very simple.
> 
> Due to the introduction of index support, clk_core_get() won't return
> -ENOENT but insted will correctly return a clock.
> This is because of_parse_clkspec() will use the index (that is set to 0
> due to how things are allocated) and correctly find in the DT node a
> clock at index 0. That in the provided example is exactly the phandle to
> pxo_board.
> 
> Clock is found so the parent is now wrongly linked to the pxo_board
> clock.
> 
> This only happens in this specific scenario but it's still worth to be
> handled and currently there are some driver that hardcode the
> parent_data and may suffer from this.
> 
> To fix this and handle it correctly we can use the following logic:
> 1. With a .fw_name not defined (index searching is skipped if a named
>    clock is provided)
> 2. Check if .name is provided
> 3. Compare the provided .name with what clockspec found
> 4. Return -ENOENT if the name doesn't match, return the clock if it does.
> 
> Returning -ENOENT permit clk core code flow to fallback to global
> searching and correctly search the right clock.
> 
> Fixes: 601b6e93304a ("clk: Allow parents to be specified via clkspec index")
> Cc: Miquel Raynal <miquel.raynal@bootlin.com>
> Cc: Jerome Brunet <jbrunet@baylibre.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Michael Turquette <mturquette@baylibre.com>
> Cc: Jeffrey Hugo <jhugo@codeaurora.org>
> Cc: Chen-Yu Tsai <wens@csie.org>
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>

Think this should also be backported to stable kernel just like it was
done with 4f8c6aba37da199155a121c6cdc38505a9eb0259 ?

> ---
>  drivers/clk/clk.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 998676d78029..42e297fcfe45 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -395,6 +395,7 @@ of_clk_get_hw_from_clkspec(struct of_phandle_args *clkspec)
>   */
>  static struct clk_core *clk_core_get(struct clk_core *core, u8 p_index)
>  {
> +	const char *global_name = core->parents[p_index].name;
>  	const char *name = core->parents[p_index].fw_name;
>  	int index = core->parents[p_index].index;
>  	struct clk_hw *hw = ERR_PTR(-ENOENT);
> @@ -407,6 +408,23 @@ static struct clk_core *clk_core_get(struct clk_core *core, u8 p_index)
>  	    !of_parse_clkspec(np, index, name, &clkspec)) {
>  		hw = of_clk_get_hw_from_clkspec(&clkspec);
>  		of_node_put(clkspec.np);
> +
> +		/*
> +		 * The returned hw may be incorrect and extra check are required in
> +		 * some corner case.
> +		 *
> +		 * In case a .fw_name is not set of_parse_clkspec will use the index
> +		 * to search the related clock.
> +		 * But index may be never set and actually never intended to be used
> +		 * in the defined parent_data since a 0 value is also accepted and that
> +		 * is what by default each struct is initialized.
> +		 *
> +		 * In the following case check if we have .name and check if the returned
> +		 * clock name match the globally name defined for the parent in the
> +		 * parent_data .name value.
> +		 */
> +		if (!name && global_name && strcmp(global_name, hw->core->name))
> +			return ERR_PTR(-ENOENT);
>  	} else if (name) {
>  		/*
>  		 * If the DT search above couldn't find the provider fallback to
> -- 
> 2.38.1
>
  
Christian Marangi Feb. 17, 2023, 5:09 a.m. UTC | #2
On Fri, Feb 17, 2023 at 02:11:20PM -0800, Stephen Boyd wrote:
> Quoting Christian Marangi (2023-02-15 15:27:12)
> > Commit 601b6e93304a ("clk: Allow parents to be specified via clkspec index")
> > introduced a regression due to a "fragile" implementation present in some very
> > corner case.
> > 
> > Such commit introduced the support for parents to be specified using
> > clkspec index. The index is an int and should be -1 if the feature
> > should not be used. This is the case with parent_hws or legacy
> > parent_names used and the index value is set to -1 by default.
> > With parent_data the situation is different, since it's a struct that
> > can have multiple value (.index, .name, .fw_name), it's init to all 0 by
> > default. This cause the index value to be set to 0 everytime even if not
> 
> It's only initialized to all 0 because that's what you've decided to do.
> It could be on the stack and have random stack junk values.
>

Yes and that itself is problematic on his own. The index value may be
set to an unintended value and we really can't update each parent_data
to -1. And as you can see in the example index is used as an alternative
source to search the parent.

Hope it's clear what is the problem here.

> > intended to be defined and used.
> > 
> > This simple "fragile" implementation cause side-effect and unintended
> > behaviour.
> > 
> > Assuming the following scenario (to repro the corner case and doesn't
> > reflect real code):
> > 
> > In dt we have a node like this:
> >                 acc1: clock-controller@2098000 {
> >                         compatible = "qcom,kpss-acc-v1";
> >                         reg = <0x02098000 0x1000>, <0x02008000 0x1000>;
> >                         clock-output-names = "acpu1_aux";
> >                         clocks = <&pxo_board>;
> >                         clock-names = "pxo";
> >                         #clock-cells = <0>;
> >                 };
> > 
> > And on the relevant driver we have the parent data defined as such:
> >                 static const struct clk_parent_data aux_parents[] = {
> >                         { .name = "pll8_vote" },
> >                         { .fw_name = "pxo", .name = "pxo_board" },
> >                 };
> > 
> > Someone would expect the first parent to be globally searched and set to
> > point to the clock named "pll8_vote".
> > But this is not the case and instead under the hood, the parent point to
> > the pxo clock. This happen without any warning and was discovered on
> > another platform while the gcc driver was converted to parent_data and
> > only .name was defined.
> 
> You didn't set .index explicitly to zero, but it is zero because of the
> use of static struct initializers here. If the struct was on the stack
> nobody knows what the value would be. Set -1 if you don't want to use
> the index lookup path.

There is at least one driver that use .name for global searching and
it's clear that he didn't intend to use index lookup.

Are you totally against this or you are suggesting I should use a
different word for this?

To me this looks very sensible and something we should take care since
sounds a bit fragile to me. (I know 99% of the time it would be a dev
error but we could have case where things works by luck and for example
someone starts adding an additional parent in later changes and regression
happens.)
  
Stephen Boyd Feb. 17, 2023, 10:11 p.m. UTC | #3
Quoting Christian Marangi (2023-02-15 15:27:12)
> Commit 601b6e93304a ("clk: Allow parents to be specified via clkspec index")
> introduced a regression due to a "fragile" implementation present in some very
> corner case.
> 
> Such commit introduced the support for parents to be specified using
> clkspec index. The index is an int and should be -1 if the feature
> should not be used. This is the case with parent_hws or legacy
> parent_names used and the index value is set to -1 by default.
> With parent_data the situation is different, since it's a struct that
> can have multiple value (.index, .name, .fw_name), it's init to all 0 by
> default. This cause the index value to be set to 0 everytime even if not

It's only initialized to all 0 because that's what you've decided to do.
It could be on the stack and have random stack junk values.

> intended to be defined and used.
> 
> This simple "fragile" implementation cause side-effect and unintended
> behaviour.
> 
> Assuming the following scenario (to repro the corner case and doesn't
> reflect real code):
> 
> In dt we have a node like this:
>                 acc1: clock-controller@2098000 {
>                         compatible = "qcom,kpss-acc-v1";
>                         reg = <0x02098000 0x1000>, <0x02008000 0x1000>;
>                         clock-output-names = "acpu1_aux";
>                         clocks = <&pxo_board>;
>                         clock-names = "pxo";
>                         #clock-cells = <0>;
>                 };
> 
> And on the relevant driver we have the parent data defined as such:
>                 static const struct clk_parent_data aux_parents[] = {
>                         { .name = "pll8_vote" },
>                         { .fw_name = "pxo", .name = "pxo_board" },
>                 };
> 
> Someone would expect the first parent to be globally searched and set to
> point to the clock named "pll8_vote".
> But this is not the case and instead under the hood, the parent point to
> the pxo clock. This happen without any warning and was discovered on
> another platform while the gcc driver was converted to parent_data and
> only .name was defined.

You didn't set .index explicitly to zero, but it is zero because of the
use of static struct initializers here. If the struct was on the stack
nobody knows what the value would be. Set -1 if you don't want to use
the index lookup path.
  

Patch

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 998676d78029..42e297fcfe45 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -395,6 +395,7 @@  of_clk_get_hw_from_clkspec(struct of_phandle_args *clkspec)
  */
 static struct clk_core *clk_core_get(struct clk_core *core, u8 p_index)
 {
+	const char *global_name = core->parents[p_index].name;
 	const char *name = core->parents[p_index].fw_name;
 	int index = core->parents[p_index].index;
 	struct clk_hw *hw = ERR_PTR(-ENOENT);
@@ -407,6 +408,23 @@  static struct clk_core *clk_core_get(struct clk_core *core, u8 p_index)
 	    !of_parse_clkspec(np, index, name, &clkspec)) {
 		hw = of_clk_get_hw_from_clkspec(&clkspec);
 		of_node_put(clkspec.np);
+
+		/*
+		 * The returned hw may be incorrect and extra check are required in
+		 * some corner case.
+		 *
+		 * In case a .fw_name is not set of_parse_clkspec will use the index
+		 * to search the related clock.
+		 * But index may be never set and actually never intended to be used
+		 * in the defined parent_data since a 0 value is also accepted and that
+		 * is what by default each struct is initialized.
+		 *
+		 * In the following case check if we have .name and check if the returned
+		 * clock name match the globally name defined for the parent in the
+		 * parent_data .name value.
+		 */
+		if (!name && global_name && strcmp(global_name, hw->core->name))
+			return ERR_PTR(-ENOENT);
 	} else if (name) {
 		/*
 		 * If the DT search above couldn't find the provider fallback to