perf/arm-cmn: Workaround AmpereOneX errata AC04_MESH_1 (incorrect child count)

Message ID 20240205194655.1567434-1-ilkka@os.amperecomputing.com
State New
Headers
Series perf/arm-cmn: Workaround AmpereOneX errata AC04_MESH_1 (incorrect child count) |

Commit Message

Ilkka Koskinen Feb. 5, 2024, 7:46 p.m. UTC
  AmpereOneX mesh implementation has a bug in HN-P nodes that makes them
report incorrect child count. The failing crosspoints report 8 children
while they only have two.

When the driver tries to access the inexistent child nodes, it believes it
has reached an invalid node type and probing fails. The workaround is to
ignore those incorrect child nodes and continue normally.

Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
---
 drivers/perf/arm-cmn.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)
  

Comments

Robin Murphy Feb. 6, 2024, 10 a.m. UTC | #1
On 2024-02-05 7:46 pm, Ilkka Koskinen wrote:
> AmpereOneX mesh implementation has a bug in HN-P nodes that makes them
> report incorrect child count. The failing crosspoints report 8 children
> while they only have two.

Ooh, fun :)

> When the driver tries to access the inexistent child nodes, it believes it
> has reached an invalid node type and probing fails. The workaround is to
> ignore those incorrect child nodes and continue normally.
> 
> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> ---
>   drivers/perf/arm-cmn.c | 25 +++++++++++++++++++++++++
>   1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index c584165b13ba..97fed8ec3693 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -2168,6 +2168,23 @@ static enum cmn_node_type arm_cmn_subtype(enum cmn_node_type type)
>   	}
>   }
>   
> +static inline bool arm_cmn_is_ampereonex_bug(const struct arm_cmn *cmn,
> +					     struct arm_cmn_node *dn,
> +					     u16 child_count, int child)
> +{
> +	/*
> +	 * The bug occurs only when a crosspoint reports 8 children
> +	 * while it only has two HN-P child nodes.
> +	 */
> +	dn -= 2;
> +
> +	if (arm_cmn_model(cmn) == CMN650 && child_count == 8 &&
> +	    child == 2 && dn->type == CMN_TYPE_HNP)
> +		return true;
> +
> +	return false;
> +}
> +
>   static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>   {
>   	void __iomem *cfg_region;
> @@ -2292,6 +2309,14 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>   
>   		for (j = 0; j < child_count; j++) {
>   			reg = readq_relaxed(xp_region + child_poff + j * 8);
> +			if (reg == 0)
> +				if (arm_cmn_is_ampereonex_bug(cmn, dn, child_count, j))
> +					/*
> +					 * We know there are only two real children and the rest 6
> +					 * are inexistent. Thus, we can skip the rest of the loop
> +					 */
> +					break;
> +

TBH I don't see much harm in taking an even simpler approach, so I'd be
inclined to not bother being all that specific beyond documenting it,
something like the below:

Cheers,
Robin.

----->8-----

diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index c584165b13ba..7e3aa7e2345f 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -2305,6 +2305,17 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
  				dev_dbg(cmn->dev, "ignoring external node %llx\n", reg);
  				continue;
  			}
+			/*
+			 * AmpereOneX erratum AC04_MESH_1 makes some XPs report a bogus
+			 * child count larger than the number of valid child pointers.
+			 * A child offset of 0 can only occur on CMN-600; otherwise it
+			 * would imply the root node being its own grandchild, which
+			 * we can safely dismiss in general.
+			 */
+			if (reg == 0 && cmn->part != PART_CMN600) {
+				dev_dbg(cmn->dev, "bogus child pointer?\n");
+				continue;
+			}
  
  			arm_cmn_init_node_info(cmn, reg & CMN_CHILD_NODE_ADDR, dn);
  
Ilkka Koskinen Feb. 6, 2024, 9:04 p.m. UTC | #2
On Tue, 6 Feb 2024, Robin Murphy wrote:
> On 2024-02-05 7:46 pm, Ilkka Koskinen wrote:
>> AmpereOneX mesh implementation has a bug in HN-P nodes that makes them
>> report incorrect child count. The failing crosspoints report 8 children
>> while they only have two.
>
> Ooh, fun :)
>
>> When the driver tries to access the inexistent child nodes, it believes it
>> has reached an invalid node type and probing fails. The workaround is to
>> ignore those incorrect child nodes and continue normally.
>> 
>> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
>> ---
>>   drivers/perf/arm-cmn.c | 25 +++++++++++++++++++++++++
>>   1 file changed, 25 insertions(+)
>> 
>> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
>> index c584165b13ba..97fed8ec3693 100644
>> --- a/drivers/perf/arm-cmn.c
>> +++ b/drivers/perf/arm-cmn.c
>> @@ -2168,6 +2168,23 @@ static enum cmn_node_type arm_cmn_subtype(enum 
>> cmn_node_type type)
>>   	}
>>   }
>>   +static inline bool arm_cmn_is_ampereonex_bug(const struct arm_cmn *cmn,
>> +					     struct arm_cmn_node *dn,
>> +					     u16 child_count, int child)
>> +{
>> +	/*
>> +	 * The bug occurs only when a crosspoint reports 8 children
>> +	 * while it only has two HN-P child nodes.
>> +	 */
>> +	dn -= 2;
>> +
>> +	if (arm_cmn_model(cmn) == CMN650 && child_count == 8 &&
>> +	    child == 2 && dn->type == CMN_TYPE_HNP)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>>   static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
>>   {
>>   	void __iomem *cfg_region;
>> @@ -2292,6 +2309,14 @@ static int arm_cmn_discover(struct arm_cmn *cmn, 
>> unsigned int rgn_offset)
>>     		for (j = 0; j < child_count; j++) {
>>   			reg = readq_relaxed(xp_region + child_poff + j * 8);
>> +			if (reg == 0)
>> +				if (arm_cmn_is_ampereonex_bug(cmn, dn, 
>> child_count, j))
>> +					/*
>> +					 * We know there are only two real 
>> children and the rest 6
>> +					 * are inexistent. Thus, we can skip 
>> the rest of the loop
>> +					 */
>> +					break;
>> +
>
> TBH I don't see much harm in taking an even simpler approach, so I'd be
> inclined to not bother being all that specific beyond documenting it,
> something like the below:

Sounds good to me.

>
> Cheers,
> Robin.
>
> ----->8-----
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index c584165b13ba..7e3aa7e2345f 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -2305,6 +2305,17 @@ static int arm_cmn_discover(struct arm_cmn *cmn, 
> unsigned int rgn_offset)
> 				dev_dbg(cmn->dev, "ignoring external node 
> %llx\n", reg);
> 				continue;
> 			}
> +			/*
> +			 * AmpereOneX erratum AC04_MESH_1 makes some XPs 
> report a bogus
> +			 * child count larger than the number of valid child 
> pointers.
> +			 * A child offset of 0 can only occur on CMN-600; 
> otherwise it
> +			 * would imply the root node being its own 
> grandchild, which
> +			 * we can safely dismiss in general.
> +			 */
> +			if (reg == 0 && cmn->part != PART_CMN600) {
> +				dev_dbg(cmn->dev, "bogus child pointer?\n");
> +				continue;
> +			}
>  			arm_cmn_init_node_info(cmn, reg & 
> CMN_CHILD_NODE_ADDR, dn);
>

Tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>

Cheers, Ilkka
  
Will Deacon Feb. 9, 2024, 5:02 p.m. UTC | #3
On Tue, Feb 06, 2024 at 01:04:27PM -0800, Ilkka Koskinen wrote:
> On Tue, 6 Feb 2024, Robin Murphy wrote:
> > On 2024-02-05 7:46 pm, Ilkka Koskinen wrote:
> > diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> > index c584165b13ba..7e3aa7e2345f 100644
> > --- a/drivers/perf/arm-cmn.c
> > +++ b/drivers/perf/arm-cmn.c
> > @@ -2305,6 +2305,17 @@ static int arm_cmn_discover(struct arm_cmn *cmn,
> > unsigned int rgn_offset)
> > 				dev_dbg(cmn->dev, "ignoring external node %llx\n", reg);
> > 				continue;
> > 			}
> > +			/*
> > +			 * AmpereOneX erratum AC04_MESH_1 makes some XPs report a bogus
> > +			 * child count larger than the number of valid child pointers.
> > +			 * A child offset of 0 can only occur on CMN-600; otherwise it
> > +			 * would imply the root node being its own grandchild, which
> > +			 * we can safely dismiss in general.
> > +			 */
> > +			if (reg == 0 && cmn->part != PART_CMN600) {
> > +				dev_dbg(cmn->dev, "bogus child pointer?\n");
> > +				continue;
> > +			}
> >  			arm_cmn_init_node_info(cmn, reg & CMN_CHILD_NODE_ADDR, dn);
> > 
> 
> Tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>

Mind sending that out as a proper patch that I can pick up, please?

Cheers,

Will
  

Patch

diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index c584165b13ba..97fed8ec3693 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -2168,6 +2168,23 @@  static enum cmn_node_type arm_cmn_subtype(enum cmn_node_type type)
 	}
 }
 
+static inline bool arm_cmn_is_ampereonex_bug(const struct arm_cmn *cmn,
+					     struct arm_cmn_node *dn,
+					     u16 child_count, int child)
+{
+	/*
+	 * The bug occurs only when a crosspoint reports 8 children
+	 * while it only has two HN-P child nodes.
+	 */
+	dn -= 2;
+
+	if (arm_cmn_model(cmn) == CMN650 && child_count == 8 &&
+	    child == 2 && dn->type == CMN_TYPE_HNP)
+		return true;
+
+	return false;
+}
+
 static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
 {
 	void __iomem *cfg_region;
@@ -2292,6 +2309,14 @@  static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
 
 		for (j = 0; j < child_count; j++) {
 			reg = readq_relaxed(xp_region + child_poff + j * 8);
+			if (reg == 0)
+				if (arm_cmn_is_ampereonex_bug(cmn, dn, child_count, j))
+					/*
+					 * We know there are only two real children and the rest 6
+					 * are inexistent. Thus, we can skip the rest of the loop
+					 */
+					break;
+
 			/*
 			 * Don't even try to touch anything external, since in general
 			 * we haven't a clue how to power up arbitrary CHI requesters.