riscv: Move call to init_cpu_topology() to later initialization stage

Message ID 20230103035316.3841303-1-leyfoon.tan@starfivetech.com
State New
Headers
Series riscv: Move call to init_cpu_topology() to later initialization stage |

Commit Message

Ley Foon Tan Jan. 3, 2023, 3:53 a.m. UTC
  topology_parse_cpu_capacity() is failed to allocate memory with kcalloc()
after read "capacity-dmips-mhz" DT parameter in CPU DT nodes. This
topology_parse_cpu_capacity() is called from init_cpu_topology(), move
call to init_cpu_topology() to later initialization  stage (after memory
allocation is available).

Note, this refers to ARM64 implementation, call init_cpu_topology() in
smp_prepare_cpus().

Tested on Qemu platform.

Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>

---

In drivers/base/arch_topology.c: topology_parse_cpu_capacity():

	ret = of_property_read_u32(cpu_node, "capacity-dmips-mhz",
				   &cpu_capacity);
	if (!ret) {
		if (!raw_capacity) {
			raw_capacity = kcalloc(num_possible_cpus(),
					       sizeof(*raw_capacity),
					       GFP_KERNEL);
			if (!raw_capacity) {
				cap_parsing_failed = true;
				return false;
			}
---
 arch/riscv/kernel/smpboot.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
  

Comments

Andrew Jones Jan. 3, 2023, 6:54 a.m. UTC | #1
On Tue, Jan 03, 2023 at 11:53:16AM +0800, Ley Foon Tan wrote:
> topology_parse_cpu_capacity() is failed to allocate memory with kcalloc()
> after read "capacity-dmips-mhz" DT parameter in CPU DT nodes. This
> topology_parse_cpu_capacity() is called from init_cpu_topology(), move
> call to init_cpu_topology() to later initialization  stage (after memory
> allocation is available).
> 
> Note, this refers to ARM64 implementation, call init_cpu_topology() in
> smp_prepare_cpus().
> 
> Tested on Qemu platform.

Hi Ley,

Can you provide the topologies (command lines) tested?

> 
> Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>

Fixes tag?

> 
> ---
> 
> In drivers/base/arch_topology.c: topology_parse_cpu_capacity():
> 
> 	ret = of_property_read_u32(cpu_node, "capacity-dmips-mhz",
> 				   &cpu_capacity);
> 	if (!ret) {
> 		if (!raw_capacity) {
> 			raw_capacity = kcalloc(num_possible_cpus(),
> 					       sizeof(*raw_capacity),
> 					       GFP_KERNEL);
> 			if (!raw_capacity) {
> 				cap_parsing_failed = true;
> 				return false;
> 			}
> ---
>  arch/riscv/kernel/smpboot.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> index 3373df413c88..ddb2afba6d25 100644
> --- a/arch/riscv/kernel/smpboot.c
> +++ b/arch/riscv/kernel/smpboot.c
> @@ -39,7 +39,6 @@ static DECLARE_COMPLETION(cpu_running);
>  
>  void __init smp_prepare_boot_cpu(void)
>  {
> -	init_cpu_topology();
>  }
>  
>  void __init smp_prepare_cpus(unsigned int max_cpus)
> @@ -48,6 +47,8 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  	int ret;
>  	unsigned int curr_cpuid;
>  
> +	init_cpu_topology();
> +
>  	curr_cpuid = smp_processor_id();
>  	store_cpu_topology(curr_cpuid);
>  	numa_store_cpu_info(curr_cpuid);
> -- 
> 2.25.1
>

Otherwise,

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

Thanks,
drew
  
Ley Foon Tan Jan. 3, 2023, 7:53 a.m. UTC | #2
> -----Original Message-----
> From: Andrew Jones <ajones@ventanamicro.com>
> Sent: Tuesday, 3 January, 2023 2:54 PM
> To: Leyfoon Tan <leyfoon.tan@starfivetech.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>; Paul Walmsley
> <paul.walmsley@sifive.com>; Albert Ou <aou@eecs.berkeley.edu>; linux-
> riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Ley Foon Tan
> <lftan.linux@gmail.com>
> Subject: Re: [PATCH] riscv: Move call to init_cpu_topology() to later
> initialization stage
> 
> On Tue, Jan 03, 2023 at 11:53:16AM +0800, Ley Foon Tan wrote:
> > topology_parse_cpu_capacity() is failed to allocate memory with
> > kcalloc() after read "capacity-dmips-mhz" DT parameter in CPU DT
> > nodes. This
> > topology_parse_cpu_capacity() is called from init_cpu_topology(), move
> > call to init_cpu_topology() to later initialization  stage (after
> > memory allocation is available).
> >
> > Note, this refers to ARM64 implementation, call init_cpu_topology() in
> > smp_prepare_cpus().
> >
> > Tested on Qemu platform.
> 
> Hi Ley,
> 
> Can you provide the topologies (command lines) tested?
2 clusters with 2 CPU cores each.

> 
> >
> > Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> 
> Fixes tag?
Okay, will send out next revision with Fixes tag.

Fixes: 03f11f03dbfe ("RISC-V: Parse cpu topology during boot. ")


[...]

> >  arch/riscv/kernel/smpboot.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> > index 3373df413c88..ddb2afba6d25 100644
> > --- a/arch/riscv/kernel/smpboot.c
> > +++ b/arch/riscv/kernel/smpboot.c
> > @@ -39,7 +39,6 @@ static DECLARE_COMPLETION(cpu_running);
> >
> >  void __init smp_prepare_boot_cpu(void)  {
> > -	init_cpu_topology();
> >  }
> >
> >  void __init smp_prepare_cpus(unsigned int max_cpus) @@ -48,6 +47,8
> @@
> > void __init smp_prepare_cpus(unsigned int max_cpus)
> >  	int ret;
> >  	unsigned int curr_cpuid;
> >
> > +	init_cpu_topology();
> > +
> >  	curr_cpuid = smp_processor_id();
> >  	store_cpu_topology(curr_cpuid);
> >  	numa_store_cpu_info(curr_cpuid);
> > --
> > 2.25.1
> >
> 
> Otherwise,
> 
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> 
> Thanks,
> drew

Thanks
Ley Foon
  
Conor Dooley Jan. 3, 2023, 5:07 p.m. UTC | #3
Hello!

Couple comments for you.

+CC Sudeep: I've got a question for you below.

On Tue, Jan 03, 2023 at 07:53:38AM +0000, Leyfoon Tan wrote:
> > From: Andrew Jones <ajones@ventanamicro.com>
> > Subject: Re: [PATCH] riscv: Move call to init_cpu_topology() to later
> > initialization stage
> > On Tue, Jan 03, 2023 at 11:53:16AM +0800, Ley Foon Tan wrote:
> > > topology_parse_cpu_capacity() is failed to allocate memory with
> > > kcalloc() after read "capacity-dmips-mhz" DT parameter in CPU DT

Uhh, so where did this "capacity-dmips-mhz" property actually come from?
I had a quick check of qemu with grep & I don't see anything there that
would add this property.
This property should not be valid on anything other than arm AFAICT.

> > > nodes. This
> > > topology_parse_cpu_capacity() is called from init_cpu_topology(), move
> > > call to init_cpu_topology() to later initialization  stage (after
> > > memory allocation is available).
> > >
> > > Note, this refers to ARM64 implementation, call init_cpu_topology() in
> > > smp_prepare_cpus().
> > >
> > > Tested on Qemu platform.
> > 
> > Hi Ley,
> > 
> > Can you provide the topologies (command lines) tested?
> 2 clusters with 2 CPU cores each.

What's the actual commandline for this? I'm not the best with QEMU, so
it'd really be appreciated, given the above.

> > > Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> > 
> > Fixes tag?
> Okay, will send out next revision with Fixes tag.

Please don't just send versions to add tags, Palmer can pick them up
if/when he applies the patch.

> Fixes: 03f11f03dbfe ("RISC-V: Parse cpu topology during boot. ")

> > >  arch/riscv/kernel/smpboot.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> > > index 3373df413c88..ddb2afba6d25 100644
> > > --- a/arch/riscv/kernel/smpboot.c
> > > +++ b/arch/riscv/kernel/smpboot.c
> > > @@ -39,7 +39,6 @@ static DECLARE_COMPLETION(cpu_running);
> > >
> > >  void __init smp_prepare_boot_cpu(void)  {
> > > -	init_cpu_topology();
> > >  }
> > >
> > >  void __init smp_prepare_cpus(unsigned int max_cpus) @@ -48,6 +47,8
> > @@
> > > void __init smp_prepare_cpus(unsigned int max_cpus)
> > >  	int ret;
> > >  	unsigned int curr_cpuid;
> > >
> > > +	init_cpu_topology();

I know arm64 does this, but there is any real reason for us to do so?
@Sudeep, do you know why arm64 calls that each time?
Or if it is worth "saving" that call on riscv, since arm64 is clearly
happily calling it for many years & calling it later would likely head
off a good few allocation issues (like the one we saw with the topology
reworking a few months ago).

Thanks,
Conor.

> > > +
> > >  	curr_cpuid = smp_processor_id();
> > >  	store_cpu_topology(curr_cpuid);
> > >  	numa_store_cpu_info(curr_cpuid);
> > > --
> > > 2.25.1
> > >
> > 
> > Otherwise,
> > 
> > Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
  
Ley Foon Tan Jan. 4, 2023, 5:35 a.m. UTC | #4
> -----Original Message-----
> From: Conor Dooley <conor@kernel.org>
> Sent: Wednesday, 4 January, 2023 1:08 AM
> To: Leyfoon Tan <leyfoon.tan@starfivetech.com>; Sudeep Holla
> <sudeep.holla@arm.com>
> Cc: Andrew Jones <ajones@ventanamicro.com>; Palmer Dabbelt
> <palmer@dabbelt.com>; Paul Walmsley <paul.walmsley@sifive.com>; Albert
> Ou <aou@eecs.berkeley.edu>; linux-riscv@lists.infradead.org; linux-
> kernel@vger.kernel.org; Ley Foon Tan <lftan.linux@gmail.com>
> Subject: Re: [PATCH] riscv: Move call to init_cpu_topology() to later
> initialization stage
> 
> Hello!
> 
> Couple comments for you.
> 
> +CC Sudeep: I've got a question for you below.
> 
> On Tue, Jan 03, 2023 at 07:53:38AM +0000, Leyfoon Tan wrote:
> > > From: Andrew Jones <ajones@ventanamicro.com>
> > > Subject: Re: [PATCH] riscv: Move call to init_cpu_topology() to
> > > later initialization stage On Tue, Jan 03, 2023 at 11:53:16AM +0800,
> > > Ley Foon Tan wrote:
> > > > topology_parse_cpu_capacity() is failed to allocate memory with
> > > > kcalloc() after read "capacity-dmips-mhz" DT parameter in CPU DT
> 
> Uhh, so where did this "capacity-dmips-mhz" property actually come from?
> I had a quick check of qemu with grep & I don't see anything there that
> would add this property.
> This property should not be valid on anything other than arm AFAICT.

This DT parameter is not in default Qemu. I've added it for testing (see test steps in below). 
This is preparation to support asymmetric CPU topology for RISC-V.

> 
> > > > nodes. This
> > > > topology_parse_cpu_capacity() is called from init_cpu_topology(),
> > > > move call to init_cpu_topology() to later initialization  stage
> > > > (after memory allocation is available).
> > > >
> > > > Note, this refers to ARM64 implementation, call
> > > > init_cpu_topology() in smp_prepare_cpus().
> > > >
> > > > Tested on Qemu platform.
> > >
> > > Hi Ley,
> > >
> > > Can you provide the topologies (command lines) tested?
> > 2 clusters with 2 CPU cores each.
> 
> What's the actual commandline for this? I'm not the best with QEMU, so it'd
> really be appreciated, given the above.
I used the Qemu to dump the DTS for 'virt' machine from Qemu, then add the "capacity-dmips-mhz" DT parameter and modify the CPU topology for clusters.

1. Dump the dtb for 'virt' machine:
qemu-system-riscv64 -cpu rv64 -smp 4 -m 2048M -M virt,dumpdtb=qemu_virt.dtb

2. Convert dtb to dts
dtc -I dtb -O dts qemu_virt.dtb > qemu_virt.dts

3. Modifying qemu_virt.dts

4. Compile dts to dtb
dtc -I dts -O dtb qemu_virt.dts > qemu_virt.dtb

5. Pass in "-dtb qemu_virt.dtb" as one of Qemu's argument.

> 
> > > > Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> > >
> > > Fixes tag?
> > Okay, will send out next revision with Fixes tag.
> 
> Please don't just send versions to add tags, Palmer can pick them up if/when
> he applies the patch.
Okay. Will let Palmer add tag below if he applies the patch.

Fixes: 03f11f03dbfe ("RISC-V: Parse cpu topology during boot. ")


> 
> > > >  arch/riscv/kernel/smpboot.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/riscv/kernel/smpboot.c
> > > > b/arch/riscv/kernel/smpboot.c index 3373df413c88..ddb2afba6d25
> > > > 100644
> > > > --- a/arch/riscv/kernel/smpboot.c
> > > > +++ b/arch/riscv/kernel/smpboot.c
> > > > @@ -39,7 +39,6 @@ static DECLARE_COMPLETION(cpu_running);
> > > >
> > > >  void __init smp_prepare_boot_cpu(void)  {
> > > > -	init_cpu_topology();
> > > >  }
> > > >
> > > >  void __init smp_prepare_cpus(unsigned int max_cpus) @@ -48,6
> > > > +47,8
> > > @@
> > > > void __init smp_prepare_cpus(unsigned int max_cpus)
> > > >  	int ret;
> > > >  	unsigned int curr_cpuid;
> > > >
> > > > +	init_cpu_topology();
> 
> I know arm64 does this, but there is any real reason for us to do so?
> @Sudeep, do you know why arm64 calls that each time?
> Or if it is worth "saving" that call on riscv, since arm64 is clearly happily calling
> it for many years & calling it later would likely head off a good few allocation
> issues (like the one we saw with the topology reworking a few months ago).
> 
> Thanks,
> Conor.
> 
> > > > +
> > > >  	curr_cpuid = smp_processor_id();
> > > >  	store_cpu_topology(curr_cpuid);
> > > >  	numa_store_cpu_info(curr_cpuid);
> > > > --
> > > > 2.25.1
> > > >
> > >
> > > Otherwise,
> > >
> > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

Regards
Ley Foon
  
Conor Dooley Jan. 4, 2023, 9:49 a.m. UTC | #5
Sudeep, see below - I got mixed up & what arm64 does now makes sense to me!
Your opinion on the patch, motivation aside, would be nice though.

Hey Leyfoon,


On 4 January 2023 05:35:41 GMT, Leyfoon Tan <leyfoon.tan@starfivetech.com> wrote:
>
>
>> -----Original Message-----
>> From: Conor Dooley <conor@kernel.org>
>> Sent: Wednesday, 4 January, 2023 1:08 AM

---8<---
>> To: Leyfoon Tan <leyfoon.tan@starfivetech.com>; Sudeep Holla
>> <sudeep.holla@arm.com>
>> Cc: Andrew Jones <ajones@ventanamicro.com>; Palmer Dabbelt
>> <palmer@dabbelt.com>; Paul Walmsley <paul.walmsley@sifive.com>; Albert
>> Ou <aou@eecs.berkeley.edu>; linux-riscv@lists.infradead.org; linux-
>> kernel@vger.kernel.org; Ley Foon Tan <lftan.linux@gmail.com>
>> Subject: Re: [PATCH] riscv: Move call to init_cpu_topology() to later
>> initialization stage
---8<---

btw, this above is pretty much useless can you can drop it from replies.

>> 
>> Hello!
>> 
>> Couple comments for you.
>> 
>> +CC Sudeep: I've got a question for you below.
>> 
>> On Tue, Jan 03, 2023 at 07:53:38AM +0000, Leyfoon Tan wrote:
>> > > From: Andrew Jones <ajones@ventanamicro.com>
>> > > Subject: Re: [PATCH] riscv: Move call to init_cpu_topology() to
>> > > later initialization stage On Tue, Jan 03, 2023 at 11:53:16AM +0800,
>> > > Ley Foon Tan wrote:
>> > > > topology_parse_cpu_capacity() is failed to allocate memory with
>> > > > kcalloc() after read "capacity-dmips-mhz" DT parameter in CPU DT
>> 
>> Uhh, so where did this "capacity-dmips-mhz" property actually come from?
>> I had a quick check of qemu with grep & I don't see anything there that
>> would add this property.
>> This property should not be valid on anything other than arm AFAICT.
>
>This DT parameter is not in default Qemu. I've added it for testing (see test steps in below). 
>This is preparation to support asymmetric CPU topology for RISC-V.

The property is only valid on arm, so how does arm64 deal with such asymmetric topologies without it?
Why should we "fix" something that may never be a valid dts?

>
>> 
>> > > > nodes. This
>> > > > topology_parse_cpu_capacity() is called from init_cpu_topology(),
>> > > > move call to init_cpu_topology() to later initialization  stage
>> > > > (after memory allocation is available).
>> > > >
>> > > > Note, this refers to ARM64 implementation, call
>> > > > init_cpu_topology() in smp_prepare_cpus().
>> > > >
>> > > > Tested on Qemu platform.
>> > >
>> > > Hi Ley,
>> > >
>> > > Can you provide the topologies (command lines) tested?
>> > 2 clusters with 2 CPU cores each.
>> 
>> What's the actual commandline for this? I'm not the best with QEMU, so it'd
>> really be appreciated, given the above.
>I used the Qemu to dump the DTS for 'virt' machine from Qemu, then add the "capacity-dmips-mhz" DT parameter and modify the CPU topology for clusters.
>
>1. Dump the dtb for 'virt' machine:
>qemu-system-riscv64 -cpu rv64 -smp 4 -m 2048M -M virt,dumpdtb=qemu_virt.dtb
>
>2. Convert dtb to dts
>dtc -I dtb -O dts qemu_virt.dtb > qemu_virt.dts
>
>3. Modifying qemu_virt.dts
>
>4. Compile dts to dtb
>dtc -I dts -O dtb qemu_virt.dts > qemu_virt.dtb
>
>5. Pass in "-dtb qemu_virt.dtb" as one of Qemu's argument.
>
>> 
>> > > > Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
>> > >
>> > > Fixes tag?
>> > Okay, will send out next revision with Fixes tag.
>> 
>> Please don't just send versions to add tags, Palmer can pick them up if/when
>> he applies the patch.
>Okay. Will let Palmer add tag below if he applies the patch.
>
>Fixes: 03f11f03dbfe ("RISC-V: Parse cpu topology during boot. ")
>
>
>> 
>> > > >  arch/riscv/kernel/smpboot.c | 3 ++-
>> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > > >
>> > > > diff --git a/arch/riscv/kernel/smpboot.c
>> > > > b/arch/riscv/kernel/smpboot.c index 3373df413c88..ddb2afba6d25
>> > > > 100644
>> > > > --- a/arch/riscv/kernel/smpboot.c
>> > > > +++ b/arch/riscv/kernel/smpboot.c
>> > > > @@ -39,7 +39,6 @@ static DECLARE_COMPLETION(cpu_running);
>> > > >
>> > > >  void __init smp_prepare_boot_cpu(void)  {
>> > > > -	init_cpu_topology();
>> > > >  }
>> > > >
>> > > >  void __init smp_prepare_cpus(unsigned int max_cpus) @@ -48,6
>> > > > +47,8
>> > > @@
>> > > > void __init smp_prepare_cpus(unsigned int max_cpus)
>> > > >  	int ret;
>> > > >  	unsigned int curr_cpuid;
>> > > >
>> > > > +	init_cpu_topology();
>> 
>> I know arm64 does this, but there is any real reason for us to do so?
>> @Sudeep, do you know why arm64 calls that each time?

I got myself mixed up between places I fiddled with storing the topology, so you can ignore that question Sudeep.
Clearly it's the one in smp_callin() that gets called for each CPU.
Woops.

>> Or if it is worth "saving" that call on riscv, since arm64 is clearly happily calling
>> it for many years & calling it later would likely head off a good few allocation
>> issues (like the one we saw with the topology reworking a few months ago).

...but is it still worth moving the function call later to head off any allocation failures if core topology code changes?

>> > > > +
>> > > >  	curr_cpuid = smp_processor_id();
>> > > >  	store_cpu_topology(curr_cpuid);
>> > > >  	numa_store_cpu_info(curr_cpuid);
>> > > > --
>> > > > 2.25.1
>> > > >
>> > >
>> > > Otherwise,
>> > >
>> > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
>
>Regards
>Ley Foon
>
  
Sudeep Holla Jan. 4, 2023, 10:41 a.m. UTC | #6
On Tue, Jan 03, 2023 at 05:07:39PM +0000, Conor Dooley wrote:
> Hello!
> 
> Couple comments for you.
> 
> +CC Sudeep: I've got a question for you below.
> 
> On Tue, Jan 03, 2023 at 07:53:38AM +0000, Leyfoon Tan wrote:
> > > From: Andrew Jones <ajones@ventanamicro.com>
> > > Subject: Re: [PATCH] riscv: Move call to init_cpu_topology() to later
> > > initialization stage
> > > On Tue, Jan 03, 2023 at 11:53:16AM +0800, Ley Foon Tan wrote:
> > > > topology_parse_cpu_capacity() is failed to allocate memory with
> > > > kcalloc() after read "capacity-dmips-mhz" DT parameter in CPU DT
> 
> Uhh, so where did this "capacity-dmips-mhz" property actually come from?
> I had a quick check of qemu with grep & I don't see anything there that
> would add this property.

From the DT properties.

> This property should not be valid on anything other than arm AFAICT.
>

Not sure if we restrict that on arm/arm64 only, but yes it is mostly used
on asymmetric CPU systems.


[...]

> > > > diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> > > > index 3373df413c88..ddb2afba6d25 100644
> > > > --- a/arch/riscv/kernel/smpboot.c
> > > > +++ b/arch/riscv/kernel/smpboot.c
> > > > @@ -39,7 +39,6 @@ static DECLARE_COMPLETION(cpu_running);
> > > >
> > > >  void __init smp_prepare_boot_cpu(void)  {
> > > > -	init_cpu_topology();
> > > >  }
> > > >
> > > >  void __init smp_prepare_cpus(unsigned int max_cpus) @@ -48,6 +47,8
> > > @@
> > > > void __init smp_prepare_cpus(unsigned int max_cpus)
> > > >  	int ret;
> > > >  	unsigned int curr_cpuid;
> > > >
> > > > +	init_cpu_topology();
> 
> I know arm64 does this, but there is any real reason for us to do so?
> @Sudeep, do you know why arm64 calls that each time?

Not sure what you are referring as called each time. IIUC smp_prepare_cpus()
must be called only once on the primary during the boot to prepare for
the secondary boot. The difference is by this stage we know the max possible
CPU and supply the same to the call.

> Or if it is worth "saving" that call on riscv, since arm64 is clearly
> happily calling it for many years & calling it later would likely head
> off a good few allocation issues (like the one we saw with the topology
> reworking a few months ago).
>

Yes the failure seems like similar issue we saw with cacheinfo and early
allocation on RISC-V. However I can't recall why we have done this way
on arm64 and not in smp_prepare_boot_cpu() like in RISC-V.

Not sure if that was any helpful.
  
Sudeep Holla Jan. 4, 2023, 10:49 a.m. UTC | #7
On Wed, Jan 04, 2023 at 09:49:48AM +0000, Conor Dooley wrote:

[...]

> >> Uhh, so where did this "capacity-dmips-mhz" property actually come from?
> >> I had a quick check of qemu with grep & I don't see anything there that
> >> would add this property.
> >> This property should not be valid on anything other than arm AFAICT.
> >
> >This DT parameter is not in default Qemu. I've added it for testing (see test steps in below). 
> >This is preparation to support asymmetric CPU topology for RISC-V.
> 
> The property is only valid on arm, so how does arm64 deal with such
> asymmetric topologies without it?

I don't think we can deal with asymmetric topologies without this.
Yes we can detect the difference in the CPU types but we can only assume
there are symmetric in terms of performance in absence of this property.

> Why should we "fix" something that may never be a valid dts?
>

I would not say invalid. But surely absence of it must be handled and
we do that for sure. IIRC, here the presence of it is causing the issue.
And if it is present means someone is trying to build it(I do understand
this is Qemu but is quite common these days for power and performance
balance in many SoC)


[...]

> >> 
> >> I know arm64 does this, but there is any real reason for us to do so?
> >> @Sudeep, do you know why arm64 calls that each time?
> 
> I got myself mixed up between places I fiddled with storing the topology, so you can ignore that question Sudeep.
> Clearly it's the one in smp_callin() that gets called for each CPU.
> Woops.
> 

Hmm I should have read all the messages in the thread. Doing by date/time
didn't work well for me 😄.

> >> Or if it is worth "saving" that call on riscv, since arm64 is clearly happily calling
> >> it for many years & calling it later would likely head off a good few allocation
> >> issues (like the one we saw with the topology reworking a few months ago).
> 
> ...but is it still worth moving the function call later to head off any allocation failures if core topology code changes?
>

Agreed, given how we faced similar issues with cacheinfo on few RISC-V
platforms.
  
Conor Dooley Jan. 4, 2023, 12:18 p.m. UTC | #8
Hey Sudeep,

On Wed, Jan 04, 2023 at 10:49:00AM +0000, Sudeep Holla wrote:
> On Wed, Jan 04, 2023 at 09:49:48AM +0000, Conor Dooley wrote:
> 
> [...]
> 
> > >> Uhh, so where did this "capacity-dmips-mhz" property actually come from?
> > >> I had a quick check of qemu with grep & I don't see anything there that
> > >> would add this property.
> > >> This property should not be valid on anything other than arm AFAICT.
> > >
> > >This DT parameter is not in default Qemu. I've added it for testing (see test steps in below). 
> > >This is preparation to support asymmetric CPU topology for RISC-V.
> > 
> > The property is only valid on arm, so how does arm64 deal with such
> > asymmetric topologies without it?
> 
> I don't think we can deal with asymmetric topologies without this.
> Yes we can detect the difference in the CPU types but we can only assume
> there are symmetric in terms of performance in absence of this property.


I looked at the bindings for it and forgot that the arm directory of
bindings applies to both arm and arm64. I see now that it is also used
on arm64.

> 
> > Why should we "fix" something that may never be a valid dts?
> >
> 
> I would not say invalid. But surely absence of it must be handled and
> we do that for sure. IIRC, here the presence of it is causing the issue.
> And if it is present means someone is trying to build it(I do understand
> this is Qemu but is quite common these days for power and performance
> balance in many SoC)

I said "invalid" as the binding is defined for arm{,64} in arm/cpus.yaml
& documented in the same directory in cpu-capacity.txt, but not yet on
riscv. All bets are off if your cpu node is using invalid properties
IMO, at least this one will fail to boot!

However, I see no reason (at this point) that we should deviate from
what arm{,64} is doing & that documenation should probably move to a
shared location at some point.

> > >> 
> > >> I know arm64 does this, but there is any real reason for us to do so?
> > >> @Sudeep, do you know why arm64 calls that each time?
> > 
> > I got myself mixed up between places I fiddled with storing the topology, so you can ignore that question Sudeep.
> > Clearly it's the one in smp_callin() that gets called for each CPU.
> > Woops.
> > 
> 
> Hmm I should have read all the messages in the thread. Doing by date/time
> didn't work well for me 😄.

Meh, my fault for getting confused ;)

> > >> Or if it is worth "saving" that call on riscv, since arm64 is clearly happily calling
> > >> it for many years & calling it later would likely head off a good few allocation
> > >> issues (like the one we saw with the topology reworking a few months ago).
> > 
> > ...but is it still worth moving the function call later to head off any allocation failures if core topology code changes?
> >
> 
> Agreed, given how we faced similar issues with cacheinfo on few RISC-V
> platforms.

Sweet, sounds like a plan to me. I'll go suggest some commit message
re-wording I think.

Thanks Sudeep!
  
Sudeep Holla Jan. 4, 2023, 12:56 p.m. UTC | #9
On Wed, Jan 04, 2023 at 12:18:28PM +0000, Conor Dooley wrote:
> Hey Sudeep,
> 
> On Wed, Jan 04, 2023 at 10:49:00AM +0000, Sudeep Holla wrote:
> > On Wed, Jan 04, 2023 at 09:49:48AM +0000, Conor Dooley wrote:
> > 
> > [...]
> > 
> > > >> Uhh, so where did this "capacity-dmips-mhz" property actually come from?
> > > >> I had a quick check of qemu with grep & I don't see anything there that
> > > >> would add this property.
> > > >> This property should not be valid on anything other than arm AFAICT.
> > > >
> > > >This DT parameter is not in default Qemu. I've added it for testing (see test steps in below). 
> > > >This is preparation to support asymmetric CPU topology for RISC-V.
> > > 
> > > The property is only valid on arm, so how does arm64 deal with such
> > > asymmetric topologies without it?
> > 
> > I don't think we can deal with asymmetric topologies without this.
> > Yes we can detect the difference in the CPU types but we can only assume
> > there are symmetric in terms of performance in absence of this property.
> 
> 
> I looked at the bindings for it and forgot that the arm directory of
> bindings applies to both arm and arm64. I see now that it is also used
> on arm64.
> 
> > 
> > > Why should we "fix" something that may never be a valid dts?
> > >
> > 
> > I would not say invalid. But surely absence of it must be handled and
> > we do that for sure. IIRC, here the presence of it is causing the issue.
> > And if it is present means someone is trying to build it(I do understand
> > this is Qemu but is quite common these days for power and performance
> > balance in many SoC)
> 
> I said "invalid" as the binding is defined for arm{,64} in arm/cpus.yaml
> & documented in the same directory in cpu-capacity.txt, but not yet on
> riscv. All bets are off if your cpu node is using invalid properties
> IMO, at least this one will fail to boot!
> 
> However, I see no reason (at this point) that we should deviate from
> what arm{,64} is doing & that documenation should probably move to a
> shared location at some point.
>

I prefer making this binding generic rather than patching to handle RISC-V
differently in the generic code. Since it is optional, the platform
need not use it if it is not needed.
  
Conor Dooley Jan. 4, 2023, 1 p.m. UTC | #10
Hey Ley Foon Tan,

Apologies for my various bits of confusion.

On Tue, Jan 03, 2023 at 11:53:16AM +0800, Ley Foon Tan wrote:
> topology_parse_cpu_capacity() is failed to allocate memory with kcalloc()
> after read "capacity-dmips-mhz" DT parameter in CPU DT nodes. This
> topology_parse_cpu_capacity() is called from init_cpu_topology(), move
> call to init_cpu_topology() to later initialization  stage (after memory
> allocation is available).
> 
> Note, this refers to ARM64 implementation, call init_cpu_topology() in
> smp_prepare_cpus().
> 
> Tested on Qemu platform.

I'd like to suggest a change to the commit message:
```
If "capacity-dmips-mhz" is present in a CPU DT node,
topology_parse_cpu_capacity() will fail to allocate memory.
arm64, with which this code path is shared, does not call
topology_parse_cpu_capacity() until later in boot where memory
allocation is available.
While "capacity-dmips-mhz" is not yet a valid property on RISC-V,
invalid properties should be ignored rather than cause issues.
Move init_cpu_topology(), which calls topology_parse_cpu_capacity(),
to a later initialization stage, to match arm64.

As a side effect of this change, RISC-V is "protected" from changes to
core topology code that would work on arm64 where memory allocation is
safe but on RISC-V isn't.
```

You don't need to use exactly that, but with something along those
lines:
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

Thanks,
Conor.

> 
> Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> 
> ---
> 
> In drivers/base/arch_topology.c: topology_parse_cpu_capacity():
> 
> 	ret = of_property_read_u32(cpu_node, "capacity-dmips-mhz",
> 				   &cpu_capacity);
> 	if (!ret) {
> 		if (!raw_capacity) {
> 			raw_capacity = kcalloc(num_possible_cpus(),
> 					       sizeof(*raw_capacity),
> 					       GFP_KERNEL);
> 			if (!raw_capacity) {
> 				cap_parsing_failed = true;
> 				return false;
> 			}
> ---
>  arch/riscv/kernel/smpboot.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> index 3373df413c88..ddb2afba6d25 100644
> --- a/arch/riscv/kernel/smpboot.c
> +++ b/arch/riscv/kernel/smpboot.c
> @@ -39,7 +39,6 @@ static DECLARE_COMPLETION(cpu_running);
>  
>  void __init smp_prepare_boot_cpu(void)
>  {
> -	init_cpu_topology();
>  }
>  
>  void __init smp_prepare_cpus(unsigned int max_cpus)
> @@ -48,6 +47,8 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  	int ret;
>  	unsigned int curr_cpuid;
>  
> +	init_cpu_topology();
> +
>  	curr_cpuid = smp_processor_id();
>  	store_cpu_topology(curr_cpuid);
>  	numa_store_cpu_info(curr_cpuid);
> -- 
> 2.25.1
>
  
Conor Dooley Jan. 4, 2023, 1:24 p.m. UTC | #11
On Wed, Jan 04, 2023 at 12:56:32PM +0000, Sudeep Holla wrote:
> On Wed, Jan 04, 2023 at 12:18:28PM +0000, Conor Dooley wrote:

> > On Wed, Jan 04, 2023 at 10:49:00AM +0000, Sudeep Holla wrote:
> > > On Wed, Jan 04, 2023 at 09:49:48AM +0000, Conor Dooley wrote:

> > > > Why should we "fix" something that may never be a valid dts?
> > > >
> > > 
> > > I would not say invalid. But surely absence of it must be handled and
> > > we do that for sure. IIRC, here the presence of it is causing the issue.
> > > And if it is present means someone is trying to build it(I do understand
> > > this is Qemu but is quite common these days for power and performance
> > > balance in many SoC)
> > 
> > I said "invalid" as the binding is defined for arm{,64} in arm/cpus.yaml
> > & documented in the same directory in cpu-capacity.txt, but not yet on
> > riscv. All bets are off if your cpu node is using invalid properties
> > IMO, at least this one will fail to boot!
> > 
> > However, I see no reason (at this point) that we should deviate from
> > what arm{,64} is doing & that documenation should probably move to a
> > shared location at some point.
> >
> 
> I prefer making this binding generic rather than patching to handle RISC-V
> differently in the generic code. Since it is optional, the platform
> need not use it if it is not needed.

Oh yeah, I was not suggesting making changes in the generic code. We
just need to change our cpu binding to match the arm cpu binding so that
having this property is accepted.

I shall go do that at some point today probably.

Thanks,
Conor.
  
Ley Foon Tan Jan. 5, 2023, 1:45 a.m. UTC | #12
> On Tue, Jan 03, 2023 at 11:53:16AM +0800, Ley Foon Tan wrote:
> > topology_parse_cpu_capacity() is failed to allocate memory with
> > kcalloc() after read "capacity-dmips-mhz" DT parameter in CPU DT
> > nodes. This
> > topology_parse_cpu_capacity() is called from init_cpu_topology(), move
> > call to init_cpu_topology() to later initialization  stage (after
> > memory allocation is available).
> >
> > Note, this refers to ARM64 implementation, call init_cpu_topology() in
> > smp_prepare_cpus().
> >
> > Tested on Qemu platform.
> 
> I'd like to suggest a change to the commit message:
> ```
> If "capacity-dmips-mhz" is present in a CPU DT node,
> topology_parse_cpu_capacity() will fail to allocate memory.
> arm64, with which this code path is shared, does not call
> topology_parse_cpu_capacity() until later in boot where memory allocation
> is available.
> While "capacity-dmips-mhz" is not yet a valid property on RISC-V, invalid
> properties should be ignored rather than cause issues.
> Move init_cpu_topology(), which calls topology_parse_cpu_capacity(), to a
> later initialization stage, to match arm64.
> 
> As a side effect of this change, RISC-V is "protected" from changes to core
> topology code that would work on arm64 where memory allocation is safe
> but on RISC-V isn't.
> ```
> 
> You don't need to use exactly that, but with something along those
> lines:
> Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
> 
> Thanks,
> Conor.
Hi Conor

Thanks for your suggestions. Will send the v2 and update the commit message.

Regards
Ley Foon
  

Patch

diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index 3373df413c88..ddb2afba6d25 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -39,7 +39,6 @@  static DECLARE_COMPLETION(cpu_running);
 
 void __init smp_prepare_boot_cpu(void)
 {
-	init_cpu_topology();
 }
 
 void __init smp_prepare_cpus(unsigned int max_cpus)
@@ -48,6 +47,8 @@  void __init smp_prepare_cpus(unsigned int max_cpus)
 	int ret;
 	unsigned int curr_cpuid;
 
+	init_cpu_topology();
+
 	curr_cpuid = smp_processor_id();
 	store_cpu_topology(curr_cpuid);
 	numa_store_cpu_info(curr_cpuid);