memblock: Add error message when memblock_can_resize is not ready

Message ID 20230614131746.3670303-1-songshuaishuai@tinylab.org
State New
Headers
Series memblock: Add error message when memblock_can_resize is not ready |

Commit Message

Song Shuai June 14, 2023, 1:17 p.m. UTC
  The memblock APIs are always correct, thus the callers usually don't
handle the return code. But the failure caused by unready memblock_can_resize
is hard to recognize without the return code. Like this piece of log:

```
[    0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
[    0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
[    0.000000] Oops - store (or AMO) access fault [#1]
```

So add an error message for this kind of failure:

```
[    0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
[    0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
[    0.000000] memblock: Can't double reserved array for area start 0x000000017ffff000 size 4096
[    0.000000] Oops - store (or AMO) access fault [#1]
```

Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
---
 mm/memblock.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
  

Comments

Mike Rapoport June 14, 2023, 4:07 p.m. UTC | #1
Hi,

On Wed, Jun 14, 2023 at 09:17:46PM +0800, Song Shuai wrote:
> The memblock APIs are always correct, thus the callers usually don't
> handle the return code. But the failure caused by unready memblock_can_resize
> is hard to recognize without the return code. Like this piece of log:

Please make it clear that failure is in memblock_double_array(), e.g.

But when memblock_double_array() is called before memblock_can_resize
is true, it is hard to understand the actual reason for the failure.

> 
> ```
> [    0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
> [    0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
> [    0.000000] Oops - store (or AMO) access fault [#1]
> ```
> 
> So add an error message for this kind of failure:
> 
> ```
> [    0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
> [    0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
> [    0.000000] memblock: Can't double reserved array for area start 0x000000017ffff000 size 4096
> [    0.000000] Oops - store (or AMO) access fault [#1]
> ```
> 
> Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
> ---
>  mm/memblock.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 3feafea06ab2..ab952a164f62 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -418,8 +418,11 @@ static int __init_memblock memblock_double_array(struct memblock_type *type,
>  	/* We don't allow resizing until we know about the reserved regions
>  	 * of memory that aren't suitable for allocation
>  	 */
> -	if (!memblock_can_resize)
> +	if (!memblock_can_resize) {
> +		pr_err("memblock: Can't double %s array for area start %pa size %ld\n",
> +			type->name, &new_area_start, (unsigned long)new_area_size);

Most of the time memblock uses %llu and cast to u64 to print size, please
make this consistent.

>  		return -1;
> +	}
>  
>  	/* Calculate new doubled size */
>  	old_size = type->max * sizeof(struct memblock_region);
> -- 
> 2.20.1
> 
>
  
Song Shuai June 20, 2023, 7:04 a.m. UTC | #2
Sorry for not replying to you in time

在 2023/6/15 00:07, Mike Rapoport 写道:
> Hi,
> 
> On Wed, Jun 14, 2023 at 09:17:46PM +0800, Song Shuai wrote:
>> The memblock APIs are always correct, thus the callers usually don't
>> handle the return code. But the failure caused by unready memblock_can_resize
>> is hard to recognize without the return code. Like this piece of log:
> 
> Please make it clear that failure is in memblock_double_array(), e.g.
> 

Having numerous memblock reservations at early boot where 
memblock_can_resize is unset
may exhaust the INIT_MEMBLOCK_REGIONS sized memblock.reserved regions 
and try to
double the region array via memblock_double_array() which fails and 
returns -1 to the caller.

You can find the numerous memblock reservations reported by this commit
24cc61d8cb5a ("arm64: memblock: don't permit memblock resizing until 
linear mapping is up").
And the similar test sense can be simulated by a constructed dtb with 
numerous discrete
/memreserve/ or /reserved-memory regions.

> But when memblock_double_array() is called before memblock_can_resize
> is true, it is hard to understand the actual reason for the failure.
> 
>>
>> ```
>> [    0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
>> [    0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
>> [    0.000000] Oops - store (or AMO) access fault [#1]
>> ```
>>
>> So add an error message for this kind of failure:
>>
>> ```
>> [    0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
>> [    0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
>> [    0.000000] memblock: Can't double reserved array for area start 0x000000017ffff000 size 4096
>> [    0.000000] Oops - store (or AMO) access fault [#1]
>> ```
>>
>> Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
>> ---
>>   mm/memblock.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 3feafea06ab2..ab952a164f62 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -418,8 +418,11 @@ static int __init_memblock memblock_double_array(struct memblock_type *type,
>>   	/* We don't allow resizing until we know about the reserved regions
>>   	 * of memory that aren't suitable for allocation
>>   	 */
>> -	if (!memblock_can_resize)
>> +	if (!memblock_can_resize) {
>> +		pr_err("memblock: Can't double %s array for area start %pa size %ld\n",
>> +			type->name, &new_area_start, (unsigned long)new_area_size);
> 
> Most of the time memblock uses %llu and cast to u64 to print size, please
> make this consistent.
I will fix it in next version if the above description is ok for you.
> 
>>   		return -1;
>> +	}
>>   
>>   	/* Calculate new doubled size */
>>   	old_size = type->max * sizeof(struct memblock_region);
>> -- 
>> 2.20.1
>>
>>
>
  
Mike Rapoport June 21, 2023, 3:33 p.m. UTC | #3
On Tue, Jun 20, 2023 at 03:04:55PM +0800, Song Shuai wrote:
> Sorry for not replying to you in time
> 
> 在 2023/6/15 00:07, Mike Rapoport 写道:
> > Hi,
> > 
> > On Wed, Jun 14, 2023 at 09:17:46PM +0800, Song Shuai wrote:
> > > The memblock APIs are always correct, thus the callers usually don't
> > > handle the return code. But the failure caused by unready memblock_can_resize
> > > is hard to recognize without the return code. Like this piece of log:
> > 
> > Please make it clear that failure is in memblock_double_array(), e.g.
> > 
> 
> Having numerous memblock reservations at early boot where
> memblock_can_resize is unset
> may exhaust the INIT_MEMBLOCK_REGIONS sized memblock.reserved regions and
> try to
> double the region array via memblock_double_array() which fails and returns
> -1 to the caller.
> 
> You can find the numerous memblock reservations reported by this commit
> 24cc61d8cb5a ("arm64: memblock: don't permit memblock resizing until linear
> mapping is up").
> And the similar test sense can be simulated by a constructed dtb with
> numerous discrete
> /memreserve/ or /reserved-memory regions.

Ideally, the callers of memblock_reserve() should check the return value
and panic with a meaningful message if it fails. Still, for now something
like this patch is an improvement.
 
How about we make the changelog to be something like:

Subject: memblock: report failures when memblock_can_resize is not set

The callers of memblock_reserve() do not check the return value presuming
that memblock_reserve() always succeeds, but there are cases where it may
fail.

Having numerous memblock reservations at early boot where
memblock_can_resize is unset may exhaust the INIT_MEMBLOCK_REGIONS sized
memblock.reserved regions array and an attempt to double this array via
memblock_double_array() will fail and will return -1 to the caller.

When this happens the system crashes anyway, but it's hard to identify the
reason for the crash.

Add a panic message to memblock_double_array() to aid debugging of the
cases when too many regions are reserved before memblock can resize
memblock.reserved array.

> > But when memblock_double_array() is called before memblock_can_resize
> > is true, it is hard to understand the actual reason for the failure.
> > 
> > > 
> > > ```
> > > [    0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
> > > [    0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
> > > [    0.000000] Oops - store (or AMO) access fault [#1]
> > > ```
> > > 
> > > So add an error message for this kind of failure:
> > > 
> > > ```
> > > [    0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c
> > > [    0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128
> > > [    0.000000] memblock: Can't double reserved array for area start 0x000000017ffff000 size 4096
> > > [    0.000000] Oops - store (or AMO) access fault [#1]
> > > ```
> > > 
> > > Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
> > > ---
> > >   mm/memblock.c | 5 ++++-
> > >   1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/memblock.c b/mm/memblock.c
> > > index 3feafea06ab2..ab952a164f62 100644
> > > --- a/mm/memblock.c
> > > +++ b/mm/memblock.c
> > > @@ -418,8 +418,11 @@ static int __init_memblock memblock_double_array(struct memblock_type *type,
> > >   	/* We don't allow resizing until we know about the reserved regions
> > >   	 * of memory that aren't suitable for allocation
> > >   	 */
> > > -	if (!memblock_can_resize)
> > > +	if (!memblock_can_resize) {
> > > +		pr_err("memblock: Can't double %s array for area start %pa size %ld\n",
> > > +			type->name, &new_area_start, (unsigned long)new_area_size);

The system will crash anyway if we get, here, so why won't use panic?
Also, dumping new_area_start here does not add any information but rather
confuses. How about

	panic("memblock: cannot resize %s array\n", type->name);

> > 
> > Most of the time memblock uses %llu and cast to u64 to print size, please
> > make this consistent.
> I will fix it in next version if the above description is ok for you.
> > 
> > >   		return -1;
> > > +	}
> > >   	/* Calculate new doubled size */
> > >   	old_size = type->max * sizeof(struct memblock_region);
> 
> -- 
> Thanks
> Song Shuai
> 
>
  

Patch

diff --git a/mm/memblock.c b/mm/memblock.c
index 3feafea06ab2..ab952a164f62 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -418,8 +418,11 @@  static int __init_memblock memblock_double_array(struct memblock_type *type,
 	/* We don't allow resizing until we know about the reserved regions
 	 * of memory that aren't suitable for allocation
 	 */
-	if (!memblock_can_resize)
+	if (!memblock_can_resize) {
+		pr_err("memblock: Can't double %s array for area start %pa size %ld\n",
+			type->name, &new_area_start, (unsigned long)new_area_size);
 		return -1;
+	}
 
 	/* Calculate new doubled size */
 	old_size = type->max * sizeof(struct memblock_region);