[v4] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

Message ID 20230306084124.300584-1-bhe@redhat.com
State New
Headers
Series [v4] arm64: kdump: simplify the reservation behaviour of crashkernel=,high |

Commit Message

Baoquan He March 6, 2023, 8:41 a.m. UTC
  On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
suitable memory region top down. If the 'xM' of crashkernel high memory
is reserved from high memory successfully, it will try to reserve
crashkernel low memory later accoringly. Otherwise, it will try to search
low memory area for the 'xM' suitable region. Please see the details in
Documentation/admin-guide/kernel-parameters.txt.

While we observed an unexpected case where a reserved region crosses the
high and low meomry boundary. E.g on a system with 4G as low memory end,
user added the kernel parameters like: 'crashkernel=512M,high', it could
finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
The crashkernel high region crossing low and high memory boudary will bring
issues:

1) For crashkernel=x,high, if getting crashkernel high region across
low and high memory boundary, then user will see two memory regions in
low memory, and one memory region in high memory. The two crashkernel
low memory regions are confusing as shown in above example.

2) If people explicityly specify "crashkernel=x,high crashkernel=y,low"
and y <= 128M, when crashkernel high region crosses low and high memory
boundary and the part of crashkernel high reservation below boundary is
bigger than y, the expected crahskernel low reservation will be skipped.
But the expected crashkernel high reservation is shrank and could not
satisfy user space requirement.

3) The crossing boundary behaviour of crahskernel high reservation is
different than x86 arch. On x86_64, the low memory end is 4G fixedly,
and the memory near 4G is reserved by system, e.g for mapping firmware,
pci mapping, so the crashkernel reservation crossing boundary never happens.
From distros point of view, this brings inconsistency and confusion. Users
need to dig into x86 and arm64 system details to find out why.

For kernel itself, the impact of issue 3) could be slight. While issue
1) and 2) cause actual impact because it brings obscure semantics and
behaviour to crashkernel=,high reservation.

Here, for crashkernel=xM,high, search the high memory for the suitable
region only in high memory. If failed, try reserving the suitable
region only in low memory. Like this, the crashkernel high region will
only exist in high memory, and crashkernel low region only exists in low
memory. The reservation behaviour for crashkernel=,high is clearer and
simpler.

Note: On arm64, the high and low memory boudary could be 1G if it's RPi4
system, or 4G if other normal systems.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
Change history:
v3->v4:
 - Change pr_info() to pr_warn() when fixed allocation failed. Leizhen
   pointed out this and suggested changing.
v2->v3:
 - Rephrase patch log to clarify the current crashkernel high
   reservation could cross the high and low memory boundary, but not
   4G boundary only, because RPi4 of arm64 has high and low memory
   boudary as 1G. The v3 patch log could mislead people that the RPi4
   also use 4G as high,low memory boundary.
v1->v2:
 - Fold patch 2 of v1 into patch 1 for better reviewing.
 - Update patch log to add more details.

 arch/arm64/mm/init.c | 43 +++++++++++++++++++++++++++++++++----------
 1 file changed, 33 insertions(+), 10 deletions(-)
  

Comments

Zhen Lei March 6, 2023, 12:55 p.m. UTC | #1
On 2023/3/6 16:41, Baoquan He wrote:
> On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> suitable memory region top down. If the 'xM' of crashkernel high memory
> is reserved from high memory successfully, it will try to reserve
> crashkernel low memory later accoringly. Otherwise, it will try to search
> low memory area for the 'xM' suitable region. Please see the details in
> Documentation/admin-guide/kernel-parameters.txt.
> 
> While we observed an unexpected case where a reserved region crosses the
> high and low meomry boundary. E.g on a system with 4G as low memory end,
> user added the kernel parameters like: 'crashkernel=512M,high', it could
> finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
> The crashkernel high region crossing low and high memory boudary will bring
> issues:
> 
> 1) For crashkernel=x,high, if getting crashkernel high region across
> low and high memory boundary, then user will see two memory regions in
> low memory, and one memory region in high memory. The two crashkernel
> low memory regions are confusing as shown in above example.
> 
> 2) If people explicityly specify "crashkernel=x,high crashkernel=y,low"
> and y <= 128M, when crashkernel high region crosses low and high memory
> boundary and the part of crashkernel high reservation below boundary is
> bigger than y, the expected crahskernel low reservation will be skipped.
> But the expected crashkernel high reservation is shrank and could not
> satisfy user space requirement.
> 
> 3) The crossing boundary behaviour of crahskernel high reservation is
> different than x86 arch. On x86_64, the low memory end is 4G fixedly,
> and the memory near 4G is reserved by system, e.g for mapping firmware,
> pci mapping, so the crashkernel reservation crossing boundary never happens.
>>From distros point of view, this brings inconsistency and confusion. Users
> need to dig into x86 and arm64 system details to find out why.
> 
> For kernel itself, the impact of issue 3) could be slight. While issue
> 1) and 2) cause actual impact because it brings obscure semantics and
> behaviour to crashkernel=,high reservation.
> 
> Here, for crashkernel=xM,high, search the high memory for the suitable
> region only in high memory. If failed, try reserving the suitable
> region only in low memory. Like this, the crashkernel high region will
> only exist in high memory, and crashkernel low region only exists in low
> memory. The reservation behaviour for crashkernel=,high is clearer and
> simpler.
> 
> Note: On arm64, the high and low memory boudary could be 1G if it's RPi4
> system, or 4G if other normal systems.

Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>

> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
> Change history:
> v3->v4:
>  - Change pr_info() to pr_warn() when fixed allocation failed. Leizhen
>    pointed out this and suggested changing.
> v2->v3:
>  - Rephrase patch log to clarify the current crashkernel high
>    reservation could cross the high and low memory boundary, but not
>    4G boundary only, because RPi4 of arm64 has high and low memory
>    boudary as 1G. The v3 patch log could mislead people that the RPi4
>    also use 4G as high,low memory boundary.
> v1->v2:
>  - Fold patch 2 of v1 into patch 1 for better reviewing.
>  - Update patch log to add more details.
> 
>  arch/arm64/mm/init.c | 43 +++++++++++++++++++++++++++++++++----------
>  1 file changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 58a0bb2c17f1..307763f97549 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -127,12 +127,13 @@ static int __init reserve_crashkernel_low(unsigned long long low_size)
>   */
>  static void __init reserve_crashkernel(void)
>  {
> -	unsigned long long crash_base, crash_size;
> -	unsigned long long crash_low_size = 0;
> +	unsigned long long crash_base, crash_size, search_base;
>  	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> +	unsigned long long crash_low_size = 0;
>  	char *cmdline = boot_command_line;
> -	int ret;
>  	bool fixed_base = false;
> +	bool high = false;
> +	int ret;
>  
>  	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
>  		return;
> @@ -155,7 +156,9 @@ static void __init reserve_crashkernel(void)
>  		else if (ret)
>  			return;
>  
> +		search_base = CRASH_ADDR_LOW_MAX;
>  		crash_max = CRASH_ADDR_HIGH_MAX;
> +		high = true;
>  	} else if (ret || !crash_size) {
>  		/* The specified value is invalid */
>  		return;
> @@ -166,31 +169,51 @@ static void __init reserve_crashkernel(void)
>  	/* User specifies base address explicitly. */
>  	if (crash_base) {
>  		fixed_base = true;
> +		search_base = crash_base;
>  		crash_max = crash_base + crash_size;
>  	}
>  
>  retry:
>  	crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> -					       crash_base, crash_max);
> +					       search_base, crash_max);
>  	if (!crash_base) {
>  		/*
> -		 * If the first attempt was for low memory, fall back to
> -		 * high memory, the minimum required low memory will be
> -		 * reserved later.
> +		 * For crashkernel=size[KMG]@offset[KMG], print out failure
> +		 * message if can't reserve the specified region.
>  		 */
> -		if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) {
> +		if (fixed_base) {
> +			pr_warn("crashkernel reservation failed - memory is in use.\n");
> +			return;
> +		}
> +
> +		/*
> +		 * For crashkernel=size[KMG], if the first attempt was for
> +		 * low memory, fall back to high memory, the minimum required
> +		 * low memory will be reserved later.
> +		 */
> +		if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
>  			crash_max = CRASH_ADDR_HIGH_MAX;
> +			search_base = CRASH_ADDR_LOW_MAX;
>  			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
>  			goto retry;
>  		}
>  
> +		/*
> +		 * For crashkernel=size[KMG],high, if the first attempt was
> +		 * for high memory, fall back to low memory.
> +		 */
> +		if (high && crash_max == CRASH_ADDR_HIGH_MAX) {
> +			crash_max = CRASH_ADDR_LOW_MAX;
> +			search_base = 0;
> +			goto retry;
> +		}
>  		pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>  			crash_size);
>  		return;
>  	}
>  
> -	if ((crash_base > CRASH_ADDR_LOW_MAX - crash_low_size) &&
> -	     crash_low_size && reserve_crashkernel_low(crash_low_size)) {
> +	if ((crash_base >= CRASH_ADDR_LOW_MAX) && crash_low_size &&
> +	     reserve_crashkernel_low(crash_low_size)) {
>  		memblock_phys_free(crash_base, crash_size);
>  		return;
>  	}
>
  
Simon Horman March 8, 2023, 11:02 a.m. UTC | #2
On Mon, Mar 06, 2023 at 04:41:24PM +0800, Baoquan He wrote:
> On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> suitable memory region top down. If the 'xM' of crashkernel high memory
> is reserved from high memory successfully, it will try to reserve
> crashkernel low memory later accoringly. Otherwise, it will try to search
> low memory area for the 'xM' suitable region. Please see the details in
> Documentation/admin-guide/kernel-parameters.txt.
> 
> While we observed an unexpected case where a reserved region crosses the
> high and low meomry boundary. E.g on a system with 4G as low memory end,
> user added the kernel parameters like: 'crashkernel=512M,high', it could
> finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
> The crashkernel high region crossing low and high memory boudary will bring
> issues:
> 
> 1) For crashkernel=x,high, if getting crashkernel high region across
> low and high memory boundary, then user will see two memory regions in
> low memory, and one memory region in high memory. The two crashkernel
> low memory regions are confusing as shown in above example.
> 
> 2) If people explicityly specify "crashkernel=x,high crashkernel=y,low"
> and y <= 128M, when crashkernel high region crosses low and high memory
> boundary and the part of crashkernel high reservation below boundary is
> bigger than y, the expected crahskernel low reservation will be skipped.
> But the expected crashkernel high reservation is shrank and could not
> satisfy user space requirement.
> 
> 3) The crossing boundary behaviour of crahskernel high reservation is
> different than x86 arch. On x86_64, the low memory end is 4G fixedly,
> and the memory near 4G is reserved by system, e.g for mapping firmware,
> pci mapping, so the crashkernel reservation crossing boundary never happens.
> >From distros point of view, this brings inconsistency and confusion. Users
> need to dig into x86 and arm64 system details to find out why.
> 
> For kernel itself, the impact of issue 3) could be slight. While issue
> 1) and 2) cause actual impact because it brings obscure semantics and
> behaviour to crashkernel=,high reservation.
> 
> Here, for crashkernel=xM,high, search the high memory for the suitable
> region only in high memory. If failed, try reserving the suitable
> region only in low memory. Like this, the crashkernel high region will
> only exist in high memory, and crashkernel low region only exists in low
> memory. The reservation behaviour for crashkernel=,high is clearer and
> simpler.
> 
> Note: On arm64, the high and low memory boudary could be 1G if it's RPi4
> system, or 4G if other normal systems.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>

Reviewed-by: Simon Horman <horms@kernel.org>
  
Catalin Marinas March 15, 2023, 2:52 p.m. UTC | #3
On Mon, Mar 06, 2023 at 04:41:24PM +0800, Baoquan He wrote:
> On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> suitable memory region top down. If the 'xM' of crashkernel high memory
> is reserved from high memory successfully, it will try to reserve
> crashkernel low memory later accoringly. Otherwise, it will try to search
> low memory area for the 'xM' suitable region. Please see the details in
> Documentation/admin-guide/kernel-parameters.txt.
> 
> While we observed an unexpected case where a reserved region crosses the
> high and low meomry boundary. E.g on a system with 4G as low memory end,
> user added the kernel parameters like: 'crashkernel=512M,high', it could
> finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
> The crashkernel high region crossing low and high memory boudary will bring
> issues:
[...]
> Note: On arm64, the high and low memory boudary could be 1G if it's RPi4
> system, or 4G if other normal systems.

I'm mostly ok with the reworking but I'm not sure about the non-fixed
low memory boundary. As I mentioned on v2, the user doesn't (need to)
know about the ZONE_DMA limit on a specific platform, that's supposed to
be fairly transparent. So on RPi4, specifying 'high' still allows
allocation below 4G which some users may treat as 'low'. The
kernel-parameters.txt doc also only talks about the 4G limit.

> +		/*
> +		 * For crashkernel=size[KMG], if the first attempt was for
> +		 * low memory, fall back to high memory, the minimum required
> +		 * low memory will be reserved later.
> +		 */
> +		if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
>  			crash_max = CRASH_ADDR_HIGH_MAX;
> +			search_base = CRASH_ADDR_LOW_MAX;
>  			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
>  			goto retry;
>  		}

So I'm more tempted to set the search_base to 4G here rather than
CRASH_ADDR_LOW_MAX. The crashkernel=x,high option on a RPi4 with all
memory below 4G will fall back to low allocation. But RPi4 is the odd
one out, so I think we can live with this.
  
Baoquan He March 16, 2023, 9:47 a.m. UTC | #4
Hi Catalin,

On 03/15/23 at 02:52pm, Catalin Marinas wrote:
> On Mon, Mar 06, 2023 at 04:41:24PM +0800, Baoquan He wrote:
> > On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> > suitable memory region top down. If the 'xM' of crashkernel high memory
> > is reserved from high memory successfully, it will try to reserve
> > crashkernel low memory later accoringly. Otherwise, it will try to search
> > low memory area for the 'xM' suitable region. Please see the details in
> > Documentation/admin-guide/kernel-parameters.txt.
> > 
> > While we observed an unexpected case where a reserved region crosses the
> > high and low meomry boundary. E.g on a system with 4G as low memory end,
> > user added the kernel parameters like: 'crashkernel=512M,high', it could
> > finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
> > The crashkernel high region crossing low and high memory boudary will bring
> > issues:
> [...]
> > Note: On arm64, the high and low memory boudary could be 1G if it's RPi4
> > system, or 4G if other normal systems.

Thanks for looking into this.

> I'm mostly ok with the reworking but I'm not sure about the non-fixed
> low memory boundary. As I mentioned on v2, the user doesn't (need to)
> know about the ZONE_DMA limit on a specific platform, that's supposed to
> be fairly transparent. So on RPi4, specifying 'high' still allows
> allocation below 4G which some users may treat as 'low'. The
> kernel-parameters.txt doc also only talks about the 4G limit.

With my understanding and the current arm64 code, the boundary of 'high'
and 'low' is truly different on different platform. E.g on RPi4, its
device does need memory below 1G to initialize, otherwise it will fail
kernel bootup. I ever asked people to help test this on RPi4, with
crashkernel reservation all above 1G, kdump kernel can't boot up.

And now, the upper boundary of low memory has been decided by macro
CRASH_ADDR_LOW_MAX. It says how big the low memory end is. That menas
on RPi4, it's 1G, on other normal systems, it's 4G.

#define CRASH_ADDR_LOW_MAX              arm64_dma_phys_limit

As for kernel-parameters.txt, the 'crashkernel=size[KMG],high' part need
be rephrased to reflect the fact on arm64.

So I made this patch based on the facts and undertanding. However, I
like the idea that taking 4G as the fixed boudary of low and high memory
on arm64. Please see the reply at bottom.

> 
> > +		/*
> > +		 * For crashkernel=size[KMG], if the first attempt was for
> > +		 * low memory, fall back to high memory, the minimum required
> > +		 * low memory will be reserved later.
> > +		 */
> > +		if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
> >  			crash_max = CRASH_ADDR_HIGH_MAX;
> > +			search_base = CRASH_ADDR_LOW_MAX;
> >  			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
> >  			goto retry;
> >  		}
> 
> So I'm more tempted to set the search_base to 4G here rather than
> CRASH_ADDR_LOW_MAX. The crashkernel=x,high option on a RPi4 with all
> memory below 4G will fall back to low allocation. But RPi4 is the odd
> one out, so I think we can live with this.

I totally agree with you that we should take 4G as the fixed boundary of
low and high memory because kdump is aimed at workstation and server
platform. We can leave RPi4 to use crashkernel=size[KMG][@offset[KMG]]
to specify a region if people have to use.

[PATCH 0/2] arm64, kdump: enforce to take 4G as the crashkernel low memory end
https://lore.kernel.org/all/20220828005545.94389-1-bhe@redhat.com/T/#u

Now I am wondering if we should change CRASH_ADDR_LOW_MAX to 4G directly
since we decide to take 4G as the low memory limit when doing 'high'
reserving or falling back. This is just like what we have been doing in
x86_64. Not sure if I follow you correctly.

===========================x86_64 low memory end for reference=======
On x86_64, we fixedly have DMA zone at 0~16M, and DMA32 zone at
16M~4G, still we have boundary of low memory and high memory at 4G.
However, it's not decided in the first place. Initially, we disregarded
devices which only address 24bit range, namely 16M, set the
CRASH_ADDR_LOW_MAX as 896M because that is compatible with the 32bit
kernel which could be running on the x86_64 bare metal machine. Later,
we realized 32bit x86 support for kdump is not needed any more because
of the rare deployment in server. It was done in below commit. 

commit 9ca5c8e632ce8f144ec6d00da2dc5e16b41d593c
Author: Dave Young <dyoung@redhat.com>
Date:   Sun Apr 21 11:50:59 2019 +0800

    x86/kdump: Have crashkernel=X reserve under 4G by default

arch/x86/kernel/setup.c:
-------------------------
#ifdef CONFIG_X86_32
# define CRASH_ADDR_LOW_MAX     SZ_512M
# define CRASH_ADDR_HIGH_MAX    SZ_512M
#else
# define CRASH_ADDR_LOW_MAX     SZ_4G
# define CRASH_ADDR_HIGH_MAX    SZ_64T
#endif
  
Catalin Marinas March 16, 2023, 5:35 p.m. UTC | #5
On Thu, Mar 16, 2023 at 05:47:52PM +0800, Baoquan He wrote:
> On 03/15/23 at 02:52pm, Catalin Marinas wrote:
> > On Mon, Mar 06, 2023 at 04:41:24PM +0800, Baoquan He wrote:
> > > +		/*
> > > +		 * For crashkernel=size[KMG], if the first attempt was for
> > > +		 * low memory, fall back to high memory, the minimum required
> > > +		 * low memory will be reserved later.
> > > +		 */
> > > +		if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
> > >  			crash_max = CRASH_ADDR_HIGH_MAX;
> > > +			search_base = CRASH_ADDR_LOW_MAX;
> > >  			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
> > >  			goto retry;
> > >  		}
> > 
> > So I'm more tempted to set the search_base to 4G here rather than
> > CRASH_ADDR_LOW_MAX. The crashkernel=x,high option on a RPi4 with all
> > memory below 4G will fall back to low allocation. But RPi4 is the odd
> > one out, so I think we can live with this.
> 
> I totally agree with you that we should take 4G as the fixed boundary of
> low and high memory because kdump is aimed at workstation and server
> platform. We can leave RPi4 to use crashkernel=size[KMG][@offset[KMG]]
> to specify a region if people have to use.
> 
> [PATCH 0/2] arm64, kdump: enforce to take 4G as the crashkernel low memory end
> https://lore.kernel.org/all/20220828005545.94389-1-bhe@redhat.com/T/#u
> 
> Now I am wondering if we should change CRASH_ADDR_LOW_MAX to 4G directly
> since we decide to take 4G as the low memory limit when doing 'high'
> reserving or falling back. This is just like what we have been doing in
> x86_64. Not sure if I follow you correctly.

On RPi4, we do need the 'low' allocation to be below 1GB, otherwise
ZONE_DMA will be empty. But we can leave the 'high' reservation above
4GB (if available). The downside is that we won't get anything between
1GB and 4GB unless explicitly specified with @offset.

I'm not entirely sure what you want to achieve: avoiding the 'high'
reservation going across an arbitrary boundary (1GB or 4GB) that the
user may not be aware of or just avoiding the 'high' reservation going
across a 4G boundary? On RPi4, if the 'high' reservation above 4GB
fails, should it fall back to below 1GB reservation or to somewhere
between 1GB and 4GB, making sure it doesn't cross any of these two
boundaries? For someone unfamiliar with the ZONE_DMA on RPi4, the latter
would look like two 'low' allocations below 4GB.
  
Baoquan He March 17, 2023, 3:09 p.m. UTC | #6
On 03/16/23 at 05:35pm, Catalin Marinas wrote:
> On Thu, Mar 16, 2023 at 05:47:52PM +0800, Baoquan He wrote:
> > On 03/15/23 at 02:52pm, Catalin Marinas wrote:
> > > On Mon, Mar 06, 2023 at 04:41:24PM +0800, Baoquan He wrote:
> > > > +		/*
> > > > +		 * For crashkernel=size[KMG], if the first attempt was for
> > > > +		 * low memory, fall back to high memory, the minimum required
> > > > +		 * low memory will be reserved later.
> > > > +		 */
> > > > +		if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
> > > >  			crash_max = CRASH_ADDR_HIGH_MAX;
> > > > +			search_base = CRASH_ADDR_LOW_MAX;
> > > >  			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
> > > >  			goto retry;
> > > >  		}
> > > 
> > > So I'm more tempted to set the search_base to 4G here rather than
> > > CRASH_ADDR_LOW_MAX. The crashkernel=x,high option on a RPi4 with all
> > > memory below 4G will fall back to low allocation. But RPi4 is the odd
> > > one out, so I think we can live with this.
> > 
> > I totally agree with you that we should take 4G as the fixed boundary of
> > low and high memory because kdump is aimed at workstation and server
> > platform. We can leave RPi4 to use crashkernel=size[KMG][@offset[KMG]]
> > to specify a region if people have to use.
> > 
> > [PATCH 0/2] arm64, kdump: enforce to take 4G as the crashkernel low memory end
> > https://lore.kernel.org/all/20220828005545.94389-1-bhe@redhat.com/T/#u
> > 
> > Now I am wondering if we should change CRASH_ADDR_LOW_MAX to 4G directly
> > since we decide to take 4G as the low memory limit when doing 'high'
> > reserving or falling back. This is just like what we have been doing in
> > x86_64. Not sure if I follow you correctly.
> 
> On RPi4, we do need the 'low' allocation to be below 1GB, otherwise
> ZONE_DMA will be empty. But we can leave the 'high' reservation above
> 4GB (if available). The downside is that we won't get anything between
> 1GB and 4GB unless explicitly specified with @offset.

Ah, I see. You want to have high reservation like below when RPi4.

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 307763f97549..9e558fadf483 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -95,6 +95,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
 
 #define CRASH_ADDR_LOW_MAX		arm64_dma_phys_limit
 #define CRASH_ADDR_HIGH_MAX		(PHYS_MASK + 1)
+#define CRASH_HIGH_SEARCH_BASE		SZ_4G
 
 #define DEFAULT_CRASH_KERNEL_LOW_SIZE	(128UL << 20)
 
@@ -156,7 +157,7 @@ static void __init reserve_crashkernel(void)
 		else if (ret)
 			return;
 
-		search_base = CRASH_ADDR_LOW_MAX;
+		search_base = CRASH_HIGH_SEARCH_BASE;
 		crash_max = CRASH_ADDR_HIGH_MAX;
 		high = true;
 	} else if (ret || !crash_size) {
@@ -193,7 +194,7 @@ static void __init reserve_crashkernel(void)
 		 */
 		if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
 			crash_max = CRASH_ADDR_HIGH_MAX;
-			search_base = CRASH_ADDR_LOW_MAX;
+			search_base = CRASH_HIGH_SEARCH_BASE;
 			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
 			goto retry;
 		}

> 
> I'm not entirely sure what you want to achieve: avoiding the 'high'
> reservation going across an arbitrary boundary (1GB or 4GB) that the
> user may not be aware of or just avoiding the 'high' reservation going
> across a 4G boundary? On RPi4, if the 'high' reservation above 4GB
> fails, should it fall back to below 1GB reservation or to somewhere
> between 1GB and 4GB, making sure it doesn't cross any of these two
> boundaries? For someone unfamiliar with the ZONE_DMA on RPi4, the latter
> would look like two 'low' allocations below 4GB.

Currently, with the existing code, the high and low memory is like below
on arm64:
RPi4:
0~1G: low memory | 1G~top: high memory

Other normal system:
0~4G: low memory | 4G~top: high memory

Now you want RPi4 to have low and high crahskernel reservation like
this: 
0~1G: low memory | 4G~top: high memory
It's also fine. However, as far as I know, RPi4 usually only has 8G
memory. And arm64 only has top down memblock allocation. Its high memory
above 4G will be fragmentary, the crahskernel high reservation will fail
if the value is big.

In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G
fixedly on arm64, just like what we do on x86_64. As for RPi4 platform,
we leave it to crashkernel=size@offset syntax. Two reasons for this:
1) crashkernel is needed on enterprise platform, such as workstation or
server. While RPi4 is obviously not the target. I contacted several RPi4
players in Redhat and my friends, none of them ever played kdump
testing. If they really have to, crashkernel=size@offset is enough for
them to set.
2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the
problem of base page mapping for the whole linear mapping if crsahkernel=
is set in kernel parameter shown in [1] at bottom. 

Arm64 enterprise OS deployment is now very important in our
Fedora/Centos stream/RHEL. I am wondering how wide and important kdump
feature is needed and deployed on RPi4 platform and what is the user case.

If CRASH_ADDR_LOW_MAX is set to 4G fixedly, and document that RPi4 need
crashkernel=size@offset specially if kdump deployment is really needed.
problems we discussed here and the spotted base page mapping issue can
be solved easily and elegantly. That woule be deeply appreciated.

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 307763f97549..3b03958b668e 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -93,8 +93,9 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
 /* Current arm64 boot protocol requires 2MB alignment */
 #define CRASH_ALIGN                    SZ_2M
 
-#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
+#define CRASH_ADDR_LOW_MAX             SZ_4G
 #define CRASH_ADDR_HIGH_MAX            (PHYS_MASK + 1)


[1]
[PATCH 0/2] arm64, kdump: enforce to take 4G as the crashkernel low memory end
https://lore.kernel.org/all/20220828005545.94389-1-bhe@redhat.com/T/#u
  
Catalin Marinas March 17, 2023, 6:05 p.m. UTC | #7
On Fri, Mar 17, 2023 at 11:09:13PM +0800, Baoquan He wrote:
> On 03/16/23 at 05:35pm, Catalin Marinas wrote:
> > On Thu, Mar 16, 2023 at 05:47:52PM +0800, Baoquan He wrote:
> > > Now I am wondering if we should change CRASH_ADDR_LOW_MAX to 4G directly
> > > since we decide to take 4G as the low memory limit when doing 'high'
> > > reserving or falling back. This is just like what we have been doing in
> > > x86_64. Not sure if I follow you correctly.
> > 
> > On RPi4, we do need the 'low' allocation to be below 1GB, otherwise
> > ZONE_DMA will be empty. But we can leave the 'high' reservation above
> > 4GB (if available). The downside is that we won't get anything between
> > 1GB and 4GB unless explicitly specified with @offset.
> 
> Ah, I see. You want to have high reservation like below when RPi4.
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 307763f97549..9e558fadf483 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -95,6 +95,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
>  
>  #define CRASH_ADDR_LOW_MAX		arm64_dma_phys_limit
>  #define CRASH_ADDR_HIGH_MAX		(PHYS_MASK + 1)
> +#define CRASH_HIGH_SEARCH_BASE		SZ_4G
>  
>  #define DEFAULT_CRASH_KERNEL_LOW_SIZE	(128UL << 20)
>  
> @@ -156,7 +157,7 @@ static void __init reserve_crashkernel(void)
>  		else if (ret)
>  			return;
>  
> -		search_base = CRASH_ADDR_LOW_MAX;
> +		search_base = CRASH_HIGH_SEARCH_BASE;
>  		crash_max = CRASH_ADDR_HIGH_MAX;
>  		high = true;
>  	} else if (ret || !crash_size) {
> @@ -193,7 +194,7 @@ static void __init reserve_crashkernel(void)
>  		 */
>  		if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
>  			crash_max = CRASH_ADDR_HIGH_MAX;
> -			search_base = CRASH_ADDR_LOW_MAX;
> +			search_base = CRASH_HIGH_SEARCH_BASE;
>  			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
>  			goto retry;
>  		}

This would probably do _if_ these are the semantics that we want (well,
more discussion below).

> > I'm not entirely sure what you want to achieve: avoiding the 'high'
> > reservation going across an arbitrary boundary (1GB or 4GB) that the
> > user may not be aware of or just avoiding the 'high' reservation going
> > across a 4G boundary? On RPi4, if the 'high' reservation above 4GB
> > fails, should it fall back to below 1GB reservation or to somewhere
> > between 1GB and 4GB, making sure it doesn't cross any of these two
> > boundaries? For someone unfamiliar with the ZONE_DMA on RPi4, the latter
> > would look like two 'low' allocations below 4GB.
> 
> Currently, with the existing code, the high and low memory is like below
> on arm64:
> RPi4:
> 0~1G: low memory | 1G~top: high memory
> 
> Other normal system:
> 0~4G: low memory | 4G~top: high memory

Yes, that's the current behaviour.

> Now you want RPi4 to have low and high crahskernel reservation like
> this: 
> 0~1G: low memory | 4G~top: high memory

I don't necessarily want this, I just want to clarify what we actually
need to fix. Your initial reasoning for this patch was that the user
gets confused by the high allocation going across the 4G boundary. But
without knowing of the 1GB vs 4GB boundary on RPi4, one can still get
confused by the high allocation across the 4GB boundary on RPi4.

> It's also fine. However, as far as I know, RPi4 usually only has 8G
> memory.

That's the maximum. Lots of them just come with 4GB or less.

> And arm64 only has top down memblock allocation. Its high memory
> above 4G will be fragmentary, the crahskernel high reservation will fail
> if the value is big.
> 
> In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G
> fixedly on arm64, just like what we do on x86_64. As for RPi4 platform,
> we leave it to crashkernel=size@offset syntax. Two reasons for this:
> 1) crashkernel is needed on enterprise platform, such as workstation or
> server. While RPi4 is obviously not the target. I contacted several RPi4
> players in Redhat and my friends, none of them ever played kdump
> testing. If they really have to, crashkernel=size@offset is enough for
> them to set.

I'd like crashkernel=size (without @offset) on RPi4 to still do the
right thing: a low allocation at least as we had until recently (or
high+low where high here is maybe above 1GB). IOW, no regression for
this crashkernel=size case. We can then change the explicit
crashkernel=x,high to mean only above 4GB irrespective of the platform
and crashkernel=x,low to be below arm64_dma_phys_limit.

> 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the
> problem of base page mapping for the whole linear mapping if crsahkernel=
> is set in kernel parameter shown in [1] at bottom. 

That's a different problem ;). I should re-read that thread, forgot most
of the details but I recall one of the counter arguments was that there
isn't a strong case to unmap the crashkernel reservation. Now, if we
place crashdump kernel image goes in the 'high' reservation, can we not
leave the 'low' reservation mapped? We don't really care about it as it
wouldn't have any meaningful code/data to be preserved. If the 'high'
one goes above 4G always, we don't depend on the arm64_dma_phys_limit.

> Arm64 enterprise OS deployment is now very important in our
> Fedora/Centos stream/RHEL. I am wondering how wide and important kdump
> feature is needed and deployed on RPi4 platform and what is the user case.

I doubt it's important beyond testing but I'd very much like not to
cause a regression and require the @offset argument for RPi4.
  
Baoquan He March 20, 2023, 1:12 p.m. UTC | #8
On 03/17/23 at 06:05pm, Catalin Marinas wrote:
> On Fri, Mar 17, 2023 at 11:09:13PM +0800, Baoquan He wrote:
> > On 03/16/23 at 05:35pm, Catalin Marinas wrote:
> > > On Thu, Mar 16, 2023 at 05:47:52PM +0800, Baoquan He wrote:
> > > > Now I am wondering if we should change CRASH_ADDR_LOW_MAX to 4G directly
> > > > since we decide to take 4G as the low memory limit when doing 'high'
> > > > reserving or falling back. This is just like what we have been doing in
> > > > x86_64. Not sure if I follow you correctly.
> > > 
> > > On RPi4, we do need the 'low' allocation to be below 1GB, otherwise
> > > ZONE_DMA will be empty. But we can leave the 'high' reservation above
> > > 4GB (if available). The downside is that we won't get anything between
> > > 1GB and 4GB unless explicitly specified with @offset.
> > 
> > Ah, I see. You want to have high reservation like below when RPi4.
> > 
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 307763f97549..9e558fadf483 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -95,6 +95,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
> >  
> >  #define CRASH_ADDR_LOW_MAX		arm64_dma_phys_limit
> >  #define CRASH_ADDR_HIGH_MAX		(PHYS_MASK + 1)
> > +#define CRASH_HIGH_SEARCH_BASE		SZ_4G
> >  
> >  #define DEFAULT_CRASH_KERNEL_LOW_SIZE	(128UL << 20)
> >  
> > @@ -156,7 +157,7 @@ static void __init reserve_crashkernel(void)
> >  		else if (ret)
> >  			return;
> >  
> > -		search_base = CRASH_ADDR_LOW_MAX;
> > +		search_base = CRASH_HIGH_SEARCH_BASE;
> >  		crash_max = CRASH_ADDR_HIGH_MAX;
> >  		high = true;
> >  	} else if (ret || !crash_size) {
> > @@ -193,7 +194,7 @@ static void __init reserve_crashkernel(void)
> >  		 */
> >  		if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
> >  			crash_max = CRASH_ADDR_HIGH_MAX;
> > -			search_base = CRASH_ADDR_LOW_MAX;
> > +			search_base = CRASH_HIGH_SEARCH_BASE;
> >  			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
> >  			goto retry;
> >  		}
> 
> This would probably do _if_ these are the semantics that we want (well,
> more discussion below).
> 
> > > I'm not entirely sure what you want to achieve: avoiding the 'high'
> > > reservation going across an arbitrary boundary (1GB or 4GB) that the
> > > user may not be aware of or just avoiding the 'high' reservation going
> > > across a 4G boundary? On RPi4, if the 'high' reservation above 4GB
> > > fails, should it fall back to below 1GB reservation or to somewhere
> > > between 1GB and 4GB, making sure it doesn't cross any of these two
> > > boundaries? For someone unfamiliar with the ZONE_DMA on RPi4, the latter
> > > would look like two 'low' allocations below 4GB.
> > 
> > Currently, with the existing code, the high and low memory is like below
> > on arm64:
> > RPi4:
> > 0~1G: low memory | 1G~top: high memory
> > 
> > Other normal system:
> > 0~4G: low memory | 4G~top: high memory
> 
> Yes, that's the current behaviour.
> 
> > Now you want RPi4 to have low and high crahskernel reservation like
> > this: 
> > 0~1G: low memory | 4G~top: high memory
> 
> I don't necessarily want this, I just want to clarify what we actually
> need to fix. Your initial reasoning for this patch was that the user
> gets confused by the high allocation going across the 4G boundary. But
> without knowing of the 1GB vs 4GB boundary on RPi4, one can still get
> confused by the high allocation across the 4GB boundary on RPi4.

Ah, I see your point. Yeah, the initial reasoning is our QE reported
several times that arm64 system could have two regions of low memory
allocation when crahskernel=,high. While later I spotted another two
issues. With the initial reasoning, we does need consider the hign and
low boundary at 1G on RPi4 which could cause confusion. Given RPi4
is not taken into our distros's account, I am shooting myself in the
foot. But yes, we should consider it if kdump feature is really cared
about on RPi4. Please see later reply.

> 
> > It's also fine. However, as far as I know, RPi4 usually only has 8G
> > memory.
> 
> That's the maximum. Lots of them just come with 4GB or less.
> 
> > And arm64 only has top down memblock allocation. Its high memory
> > above 4G will be fragmentary, the crahskernel high reservation will fail
> > if the value is big.
> > 
> > In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G
> > fixedly on arm64, just like what we do on x86_64. As for RPi4 platform,
> > we leave it to crashkernel=size@offset syntax. Two reasons for this:
> > 1) crashkernel is needed on enterprise platform, such as workstation or
> > server. While RPi4 is obviously not the target. I contacted several RPi4
> > players in Redhat and my friends, none of them ever played kdump
> > testing. If they really have to, crashkernel=size@offset is enough for
> > them to set.
> 
> I'd like crashkernel=size (without @offset) on RPi4 to still do the
> right thing: a low allocation at least as we had until recently (or
> high+low where high here is maybe above 1GB). IOW, no regression for
> this crashkernel=size case. We can then change the explicit
> crashkernel=x,high to mean only above 4GB irrespective of the platform
> and crashkernel=x,low to be below arm64_dma_phys_limit.

Since crashkernel=,high and crashkernel=size fallback was added in arm64
recently, with my understanding, you are suggesting:

on arm64:
RPi4:
crashkernel=size
0~1G: low memory (no regression introduced)

crashkernel=size,high
0~1G: low memory | 4G~top: high memory

Other normal system:
crashkernel=size|crashkernel=size,high
0~4G: low memory | 4G~top: high memory

Please correct me if I misunderstood, I can make change according to
your suggestion.

> 
> > 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the
> > problem of base page mapping for the whole linear mapping if crsahkernel=
> > is set in kernel parameter shown in [1] at bottom. 
> 
> That's a different problem ;). I should re-read that thread, forgot most
> of the details but I recall one of the counter arguments was that there
> isn't a strong case to unmap the crashkernel reservation. Now, if we
> place crashdump kernel image goes in the 'high' reservation, can we not
> leave the 'low' reservation mapped? We don't really care about it as it
> wouldn't have any meaningful code/data to be preserved. If the 'high'
> one goes above 4G always, we don't depend on the arm64_dma_phys_limit.

Yes, this looks ideal. While it only works when crashkernel=,high case and
it succeeds to reserve a memory region for the specified size of crashkernel
high memory. At below, we have 4 cases of crashkernel= syntax:

crashkernel=size
1)first attempt:  low memory under arm64_dma_phys_limit
2)fallback:       finding memory above 4G

crashkernel=size,high
3)first attempt:  finding memory above 4G
4)fallback:       low memory under arm64_dma_phys_limit

case 3) works with your suggestion. However, 1), 2), 4) all need to
defer to bootmem_init(). With these cases and different handling,
reserve_crashkernel() could be too complicated.

I am wondering if we can cancel the protection of crashkernel memory
region on arm64 for now. In earlier discussion, people questioned if the
protection is necessary on arm64. After comparison, I would rather take
away the protection method of crashkernel region since they try to
protect in a chance in one million , while the base page mapping for the
whole linear mapping is mitigating arm64 high end server always.

> 
> > Arm64 enterprise OS deployment is now very important in our
> > Fedora/Centos stream/RHEL. I am wondering how wide and important kdump
> > feature is needed and deployed on RPi4 platform and what is the user case.
> 
> I doubt it's important beyond testing but I'd very much like not to
> cause a regression and require the @offset argument for RPi4.

OK, understood. Pity we don't have information to judge if it's a RPi4
system or not, with a Kconfig option, some cpu register or hardware
information.
  
Catalin Marinas March 23, 2023, 5:25 p.m. UTC | #9
On Mon, Mar 20, 2023 at 09:12:08PM +0800, Baoquan He wrote:
> On 03/17/23 at 06:05pm, Catalin Marinas wrote:
> > On Fri, Mar 17, 2023 at 11:09:13PM +0800, Baoquan He wrote:
> > > In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G
> > > fixedly on arm64, just like what we do on x86_64. As for RPi4 platform,
> > > we leave it to crashkernel=size@offset syntax. Two reasons for this:
> > > 1) crashkernel is needed on enterprise platform, such as workstation or
> > > server. While RPi4 is obviously not the target. I contacted several RPi4
> > > players in Redhat and my friends, none of them ever played kdump
> > > testing. If they really have to, crashkernel=size@offset is enough for
> > > them to set.
> > 
> > I'd like crashkernel=size (without @offset) on RPi4 to still do the
> > right thing: a low allocation at least as we had until recently (or
> > high+low where high here is maybe above 1GB). IOW, no regression for
> > this crashkernel=size case. We can then change the explicit
> > crashkernel=x,high to mean only above 4GB irrespective of the platform
> > and crashkernel=x,low to be below arm64_dma_phys_limit.
> 
> Since crashkernel=,high and crashkernel=size fallback was added in arm64
> recently, with my understanding, you are suggesting:
> 
> on arm64:
> RPi4:
> crashkernel=size
> 0~1G: low memory (no regression introduced)

And, if not enough low memory, fall back to memory above 1GB (for RPi4;
it would be above 4GB for any other system).

> crashkernel=size,high
> 0~1G: low memory | 4G~top: high memory

Yes.

> Other normal system:
> crashkernel=size|crashkernel=size,high
> 0~4G: low memory | 4G~top: high memory

Yes.

IOW, specifying 'high' only forces the high allocation above 4GB instead
of arm64_dma_phys_limit, irrespective of the platform. If no 'high'
specified search_base remains CRASH_ADDR_LOW_MAX (1GB on RPi4, 4GB for
the rest).

> > > 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the
> > > problem of base page mapping for the whole linear mapping if crsahkernel=
> > > is set in kernel parameter shown in [1] at bottom. 
> > 
> > That's a different problem ;). I should re-read that thread, forgot most
> > of the details but I recall one of the counter arguments was that there
> > isn't a strong case to unmap the crashkernel reservation. Now, if we
> > place crashdump kernel image goes in the 'high' reservation, can we not
> > leave the 'low' reservation mapped? We don't really care about it as it
> > wouldn't have any meaningful code/data to be preserved. If the 'high'
> > one goes above 4G always, we don't depend on the arm64_dma_phys_limit.
> 
> Yes, this looks ideal. While it only works when crashkernel=,high case and
> it succeeds to reserve a memory region for the specified size of crashkernel
> high memory. At below, we have 4 cases of crashkernel= syntax:
> 
> crashkernel=size
> 1)first attempt:  low memory under arm64_dma_phys_limit
> 2)fallback:       finding memory above 4G

(2) should be 'finding memory above arm64_dma_phys_limit' to keep the
current behaviour for RPi4.

> crashkernel=size,high
> 3)first attempt:  finding memory above 4G
> 4)fallback:       low memory under arm64_dma_phys_limit

Yes.

> case 3) works with your suggestion. However, 1), 2), 4) all need to
> defer to bootmem_init(). With these cases and different handling,
> reserve_crashkernel() could be too complicated.

Ah, because of the fallback below arm64_dma_phys_limit as in (4), we
still can't move the full crashkernel reservation early. Well, we could
do it in two steps: (a) early attempt at crashkernel reservation above
4G if 'high' was specified and we avoid mapping it if successful and (b)
do the late crashkernel reservation below arm64_dma_phys_limit and skip
unmapping as being too late. This way most server-like platforms would
get a reservation above 4G, unmapped.

> I am wondering if we can cancel the protection of crashkernel memory
> region on arm64 for now. In earlier discussion, people questioned if the
> protection is necessary on arm64. After comparison, I would rather take
> away the protection method of crashkernel region since they try to
> protect in a chance in one million , while the base page mapping for the
> whole linear mapping is mitigating arm64 high end server always.

This works for me. We can add the protection later for addresses above
4GB only as mentioned above.
  
Zhen Lei March 24, 2023, 2:47 a.m. UTC | #10
On 2023/3/24 1:25, Catalin Marinas wrote:
> On Mon, Mar 20, 2023 at 09:12:08PM +0800, Baoquan He wrote:
>> On 03/17/23 at 06:05pm, Catalin Marinas wrote:
>>> On Fri, Mar 17, 2023 at 11:09:13PM +0800, Baoquan He wrote:
>>>> In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G
>>>> fixedly on arm64, just like what we do on x86_64. As for RPi4 platform,
>>>> we leave it to crashkernel=size@offset syntax. Two reasons for this:
>>>> 1) crashkernel is needed on enterprise platform, such as workstation or
>>>> server. While RPi4 is obviously not the target. I contacted several RPi4
>>>> players in Redhat and my friends, none of them ever played kdump
>>>> testing. If they really have to, crashkernel=size@offset is enough for
>>>> them to set.
>>>
>>> I'd like crashkernel=size (without @offset) on RPi4 to still do the
>>> right thing: a low allocation at least as we had until recently (or
>>> high+low where high here is maybe above 1GB). IOW, no regression for
>>> this crashkernel=size case. We can then change the explicit
>>> crashkernel=x,high to mean only above 4GB irrespective of the platform
>>> and crashkernel=x,low to be below arm64_dma_phys_limit.
>>
>> Since crashkernel=,high and crashkernel=size fallback was added in arm64
>> recently, with my understanding, you are suggesting:
>>
>> on arm64:
>> RPi4:
>> crashkernel=size
>> 0~1G: low memory (no regression introduced)
> 
> And, if not enough low memory, fall back to memory above 1GB (for RPi4;
> it would be above 4GB for any other system).
> 
>> crashkernel=size,high
>> 0~1G: low memory | 4G~top: high memory
> 
> Yes.
> 
>> Other normal system:
>> crashkernel=size|crashkernel=size,high
>> 0~4G: low memory | 4G~top: high memory
> 
> Yes.
> 
> IOW, specifying 'high' only forces the high allocation above 4GB instead
> of arm64_dma_phys_limit, irrespective of the platform. If no 'high'
> specified search_base remains CRASH_ADDR_LOW_MAX (1GB on RPi4, 4GB for
> the rest).
> 
>>>> 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the
>>>> problem of base page mapping for the whole linear mapping if crsahkernel=
>>>> is set in kernel parameter shown in [1] at bottom. 
>>>
>>> That's a different problem ;). I should re-read that thread, forgot most
>>> of the details but I recall one of the counter arguments was that there
>>> isn't a strong case to unmap the crashkernel reservation. Now, if we
>>> place crashdump kernel image goes in the 'high' reservation, can we not
>>> leave the 'low' reservation mapped? We don't really care about it as it
>>> wouldn't have any meaningful code/data to be preserved. If the 'high'
>>> one goes above 4G always, we don't depend on the arm64_dma_phys_limit.
>>
>> Yes, this looks ideal. While it only works when crashkernel=,high case and
>> it succeeds to reserve a memory region for the specified size of crashkernel
>> high memory. At below, we have 4 cases of crashkernel= syntax:
>>
>> crashkernel=size
>> 1)first attempt:  low memory under arm64_dma_phys_limit
>> 2)fallback:       finding memory above 4G
> 
> (2) should be 'finding memory above arm64_dma_phys_limit' to keep the
> current behaviour for RPi4.
> 
>> crashkernel=size,high
>> 3)first attempt:  finding memory above 4G
>> 4)fallback:       low memory under arm64_dma_phys_limit
> 
> Yes.
> 
>> case 3) works with your suggestion. However, 1), 2), 4) all need to
>> defer to bootmem_init(). With these cases and different handling,
>> reserve_crashkernel() could be too complicated.
> 
> Ah, because of the fallback below arm64_dma_phys_limit as in (4), we
> still can't move the full crashkernel reservation early. Well, we could
> do it in two steps: (a) early attempt at crashkernel reservation above
> 4G if 'high' was specified and we avoid mapping it if successful and (b)
> do the late crashkernel reservation below arm64_dma_phys_limit and skip
> unmapping as being too late. This way most server-like platforms would
> get a reservation above 4G, unmapped.
> 
>> I am wondering if we can cancel the protection of crashkernel memory
>> region on arm64 for now. In earlier discussion, people questioned if the
>> protection is necessary on arm64. After comparison, I would rather take
>> away the protection method of crashkernel region since they try to
>> protect in a chance in one million , while the base page mapping for the
>> whole linear mapping is mitigating arm64 high end server always.
> 
> This works for me. We can add the protection later for addresses above
> 4GB only as mentioned above.

Recently, I've also been rethinking the performance issues when kdump is
enabled. I have a new idea. For crashkernel=X, we can temporarily search
for free memory from the low address to the high address. As below:

save_bottom_up = memblock_bottom_up();
if (!high)
	memblock_set_bottom_up(true);
crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max);
memblock_set_bottom_up(save_bottom_up);

The final code change should be small, and I'll try it today.

>
  
Baoquan He March 24, 2023, 2:08 p.m. UTC | #11
On 03/23/23 at 05:25pm, Catalin Marinas wrote:
> On Mon, Mar 20, 2023 at 09:12:08PM +0800, Baoquan He wrote:
> > On 03/17/23 at 06:05pm, Catalin Marinas wrote:
> > > On Fri, Mar 17, 2023 at 11:09:13PM +0800, Baoquan He wrote:
> > > > In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G
> > > > fixedly on arm64, just like what we do on x86_64. As for RPi4 platform,
> > > > we leave it to crashkernel=size@offset syntax. Two reasons for this:
> > > > 1) crashkernel is needed on enterprise platform, such as workstation or
> > > > server. While RPi4 is obviously not the target. I contacted several RPi4
> > > > players in Redhat and my friends, none of them ever played kdump
> > > > testing. If they really have to, crashkernel=size@offset is enough for
> > > > them to set.
> > > 
> > > I'd like crashkernel=size (without @offset) on RPi4 to still do the
> > > right thing: a low allocation at least as we had until recently (or
> > > high+low where high here is maybe above 1GB). IOW, no regression for
> > > this crashkernel=size case. We can then change the explicit
> > > crashkernel=x,high to mean only above 4GB irrespective of the platform
> > > and crashkernel=x,low to be below arm64_dma_phys_limit.
> > 
> > Since crashkernel=,high and crashkernel=size fallback was added in arm64
> > recently, with my understanding, you are suggesting:
> > 
> > on arm64:
> > RPi4:
> > crashkernel=size
> > 0~1G: low memory (no regression introduced)
> 
> And, if not enough low memory, fall back to memory above 1GB (for RPi4;
> it would be above 4GB for any other system).
> 
> > crashkernel=size,high
> > 0~1G: low memory | 4G~top: high memory
> 
> Yes.
> 
> > Other normal system:
> > crashkernel=size|crashkernel=size,high
> > 0~4G: low memory | 4G~top: high memory
> 
> Yes.
> 
> IOW, specifying 'high' only forces the high allocation above 4GB instead
> of arm64_dma_phys_limit, irrespective of the platform. If no 'high'
> specified search_base remains CRASH_ADDR_LOW_MAX (1GB on RPi4, 4GB for
> the rest).
> 
> > > > 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the
> > > > problem of base page mapping for the whole linear mapping if crsahkernel=
> > > > is set in kernel parameter shown in [1] at bottom. 
> > > 
> > > That's a different problem ;). I should re-read that thread, forgot most
> > > of the details but I recall one of the counter arguments was that there
> > > isn't a strong case to unmap the crashkernel reservation. Now, if we
> > > place crashdump kernel image goes in the 'high' reservation, can we not
> > > leave the 'low' reservation mapped? We don't really care about it as it
> > > wouldn't have any meaningful code/data to be preserved. If the 'high'
> > > one goes above 4G always, we don't depend on the arm64_dma_phys_limit.
> > 
> > Yes, this looks ideal. While it only works when crashkernel=,high case and
> > it succeeds to reserve a memory region for the specified size of crashkernel
> > high memory. At below, we have 4 cases of crashkernel= syntax:
> > 
> > crashkernel=size
> > 1)first attempt:  low memory under arm64_dma_phys_limit
> > 2)fallback:       finding memory above 4G
> 
> (2) should be 'finding memory above arm64_dma_phys_limit' to keep the
> current behaviour for RPi4.

Then for RPi4, with case 2), it will find memory above
arm64_dma_phys_limit, namely 1G. Then it will get two memory regions,
one could be in [1G, 4G], another is below 4G. I am fine with this, as
long as it won't cause confusion that people may think two low memory
regions you mentioned earlier. Please help confirm if I understand your
suggestioin correctly. I will start making patch with this clarified.
Thanks.

> 
> > crashkernel=size,high
> > 3)first attempt:  finding memory above 4G
> > 4)fallback:       low memory under arm64_dma_phys_limit
> 
> Yes.
> 
> > case 3) works with your suggestion. However, 1), 2), 4) all need to
> > defer to bootmem_init(). With these cases and different handling,
> > reserve_crashkernel() could be too complicated.
> 
> Ah, because of the fallback below arm64_dma_phys_limit as in (4), we
> still can't move the full crashkernel reservation early. Well, we could
> do it in two steps: (a) early attempt at crashkernel reservation above
> 4G if 'high' was specified and we avoid mapping it if successful and (b)
> do the late crashkernel reservation below arm64_dma_phys_limit and skip
> unmapping as being too late. This way most server-like platforms would
> get a reservation above 4G, unmapped.

Yeah, this covers case 3), while other cases are still in pit.

> 
> > I am wondering if we can cancel the protection of crashkernel memory
> > region on arm64 for now. In earlier discussion, people questioned if the
> > protection is necessary on arm64. After comparison, I would rather take
> > away the protection method of crashkernel region since they try to
> > protect in a chance in one million , while the base page mapping for the
> > whole linear mapping is mitigating arm64 high end server always.
> 
> This works for me. We can add the protection later for addresses above
> 4GB only as mentioned above.

Thanks, I have posted a patchset to cancel the protection on crashkernel
memory region as per your confirmation here. This can give distros a
chance to back port them to fix the performance issue caused by the base
page mapping. I personally expect we can hold the crashkernel region
unprotected till we have a ideal solution since the code will be elegant
with comfortable simplicity.

Let's wait and see the code change if people interested want to keep the
protection methods.
  
Baoquan He March 24, 2023, 2:53 p.m. UTC | #12
Hi Leizhen,

On 03/24/23 at 10:47am, Leizhen (ThunderTown) wrote:
...... 
> >>>> 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the
> >>>> problem of base page mapping for the whole linear mapping if crsahkernel=
> >>>> is set in kernel parameter shown in [1] at bottom. 
> >>>
> >>> That's a different problem ;). I should re-read that thread, forgot most
> >>> of the details but I recall one of the counter arguments was that there
> >>> isn't a strong case to unmap the crashkernel reservation. Now, if we
> >>> place crashdump kernel image goes in the 'high' reservation, can we not
> >>> leave the 'low' reservation mapped? We don't really care about it as it
> >>> wouldn't have any meaningful code/data to be preserved. If the 'high'
> >>> one goes above 4G always, we don't depend on the arm64_dma_phys_limit.
> >>
> >> Yes, this looks ideal. While it only works when crashkernel=,high case and
> >> it succeeds to reserve a memory region for the specified size of crashkernel
> >> high memory. At below, we have 4 cases of crashkernel= syntax:
> >>
> >> crashkernel=size
> >> 1)first attempt:  low memory under arm64_dma_phys_limit
> >> 2)fallback:       finding memory above 4G
> > 
> > (2) should be 'finding memory above arm64_dma_phys_limit' to keep the
> > current behaviour for RPi4.
> > 
> >> crashkernel=size,high
> >> 3)first attempt:  finding memory above 4G
> >> 4)fallback:       low memory under arm64_dma_phys_limit
> > 
> > Yes.
> > 
> >> case 3) works with your suggestion. However, 1), 2), 4) all need to
> >> defer to bootmem_init(). With these cases and different handling,
> >> reserve_crashkernel() could be too complicated.
> > 
> > Ah, because of the fallback below arm64_dma_phys_limit as in (4), we
> > still can't move the full crashkernel reservation early. Well, we could
> > do it in two steps: (a) early attempt at crashkernel reservation above
> > 4G if 'high' was specified and we avoid mapping it if successful and (b)
> > do the late crashkernel reservation below arm64_dma_phys_limit and skip
> > unmapping as being too late. This way most server-like platforms would
> > get a reservation above 4G, unmapped.
> > 
> >> I am wondering if we can cancel the protection of crashkernel memory
> >> region on arm64 for now. In earlier discussion, people questioned if the
> >> protection is necessary on arm64. After comparison, I would rather take
> >> away the protection method of crashkernel region since they try to
> >> protect in a chance in one million , while the base page mapping for the
> >> whole linear mapping is mitigating arm64 high end server always.
> > 
> > This works for me. We can add the protection later for addresses above
> > 4GB only as mentioned above.
> 
> Recently, I've also been rethinking the performance issues when kdump is
> enabled. I have a new idea. For crashkernel=X, we can temporarily search
> for free memory from the low address to the high address. As below:
> 
> save_bottom_up = memblock_bottom_up();
> if (!high)
> 	memblock_set_bottom_up(true);
> crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max);
> memblock_set_bottom_up(save_bottom_up);
> 
> The final code change should be small, and I'll try it today.

I have sent a patchset to remove the crashkernel region protection code
as per Catalin's confirmation. I personally like the code conciseness w/o
protection because kinds of crahskernel reservation has been complex,
the situation on arm64 will makes it worse if we try to keep the
protection and fix the performance issue. While I am glad to see any
attempt to achieve the two goals if it's satisfactory.
  
Catalin Marinas March 24, 2023, 5:08 p.m. UTC | #13
On Fri, Mar 24, 2023 at 10:08:17PM +0800, Baoquan He wrote:
> On 03/23/23 at 05:25pm, Catalin Marinas wrote:
> > On Mon, Mar 20, 2023 at 09:12:08PM +0800, Baoquan He wrote:
> > > crashkernel=size
> > > 1)first attempt:  low memory under arm64_dma_phys_limit
> > > 2)fallback:       finding memory above 4G
> > 
> > (2) should be 'finding memory above arm64_dma_phys_limit' to keep the
> > current behaviour for RPi4.
> 
> Then for RPi4, with case 2), it will find memory above
> arm64_dma_phys_limit, namely 1G. Then it will get two memory regions,
> one could be in [1G, 4G], another is below 4G. I am fine with this, as
> long as it won't cause confusion that people may think two low memory
> regions you mentioned earlier. Please help confirm if I understand your
> suggestioin correctly. I will start making patch with this clarified.

Yes, you understood correctly. While it may still be slightly confusing,
if the user is not specific about high and low on the cmdline, we just
allow the kernel to find the best places. I assume distros will just use
,high and get a consistent behaviour on all platforms.
  
Zhen Lei March 25, 2023, 1:53 a.m. UTC | #14
On 2023/3/24 22:53, Baoquan He wrote:
> Hi Leizhen,
> 
> On 03/24/23 at 10:47am, Leizhen (ThunderTown) wrote:
> ...... 
>>>>>> 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the
>>>>>> problem of base page mapping for the whole linear mapping if crsahkernel=
>>>>>> is set in kernel parameter shown in [1] at bottom. 
>>>>>
>>>>> That's a different problem ;). I should re-read that thread, forgot most
>>>>> of the details but I recall one of the counter arguments was that there
>>>>> isn't a strong case to unmap the crashkernel reservation. Now, if we
>>>>> place crashdump kernel image goes in the 'high' reservation, can we not
>>>>> leave the 'low' reservation mapped? We don't really care about it as it
>>>>> wouldn't have any meaningful code/data to be preserved. If the 'high'
>>>>> one goes above 4G always, we don't depend on the arm64_dma_phys_limit.
>>>>
>>>> Yes, this looks ideal. While it only works when crashkernel=,high case and
>>>> it succeeds to reserve a memory region for the specified size of crashkernel
>>>> high memory. At below, we have 4 cases of crashkernel= syntax:
>>>>
>>>> crashkernel=size
>>>> 1)first attempt:  low memory under arm64_dma_phys_limit
>>>> 2)fallback:       finding memory above 4G
>>>
>>> (2) should be 'finding memory above arm64_dma_phys_limit' to keep the
>>> current behaviour for RPi4.
>>>
>>>> crashkernel=size,high
>>>> 3)first attempt:  finding memory above 4G
>>>> 4)fallback:       low memory under arm64_dma_phys_limit
>>>
>>> Yes.
>>>
>>>> case 3) works with your suggestion. However, 1), 2), 4) all need to
>>>> defer to bootmem_init(). With these cases and different handling,
>>>> reserve_crashkernel() could be too complicated.
>>>
>>> Ah, because of the fallback below arm64_dma_phys_limit as in (4), we
>>> still can't move the full crashkernel reservation early. Well, we could
>>> do it in two steps: (a) early attempt at crashkernel reservation above
>>> 4G if 'high' was specified and we avoid mapping it if successful and (b)
>>> do the late crashkernel reservation below arm64_dma_phys_limit and skip
>>> unmapping as being too late. This way most server-like platforms would
>>> get a reservation above 4G, unmapped.
>>>
>>>> I am wondering if we can cancel the protection of crashkernel memory
>>>> region on arm64 for now. In earlier discussion, people questioned if the
>>>> protection is necessary on arm64. After comparison, I would rather take
>>>> away the protection method of crashkernel region since they try to
>>>> protect in a chance in one million , while the base page mapping for the
>>>> whole linear mapping is mitigating arm64 high end server always.
>>>
>>> This works for me. We can add the protection later for addresses above
>>> 4GB only as mentioned above.
>>
>> Recently, I've also been rethinking the performance issues when kdump is
>> enabled. I have a new idea. For crashkernel=X, we can temporarily search
>> for free memory from the low address to the high address. As below:
>>
>> save_bottom_up = memblock_bottom_up();
>> if (!high)
>> 	memblock_set_bottom_up(true);
>> crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max);
>> memblock_set_bottom_up(save_bottom_up);
>>
>> The final code change should be small, and I'll try it today.
> 
> I have sent a patchset to remove the crashkernel region protection code
> as per Catalin's confirmation. I personally like the code conciseness w/o
> protection because kinds of crahskernel reservation has been complex,
> the situation on arm64 will makes it worse if we try to keep the
> protection and fix the performance issue. While I am glad to see any
> attempt to achieve the two goals if it's satisfactory.

I saw the patchset. No protection is also a good idea, the code is
simplified a lot.

> 
> .
>
  

Patch

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 58a0bb2c17f1..307763f97549 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -127,12 +127,13 @@  static int __init reserve_crashkernel_low(unsigned long long low_size)
  */
 static void __init reserve_crashkernel(void)
 {
-	unsigned long long crash_base, crash_size;
-	unsigned long long crash_low_size = 0;
+	unsigned long long crash_base, crash_size, search_base;
 	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
+	unsigned long long crash_low_size = 0;
 	char *cmdline = boot_command_line;
-	int ret;
 	bool fixed_base = false;
+	bool high = false;
+	int ret;
 
 	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
 		return;
@@ -155,7 +156,9 @@  static void __init reserve_crashkernel(void)
 		else if (ret)
 			return;
 
+		search_base = CRASH_ADDR_LOW_MAX;
 		crash_max = CRASH_ADDR_HIGH_MAX;
+		high = true;
 	} else if (ret || !crash_size) {
 		/* The specified value is invalid */
 		return;
@@ -166,31 +169,51 @@  static void __init reserve_crashkernel(void)
 	/* User specifies base address explicitly. */
 	if (crash_base) {
 		fixed_base = true;
+		search_base = crash_base;
 		crash_max = crash_base + crash_size;
 	}
 
 retry:
 	crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
-					       crash_base, crash_max);
+					       search_base, crash_max);
 	if (!crash_base) {
 		/*
-		 * If the first attempt was for low memory, fall back to
-		 * high memory, the minimum required low memory will be
-		 * reserved later.
+		 * For crashkernel=size[KMG]@offset[KMG], print out failure
+		 * message if can't reserve the specified region.
 		 */
-		if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) {
+		if (fixed_base) {
+			pr_warn("crashkernel reservation failed - memory is in use.\n");
+			return;
+		}
+
+		/*
+		 * For crashkernel=size[KMG], if the first attempt was for
+		 * low memory, fall back to high memory, the minimum required
+		 * low memory will be reserved later.
+		 */
+		if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
 			crash_max = CRASH_ADDR_HIGH_MAX;
+			search_base = CRASH_ADDR_LOW_MAX;
 			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
 			goto retry;
 		}
 
+		/*
+		 * For crashkernel=size[KMG],high, if the first attempt was
+		 * for high memory, fall back to low memory.
+		 */
+		if (high && crash_max == CRASH_ADDR_HIGH_MAX) {
+			crash_max = CRASH_ADDR_LOW_MAX;
+			search_base = 0;
+			goto retry;
+		}
 		pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
 			crash_size);
 		return;
 	}
 
-	if ((crash_base > CRASH_ADDR_LOW_MAX - crash_low_size) &&
-	     crash_low_size && reserve_crashkernel_low(crash_low_size)) {
+	if ((crash_base >= CRASH_ADDR_LOW_MAX) && crash_low_size &&
+	     reserve_crashkernel_low(crash_low_size)) {
 		memblock_phys_free(crash_base, crash_size);
 		return;
 	}