arm64/sve: Lower the maximum allocation for the SVE ptrace regset

Message ID 20240203-arm64-sve-ptrace-regset-size-v1-1-2c3ba1386b9e@kernel.org
State New
Headers
Series arm64/sve: Lower the maximum allocation for the SVE ptrace regset |

Commit Message

Mark Brown Feb. 3, 2024, 12:16 p.m. UTC
  Doug Anderson observed that ChromeOS crashes are being reported which
include failing allocations of order 7 during core dumps due to ptrace
allocating storage for regsets:

  chrome: page allocation failure: order:7,
          mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
          nodemask=(null),cpuset=urgent,mems_allowed=0
   ...
  regset_get_alloc+0x1c/0x28
  elf_core_dump+0x3d8/0xd8c
  do_coredump+0xeb8/0x1378

with further investigation showing that this is:

   [   66.957385] DOUG: Allocating 279584 bytes

which is the maximum size of the SVE regset. As Doug observes it is not
entirely surprising that such a large allocation of contiguous memory might
fail on a long running system.

The SVE regset is currently sized to hold SVE registers with a VQ of
SVE_VQ_MAX which is 512, substantially more than the architectural maximum
of 16 which we might see even in a system emulating the limits of the
architecture. Since we don't expose the size we tell the regset core
externally let's define ARCH_SVE_VQ_MAX with the actual architectural
maximum and use that for the regset, we'll still overallocate most of the
time but much less so which will be helpful even if the core is fixed to
not require contiguous allocations.

We could also teach the ptrace core about runtime discoverable regset sizes
but that would be a more invasive change and this is being observed in
practical systems.

Reported-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
We should probably also use the actual architectural limit for the
bitmasks we use in the VL enumeration code, though that's both a little
bit more involved and less immediately a problem.
---
 arch/arm64/include/asm/fpsimd.h | 10 +++++-----
 arch/arm64/kernel/ptrace.c      |  3 ++-
 2 files changed, 7 insertions(+), 6 deletions(-)


---
base-commit: 41bccc98fb7931d63d03f326a746ac4d429c1dd3
change-id: 20240202-arm64-sve-ptrace-regset-size-21b0928969e1

Best regards,
  

Comments

Doug Anderson Feb. 5, 2024, 5:02 p.m. UTC | #1
Hi,

On Sat, Feb 3, 2024 at 4:18 AM Mark Brown <broonie@kernel.org> wrote:
>
> Doug Anderson observed that ChromeOS crashes are being reported which
> include failing allocations of order 7 during core dumps due to ptrace
> allocating storage for regsets:
>
>   chrome: page allocation failure: order:7,
>           mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
>           nodemask=(null),cpuset=urgent,mems_allowed=0
>    ...
>   regset_get_alloc+0x1c/0x28
>   elf_core_dump+0x3d8/0xd8c
>   do_coredump+0xeb8/0x1378
>
> with further investigation showing that this is:
>
>    [   66.957385] DOUG: Allocating 279584 bytes
>
> which is the maximum size of the SVE regset. As Doug observes it is not
> entirely surprising that such a large allocation of contiguous memory might
> fail on a long running system.
>
> The SVE regset is currently sized to hold SVE registers with a VQ of
> SVE_VQ_MAX which is 512, substantially more than the architectural maximum
> of 16 which we might see even in a system emulating the limits of the
> architecture. Since we don't expose the size we tell the regset core
> externally let's define ARCH_SVE_VQ_MAX with the actual architectural
> maximum and use that for the regset, we'll still overallocate most of the
> time but much less so which will be helpful even if the core is fixed to
> not require contiguous allocations.
>
> We could also teach the ptrace core about runtime discoverable regset sizes
> but that would be a more invasive change and this is being observed in
> practical systems.
>
> Reported-by: Doug Anderson <dianders@chromium.org>
> Signed-off-by: Mark Brown <broonie@kernel.org>

Confirmed that when I send a "quit" signal to Chrome now that the
allocation I see for "core_note_type" NT_ARM_SVE goes down from
279,584 bytes (n=17474) to just 8,768 bytes (n=548). I'm not
intimately familiar with this code so I'll skip the Reviewed-by unless
someone thinks it would be valuable for me to analyze more. I think
there are already plenty of people who know this well, so for now,
just:

Tested-by: Douglas Anderson <dianders@chromium.org>
  
Dave Martin Feb. 5, 2024, 5:11 p.m. UTC | #2
On Sat, Feb 03, 2024 at 12:16:49PM +0000, Mark Brown wrote:
> Doug Anderson observed that ChromeOS crashes are being reported which
> include failing allocations of order 7 during core dumps due to ptrace
> allocating storage for regsets:
> 
>   chrome: page allocation failure: order:7,
>           mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
>           nodemask=(null),cpuset=urgent,mems_allowed=0
>    ...
>   regset_get_alloc+0x1c/0x28
>   elf_core_dump+0x3d8/0xd8c
>   do_coredump+0xeb8/0x1378
> 
> with further investigation showing that this is:
> 
>    [   66.957385] DOUG: Allocating 279584 bytes
> 
> which is the maximum size of the SVE regset. As Doug observes it is not
> entirely surprising that such a large allocation of contiguous memory might
> fail on a long running system.
> 
> The SVE regset is currently sized to hold SVE registers with a VQ of
> SVE_VQ_MAX which is 512, substantially more than the architectural maximum
> of 16 which we might see even in a system emulating the limits of the
> architecture. Since we don't expose the size we tell the regset core
> externally let's define ARCH_SVE_VQ_MAX with the actual architectural
> maximum and use that for the regset, we'll still overallocate most of the
> time but much less so which will be helpful even if the core is fixed to
> not require contiguous allocations.
> 
> We could also teach the ptrace core about runtime discoverable regset sizes
> but that would be a more invasive change and this is being observed in
> practical systems.

This is not hard at all: see
27e64b4be4b8 ("regset: Add support for dynamically sized regsets") 

But since this is precisely what was ripped out, I guess adding it back
may be controversial (?)

> 
> Reported-by: Doug Anderson <dianders@chromium.org>
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
> We should probably also use the actual architectural limit for the
> bitmasks we use in the VL enumeration code, though that's both a little
> bit more involved and less immediately a problem.

Since these masks are 64 bytes each and rarely accessed, it seemed
pointless complexity to make them resizeable...

> ---
>  arch/arm64/include/asm/fpsimd.h | 10 +++++-----
>  arch/arm64/kernel/ptrace.c      |  3 ++-
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 50e5f25d3024..cf5f31181bc8 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -62,12 +62,12 @@ static inline void cpacr_restore(unsigned long cpacr)
>   * When we defined the maximum SVE vector length we defined the ABI so
>   * that the maximum vector length included all the reserved for future
>   * expansion bits in ZCR rather than those just currently defined by
> - * the architecture. While SME follows a similar pattern the fact that
> - * it includes a square matrix means that any allocations that attempt
> - * to cover the maximum potential vector length (such as happen with
> - * the regset used for ptrace) end up being extremely large. Define
> - * the much lower actual limit for use in such situations.
> + * the architecture.  Using this length to allocate worst size buffers
> + * results in excessively large allocations, and this effect is even
> + * more pronounced for SME due to ZA.  Define more suitable VLs for
> + * these situations.
>   */
> +#define ARCH_SVE_VQ_MAX 16
>  #define SME_VQ_MAX	16

Ack, though part of the reason for not doing this was to discourage
people from allocating statically sized buffers in general.

If the kernel is now juggling two #defines for the maximum vector size,
this feels like it may seed bitrot...

> 
>  struct task_struct;
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index dc6cf0e37194..e3bef38fc2e2 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -1500,7 +1500,8 @@ static const struct user_regset aarch64_regsets[] = {
>  #ifdef CONFIG_ARM64_SVE
>  	[REGSET_SVE] = { /* Scalable Vector Extension */
>  		.core_note_type = NT_ARM_SVE,
> -		.n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE),
> +		.n = DIV_ROUND_UP(SVE_PT_SIZE(ARCH_SVE_VQ_MAX,
> +					      SVE_PT_REGS_SVE),
>  				  SVE_VQ_BYTES),
>  		.size = SVE_VQ_BYTES,
>  		.align = SVE_VQ_BYTES,
> 
> ---
> base-commit: 41bccc98fb7931d63d03f326a746ac4d429c1dd3
> change-id: 20240202-arm64-sve-ptrace-regset-size-21b0928969e1
> 
> Best regards,
> -- 
> Mark Brown <broonie@kernel.org>

This looks reasonable, but patching the .n fields doesn't look too bad
either; I'll post a patch separately for comparison.

[...]

Cheers
---Dave
  
Mark Brown Feb. 5, 2024, 5:41 p.m. UTC | #3
On Mon, Feb 05, 2024 at 05:11:59PM +0000, Dave Martin wrote:
> On Sat, Feb 03, 2024 at 12:16:49PM +0000, Mark Brown wrote:

> > We could also teach the ptrace core about runtime discoverable regset sizes
> > but that would be a more invasive change and this is being observed in
> > practical systems.

> This is not hard at all: see
> 27e64b4be4b8 ("regset: Add support for dynamically sized regsets") 

> But since this is precisely what was ripped out, I guess adding it back
> may be controversial (?)

Also just that people might want to backport and while it's not super
*hard* I tend to prefer to do something as minimal as possible as a fix,
the less we do the less the chances that we mess up.

> > We should probably also use the actual architectural limit for the
> > bitmasks we use in the VL enumeration code, though that's both a little
> > bit more involved and less immediately a problem.

> Since these masks are 64 bytes each and rarely accessed, it seemed
> pointless complexity to make them resizeable...

I was suggesting making them use the architectural maximum rather than
making them dynamic.

> > +#define ARCH_SVE_VQ_MAX 16
> >  #define SME_VQ_MAX	16

> Ack, though part of the reason for not doing this was to discourage
> people from allocating statically sized buffers in general.

I was going to do a patch adding a comment to the header noting that
this is not actually the architectural maximum since at present it's
a bit of a landmine, people who have some idea of the architecture
likely have a rough idea what sort of allocation size is needed for the
maximum SVE state and are likely to not double check the value provided
(I think that's what happened with the refactoring to remove the dynamic
sizing).  A comment in the header is still very missable but it'd be
something.

> If the kernel is now juggling two #defines for the maximum vector size,
> this feels like it may seed bitrot...

Ideally we'd just not have the existing define externally but it's there
and it's been used.
  
Dave Martin Feb. 7, 2024, 12:23 p.m. UTC | #4
On Mon, Feb 05, 2024 at 05:41:47PM +0000, Mark Brown wrote:
> On Mon, Feb 05, 2024 at 05:11:59PM +0000, Dave Martin wrote:
> > On Sat, Feb 03, 2024 at 12:16:49PM +0000, Mark Brown wrote:
> 
> > > We could also teach the ptrace core about runtime discoverable regset sizes
> > > but that would be a more invasive change and this is being observed in
> > > practical systems.
> 
> > This is not hard at all: see
> > 27e64b4be4b8 ("regset: Add support for dynamically sized regsets") 
> 
> > But since this is precisely what was ripped out, I guess adding it back
> > may be controversial (?)
> 
> Also just that people might want to backport and while it's not super
> *hard* I tend to prefer to do something as minimal as possible as a fix,
> the less we do the less the chances that we mess up.

Totally agree with that: the core code has now gone in another direction,
so we should consider that a done deal.  I'm just using the old patch as
an illustration of the (low) level of complexity.

> > > We should probably also use the actual architectural limit for the
> > > bitmasks we use in the VL enumeration code, though that's both a little
> > > bit more involved and less immediately a problem.
> 
> > Since these masks are 64 bytes each and rarely accessed, it seemed
> > pointless complexity to make them resizeable...
> 
> I was suggesting making them use the architectural maximum rather than
> making them dynamic.
> 
> > > +#define ARCH_SVE_VQ_MAX 16
> > >  #define SME_VQ_MAX	16
> 
> > Ack, though part of the reason for not doing this was to discourage
> > people from allocating statically sized buffers in general.
> 
> I was going to do a patch adding a comment to the header noting that
> this is not actually the architectural maximum since at present it's
> a bit of a landmine, people who have some idea of the architecture
> likely have a rough idea what sort of allocation size is needed for the
> maximum SVE state and are likely to not double check the value provided
> (I think that's what happened with the refactoring to remove the dynamic
> sizing).  A comment in the header is still very missable but it'd be
> something.
> 
> > If the kernel is now juggling two #defines for the maximum vector size,
> > this feels like it may seed bitrot...
> 
> Ideally we'd just not have the existing define externally but it's there
> and it's been used.

To clarify, is this intended as a temporary band-aid against silly
behaviour while a cleaner solution is found, or a permanent limitation?

We'd need to change various things if the architectural max VL actually
grew, so no forward-portability is lost immediately if the kernel
adopts 16 internally, but I'm still a little concerned that people may
poke about in the kernel code as a reference and this will muddy the
waters regarding how to do the right thing in userspace (I know people
shouldn't, but...)

[...]

Cheers
---Dave
  
Mark Brown Feb. 7, 2024, 1:09 p.m. UTC | #5
On Wed, Feb 07, 2024 at 12:23:56PM +0000, Dave Martin wrote:
> On Mon, Feb 05, 2024 at 05:41:47PM +0000, Mark Brown wrote:
> > On Mon, Feb 05, 2024 at 05:11:59PM +0000, Dave Martin wrote:

> > > If the kernel is now juggling two #defines for the maximum vector size,
> > > this feels like it may seed bitrot...

> > Ideally we'd just not have the existing define externally but it's there
> > and it's been used.

> To clarify, is this intended as a temporary band-aid against silly
> behaviour while a cleaner solution is found, or a permanent limitation?

Ideally we'd just make everything dynamic, other than the regset issue
and the bitmasks used for VL enumeration we're there already.  Making
the bitmasks dynamically sized is more painful but are also doing
enumeration that userspace doesn't need to do.

> We'd need to change various things if the architectural max VL actually
> grew, so no forward-portability is lost immediately if the kernel
> adopts 16 internally, but I'm still a little concerned that people may
> poke about in the kernel code as a reference and this will muddy the
> waters regarding how to do the right thing in userspace (I know people
> shouldn't, but...)

I think if we fix the ptrace regset issue we're doing a good enough job
of just using fully dynamic sizing with no limits other than what's been
enumerated there.  We could possibly deal with the enumberation code by
changing it to use ZCR/SMCR_ELx_LEN_ based defines so that it's
obviously coming from what we can possibly write to the register but
it's a bit less clear how to do that neatly.
  
Dave Martin Feb. 7, 2024, 1:51 p.m. UTC | #6
On Wed, Feb 07, 2024 at 01:09:51PM +0000, Mark Brown wrote:
> On Wed, Feb 07, 2024 at 12:23:56PM +0000, Dave Martin wrote:
> > On Mon, Feb 05, 2024 at 05:41:47PM +0000, Mark Brown wrote:
> > > On Mon, Feb 05, 2024 at 05:11:59PM +0000, Dave Martin wrote:
> 
> > > > If the kernel is now juggling two #defines for the maximum vector size,
> > > > this feels like it may seed bitrot...
> 
> > > Ideally we'd just not have the existing define externally but it's there
> > > and it's been used.
> 
> > To clarify, is this intended as a temporary band-aid against silly
> > behaviour while a cleaner solution is found, or a permanent limitation?
> 
> Ideally we'd just make everything dynamic, other than the regset issue
> and the bitmasks used for VL enumeration we're there already.  Making
> the bitmasks dynamically sized is more painful but are also doing
> enumeration that userspace doesn't need to do.

For the bitmasks, we'd be saving (512 - 16) / 8 = 62 bytes for each of
SVE and SME (I think).

The tradeoff really didn't seem worth it...

> 
> > We'd need to change various things if the architectural max VL actually
> > grew, so no forward-portability is lost immediately if the kernel
> > adopts 16 internally, but I'm still a little concerned that people may
> > poke about in the kernel code as a reference and this will muddy the
> > waters regarding how to do the right thing in userspace (I know people
> > shouldn't, but...)
> 
> I think if we fix the ptrace regset issue we're doing a good enough job
> of just using fully dynamic sizing with no limits other than what's been
> enumerated there.  We could possibly deal with the enumberation code by
> changing it to use ZCR/SMCR_ELx_LEN_ based defines so that it's
> obviously coming from what we can possibly write to the register but
> it's a bit less clear how to do that neatly.

OK, but we still seem to have two competing approaches: clamp SVE_VQ_MAX
for kernel internal purposes, or restore the dynamic sizing of
NT_ARM_SVE based on the new regset core behaviour.

Are you saying we should or both, or otherwise which one?

Cheers
---Dave
  
Mark Brown Feb. 7, 2024, 3:07 p.m. UTC | #7
On Wed, Feb 07, 2024 at 01:51:42PM +0000, Dave Martin wrote:
> On Wed, Feb 07, 2024 at 01:09:51PM +0000, Mark Brown wrote:

> > I think if we fix the ptrace regset issue we're doing a good enough job
> > of just using fully dynamic sizing with no limits other than what's been
> > enumerated there.  We could possibly deal with the enumberation code by
> > changing it to use ZCR/SMCR_ELx_LEN_ based defines so that it's
> > obviously coming from what we can possibly write to the register but
> > it's a bit less clear how to do that neatly.

> OK, but we still seem to have two competing approaches: clamp SVE_VQ_MAX
> for kernel internal purposes, or restore the dynamic sizing of
> NT_ARM_SVE based on the new regset core behaviour.

> Are you saying we should or both, or otherwise which one?

I'd like to remove uses of SVE_VQ_MAX, to the extent possible by making
things dynamic but if not by not using it I would like to not use
SVE_VQ_MAX partly due to the considerations you mentioned about people
picking up from looking at the kernel source that it's a good idea.
  
Will Deacon Feb. 9, 2024, 5:11 p.m. UTC | #8
Hi Doug,

On Mon, Feb 05, 2024 at 09:02:08AM -0800, Doug Anderson wrote:
> Hi,
> 
> On Sat, Feb 3, 2024 at 4:18 AM Mark Brown <broonie@kernel.org> wrote:
> >
> > Doug Anderson observed that ChromeOS crashes are being reported which
> > include failing allocations of order 7 during core dumps due to ptrace
> > allocating storage for regsets:
> >
> >   chrome: page allocation failure: order:7,
> >           mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
> >           nodemask=(null),cpuset=urgent,mems_allowed=0
> >    ...
> >   regset_get_alloc+0x1c/0x28
> >   elf_core_dump+0x3d8/0xd8c
> >   do_coredump+0xeb8/0x1378
> >
> > with further investigation showing that this is:
> >
> >    [   66.957385] DOUG: Allocating 279584 bytes
> >
> > which is the maximum size of the SVE regset. As Doug observes it is not
> > entirely surprising that such a large allocation of contiguous memory might
> > fail on a long running system.
> >
> > The SVE regset is currently sized to hold SVE registers with a VQ of
> > SVE_VQ_MAX which is 512, substantially more than the architectural maximum
> > of 16 which we might see even in a system emulating the limits of the
> > architecture. Since we don't expose the size we tell the regset core
> > externally let's define ARCH_SVE_VQ_MAX with the actual architectural
> > maximum and use that for the regset, we'll still overallocate most of the
> > time but much less so which will be helpful even if the core is fixed to
> > not require contiguous allocations.
> >
> > We could also teach the ptrace core about runtime discoverable regset sizes
> > but that would be a more invasive change and this is being observed in
> > practical systems.
> >
> > Reported-by: Doug Anderson <dianders@chromium.org>
> > Signed-off-by: Mark Brown <broonie@kernel.org>
> 
> Confirmed that when I send a "quit" signal to Chrome now that the
> allocation I see for "core_note_type" NT_ARM_SVE goes down from
> 279,584 bytes (n=17474) to just 8,768 bytes (n=548). I'm not
> intimately familiar with this code so I'll skip the Reviewed-by unless
> someone thinks it would be valuable for me to analyze more. I think
> there are already plenty of people who know this well, so for now,
> just:
> 
> Tested-by: Douglas Anderson <dianders@chromium.org>

I can pick this up as a short-term hack if it solves the problem for you,
but I also saw that you posted:

https://lore.kernel.org/r/20240205092626.v2.1.Id9ad163b60d21c9e56c2d686b0cc9083a8ba7924@changeid

to fallback onto vmalloc() for large allocations.

What's your preference for a fix?

Cheers,

Will
  
Mark Brown Feb. 9, 2024, 5:40 p.m. UTC | #9
On Fri, Feb 09, 2024 at 05:11:55PM +0000, Will Deacon wrote:

> I can pick this up as a short-term hack if it solves the problem for you,
> but I also saw that you posted:

> https://lore.kernel.org/r/20240205092626.v2.1.Id9ad163b60d21c9e56c2d686b0cc9083a8ba7924@changeid

> to fallback onto vmalloc() for large allocations.

> What's your preference for a fix?

We need the change kvzalloc() regardless since the ZA regset is just
over 64K which is on the big side, I do think that reducing the reported
regset size is useful as a fix independently since even with kvzalloc()
it's still pointless and rude to ask for such a big allocation and we
might reasonably be generating core files or dumping crashing processes
under memory pressure.

Dave was looking at sizing the regsets dynamically but there's enough
going on there that it's not great if people want to pick it up for
stable.
  
Dave Martin Feb. 12, 2024, 4:50 p.m. UTC | #10
On Sat, Feb 03, 2024 at 12:16:49PM +0000, Mark Brown wrote:
> Doug Anderson observed that ChromeOS crashes are being reported which
> include failing allocations of order 7 during core dumps due to ptrace
> allocating storage for regsets:
> 
>   chrome: page allocation failure: order:7,
>           mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
>           nodemask=(null),cpuset=urgent,mems_allowed=0
>    ...
>   regset_get_alloc+0x1c/0x28
>   elf_core_dump+0x3d8/0xd8c
>   do_coredump+0xeb8/0x1378
> 
> with further investigation showing that this is:
> 
>    [   66.957385] DOUG: Allocating 279584 bytes
> 
> which is the maximum size of the SVE regset. As Doug observes it is not
> entirely surprising that such a large allocation of contiguous memory might
> fail on a long running system.
> 
> The SVE regset is currently sized to hold SVE registers with a VQ of
> SVE_VQ_MAX which is 512, substantially more than the architectural maximum
> of 16 which we might see even in a system emulating the limits of the
> architecture. Since we don't expose the size we tell the regset core
> externally let's define ARCH_SVE_VQ_MAX with the actual architectural
> maximum and use that for the regset, we'll still overallocate most of the
> time but much less so which will be helpful even if the core is fixed to
> not require contiguous allocations.
> 
> We could also teach the ptrace core about runtime discoverable regset sizes
> but that would be a more invasive change and this is being observed in
> practical systems.
> 
> Reported-by: Doug Anderson <dianders@chromium.org>
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
> We should probably also use the actual architectural limit for the
> bitmasks we use in the VL enumeration code, though that's both a little
> bit more involved and less immediately a problem.
> ---
>  arch/arm64/include/asm/fpsimd.h | 10 +++++-----
>  arch/arm64/kernel/ptrace.c      |  3 ++-
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 50e5f25d3024..cf5f31181bc8 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -62,12 +62,12 @@ static inline void cpacr_restore(unsigned long cpacr)
>   * When we defined the maximum SVE vector length we defined the ABI so
>   * that the maximum vector length included all the reserved for future
>   * expansion bits in ZCR rather than those just currently defined by
> - * the architecture. While SME follows a similar pattern the fact that
> - * it includes a square matrix means that any allocations that attempt
> - * to cover the maximum potential vector length (such as happen with
> - * the regset used for ptrace) end up being extremely large. Define
> - * the much lower actual limit for use in such situations.
> + * the architecture.  Using this length to allocate worst size buffers
> + * results in excessively large allocations, and this effect is even
> + * more pronounced for SME due to ZA.  Define more suitable VLs for
> + * these situations.
>   */
> +#define ARCH_SVE_VQ_MAX 16
>  #define SME_VQ_MAX	16
>  
>  struct task_struct;
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index dc6cf0e37194..e3bef38fc2e2 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -1500,7 +1500,8 @@ static const struct user_regset aarch64_regsets[] = {
>  #ifdef CONFIG_ARM64_SVE
>  	[REGSET_SVE] = { /* Scalable Vector Extension */
>  		.core_note_type = NT_ARM_SVE,
> -		.n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE),
> +		.n = DIV_ROUND_UP(SVE_PT_SIZE(ARCH_SVE_VQ_MAX,
> +					      SVE_PT_REGS_SVE),
>  				  SVE_VQ_BYTES),

Do we need an actual check somewhere that we don't bust this limit?

Since ZCR_ELx_LEN_MASK was changed from 0x1ff to 0xf, it looks like the
kernel itself will not generate an overlarge VL, although it feels a bit
like this guarantee arrives by accident.
Could ARCH_SVE_VQ_MAX be based on ZCR_ELx_LEN_MASK instead?

Userspace could specify vl > sve_vl_from_vq(ARCH_SVE_VQ_MAX) in
PTRACE_SETREGSET; I'm not sure exactly what happens there.

(The original 0x1ff value of ZCR_ELx_LEN_MASK was based on more than
just hope, but it does seem appropriate to restrict it now so that it
matches the formal architecture, as per commit f171f9e4097d
("arm64/fp: Make SVE and SME length register definition match
architecture") )

[...]

Cheers
---Dave
  
Mark Brown Feb. 12, 2024, 5:09 p.m. UTC | #11
On Mon, Feb 12, 2024 at 04:50:49PM +0000, Dave Martin wrote:
> On Sat, Feb 03, 2024 at 12:16:49PM +0000, Mark Brown wrote:

> > -		.n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE),
> > +		.n = DIV_ROUND_UP(SVE_PT_SIZE(ARCH_SVE_VQ_MAX,
> > +					      SVE_PT_REGS_SVE),
> >  				  SVE_VQ_BYTES),

> Do we need an actual check somewhere that we don't bust this limit?

..

> Userspace could specify vl > sve_vl_from_vq(ARCH_SVE_VQ_MAX) in
> PTRACE_SETREGSET; I'm not sure exactly what happens there.

We already have validation against the actual enumerated limits for the
system by virtue of setting the vector length to whatever is specified
so we'll limit any overly large vector length down to something we can
actually support and then reject an attempt to supply register data if
we changed the VL from what the user specified.

> Since ZCR_ELx_LEN_MASK was changed from 0x1ff to 0xf, it looks like the
> kernel itself will not generate an overlarge VL, although it feels a bit
> like this guarantee arrives by accident.
> Could ARCH_SVE_VQ_MAX be based on ZCR_ELx_LEN_MASK instead?

I guess.
  

Patch

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 50e5f25d3024..cf5f31181bc8 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -62,12 +62,12 @@  static inline void cpacr_restore(unsigned long cpacr)
  * When we defined the maximum SVE vector length we defined the ABI so
  * that the maximum vector length included all the reserved for future
  * expansion bits in ZCR rather than those just currently defined by
- * the architecture. While SME follows a similar pattern the fact that
- * it includes a square matrix means that any allocations that attempt
- * to cover the maximum potential vector length (such as happen with
- * the regset used for ptrace) end up being extremely large. Define
- * the much lower actual limit for use in such situations.
+ * the architecture.  Using this length to allocate worst size buffers
+ * results in excessively large allocations, and this effect is even
+ * more pronounced for SME due to ZA.  Define more suitable VLs for
+ * these situations.
  */
+#define ARCH_SVE_VQ_MAX 16
 #define SME_VQ_MAX	16
 
 struct task_struct;
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index dc6cf0e37194..e3bef38fc2e2 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -1500,7 +1500,8 @@  static const struct user_regset aarch64_regsets[] = {
 #ifdef CONFIG_ARM64_SVE
 	[REGSET_SVE] = { /* Scalable Vector Extension */
 		.core_note_type = NT_ARM_SVE,
-		.n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE),
+		.n = DIV_ROUND_UP(SVE_PT_SIZE(ARCH_SVE_VQ_MAX,
+					      SVE_PT_REGS_SVE),
 				  SVE_VQ_BYTES),
 		.size = SVE_VQ_BYTES,
 		.align = SVE_VQ_BYTES,