[0/4] arm64: Make Aarch32 compatibility enablement optional at boot

Message ID	cover.1697614386.git.andrea.porta@suse.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; From: Andrea della Porta <andrea.porta@suse.com> To: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: nik.borisov@suse.com, Andrea della Porta <andrea.porta@suse.com> Subject: [PATCH 0/4] arm64: Make Aarch32 compatibility enablement optional at boot Date: Wed, 18 Oct 2023 13:13:18 +0200 Message-ID: <cover.1697614386.git.andrea.porta@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	arm64: Make Aarch32 compatibility enablement optional at boot \| [0/4] arm64: Make Aarch32 compatibility enablement optional at boot [1/4] arm64: Introduce aarch32_enabled() [2/4] arm64/process: Make loading of 32bit processes depend on aarch32_enabled() [3/4] arm64/entry-common: Make Aarch32 syscalls' availability depend on aarch32_enabled() [4/4] arm64: Make Aarch32 emulation boot time configurable

Message ID

cover.1697614386.git.andrea.porta@suse.com

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38;
From: Andrea della Porta <andrea.porta@suse.com>
To: Catalin Marinas <catalin.marinas@arm.com>,
        Will Deacon <will@kernel.org>,
        linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org
Cc: nik.borisov@suse.com, Andrea della Porta <andrea.porta@suse.com>
Subject: [PATCH 0/4] arm64: Make Aarch32 compatibility enablement optional at
 boot
Date: Wed, 18 Oct 2023 13:13:18 +0200
Message-ID: <cover.1697614386.git.andrea.porta@suse.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

arm64: Make Aarch32 compatibility enablement optional at boot |

Message

Andrea della Porta Oct. 18, 2023, 11:13 a.m. UTC

  Aarch32 compatibility mode is enabled at compile time through
CONFIG_COMPAT Kconfig option. This patchset lets 32-bit support
(for both processes and syscalls) be enabled at boot time using
a kernel parameter. Also, it provides a mean for distributions 
to set their own default without sacrificing compatibility support,
that is users can override default behaviour through the kernel
parameter.

*** Notes about syscall management ***
VBAR_EL1 register, which holds the exception table address,
is setup very early in the boot process, before parse_early_param().
This means that it's not possible to access boot parameter before
setting the register. Also, setting the aforementioned register
for secondary cpus is done later in the boot flow.
Several ways to work around this has been considered, among which:

* resetting VBAR_EL1 to point to one of two vector tables (the
  former with 32-bit exceptions handler enabled and the latter
  pointing to unhandled stub, just as if CONFIG_COMPAT is enabled)
  depending on the proposed boot parameter. This has the disadvantage
  to produce a somewhat messy patchset involving several lines,
  has higher cognitive load since there are at least three places
  where the register is getting changed (not near to each other),
  and have implications on other code segments (namely kpti, kvm
  and vdso), requiring special care.

* patching the vector table contents once the early param is available.
  This has most of the implications of the previous option
  (except maybe not impacting other code segments), plus it sounds
  a little 'hackish'.

The chosen approach involves conditional executing 32-bit syscalls
depending on the parameter value. This of course results in a
little performance loss, but has the following advantages:

* all the cons from previously explained alternatives are solved
* users of 32-bit apps on 64-bit kernel are already suffering from
  performance losses due to 32-bit apps not fully leveraging the 64-bit
  processor, so they are already aware of this
* users of 32-bit apps on 64-bit kernel are believed
  to be a minority and most of the time there are sources available
  to be recompiled for 64-bit as a workaround for better performance

It worth mentioning that users of 64-bit apps are, of course,
unaffected.

Based on the work from Nikolay Borisov, see:
Link: https://lkml.org/lkml/2023/6/23/387

Andrea della Porta (4):
  arm64: Introduce aarch32_enabled()
  arm64/process: Make loading of 32bit processes depend on
    aarch32_enabled()
  arm64/entry-common: Make Aarch32 syscalls' availability depend on
    aarch32_enabled()
  arm64: Make Aarch32 emulation boot time configurable

 .../admin-guide/kernel-parameters.txt         |  7 ++++
 arch/arm64/Kconfig                            |  9 +++++
 arch/arm64/include/asm/compat.h               | 12 +++++++
 arch/arm64/kernel/entry-common.c              | 33 +++++++++++++++++--
 arch/arm64/kernel/process.c                   |  2 +-
 5 files changed, 59 insertions(+), 4 deletions(-)

Comments

Will Deacon Oct. 18, 2023, 12:27 p.m. UTC | #1

Hi,

On Wed, Oct 18, 2023 at 01:13:18PM +0200, Andrea della Porta wrote:
> Aarch32 compatibility mode is enabled at compile time through
> CONFIG_COMPAT Kconfig option. This patchset lets 32-bit support
> (for both processes and syscalls) be enabled at boot time using
> a kernel parameter. Also, it provides a mean for distributions 
> to set their own default without sacrificing compatibility support,
> that is users can override default behaviour through the kernel
> parameter.

I proposed something similar in the past:

https://lkml.kernel.org/linux-fsdevel/20210916131816.8841-1-will@kernel.org/

bu the conclusion there (see the reply from Kees) was that it was better
to either use existing seccomp mechanisms or add something to control
which binfmts can be loaded.

Will

Arnd Bergmann Oct. 18, 2023, 12:44 p.m. UTC | #2

On Wed, Oct 18, 2023, at 14:27, Will Deacon wrote:
> Hi,
>
> On Wed, Oct 18, 2023 at 01:13:18PM +0200, Andrea della Porta wrote:
>> Aarch32 compatibility mode is enabled at compile time through
>> CONFIG_COMPAT Kconfig option. This patchset lets 32-bit support
>> (for both processes and syscalls) be enabled at boot time using
>> a kernel parameter. Also, it provides a mean for distributions 
>> to set their own default without sacrificing compatibility support,
>> that is users can override default behaviour through the kernel
>> parameter.
>
> I proposed something similar in the past:
>
> https://lkml.kernel.org/linux-fsdevel/20210916131816.8841-1-will@kernel.org/
>
> bu the conclusion there (see the reply from Kees) was that it was better
> to either use existing seccomp mechanisms or add something to control
> which binfmts can be loaded.

Right, I was going to reply along the same lines here: x86 is
a bit of a special case that needs this, but I believe all the
other architectures already guard the compat syscall execution
on test_thread_flag(TIF_32BIT) that is only set by the compat
binfmt loader.

Doing the reverse is something that has however come up in the
past several times and that could be interesting: In order to
run userspace emulation (qemu-user, fex, ...) we may want to
allow calling syscalls and ioctls for foreign ABIs in a native
task, and at that point having a mechanism to control this
capability globally or per task would be useful as well.

The compat mode (arm32 on arm64) is the easiest case here, but the
same thing could be done for emulating the very subtle architecture
differences (x86-64 on arm64, arm64 on x86_64, arm32 on x86-compat,
or any of the above on riscv or  loongarch).

     Arnd

Mark Rutland Oct. 18, 2023, 12:52 p.m. UTC | #3

On Wed, Oct 18, 2023 at 01:13:18PM +0200, Andrea della Porta wrote:
> Aarch32 compatibility mode is enabled at compile time through
> CONFIG_COMPAT Kconfig option. This patchset lets 32-bit support
> (for both processes and syscalls) be enabled at boot time using
> a kernel parameter. Also, it provides a mean for distributions 
> to set their own default without sacrificing compatibility support,
> that is users can override default behaviour through the kernel
> parameter.

Can you elaborate on *why* people want such a policy?

> *** Notes about syscall management ***
> VBAR_EL1 register, which holds the exception table address,
> is setup very early in the boot process, before parse_early_param().
> This means that it's not possible to access boot parameter before
> setting the register. Also, setting the aforementioned register
> for secondary cpus is done later in the boot flow.
> Several ways to work around this has been considered, among which:
> 
> * resetting VBAR_EL1 to point to one of two vector tables (the
>   former with 32-bit exceptions handler enabled and the latter
>   pointing to unhandled stub, just as if CONFIG_COMPAT is enabled)
>   depending on the proposed boot parameter. This has the disadvantage
>   to produce a somewhat messy patchset involving several lines,
>   has higher cognitive load since there are at least three places
>   where the register is getting changed (not near to each other),
>   and have implications on other code segments (namely kpti, kvm
>   and vdso), requiring special care.
> 
> * patching the vector table contents once the early param is available.
>   This has most of the implications of the previous option
>   (except maybe not impacting other code segments), plus it sounds
>   a little 'hackish'.
> 
> The chosen approach involves conditional executing 32-bit syscalls
> depending on the parameter value.

Why does the compat syscall path need to do anything?

On arm64 it's not possible to issue compat syscalls from a native 64-bit task.
If you prevent the loading of AArch32 binaries, none of the compat syscalls
will be reachable at all.

That's the proper way to implement this, and we already have logic for that as
part of the mismatched AArch32 support.

> This of course results in a little performance loss, but has the following
> advantages:

A performance loss for what relative to what?

How much of a performance loss?

Mark.

> * all the cons from previously explained alternatives are solved
> * users of 32-bit apps on 64-bit kernel are already suffering from
>   performance losses due to 32-bit apps not fully leveraging the 64-bit
>   processor, so they are already aware of this
> * users of 32-bit apps on 64-bit kernel are believed
>   to be a minority and most of the time there are sources available
>   to be recompiled for 64-bit as a workaround for better performance
> 
> It worth mentioning that users of 64-bit apps are, of course,
> unaffected.
> 
> Based on the work from Nikolay Borisov, see:
> Link: https://lkml.org/lkml/2023/6/23/387
> 
> Andrea della Porta (4):
>   arm64: Introduce aarch32_enabled()
>   arm64/process: Make loading of 32bit processes depend on
>     aarch32_enabled()
>   arm64/entry-common: Make Aarch32 syscalls' availability depend on
>     aarch32_enabled()
>   arm64: Make Aarch32 emulation boot time configurable
> 
>  .../admin-guide/kernel-parameters.txt         |  7 ++++
>  arch/arm64/Kconfig                            |  9 +++++
>  arch/arm64/include/asm/compat.h               | 12 +++++++
>  arch/arm64/kernel/entry-common.c              | 33 +++++++++++++++++--
>  arch/arm64/kernel/process.c                   |  2 +-
>  5 files changed, 59 insertions(+), 4 deletions(-)
> 
> -- 
> 2.35.3
>

Andrea della Porta Oct. 19, 2023, 9:17 a.m. UTC | #4

On 13:27 Wed 18 Oct     , Will Deacon wrote:
> Hi,
> 
> On Wed, Oct 18, 2023 at 01:13:18PM +0200, Andrea della Porta wrote:
> > Aarch32 compatibility mode is enabled at compile time through
> > CONFIG_COMPAT Kconfig option. This patchset lets 32-bit support
> > (for both processes and syscalls) be enabled at boot time using
> > a kernel parameter. Also, it provides a mean for distributions 
> > to set their own default without sacrificing compatibility support,
> > that is users can override default behaviour through the kernel
> > parameter.
> 
> I proposed something similar in the past:
> 
> https://lkml.kernel.org/linux-fsdevel/20210916131816.8841-1-will@kernel.org/
> 
> bu the conclusion there (see the reply from Kees) was that it was better
> to either use existing seccomp mechanisms or add something to control
> which binfmts can be loaded.
> 
> Will

I see. Seccomp sounds like a really good idea, since just blocking the compat
binfmt would not avoid the call to 32-bit syscalls per se: it's true that
ARM64 enforce the transition from A64 to A32 only on exception return and
PSTATE.nRW flag can change only from EL1, maybe though some exploitation
may arise in the future to do just that (I'm not aware of any or come up
with a proof off the top of my head, but I can't exclude it either). So,
assuming by absurd a switch to A32 is feasible, the further step of embedding
A32 instruction in a A64 ELF executable is a breeze. Hence blocking the 
syscall (and not only the binfmt loading) could prove necessary. I know all
of this is higly speculative right now, maybe it's worth thinking nonetheless.

Andrea

Andrea della Porta Oct. 19, 2023, 10:52 a.m. UTC | #5

On 14:44 Wed 18 Oct     , Arnd Bergmann wrote:
> On Wed, Oct 18, 2023, at 14:27, Will Deacon wrote:
> > Hi,
> >
> > On Wed, Oct 18, 2023 at 01:13:18PM +0200, Andrea della Porta wrote:
> >> Aarch32 compatibility mode is enabled at compile time through
> >> CONFIG_COMPAT Kconfig option. This patchset lets 32-bit support
> >> (for both processes and syscalls) be enabled at boot time using
> >> a kernel parameter. Also, it provides a mean for distributions 
> >> to set their own default without sacrificing compatibility support,
> >> that is users can override default behaviour through the kernel
> >> parameter.
> >
> > I proposed something similar in the past:
> >
> > https://lkml.kernel.org/linux-fsdevel/20210916131816.8841-1-will@kernel.org/
> >
> > bu the conclusion there (see the reply from Kees) was that it was better
> > to either use existing seccomp mechanisms or add something to control
> > which binfmts can be loaded.
> 
> Right, I was going to reply along the same lines here: x86 is
> a bit of a special case that needs this, but I believe all the
> other architectures already guard the compat syscall execution
> on test_thread_flag(TIF_32BIT) that is only set by the compat
> binfmt loader.

Are you referring to the fact that x86 can switch at will between 32- and 64-
bit code?

Regarding the TIF_32BIT flag, thanks for the head-up. I still believe though
that this mechanism can somehow break down in the future, since prohibiting
32 bit executable loading *and* blocking 32 bit compat syscall are two
separate path of execution, held together by the architecture prohibiting
to switch to A32 instructions by design. Breaking the first rule and embedding 
wisely crafted A32 instruction in an executable is easy, while the difficult
part is finding some 'reentrancy' to be able to do the execution state switch,
as pinted out in https://lore.kernel.org/lkml/ZTD0DAes-J-YQ2eu@apocalypse/.
I agree it's highly speculative and not something to be concerned right
now, it's just a head up, should the need arise in the future.

> Doing the reverse is something that has however come up in the
> past several times and that could be interesting: In order to
> run userspace emulation (qemu-user, fex, ...) we may want to
> allow calling syscalls and ioctls for foreign ABIs in a native
> task, and at that point having a mechanism to control this
> capability globally or per task would be useful as well.
> 
> The compat mode (arm32 on arm64) is the easiest case here, but the
> same thing could be done for emulating the very subtle architecture
> differences (x86-64 on arm64, arm64 on x86_64, arm32 on x86-compat,
> or any of the above on riscv or  loongarch).
> 
>      Arnd

Really interesting, Since it's more related to emulation needs (my patch
has another focus due to the fact that A64 can execute A32 natively),
I'll take a look at this separately.

Andrea

Arnd Bergmann Oct. 19, 2023, 11:41 a.m. UTC | #6

On Thu, Oct 19, 2023, at 12:52, Andrea della Porta wrote:
> On 14:44 Wed 18 Oct     , Arnd Bergmann wrote:
>> On Wed, Oct 18, 2023, at 14:27, Will Deacon wrote:
>> 
>> Right, I was going to reply along the same lines here: x86 is
>> a bit of a special case that needs this, but I believe all the
>> other architectures already guard the compat syscall execution
>> on test_thread_flag(TIF_32BIT) that is only set by the compat
>> binfmt loader.
>
> Are you referring to the fact that x86 can switch at will between 32- and 64-
> bit code?

No.

> Regarding the TIF_32BIT flag, thanks for the head-up. I still believe though
> that this mechanism can somehow break down in the future, since prohibiting
> 32 bit executable loading *and* blocking 32 bit compat syscall are two
> separate path of execution, held together by the architecture prohibiting
> to switch to A32 instructions by design. Breaking the first rule and embedding 
> wisely crafted A32 instruction in an executable is easy, while the difficult
> part is finding some 'reentrancy' to be able to do the execution state switch,
> as pinted out in https://lore.kernel.org/lkml/ZTD0DAes-J-YQ2eu@apocalypse/.
> I agree it's highly speculative and not something to be concerned right
> now, it's just a head up, should the need arise in the future.

There are (at least) five separate aspects to compat mode that are easy
to mix up:

1. Instruction decoding -- switching between the modes supported by the
   CPU (A64/A32/T32)
2. Word size -- what happens to the upper 32 bits of a register in
   an arithmetic operation
3. Personality -- Which architecture string gets returned by the
   uname syscall (aarch64 vs armv8) as well as the format of
   /proc/cpuinfo
4. system call entry points -- how a process calls into native or
   compat syscalls, or possibly foreign OS emulation
5. Binary format -- elf32 vs elf64 executables

On most architectures with compat mode, 4. and 5. are fundamentally
tied together today: a compat task can only call compat syscalls
and a native task can only call native syscalls. x86 is the exception
here, as it uses different instructions (int80, syscall, sysenter)
and picks the syscall table based on that instruction.

I think 1. and 2. are also always tied to 5 on arm, but this is
not necessarily true for other architectures. 3. used to be tied
to 5 on some architectures in the past, but should be independent
now.

>> Doing the reverse is something that has however come up in the
>> past several times and that could be interesting: In order to
>> run userspace emulation (qemu-user, fex, ...) we may want to
>> allow calling syscalls and ioctls for foreign ABIs in a native
>> task, and at that point having a mechanism to control this
>> capability globally or per task would be useful as well.
>> 
>> The compat mode (arm32 on arm64) is the easiest case here, but the
>> same thing could be done for emulating the very subtle architecture
>> differences (x86-64 on arm64, arm64 on x86_64, arm32 on x86-compat,
>> or any of the above on riscv or  loongarch).
>
> Really interesting, Since it's more related to emulation needs (my patch
> has another focus due to the fact that A64 can execute A32 natively),
> I'll take a look at this separately.

A64 mode (unlike some other architectures, notably mips64) cannot
execute A32 or T32 instructions without a mode switch, the three are
entirely incompatible on the binary level.

Many ARMv8-CPUs support both Aarch64 mode and Aarch32 (A32/T32),
but a lot of the newer ones (e.g. Apple M1/M2, Cortex-R82 or
Cortex-A715) only do Aarch64 and need user-space emulation to run
32-bit binaries.

    Arnd

Andrea della Porta Oct. 19, 2023, 12:34 p.m. UTC | #7

On 13:52 Wed 18 Oct     , Mark Rutland wrote:
> On Wed, Oct 18, 2023 at 01:13:18PM +0200, Andrea della Porta wrote:
> > Aarch32 compatibility mode is enabled at compile time through
> > CONFIG_COMPAT Kconfig option. This patchset lets 32-bit support
> > (for both processes and syscalls) be enabled at boot time using
> > a kernel parameter. Also, it provides a mean for distributions 
> > to set their own default without sacrificing compatibility support,
> > that is users can override default behaviour through the kernel
> > parameter.
> 
> Can you elaborate on *why* people want such a policy?
> 

Formerly, the reason was to reduce kernel attack surface by excluding
compat syscall, wherever applicable. Much less important but still a point,
I would also say this could be a good chance to get rid of somewhat old
and stale 32-bit libraries and programs, but this is of course debatable.

> > *** Notes about syscall management ***
> > VBAR_EL1 register, which holds the exception table address,
> > is setup very early in the boot process, before parse_early_param().
> > This means that it's not possible to access boot parameter before
> > setting the register. Also, setting the aforementioned register
> > for secondary cpus is done later in the boot flow.
> > Several ways to work around this has been considered, among which:
> > 
> > * resetting VBAR_EL1 to point to one of two vector tables (the
> >   former with 32-bit exceptions handler enabled and the latter
> >   pointing to unhandled stub, just as if CONFIG_COMPAT is enabled)
> >   depending on the proposed boot parameter. This has the disadvantage
> >   to produce a somewhat messy patchset involving several lines,
> >   has higher cognitive load since there are at least three places
> >   where the register is getting changed (not near to each other),
> >   and have implications on other code segments (namely kpti, kvm
> >   and vdso), requiring special care.
> > 
> > * patching the vector table contents once the early param is available.
> >   This has most of the implications of the previous option
> >   (except maybe not impacting other code segments), plus it sounds
> >   a little 'hackish'.
> > 
> > The chosen approach involves conditional executing 32-bit syscalls
> > depending on the parameter value.
> 
> Why does the compat syscall path need to do anything?

I probably didn't catch your point here, compat syscall does not need to do
anything and they do not (just like they works right now with CONFIG_COMPAT
alone), except for the conditional instruction that excludes them at runtime.
Of course this conditional *is* doing something and somewhat redundant if
compat is disabled, but in this scenario I think it's unavoidable.

> 
> On arm64 it's not possible to issue compat syscalls from a native 64-bit task.
> If you prevent the loading of AArch32 binaries, none of the compat syscalls
> will be reachable at all.
> 
> That's the proper way to implement this, and we already have logic for that as
> part of the mismatched AArch32 support.
> 
> > This of course results in a little performance loss, but has the following
> > advantages:
> 
> A performance loss for what relative to what?

of a compat syscall as it is now enabling CONFIG_COMPAT vs the patched
syscall handlers that need a further conditional instruction to check
whether comapt is enabled or not.

> 
> How much of a performance loss?

I did not take measurement yet since it was just a qualitative consideration
more than a quantitative one, also considering that chances are that it would
affect just very little population. The conditional instruction time taken
to execute is reasonably near to negligible if compared to any syscall execution.