[2/2] arm64: errata: Add Cortex-A520 speculative unprivileged load workaround

Message ID 20230912121120.380420-2-robh@kernel.org
State New
Headers
Series [1/2] arm64: Add Cortex-A520 CPU part definition |

Commit Message

Rob Herring Sept. 12, 2023, 12:11 p.m. UTC
  Implement the workaround for ARM Cortex-A520 erratum 2966298. On an
affected Cortex-A520 core, a speculatively executed unprivileged load
might leak data from a privileged level via a cache side channel.

The workaround is to execute a TLBI before returning to EL0. A
non-shareable TLBI to any address is sufficient.

The workaround isn't necessary if page table isolation (KPTI) is
enabled, but for simplicity it will be. Page table isolation should
normally be disabled for Cortex-A520 as it supports the CSV3 feature
and the E0PD feature (used when KASLR is enabled).

Cc: stable@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
---
 Documentation/arch/arm64/silicon-errata.rst |  2 ++
 arch/arm64/Kconfig                          | 13 +++++++++++++
 arch/arm64/kernel/cpu_errata.c              |  8 ++++++++
 arch/arm64/kernel/entry.S                   |  4 ++++
 arch/arm64/tools/cpucaps                    |  1 +
 5 files changed, 28 insertions(+)
  

Comments

Will Deacon Sept. 18, 2023, 10:01 a.m. UTC | #1
On Tue, Sep 12, 2023 at 07:11:15AM -0500, Rob Herring wrote:
> Implement the workaround for ARM Cortex-A520 erratum 2966298. On an
> affected Cortex-A520 core, a speculatively executed unprivileged load
> might leak data from a privileged level via a cache side channel.
> 
> The workaround is to execute a TLBI before returning to EL0. A
> non-shareable TLBI to any address is sufficient.

Can you elaborate at all on how this works, please? A TLBI addressing a
cache side channel feels weird (or is "cache" referring to some TLB
structures rather than e.g. the data cache here?).

Assuming there's some vulnerable window between the speculative
unprivileged load and the completion of the TLBI, what prevents another
CPU from observing the side-channel during that time? Also, does the
TLBI need to be using the same ASID as the unprivileged load? If so, then
a context-switch could widen the vulnerable window quite significantly.

Anyway, hopefully I'm barking up the wrong tree, but it would be helpful
to have some intuition behind the workaround in order to review (and
maintain) this patch.

Cheers,

Will
  
Marc Zyngier Sept. 18, 2023, 10:17 a.m. UTC | #2
On 2023-09-18 11:01, Will Deacon wrote:
> On Tue, Sep 12, 2023 at 07:11:15AM -0500, Rob Herring wrote:
>> Implement the workaround for ARM Cortex-A520 erratum 2966298. On an
>> affected Cortex-A520 core, a speculatively executed unprivileged load
>> might leak data from a privileged level via a cache side channel.
>> 
>> The workaround is to execute a TLBI before returning to EL0. A
>> non-shareable TLBI to any address is sufficient.
> 
> Can you elaborate at all on how this works, please? A TLBI addressing a
> cache side channel feels weird (or is "cache" referring to some TLB
> structures rather than e.g. the data cache here?).
> 
> Assuming there's some vulnerable window between the speculative
> unprivileged load and the completion of the TLBI, what prevents another
> CPU from observing the side-channel during that time? Also, does the
> TLBI need to be using the same ASID as the unprivileged load? If so, 
> then
> a context-switch could widen the vulnerable window quite significantly.

Another 'interesting' case is the KVM world switch. If EL0 is
affected, what about EL1? Can such a data leak exist cross-EL1,
or from EL2 to El1? Asking for a friend...

         M.
  
Rob Herring Sept. 19, 2023, 12:21 p.m. UTC | #3
On Mon, Sep 18, 2023 at 5:01 AM Will Deacon <will@kernel.org> wrote:
>
> On Tue, Sep 12, 2023 at 07:11:15AM -0500, Rob Herring wrote:
> > Implement the workaround for ARM Cortex-A520 erratum 2966298. On an
> > affected Cortex-A520 core, a speculatively executed unprivileged load
> > might leak data from a privileged level via a cache side channel.
> >
> > The workaround is to execute a TLBI before returning to EL0. A
> > non-shareable TLBI to any address is sufficient.
>
> Can you elaborate at all on how this works, please?

Here's the write-up if you haven't read that already:

https://developer.arm.com/documentation/SDEN-2444153/0500/?lang=en

> A TLBI addressing a
> cache side channel feels weird (or is "cache" referring to some TLB
> structures rather than e.g. the data cache here?).

AIUI, the TLBI is simply enough to ensure the permission check happens
on the speculative load. It has nothing to do with actual TLB
contents.

This core has FEAT_E0PD and FEAT_CSV3 which should mitigate this
scenario, but this case is a narrow uarch condition which bypasses
those checks.

> Assuming there's some vulnerable window between the speculative
> unprivileged load and the completion of the TLBI, what prevents another
> CPU from observing the side-channel during that time?

The cache hit is private to the core. How would another core observe that?

> Also, does the
> TLBI need to be using the same ASID as the unprivileged load? If so, then
> a context-switch could widen the vulnerable window quite significantly.

No, the TLBI can be any context and/or address including unused addresses.

Rob
  
Rob Herring Sept. 19, 2023, 12:29 p.m. UTC | #4
On Mon, Sep 18, 2023 at 5:18 AM Marc Zyngier <maz@misterjones.org> wrote:
>
> On 2023-09-18 11:01, Will Deacon wrote:
> > On Tue, Sep 12, 2023 at 07:11:15AM -0500, Rob Herring wrote:
> >> Implement the workaround for ARM Cortex-A520 erratum 2966298. On an
> >> affected Cortex-A520 core, a speculatively executed unprivileged load
> >> might leak data from a privileged level via a cache side channel.
> >>
> >> The workaround is to execute a TLBI before returning to EL0. A
> >> non-shareable TLBI to any address is sufficient.
> >
> > Can you elaborate at all on how this works, please? A TLBI addressing a
> > cache side channel feels weird (or is "cache" referring to some TLB
> > structures rather than e.g. the data cache here?).
> >
> > Assuming there's some vulnerable window between the speculative
> > unprivileged load and the completion of the TLBI, what prevents another
> > CPU from observing the side-channel during that time? Also, does the
> > TLBI need to be using the same ASID as the unprivileged load? If so,
> > then
> > a context-switch could widen the vulnerable window quite significantly.
>
> Another 'interesting' case is the KVM world switch. If EL0 is
> affected, what about EL1? Can such a data leak exist cross-EL1,
> or from EL2 to El1? Asking for a friend...

I'm checking for a definitive answer, but page table isolation also
avoids the issue. Wouldn't these scenarios all be similar to page
table isolation in that the EL2 or prior EL1 context is unmapped?

Rob
  
Marc Zyngier Sept. 19, 2023, 12:49 p.m. UTC | #5
On Tue, 19 Sep 2023 13:29:07 +0100,
Rob Herring <robh@kernel.org> wrote:
> 
> On Mon, Sep 18, 2023 at 5:18 AM Marc Zyngier <maz@misterjones.org> wrote:
> >
> > On 2023-09-18 11:01, Will Deacon wrote:
> > > On Tue, Sep 12, 2023 at 07:11:15AM -0500, Rob Herring wrote:
> > >> Implement the workaround for ARM Cortex-A520 erratum 2966298. On an
> > >> affected Cortex-A520 core, a speculatively executed unprivileged load
> > >> might leak data from a privileged level via a cache side channel.
> > >>
> > >> The workaround is to execute a TLBI before returning to EL0. A
> > >> non-shareable TLBI to any address is sufficient.
> > >
> > > Can you elaborate at all on how this works, please? A TLBI addressing a
> > > cache side channel feels weird (or is "cache" referring to some TLB
> > > structures rather than e.g. the data cache here?).
> > >
> > > Assuming there's some vulnerable window between the speculative
> > > unprivileged load and the completion of the TLBI, what prevents another
> > > CPU from observing the side-channel during that time? Also, does the
> > > TLBI need to be using the same ASID as the unprivileged load? If so,
> > > then
> > > a context-switch could widen the vulnerable window quite significantly.
> >
> > Another 'interesting' case is the KVM world switch. If EL0 is
> > affected, what about EL1? Can such a data leak exist cross-EL1,
> > or from EL2 to El1? Asking for a friend...
> 
> I'm checking for a definitive answer, but page table isolation also
> avoids the issue. Wouldn't these scenarios all be similar to page
> table isolation in that the EL2 or prior EL1 context is unmapped?

No, EL2 is always mapped, and we don't have anything like KPTI there.

Maybe the saving grace is that EL2 and EL2&0 are different translation
regimes from EL1&0, but there's nothing in the commit message that
indicates it. As for EL1-to-EL1 leaks, it again completely depends on
how the TLBs are tagged.

You'd hope that having different VMIDs would save the bacon, but if
you can leak EL1 translations into EL0, it means that the associated
permission and/or tags do not contain all the required information...

	M.
  
Rob Herring Sept. 20, 2023, 4:47 p.m. UTC | #6
On Tue, Sep 19, 2023 at 7:50 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 19 Sep 2023 13:29:07 +0100,
> Rob Herring <robh@kernel.org> wrote:
> >
> > On Mon, Sep 18, 2023 at 5:18 AM Marc Zyngier <maz@misterjones.org> wrote:
> > >
> > > On 2023-09-18 11:01, Will Deacon wrote:
> > > > On Tue, Sep 12, 2023 at 07:11:15AM -0500, Rob Herring wrote:
> > > >> Implement the workaround for ARM Cortex-A520 erratum 2966298. On an
> > > >> affected Cortex-A520 core, a speculatively executed unprivileged load
> > > >> might leak data from a privileged level via a cache side channel.
> > > >>
> > > >> The workaround is to execute a TLBI before returning to EL0. A
> > > >> non-shareable TLBI to any address is sufficient.
> > > >
> > > > Can you elaborate at all on how this works, please? A TLBI addressing a
> > > > cache side channel feels weird (or is "cache" referring to some TLB
> > > > structures rather than e.g. the data cache here?).
> > > >
> > > > Assuming there's some vulnerable window between the speculative
> > > > unprivileged load and the completion of the TLBI, what prevents another
> > > > CPU from observing the side-channel during that time? Also, does the
> > > > TLBI need to be using the same ASID as the unprivileged load? If so,
> > > > then
> > > > a context-switch could widen the vulnerable window quite significantly.
> > >
> > > Another 'interesting' case is the KVM world switch. If EL0 is
> > > affected, what about EL1? Can such a data leak exist cross-EL1,
> > > or from EL2 to El1? Asking for a friend...
> >
> > I'm checking for a definitive answer, but page table isolation also
> > avoids the issue. Wouldn't these scenarios all be similar to page
> > table isolation in that the EL2 or prior EL1 context is unmapped?
>
> No, EL2 is always mapped, and we don't have anything like KPTI there.
>
> Maybe the saving grace is that EL2 and EL2&0 are different translation
> regimes from EL1&0, but there's nothing in the commit message that
> indicates it. As for EL1-to-EL1 leaks, it again completely depends on
> how the TLBs are tagged.

Different translation regimes are not affected. It must be the same
regime and same translation.

> You'd hope that having different VMIDs would save the bacon, but if
> you can leak EL1 translations into EL0, it means that the associated
> permission and/or tags do not contain all the required information...

The VMID is part of the equation. See here[1].

Rob

[1] https://developer.arm.com/documentation/102517/0001/Memory-management/Translation-Lookaside-Buffer-match-process?lang=en
  
Marc Zyngier Sept. 20, 2023, 5:17 p.m. UTC | #7
On Wed, 20 Sep 2023 17:47:35 +0100,
Rob Herring <robh@kernel.org> wrote:
> 
> On Tue, Sep 19, 2023 at 7:50 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Tue, 19 Sep 2023 13:29:07 +0100,
> > Rob Herring <robh@kernel.org> wrote:
> > >
> > > On Mon, Sep 18, 2023 at 5:18 AM Marc Zyngier <maz@misterjones.org> wrote:
> > > >
> > > > On 2023-09-18 11:01, Will Deacon wrote:
> > > > > On Tue, Sep 12, 2023 at 07:11:15AM -0500, Rob Herring wrote:
> > > > >> Implement the workaround for ARM Cortex-A520 erratum 2966298. On an
> > > > >> affected Cortex-A520 core, a speculatively executed unprivileged load
> > > > >> might leak data from a privileged level via a cache side channel.
> > > > >>
> > > > >> The workaround is to execute a TLBI before returning to EL0. A
> > > > >> non-shareable TLBI to any address is sufficient.
> > > > >
> > > > > Can you elaborate at all on how this works, please? A TLBI addressing a
> > > > > cache side channel feels weird (or is "cache" referring to some TLB
> > > > > structures rather than e.g. the data cache here?).
> > > > >
> > > > > Assuming there's some vulnerable window between the speculative
> > > > > unprivileged load and the completion of the TLBI, what prevents another
> > > > > CPU from observing the side-channel during that time? Also, does the
> > > > > TLBI need to be using the same ASID as the unprivileged load? If so,
> > > > > then
> > > > > a context-switch could widen the vulnerable window quite significantly.
> > > >
> > > > Another 'interesting' case is the KVM world switch. If EL0 is
> > > > affected, what about EL1? Can such a data leak exist cross-EL1,
> > > > or from EL2 to El1? Asking for a friend...
> > >
> > > I'm checking for a definitive answer, but page table isolation also
> > > avoids the issue. Wouldn't these scenarios all be similar to page
> > > table isolation in that the EL2 or prior EL1 context is unmapped?
> >
> > No, EL2 is always mapped, and we don't have anything like KPTI there.
> >
> > Maybe the saving grace is that EL2 and EL2&0 are different translation
> > regimes from EL1&0, but there's nothing in the commit message that
> > indicates it. As for EL1-to-EL1 leaks, it again completely depends on
> > how the TLBs are tagged.
> 
> Different translation regimes are not affected. It must be the same
> regime and same translation.

It would be good to capture this, then.

> 
> > You'd hope that having different VMIDs would save the bacon, but if
> > you can leak EL1 translations into EL0, it means that the associated
> > permission and/or tags do not contain all the required information...
> 
> The VMID is part of the equation. See here[1].

I have a pretty good idea of how TLB are *supposed* to behave. The
fact that you need some sort of invalidation on ERET to EL0 is the
proof that this CPU doesn't follow these rules to the letter...

	M.
  

Patch

diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index e96f057ea2a0..f47f63bcf67c 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -71,6 +71,8 @@  stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A510     | #2658417        | ARM64_ERRATUM_2658417       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A520     | #2966298        | ARM64_ERRATUM_2966298       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A53      | #826319         | ARM64_ERRATUM_826319        |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A53      | #827319         | ARM64_ERRATUM_827319        |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b10515c0200b..78f20e632712 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1037,6 +1037,19 @@  config ARM64_ERRATUM_2645198
 
 	  If unsure, say Y.
 
+config ARM64_ERRATUM_2966298
+	bool "Cortex-A520: 2966298: workaround for speculatively executed unprivileged load"
+	default y
+	help
+	  This option adds the workaround for ARM Cortex-A520 erratum 2966298.
+
+	  On an affected Cortex-A520 core, a speculatively executed unprivileged
+	  load might leak data from a privileged level via a cache side channel.
+
+	  Work around this problem by executing a TLBI before returning to EL0.
+
+	  If unsure, say Y.
+
 config CAVIUM_ERRATUM_22375
 	bool "Cavium erratum 22375, 24313"
 	default y
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index be66e94a21bd..5706e74c5578 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -730,6 +730,14 @@  const struct arm64_cpu_capabilities arm64_errata[] = {
 		.cpu_enable = cpu_clear_bf16_from_user_emulation,
 	},
 #endif
+#ifdef CONFIG_ARM64_ERRATUM_2966298
+	{
+		.desc = "ARM erratum 2966298",
+		.capability = ARM64_WORKAROUND_2966298,
+		/* Cortex-A520 r0p0 - r0p1 */
+		ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A520, 0, 0, 1),
+	},
+#endif
 #ifdef CONFIG_AMPERE_ERRATUM_AC03_CPU_38
 	{
 		.desc = "AmpereOne erratum AC03_CPU_38",
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 6ad61de03d0a..a6030913cd58 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -428,6 +428,10 @@  alternative_else_nop_endif
 	ldp	x28, x29, [sp, #16 * 14]
 
 	.if	\el == 0
+alternative_if ARM64_WORKAROUND_2966298
+	tlbi	vale1, xzr
+	dsb	nsh
+alternative_else_nop_endif
 alternative_if_not ARM64_UNMAP_KERNEL_AT_EL0
 	ldr	lr, [sp, #S_LR]
 	add	sp, sp, #PT_REGS_SIZE		// restore sp
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index c3f06fdef609..dea3dc89234b 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -84,6 +84,7 @@  WORKAROUND_2077057
 WORKAROUND_2457168
 WORKAROUND_2645198
 WORKAROUND_2658417
+WORKAROUND_2966298
 WORKAROUND_AMPERE_AC03_CPU_38
 WORKAROUND_TRBE_OVERWRITE_FILL_MODE
 WORKAROUND_TSB_FLUSH_FAILURE