Message ID | CAP5Mv+ydhk=Ob4b40ZahGMgT-5+-VEHxtmA=-LkJiEOOU+K6hw@mail.gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-63862-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp639530dyb; Tue, 13 Feb 2024 08:08:03 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWIbAlKFkYVPjOwxFOnrAoB5/2bndxxs5UmzgslhOokjRhKki+QC+jDvyh5sAsSgM7qegQL+QdRTOSerR+bSdQy95z12w== X-Google-Smtp-Source: AGHT+IHpvvU7lGJizseq58uA3ycKeHACoq0mZLc6jVSVuvzcMAZNtqQZa+AVFT4Tn1ytFx3jEcIu X-Received: by 2002:a05:6214:2a88:b0:68c:facf:710e with SMTP id jr8-20020a0562142a8800b0068cfacf710emr12667615qvb.41.1707840483664; Tue, 13 Feb 2024 08:08:03 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707840483; cv=pass; d=google.com; s=arc-20160816; b=qjXVClU4BlxFhkDB+MPdvN7Vn1nCZxHh6dfGPU2xc8yAgMU8Hqeux6OFw3v1Hww6uW 5KjmRQlbQt25qElimHtz2UiFfILgKgc+ZbM09zPHh0w6zMJmxg9RXWuDNO2/sRl25kAJ uxFafoBCjULEIu/oeIqTPYCcIHGcFR1iD47a7xAYVrAhu932yJ1VX3uB5ytsFsXT5Oqh Sl3c2oJWCDilEzOt0CGWdYVoQb1YILUz7B+ZPlGm3PznOU9eXfF0VeL8l+6xcNf1r8zc tVubfNHlue+7Spzt9YplazsiHzSpTKOa+Bmi3eRTqghewkPcaWbyse3QeNJvBg93ZAnx D6DQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:dkim-signature; bh=CEDvSvYhyauFeH8HCgsVW9bjAlPcgBCfLAOG21KUOeA=; fh=WS67ZRx7F7rA6McYfWgP2PI5msO/ACQeTKW1FBvNO9c=; b=BjEzPFeQFTzBMbZqzq18nPwp+lA7nAmGKgniV0Zfk5UZA62rY4a76WTAkr9t7npin+ 1gNYnRr2xsdok+UzN3Ggk5fCHKh2YTHt1Oz1oaGZezClShuBPrZANkvNPiFuxrdy8lXv vbedV2AWGhiRt4UPplMmPC/6CO/4lYUCwL1WfpXqSDY4Vp+UBC5iGKj8P2zGyeP3Idk9 PaERSPZ2VTaw7GE9KDN/lEW3hpIIwT5nk44/6xVNFm+jV0sf6V8iwsTMffp0/5IsS9tq MeKocdEnW05+h1a3v6SeYIOZ3rez6tNHi0haZkk/k49E6DfmaMymUjU7ULWPbJyi2Zrr 7C/g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@motorola-com.20230601.gappssmtp.com header.s=20230601 header.b=Ia9esmYW; arc=pass (i=1 spf=pass spfdomain=motorola.com dkim=pass dkdomain=motorola-com.20230601.gappssmtp.com dmarc=pass fromdomain=motorola.com); spf=pass (google.com: domain of linux-kernel+bounces-63862-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-63862-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=motorola.com X-Forwarded-Encrypted: i=2; AJvYcCUr+9QQDWeHeF56cGvJHbjurD9LomVd5f7+VTpCppl/qLskJp/ANqR+QaJA2n16TkFV6kJ/xB01C3GKEzPQd6yfVKMjpA== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id 2-20020a0562140dc200b0068edf655758si1434066qvt.146.2024.02.13.08.08.03 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 08:08:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-63862-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@motorola-com.20230601.gappssmtp.com header.s=20230601 header.b=Ia9esmYW; arc=pass (i=1 spf=pass spfdomain=motorola.com dkim=pass dkdomain=motorola-com.20230601.gappssmtp.com dmarc=pass fromdomain=motorola.com); spf=pass (google.com: domain of linux-kernel+bounces-63862-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-63862-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=motorola.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 644CB1C21CA6 for <ouuuleilei@gmail.com>; Tue, 13 Feb 2024 16:08:03 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5E96C604D2; Tue, 13 Feb 2024 16:06:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=motorola-com.20230601.gappssmtp.com header.i=@motorola-com.20230601.gappssmtp.com header.b="Ia9esmYW" Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com [209.85.167.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 796665FB8D for <linux-kernel@vger.kernel.org>; Tue, 13 Feb 2024 16:05:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707840360; cv=none; b=LREvEabVpJNfmCwkbUvmEmZ1hF5MGH6Jr4TXkjTbsyjA5IcS80jdgrAdpuhgLzDgPw43EE07RY/Y8vYUmztjZaHZMrf0MD58+Pd5k3IsmJjiy5PsEaNOKQgW6AQKVIjA0GPczPzsl2F8wn214C9Sld2TrPFaTXgMuczKJUx4cK8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707840360; c=relaxed/simple; bh=j7L57yfDJOOv+ZUdMV9NCeFXUFzD3Nw58mseoOFjGSk=; h=MIME-Version:From:Date:Message-ID:Subject:To:Cc:Content-Type; b=YpBcAamEntQLkKXFfNlQsJv/G2zzTEY36dHQ8z96NavR+xQ5NyXQTPSXB61gyN0zT3MktsgC+WcpiBS7w4a/k6+F+qmaK+URijtoUJBwbOGIL37v8ySDaPFnce2tSjhw86esUMw+j7zSAp98TYrtaZVTI9iar+ZWGKB/qv2qPr8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=motorola.com; spf=pass smtp.mailfrom=motorola.com; dkim=pass (2048-bit key) header.d=motorola-com.20230601.gappssmtp.com header.i=@motorola-com.20230601.gappssmtp.com header.b=Ia9esmYW; arc=none smtp.client-ip=209.85.167.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=motorola.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=motorola.com Received: by mail-lf1-f46.google.com with SMTP id 2adb3069b0e04-51147d0abd1so5169076e87.1 for <linux-kernel@vger.kernel.org>; Tue, 13 Feb 2024 08:05:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=motorola-com.20230601.gappssmtp.com; s=20230601; t=1707840356; x=1708445156; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=CEDvSvYhyauFeH8HCgsVW9bjAlPcgBCfLAOG21KUOeA=; b=Ia9esmYWxK0VBMPHRamHIWkGOJs6MqccjC8mxs7G9g0MEepJTN997nGDr9nGOTcLJx ehRuI4zl1XdDUP1ZWc6DyivOl3xb/nQDWMGiSzZl3gPqvjwamWV1tL2Bb8dX2KBQ2eQ5 qHDDnlybS73eJLRa67dEpPRNX5+MndcUzSqcfj5LqJzfX01EQ/GMlGO4r4ZC8Qjs7xjF 7VO8NiIAUMnKoqVqMD8fEJhATP5DH3w44DjYqDldc0gMZ5wv4XsV8QSbqvk41HvRNqIc HPNyHHsuCgAd/L6DZOEwCdHPVoQhTcmn0MoRVoHXoJmFg0C9eczEj8DkC/AzDSyRp3aF o7Lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707840356; x=1708445156; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=CEDvSvYhyauFeH8HCgsVW9bjAlPcgBCfLAOG21KUOeA=; b=Q2JidTsFf2zWISE0tAQ/fDL7m1kC+gICN/r+F2Do6Z42hMmfZTgfNUN+Tb8F4kWD1m NaQadgWakOgzvTbNd3QqCei38T8zb7ybIhPIKeNf0CzpI2qnxwxvHNK7cDE6IepyHDmG lKEpRFi6js+gWqQq09liC8kU0HmrokPAm4OpxXrhq5+KgJRfE/Ad3CqP81Fwc2QLyC+J 35+9s/H0s6HqGN4TynyziNlvUqan4BhLUuGlXnLJ9bCiyfmF6fV5t8ODOJD/ZPu5iff7 lhPKig76nha6VeIJSfR69+cUt3r9yqQrCRgqrmX9rTI7Jx6Zw1NNgpLp0GdFsMh2gMdC ppLQ== X-Gm-Message-State: AOJu0YyRsFCg7PDmUPFMS23JzruQNsCU7Rv+05mijDhqeBjx61/xsIhA l3U64Ya7EuJUZCTm+aT90/Iy8O3wBvaCNg2eIkS4cdh9rNfsYEJJOTL+L7Pr5Ud6trOvGFC9Pcv +aLAAWCDm6FcKWQXO2DE9Vr2daIbRHJe0AFsI X-Received: by 2002:a05:6512:158f:b0:511:47f7:62e0 with SMTP id bp15-20020a056512158f00b0051147f762e0mr9204732lfb.21.1707840356309; Tue, 13 Feb 2024 08:05:56 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 From: Maxwell Bland <mbland@motorola.com> Date: Tue, 13 Feb 2024 10:05:45 -0600 Message-ID: <CAP5Mv+ydhk=Ob4b40ZahGMgT-5+-VEHxtmA=-LkJiEOOU+K6hw@mail.gmail.com> Subject: [PATCH] arm64: allow post-init vmalloc PXNTable To: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, catalin.marinas@arm.com, will@kernel.org, dennis@kernel.org, tj@kernel.org, cl@linux.com, akpm@linux-foundation.org, shikemeng@huaweicloud.com, david@redhat.com, rppt@kernel.org, anshuman.khandual@arm.com, willy@infradead.org, ryan.roberts@arm.com, rick.p.edgecombe@intel.com, pcc@google.com, mbland@motorola.com, mark.rutland@arm.com, rmk+kernel@armlinux.org.uk, tglx@linutronix.de, gshan@redhat.com, gregkh@linuxfoundation.org, Jonathan.Cameron@huawei.com, james.morse@arm.com, awheeler@motorola.com Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790800542856647300 X-GMAIL-MSGID: 1790800542856647300 |
Series |
arm64: allow post-init vmalloc PXNTable
|
|
Commit Message
Maxwell Bland
Feb. 13, 2024, 4:05 p.m. UTC
Apologies if this is a duplicate mail, it will be the last one. Moto's SMTP
server sucks!!
Ensures that PXNTable can be set on all table descriptors allocated
through vmalloc. Normally, PXNTable is set only during initial memory
mapping and does not apply thereafter, making it possible for attackers
to target post-init allocated writable PTEs as a staging region for
injection of their code into the kernel. Presently it is not possible to
efficiently prevent these attacks as VMALLOC_END overlaps with _text,
e.g.:
VMALLOC_START ffff800080000000 VMALLOC_END fffffbfff0000000
_text ffffb6c0c1400000 _end ffffb6c0c3e40000
Setting VMALLOC_END to _text in init would resolve this issue with the
caveat of a sizeable reduction in the size of available vmalloc memory
due to requirements on aslr randomness. However, there are circumstances
where this trade-off is necessary: in particular, hypervisor-level
security monitors where 1) the microarchitecture contains race
conditions on PTE level updates or 2) a per-PTE update verifier comes at
a significant hit to performance.
Because the address of _text is aslr-sensitive and this patch associates
this value with VMALLOC_END, we remove the use of VMALLOC_END in a print
statement in mm/percpu.c. However, only the format string is updated in
crash_core.c, since we are dead at that point regardless. VMALLOC_END is
updated in kernel/setup.c to associate the feature closely with aslr and
region allocation code.
Signed-off-by: Maxwell Bland <mbland@motorola.com>
---
arch/arm64/Kconfig | 13 +++++++++++++
arch/arm64/include/asm/pgtable.h | 6 ++++++
arch/arm64/include/asm/vmalloc-pxn.h | 10 ++++++++++
arch/arm64/kernel/crash_core.c | 2 +-
arch/arm64/kernel/setup.c | 9 +++++++++
mm/percpu.c | 4 ++--
6 files changed, 41 insertions(+), 3 deletions(-)
create mode 100644 arch/arm64/include/asm/vmalloc-pxn.h
rc = -EINVAL;
base-commit: 716f4aaa7b48a55c73d632d0657b35342b1fefd7
Comments
On Tue, Feb 13, 2024 at 10:05:45AM -0600, Maxwell Bland wrote: > Apologies if this is a duplicate mail, it will be the last one. Moto's SMTP > server sucks!! This shouldn't be in the changelog, it needs to go below the --- line. Also, your patch is corrupted and can not be applied :( thanks, greg k-h
> This shouldn't be in the changelog, it needs to go below the --- line. Oh! Thanks!!! > Also, your patch is corrupted and can not be applied :( Shoot! Apologies, I just noticed the hard-wrap at 80. I am talking to Moto's IT right now, I will resend patch once I can fix the mail server config. Maxwell
On Tue, Feb 13, 2024 at 05:16:04PM +0000, Maxwell Bland wrote: > > This shouldn't be in the changelog, it needs to go below the --- line. > > Oh! Thanks!!! > > > Also, your patch is corrupted and can not be applied :( > > Shoot! Apologies, I just noticed the hard-wrap at 80. I am talking to Moto's IT right now, I will resend patch once I can fix the mail server config. tabs were also eaten :(
On Tue, Feb 13, 2024 at 10:05:45AM -0600, Maxwell Bland wrote: > Apologies if this is a duplicate mail, it will be the last one. Moto's SMTP > server sucks!! > > Ensures that PXNTable can be set on all table descriptors allocated > through vmalloc. Normally, PXNTable is set only during initial memory > mapping and does not apply thereafter, making it possible for attackers > to target post-init allocated writable PTEs as a staging region for > injection of their code into the kernel. Presently it is not possible to > efficiently prevent these attacks as VMALLOC_END overlaps with _text, > e.g.: > > VMALLOC_START ffff800080000000 VMALLOC_END fffffbfff0000000 > _text ffffb6c0c1400000 _end ffffb6c0c3e40000 > > Setting VMALLOC_END to _text in init would resolve this issue with the > caveat of a sizeable reduction in the size of available vmalloc memory > due to requirements on aslr randomness. However, there are circumstances > where this trade-off is necessary: in particular, hypervisor-level > security monitors where 1) the microarchitecture contains race > conditions on PTE level updates or 2) a per-PTE update verifier comes at > a significant hit to performance. Which "hypervisor-level security monitors" are you referring to? We don't support any of those upstream AFAIK. How much VA space are you potentially throwing away? How does this work with other allocations of executable memory? e.g. modules, BPF? I'm not keen on this as-is. Mark. > Because the address of _text is aslr-sensitive and this patch associates > this value with VMALLOC_END, we remove the use of VMALLOC_END in a print > statement in mm/percpu.c. However, only the format string is updated in > crash_core.c, since we are dead at that point regardless. VMALLOC_END is > updated in kernel/setup.c to associate the feature closely with aslr and > region allocation code. > > Signed-off-by: Maxwell Bland <mbland@motorola.com> > --- > arch/arm64/Kconfig | 13 +++++++++++++ > arch/arm64/include/asm/pgtable.h | 6 ++++++ > arch/arm64/include/asm/vmalloc-pxn.h | 10 ++++++++++ > arch/arm64/kernel/crash_core.c | 2 +- > arch/arm64/kernel/setup.c | 9 +++++++++ > mm/percpu.c | 4 ++-- > 6 files changed, 41 insertions(+), 3 deletions(-) > create mode 100644 arch/arm64/include/asm/vmalloc-pxn.h > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index aa7c1d435139..5f1e75d70e14 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -2165,6 +2165,19 @@ config ARM64_DEBUG_PRIORITY_MASKING > If unsure, say N > endif # ARM64_PSEUDO_NMI > > +config ARM64_VMALLOC_PXN > + bool "Ensures table descriptors pointing to kernel data are PXNTable" > + help > + Reduces the range of the kernel data vmalloc region to remove any > + overlap with kernel code, making it possible to enable the PXNTable > + bit on table descriptors allocated after the kernel's initial memory > + mapping. > + > + This increases the performance of security monitors which protect > + against malicious updates to page table entries. > + > + If unsure, say N. > + > config RELOCATABLE > bool "Build a relocatable kernel image" if EXPERT > select ARCH_HAS_RELR > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 79ce70fbb751..49f64ea77c81 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -22,7 +22,9 @@ > * and fixed mappings > */ > #define VMALLOC_START (MODULES_END) > +#ifndef CONFIG_ARM64_VMALLOC_PXN > #define VMALLOC_END (VMEMMAP_START - SZ_256M) > +#endif > > #define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT)) > > @@ -35,6 +37,10 @@ > #include <linux/sched.h> > #include <linux/page_table_check.h> > > +#ifdef CONFIG_ARM64_VMALLOC_PXN > +#include <asm/vmalloc-pxn.h> > +#endif > + > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE > > diff --git a/arch/arm64/include/asm/vmalloc-pxn.h > b/arch/arm64/include/asm/vmalloc-pxn.h > new file mode 100644 > index 000000000000..c8c4f878eb62 > --- /dev/null > +++ b/arch/arm64/include/asm/vmalloc-pxn.h > @@ -0,0 +1,10 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_ARM64_VMALLOC_PXN_H > +#define _ASM_ARM64_VMALLOC_PXN_H > + > +#ifdef CONFIG_ARM64_VMALLOC_PXN > +extern u64 __vmalloc_end __ro_after_init; > +#define VMALLOC_END (__vmalloc_end) > +#endif /* CONFIG_ARM64_VMALLOC_PXN */ > + > +#endif /* _ASM_ARM64_VMALLOC_PXN_H */ > diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c > index 66cde752cd74..39dccae11a40 100644 > --- a/arch/arm64/kernel/crash_core.c > +++ b/arch/arm64/kernel/crash_core.c > @@ -24,7 +24,7 @@ void arch_crash_save_vmcoreinfo(void) > vmcoreinfo_append_str("NUMBER(MODULES_VADDR)=0x%lx\n", MODULES_VADDR); > vmcoreinfo_append_str("NUMBER(MODULES_END)=0x%lx\n", MODULES_END); > vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START); > - vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%lx\n", VMALLOC_END); > + vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%llx\n", VMALLOC_END); > vmcoreinfo_append_str("NUMBER(VMEMMAP_START)=0x%lx\n", VMEMMAP_START); > vmcoreinfo_append_str("NUMBER(VMEMMAP_END)=0x%lx\n", VMEMMAP_END); > vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n", > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > index 42c690bb2d60..b7ccee672743 100644 > --- a/arch/arm64/kernel/setup.c > +++ b/arch/arm64/kernel/setup.c > @@ -54,6 +54,11 @@ > #include <asm/xen/hypervisor.h> > #include <asm/mmu_context.h> > > +#ifdef CONFIG_ARM64_VMALLOC_PXN > +u64 __vmalloc_end __ro_after_init = VMEMMAP_START - SZ_256M; > +EXPORT_SYMBOL(__vmalloc_end); > +#endif /* CONFIG_ARM64_VMALLOC_PXN */ > + > static int num_standard_resources; > static struct resource *standard_resources; > > @@ -298,6 +303,10 @@ void __init __no_sanitize_address setup_arch(char > **cmdline_p) > > kaslr_init(); > > +#ifdef CONFIG_ARM64_VMALLOC_PXN > + __vmalloc_end = ALIGN_DOWN((u64) _text, PMD_SIZE); > +#endif > + > /* > * If know now we are going to need KPTI then use non-global > * mappings from the start, avoiding the cost of rewriting > diff --git a/mm/percpu.c b/mm/percpu.c > index 4e11fc1e6def..a902500ebfa0 100644 > --- a/mm/percpu.c > +++ b/mm/percpu.c > @@ -3128,8 +3128,8 @@ int __init pcpu_embed_first_chunk(size_t > reserved_size, size_t dyn_size, > > /* warn if maximum distance is further than 75% of vmalloc space */ > if (max_distance > VMALLOC_TOTAL * 3 / 4) { > - pr_warn("max_distance=0x%lx too large for vmalloc space 0x%lx\n", > - max_distance, VMALLOC_TOTAL); > + pr_warn("max_distance=0x%lx too large for vmalloc space\n", > + max_distance); > #ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK > /* and fail if we have fallback */ > rc = -EINVAL; > > base-commit: 716f4aaa7b48a55c73d632d0657b35342b1fefd7 > -- > 2.39.0
> From: Mark Rutland <mark.rutland@arm.com> On Tue, Feb 13, 2024 at 10:05:45AM > -0600, Maxwell Bland wrote: > > VMALLOC_START ffff800080000000 VMALLOC_END fffffbfff0000000 _text > > ffffb6c0c1400000 _end ffffb6c0c3e40000 > > > > Setting VMALLOC_END to _text in init would resolve this issue with the > > caveat of a sizeable reduction in the size of available vmalloc memory due > > to requirements on aslr randomness. However, there are circumstances where > > this trade-off is necessary: in particular, hypervisor-level security > > monitors where 1) the microarchitecture contains race conditions on PTE > > level updates or 2) a per-PTE update verifier comes at a significant hit to > > performance. > > Which "hypervisor-level security monitors" are you referring to? Right now there are around 4 or 5 different attempts (from what I know: Moto, Samsung, MediaTek, and Qualcomm) at making page tables immutable and reducing the kernel threat surface to just dynamically allocated structs, e.g. file_operations, in ARM, a revival of some of the ideas of: https://wenboshen.org/publications/papers/tz-rkp-ccs14.pdf Which are no longer possible to enforce for a number of reasons. As related to this patch in particular: the performance hits involved in per-PTE update verification are huge. My goal is ultimately to prevent modern exploits like: https://github.com/chompie1337/s8_2019_2215_poc which modify dynamically allocated pointers, but trying to protect against these exploits is disingenuous without first being able to enforce PXN on non-code pages, i.e. there is a reason we do this in mm initialization, but we need to enforce or support the enforcement of PXNTable dynamically too. > We don't support any of those upstream AFAIK. As is hopefully apparent from the above, though it will help downstream systems, I do not see this patch as a support issue so much as a legitimate security feature. There is the matter of deciding which subsystem should be responsible. The generic vmalloc interface should provide a strong distinction between code and data allocations, but enforcing this would become the responsibility of each microarchitecture regardless. > > How much VA space are you potentially throwing away? > This is rough, I admit. )-: On the order of 70,000 GB, likely more in practice: it restricts vmalloc to the region before _text. You may be thinking, "that is ridiculous, c'mon Maxwell", and you would be right, but I was OK with this trade-off for Moto systems, and was thinking the approach keeps the patch changes small and simple. I had a hard time thinking of a better way to do this while avoiding duplication of vmalloc code into arm64 land. Potentially, though, it would be OK to add an additional field to the generic vmalloc interface? I may need to reach out for help here: maybe the solution to the issue will come more readily to those with more experience. > How does this work with other allocations of executable memory? e.g. modules, > BPF? It should work. - arch/arm64/kernel/module.c uses __vmalloc_node_range with module_alloc_base and module_alloc_end, bypassing the generic vmalloc_node region, and these variables are decided based on a random offset between _text and _end. - kernel/bpf/core.c uses bpf_jit_alloc_exec to create executable code regions, which is a wrapper for module_alloc. In the interpreted BPF case, we do not need to worry since the pages storing interpreted code are NX and can be marked PXNTable regardless. > I'm not keen on this as-is. That's OK, so long as we agree enforcing PXNTable dynamically would be a good thing. I look forward to your thoughts on the above, and I will go back and iterate. Working with IT to fix the email formatting now, so I will hopefully be able to post a fetchable and runnable version of my initial patch shortly.
> -----Original Message----- > From: Maxwell Bland On Tuesday, February 13, 2024 1:15 PM > > From: Mark Rutland <mark.rutland@arm.com> > > How does this work with other allocations of executable memory? e.g. modules, > > BPF? > > It should work. > - kernel/bpf/core.c uses bpf_jit_alloc_exec to create executable code regions, > which is a wrapper for module_alloc. In the interpreted BPF case, we do not > need to worry since the pages storing interpreted code are NX and can be > marked PXNTable regardless. Correction: I was wrong here. The _weak reference to bpf_jit_alloc_exec is overwritten in arch/arm64/net/bpf_jit_comp.c to use a vmalloc call. This would need to be set back to the generic BPF's use of "module_alloc". I will look into and correct this.
Ensures that PXNTable can be set on all table descriptors allocated
through vmalloc. Normally, PXNTable is set only during initial memory
mapping and does not apply thereafter, making it possible for attackers
to target post-init allocated writable PTEs as a staging region for
injection of their code into the kernel. Presently it is not possible to
efficiently prevent these attacks as VMALLOC_END overlaps with _text,
e.g.:
VMALLOC_START ffff800080000000 VMALLOC_END fffffbfff0000000
_text ffffb6c0c1400000 _end ffffb6c0c3e40000
Setting VMALLOC_END to _text in init would resolve this issue with the
caveat of a sizeable reduction in the size of available vmalloc memory
(~70,000 GB) due to requirements on aslr randomness. However, we need to
support the enforcement of PXNTable dynamically for our static
assignment of this flag during mm initialization to be effective.
Because the address of _text is aslr-sensitive and this patch associates
this value with VMALLOC_END, we remove the use of VMALLOC_END in a print
statement in mm/percpu.c. However, only the format string is updated in
crash_core.c, since we are dead at that point regardless. VMALLOC_END is
updated in kernel/setup.c to associate the feature closely with aslr and
region allocation code.
bpf_jit_comp.c must also be remediated to ensure that the module_alloc
rather than vmalloc interface is used, so that regions used for BPF
allocations are appropriately located into the _text region.
Signed-off-by: Maxwell Bland <mbland@motorola.com>
---
This is an attempt to get Moto's SMTP server to send the patch without ruining
the formatting. Based on Mark R.'s comments, though, it sounds like:
1) I need to figure out a way to reduce the reduction in virtual memory.
2) I need to actually enforce PXNTable dynamically, to make it clear this is a
real upstream issue.
3) I need some testing and quantification to make sure this does not ruin BPF
and module allocations.
https://lore.kernel.org/all/ZcurbvkUR-BoGTxu@FVFF77S0Q05N.cambridge.arm.com/
Regardless, here's the original patch on the current Github linux main.
arch/arm64/Kconfig | 13 +++++++++++++
arch/arm64/include/asm/pgtable.h | 6 ++++++
arch/arm64/include/asm/vmalloc-pxn.h | 9 +++++++++
arch/arm64/kernel/crash_core.c | 2 +-
arch/arm64/kernel/setup.c | 9 +++++++++
arch/arm64/net/bpf_jit_comp.c | 5 +++--
mm/percpu.c | 4 ++--
7 files changed, 43 insertions(+), 5 deletions(-)
create mode 100644 arch/arm64/include/asm/vmalloc-pxn.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index aa7c1d435139..5f1e75d70e14 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2165,6 +2165,19 @@ config ARM64_DEBUG_PRIORITY_MASKING
If unsure, say N
endif # ARM64_PSEUDO_NMI
+config ARM64_VMALLOC_PXN
+ bool "Ensures table descriptors pointing to kernel data are PXNTable"
+ help
+ Reduces the range of the kernel data vmalloc region to remove any
+ overlap with kernel code, making it possible to enable the PXNTable
+ bit on table descriptors allocated after the kernel's initial memory
+ mapping.
+
+ This increases the performance of security monitors which protect
+ against malicious updates to page table entries.
+
+ If unsure, say N.
+
config RELOCATABLE
bool "Build a relocatable kernel image" if EXPERT
select ARCH_HAS_RELR
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 79ce70fbb751..49f64ea77c81 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -22,7 +22,9 @@
* and fixed mappings
*/
#define VMALLOC_START (MODULES_END)
+#ifndef CONFIG_ARM64_VMALLOC_PXN
#define VMALLOC_END (VMEMMAP_START - SZ_256M)
+#endif
#define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
@@ -35,6 +37,10 @@
#include <linux/sched.h>
#include <linux/page_table_check.h>
+#ifdef CONFIG_ARM64_VMALLOC_PXN
+#include <asm/vmalloc-pxn.h>
+#endif
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
#define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
diff --git a/arch/arm64/include/asm/vmalloc-pxn.h b/arch/arm64/include/asm/vmalloc-pxn.h
new file mode 100644
index 000000000000..d054427e2804
--- /dev/null
+++ b/arch/arm64/include/asm/vmalloc-pxn.h
@@ -0,0 +1,9 @@
+#ifndef _ASM_ARM64_VMALLOC_PXN_H
+#define _ASM_ARM64_VMALLOC_PXN_H
+
+#ifdef CONFIG_ARM64_VMALLOC_PXN
+extern u64 __vmalloc_end __ro_after_init;
+#define VMALLOC_END (__vmalloc_end)
+#endif /* CONFIG_ARM64_VMALLOC_PXN */
+
+#endif /* _ASM_ARM64_VMALLOC_PXN_H */
diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c
index 66cde752cd74..39dccae11a40 100644
--- a/arch/arm64/kernel/crash_core.c
+++ b/arch/arm64/kernel/crash_core.c
@@ -24,7 +24,7 @@ void arch_crash_save_vmcoreinfo(void)
vmcoreinfo_append_str("NUMBER(MODULES_VADDR)=0x%lx\n", MODULES_VADDR);
vmcoreinfo_append_str("NUMBER(MODULES_END)=0x%lx\n", MODULES_END);
vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START);
- vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%lx\n", VMALLOC_END);
+ vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%llx\n", VMALLOC_END);
vmcoreinfo_append_str("NUMBER(VMEMMAP_START)=0x%lx\n", VMEMMAP_START);
vmcoreinfo_append_str("NUMBER(VMEMMAP_END)=0x%lx\n", VMEMMAP_END);
vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n",
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 42c690bb2d60..b7ccee672743 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -54,6 +54,11 @@
#include <asm/xen/hypervisor.h>
#include <asm/mmu_context.h>
+#ifdef CONFIG_ARM64_VMALLOC_PXN
+u64 __vmalloc_end __ro_after_init = VMEMMAP_START - SZ_256M;
+EXPORT_SYMBOL(__vmalloc_end);
+#endif /* CONFIG_ARM64_VMALLOC_PXN */
+
static int num_standard_resources;
static struct resource *standard_resources;
@@ -298,6 +303,10 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
kaslr_init();
+#ifdef CONFIG_ARM64_VMALLOC_PXN
+ __vmalloc_end = ALIGN_DOWN((u64) _text, PMD_SIZE);
+#endif
+
/*
* If know now we are going to need KPTI then use non-global
* mappings from the start, avoiding the cost of rewriting
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 8955da5c47cf..1fe0d637792c 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -11,6 +11,7 @@
#include <linux/bpf.h>
#include <linux/filter.h>
#include <linux/memory.h>
+#include <linux/moduleloader.h>
#include <linux/printk.h>
#include <linux/slab.h>
@@ -1690,12 +1691,12 @@ u64 bpf_jit_alloc_exec_limit(void)
void *bpf_jit_alloc_exec(unsigned long size)
{
/* Memory is intended to be executable, reset the pointer tag. */
- return kasan_reset_tag(vmalloc(size));
+ return kasan_reset_tag(module_alloc(size));
}
void bpf_jit_free_exec(void *addr)
{
- return vfree(addr);
+ return module_memfree(addr);
}
/* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
diff --git a/mm/percpu.c b/mm/percpu.c
index 4e11fc1e6def..a902500ebfa0 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -3128,8 +3128,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
/* warn if maximum distance is further than 75% of vmalloc space */
if (max_distance > VMALLOC_TOTAL * 3 / 4) {
- pr_warn("max_distance=0x%lx too large for vmalloc space 0x%lx\n",
- max_distance, VMALLOC_TOTAL);
+ pr_warn("max_distance=0x%lx too large for vmalloc space\n",
+ max_distance);
#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
/* and fail if we have fallback */
rc = -EINVAL;
--
2.39.2
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index aa7c1d435139..5f1e75d70e14 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2165,6 +2165,19 @@ config ARM64_DEBUG_PRIORITY_MASKING If unsure, say N endif # ARM64_PSEUDO_NMI +config ARM64_VMALLOC_PXN + bool "Ensures table descriptors pointing to kernel data are PXNTable" + help + Reduces the range of the kernel data vmalloc region to remove any + overlap with kernel code, making it possible to enable the PXNTable + bit on table descriptors allocated after the kernel's initial memory + mapping. + + This increases the performance of security monitors which protect + against malicious updates to page table entries. + + If unsure, say N. + config RELOCATABLE bool "Build a relocatable kernel image" if EXPERT select ARCH_HAS_RELR diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 79ce70fbb751..49f64ea77c81 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -22,7 +22,9 @@ * and fixed mappings */ #define VMALLOC_START (MODULES_END) +#ifndef CONFIG_ARM64_VMALLOC_PXN #define VMALLOC_END (VMEMMAP_START - SZ_256M) +#endif #define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT)) @@ -35,6 +37,10 @@ #include <linux/sched.h> #include <linux/page_table_check.h> +#ifdef CONFIG_ARM64_VMALLOC_PXN +#include <asm/vmalloc-pxn.h> +#endif + #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE diff --git a/arch/arm64/include/asm/vmalloc-pxn.h b/arch/arm64/include/asm/vmalloc-pxn.h new file mode 100644 index 000000000000..c8c4f878eb62 --- /dev/null +++ b/arch/arm64/include/asm/vmalloc-pxn.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_ARM64_VMALLOC_PXN_H +#define _ASM_ARM64_VMALLOC_PXN_H + +#ifdef CONFIG_ARM64_VMALLOC_PXN +extern u64 __vmalloc_end __ro_after_init; +#define VMALLOC_END (__vmalloc_end) +#endif /* CONFIG_ARM64_VMALLOC_PXN */ + +#endif /* _ASM_ARM64_VMALLOC_PXN_H */ diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c index 66cde752cd74..39dccae11a40 100644 --- a/arch/arm64/kernel/crash_core.c +++ b/arch/arm64/kernel/crash_core.c @@ -24,7 +24,7 @@ void arch_crash_save_vmcoreinfo(void) vmcoreinfo_append_str("NUMBER(MODULES_VADDR)=0x%lx\n", MODULES_VADDR); vmcoreinfo_append_str("NUMBER(MODULES_END)=0x%lx\n", MODULES_END); vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START); - vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%lx\n", VMALLOC_END); + vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%llx\n", VMALLOC_END); vmcoreinfo_append_str("NUMBER(VMEMMAP_START)=0x%lx\n", VMEMMAP_START); vmcoreinfo_append_str("NUMBER(VMEMMAP_END)=0x%lx\n", VMEMMAP_END); vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n", diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 42c690bb2d60..b7ccee672743 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -54,6 +54,11 @@ #include <asm/xen/hypervisor.h> #include <asm/mmu_context.h> +#ifdef CONFIG_ARM64_VMALLOC_PXN +u64 __vmalloc_end __ro_after_init = VMEMMAP_START - SZ_256M; +EXPORT_SYMBOL(__vmalloc_end); +#endif /* CONFIG_ARM64_VMALLOC_PXN */ + static int num_standard_resources; static struct resource *standard_resources; @@ -298,6 +303,10 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p) kaslr_init(); +#ifdef CONFIG_ARM64_VMALLOC_PXN + __vmalloc_end = ALIGN_DOWN((u64) _text, PMD_SIZE); +#endif + /* * If know now we are going to need KPTI then use non-global * mappings from the start, avoiding the cost of rewriting diff --git a/mm/percpu.c b/mm/percpu.c index 4e11fc1e6def..a902500ebfa0 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -3128,8 +3128,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, /* warn if maximum distance is further than 75% of vmalloc space */ if (max_distance > VMALLOC_TOTAL * 3 / 4) { - pr_warn("max_distance=0x%lx too large for vmalloc space 0x%lx\n", - max_distance, VMALLOC_TOTAL); + pr_warn("max_distance=0x%lx too large for vmalloc space\n", + max_distance); #ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK /* and fail if we have fallback */