[v3,3/3] list_debug: Introduce CONFIG_DEBUG_LIST_MINIMAL

Message ID 20230808102049.465864-3-elver@google.com
State New
Headers
Series [v3,1/3] compiler_types: Introduce the Clang __preserve_most function attribute |

Commit Message

Marco Elver Aug. 8, 2023, 10:17 a.m. UTC
  Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The feature has never been designed with performance in
mind, yet common list manipulation is happening across hot paths all
over the kernel.

Introduce CONFIG_DEBUG_LIST_MINIMAL, which performs list pointer
checking inline, and only upon list corruption delegates to the
reporting slow path.

To generate optimal machine code with CONFIG_DEBUG_LIST_MINIMAL:

  1. Elide checking for pointer values which upon dereference would
     result in an immediate access fault -- therefore "minimal" checks.
     The trade-off is lower-quality error reports.

  2. Use the newly introduced __preserve_most function attribute
     (available with Clang, but not yet with GCC) to minimize the code
     footprint for calling the reporting slow path. As a result,
     function size of callers is reduced by avoiding saving registers
     before calling the rarely called reporting slow path.

     Note that all TUs in lib/Makefile already disable function tracing,
     including list_debug.c, and __preserve_most's implied notrace has
     no effect in this case.

  3. Because the inline checks are a subset of the full set of checks in
     ___list_*_valid(), always return false if the inline checks failed.
     This avoids redundant compare and conditional branch right after
     return from the slow path.

As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.

Running netperf with CONFIG_DEBUG_LIST_MINIMAL (using a Clang compiler
with "preserve_most") shows throughput improvements, in my case of ~7%
on average (up to 20-30% on some test cases).

Link: https://r.android.com/1266735 [1]
Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
Signed-off-by: Marco Elver <elver@google.com>
---
v3:
* Rename ___list_*_valid() to __list_*_valid_or_report().
* More comments.

v2:
* Note that lib/Makefile disables function tracing for everything and
  __preserve_most's implied notrace is a noop here.
---
 arch/arm64/kvm/hyp/nvhe/list_debug.c |  2 +
 include/linux/list.h                 | 64 +++++++++++++++++++++++++---
 lib/Kconfig.debug                    | 15 +++++++
 lib/list_debug.c                     |  2 +
 4 files changed, 77 insertions(+), 6 deletions(-)
  

Comments

Kees Cook Aug. 8, 2023, 9:27 p.m. UTC | #1
On Tue, Aug 08, 2023 at 12:17:27PM +0200, Marco Elver wrote:
> Numerous production kernel configs (see [1, 2]) are choosing to enable
> CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
> configs [3]. The feature has never been designed with performance in
> mind, yet common list manipulation is happening across hot paths all
> over the kernel.
> 
> Introduce CONFIG_DEBUG_LIST_MINIMAL, which performs list pointer
> checking inline, and only upon list corruption delegates to the
> reporting slow path.

I'd really like to get away from calling this "DEBUG", since it's used
more for hardening (CONFIG_LIST_HARDENED?). Will Deacon spent some time
making this better a while back, but the series never landed. Do you
have a bit of time to look through it?

https://github.com/KSPP/linux/issues/10
https://lore.kernel.org/lkml/20200324153643.15527-1-will@kernel.org/

-Kees
  
Marco Elver Aug. 9, 2023, 7:35 a.m. UTC | #2
On Tue, 8 Aug 2023 at 23:27, Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Aug 08, 2023 at 12:17:27PM +0200, Marco Elver wrote:
> > Numerous production kernel configs (see [1, 2]) are choosing to enable
> > CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
> > configs [3]. The feature has never been designed with performance in
> > mind, yet common list manipulation is happening across hot paths all
> > over the kernel.
> >
> > Introduce CONFIG_DEBUG_LIST_MINIMAL, which performs list pointer
> > checking inline, and only upon list corruption delegates to the
> > reporting slow path.
>
> I'd really like to get away from calling this "DEBUG", since it's used
> more for hardening (CONFIG_LIST_HARDENED?). Will Deacon spent some time
> making this better a while back, but the series never landed. Do you
> have a bit of time to look through it?
>
> https://github.com/KSPP/linux/issues/10
> https://lore.kernel.org/lkml/20200324153643.15527-1-will@kernel.org/

I'm fine renaming this one. But there are other issues that Will's
series solves, which I don't want this series to depend on. We can try
to sort them out separately.

The main problem here is that DEBUG_LIST has been designed to be
friendly for debugging (incl. checking poison values and NULL). Some
kernel devs may still want that, but for production use is pointless
and wasteful.

So what I can propose is to introduce CONFIG_LIST_HARDENED that
doesn't depend on CONFIG_DEBUG_LIST, but instead selects it, because
we still use that code to produce a report.

If there are other list types that have similar debug checks, but
where we can optimize performance by eliding some and moving them
inline, we can do the same (CONFIG_*_HARDENED). If the checks are
already optimized, we could just rename it to CONFIG_*_HARDENED and
remove the DEBUG-name.

I'm not a big fan of the 2 modes either, but that's what we get if we
want to support the debugging and hardening usecases. Inlining the
full set of checks does not work for performance and size.
  
Marco Elver Aug. 9, 2023, 9:57 a.m. UTC | #3
On Wed, Aug 09, 2023 at 09:35AM +0200, Marco Elver wrote:

> > I'd really like to get away from calling this "DEBUG", since it's used
> > more for hardening (CONFIG_LIST_HARDENED?). Will Deacon spent some time
> > making this better a while back, but the series never landed. Do you
> > have a bit of time to look through it?
> >
> > https://github.com/KSPP/linux/issues/10
> > https://lore.kernel.org/lkml/20200324153643.15527-1-will@kernel.org/
> 
> I'm fine renaming this one. But there are other issues that Will's
> series solves, which I don't want this series to depend on. We can try
> to sort them out separately.
> 
> The main problem here is that DEBUG_LIST has been designed to be
> friendly for debugging (incl. checking poison values and NULL). Some
> kernel devs may still want that, but for production use is pointless
> and wasteful.
> 
> So what I can propose is to introduce CONFIG_LIST_HARDENED that
> doesn't depend on CONFIG_DEBUG_LIST, but instead selects it, because
> we still use that code to produce a report.

How about the below?

We'll add CONFIG_HARDEN_LIST (in Kconfig.hardening), which is
independent of CONFIG_DEBUG_LIST. For the implementation it selects
DEBUG_LIST, but irrelevant for users.

This will get us the best of both worlds: a version for hardening that
should remain as fast as possible, and one for debugging with better
reports.

------ >8 ------

From: Marco Elver <elver@google.com>
Date: Thu, 27 Jul 2023 22:19:02 +0200
Subject: [PATCH v4 3/3] list: Introduce CONFIG_HARDEN_LIST

Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The motivation behind this is that the option can be used
as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025
are mitigated by the option [4]).

The feature has never been designed with performance in mind, yet common
list manipulation is happening across hot paths all over the kernel.

Introduce CONFIG_HARDEN_LIST, which performs list pointer checking
inline, and only upon list corruption calls the reporting slow path.

To generate optimal machine code with CONFIG_HARDEN_LIST:

  1. Elide checking for pointer values which upon dereference would
     result in an immediate access fault -- therefore "minimal" checks.
     The trade-off is lower-quality error reports.

  2. Use the newly introduced __preserve_most function attribute
     (available with Clang, but not yet with GCC) to minimize the code
     footprint for calling the reporting slow path. As a result,
     function size of callers is reduced by avoiding saving registers
     before calling the rarely called reporting slow path.

     Note that all TUs in lib/Makefile already disable function tracing,
     including list_debug.c, and __preserve_most's implied notrace has
     no effect in this case.

  3. Because the inline checks are a subset of the full set of checks in
     __list_*_valid_or_report(), always return false if the inline
     checks failed.  This avoids redundant compare and conditional
     branch right after return from the slow path.

As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.

Running netperf with CONFIG_HARDEN_LIST (using a Clang compiler with
"preserve_most") shows throughput improvements, in my case of ~7% on
average (up to 20-30% on some test cases).

Link: https://r.android.com/1266735 [1]
Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4]
Signed-off-by: Marco Elver <elver@google.com>
---
v4:
* Rename to CONFIG_HARDEN_LIST, which can independently be selected from
  CONFIG_DEBUG_LIST.

v3:
* Rename ___list_*_valid() to __list_*_valid_or_report().
* More comments.

v2:
* Note that lib/Makefile disables function tracing for everything and
  __preserve_most's implied notrace is a noop here.
---
 arch/arm64/kvm/hyp/nvhe/list_debug.c |  2 +
 include/linux/list.h                 | 64 +++++++++++++++++++++++++---
 lib/Kconfig.debug                    | 12 ++++--
 lib/list_debug.c                     |  2 +
 security/Kconfig.hardening           | 14 ++++++
 5 files changed, 84 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index 16266a939a4c..46a2d4f2b3c6 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)
 
 /* The predicates checked here are taken from lib/list_debug.c. */
 
+__list_valid_slowpath
 bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 				struct list_head *next)
 {
@@ -37,6 +38,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 	return true;
 }
 
+__list_valid_slowpath
 bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
diff --git a/include/linux/list.h b/include/linux/list.h
index 130c6a1bb45c..1c7f70b7cc7a 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -39,38 +39,90 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
 }
 
 #ifdef CONFIG_DEBUG_LIST
+
+#ifdef CONFIG_HARDEN_LIST
+# define __list_valid_slowpath __cold __preserve_most
+#else
+# define __list_valid_slowpath
+#endif
+
 /*
  * Performs the full set of list corruption checks before __list_add().
  * On list corruption reports a warning, and returns false.
  */
-extern bool __list_add_valid_or_report(struct list_head *new,
-				       struct list_head *prev,
-				       struct list_head *next);
+extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new,
+							     struct list_head *prev,
+							     struct list_head *next);
 
 /*
  * Performs list corruption checks before __list_add(). Returns false if a
  * corruption is detected, true otherwise.
+ *
+ * With CONFIG_HARDEN_LIST set, performs minimal list integrity checking (that
+ * do not result in a fault) inline, and only if a corruption is detected calls
+ * the reporting function __list_add_valid_or_report().
  */
 static __always_inline bool __list_add_valid(struct list_head *new,
 					     struct list_head *prev,
 					     struct list_head *next)
 {
-	return __list_add_valid_or_report(new, prev, next);
+	bool ret = true;
+
+	if (IS_ENABLED(CONFIG_HARDEN_LIST)) {
+		/*
+		 * With the hardening version, elide checking if next and prev
+		 * are NULL, since the immediate dereference of them below would
+		 * result in a fault if NULL.
+		 *
+		 * With the reduced set of checks, we can afford to inline the
+		 * checks, which also gives the compiler a chance to elide some
+		 * of them completely if they can be proven at compile-time. If
+		 * one of the pre-conditions does not hold, the slow-path will
+		 * show a report which pre-condition failed.
+		 */
+		if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_add_valid_or_report(new, prev, next);
+	return ret;
 }
 
 /*
  * Performs the full set of list corruption checks before __list_del_entry().
  * On list corruption reports a warning, and returns false.
  */
-extern bool __list_del_entry_valid_or_report(struct list_head *entry);
+extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry);
 
 /*
  * Performs list corruption checks before __list_del_entry(). Returns false if a
  * corruption is detected, true otherwise.
+ *
+ * With CONFIG_HARDEN_LIST set, performs minimal list integrity checking (that
+ * do not result in a fault) inline, and only if a corruption is detected calls
+ * the reporting function __list_del_entry_valid_or_report().
  */
 static __always_inline bool __list_del_entry_valid(struct list_head *entry)
 {
-	return __list_del_entry_valid_or_report(entry);
+	bool ret = true;
+
+	if (IS_ENABLED(CONFIG_HARDEN_LIST)) {
+		struct list_head *prev = entry->prev;
+		struct list_head *next = entry->next;
+
+		/*
+		 * With the hardening version, elide checking if next and prev
+		 * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
+		 * dereference of them below would result in a fault.
+		 */
+		if (likely(prev->next == entry && next->prev == entry))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_del_entry_valid_or_report(entry);
+	return ret;
 }
 #else
 static inline bool __list_add_valid(struct list_head *new,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fbc89baf7de6..6b0de78fb2da 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1672,11 +1672,15 @@ config HAVE_DEBUG_BUGVERBOSE
 menu "Debug kernel data structures"
 
 config DEBUG_LIST
-	bool "Debug linked list manipulation"
-	depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION
+	bool "Debug linked list manipulation" if !HARDEN_LIST
+	depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION || HARDEN_LIST
 	help
-	  Enable this to turn on extended checks in the linked-list
-	  walking routines.
+	  Enable this to turn on extended checks in the linked-list walking
+	  routines.
+
+	  If you care about performance, you should enable CONFIG_HARDEN_LIST
+	  instead.  This option alone trades better quality error reports for
+	  worse performance, and is more suitable for debugging.
 
 	  If unsure, say N.
 
diff --git a/lib/list_debug.c b/lib/list_debug.c
index 2def33b1491f..0ff547910dd0 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -17,6 +17,7 @@
  * attempt).
  */
 
+__list_valid_slowpath
 bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 				struct list_head *next)
 {
@@ -39,6 +40,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 }
 EXPORT_SYMBOL(__list_add_valid_or_report);
 
+__list_valid_slowpath
 bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 0f295961e773..a8aef895f13d 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -279,6 +279,20 @@ config ZERO_CALL_USED_REGS
 
 endmenu
 
+menu "Hardening of kernel data structures"
+
+config HARDEN_LIST
+	bool "Check integrity of linked list manipulation"
+	select DEBUG_LIST
+	help
+	  Minimal integrity checking in the linked-list manipulation routines
+	  to catch memory corruptions that are not guaranteed to result in an
+	  immediate access fault.
+
+	  If unsure, say N.
+
+endmenu
+
 config CC_HAS_RANDSTRUCT
 	def_bool $(cc-option,-frandomize-layout-seed-file=/dev/null)
 	# Randstruct was first added in Clang 15, but it isn't safe to use until
  
Steven Rostedt Aug. 9, 2023, 3:30 p.m. UTC | #4
On Wed, 9 Aug 2023 11:57:19 +0200
Marco Elver <elver@google.com> wrote:

>  static __always_inline bool __list_add_valid(struct list_head *new,
>  					     struct list_head *prev,
>  					     struct list_head *next)
>  {
> -	return __list_add_valid_or_report(new, prev, next);
> +	bool ret = true;
> +
> +	if (IS_ENABLED(CONFIG_HARDEN_LIST)) {
> +		/*
> +		 * With the hardening version, elide checking if next and prev
> +		 * are NULL, since the immediate dereference of them below would
> +		 * result in a fault if NULL.
> +		 *
> +		 * With the reduced set of checks, we can afford to inline the
> +		 * checks, which also gives the compiler a chance to elide some
> +		 * of them completely if they can be proven at compile-time. If
> +		 * one of the pre-conditions does not hold, the slow-path will
> +		 * show a report which pre-condition failed.
> +		 */
> +		if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
> +			return true;
> +		ret = false;
> +	}
> +
> +	ret &= __list_add_valid_or_report(new, prev, next);
> +	return ret;
>  }

I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other
way around. It logically doesn't make sense that HARDEN_LIST would select
DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not
DEBUG_LIST (because who knows, it may add other features I don't want). But
then, I may have stumbled over something and want more info, and enable
DEBUG_LIST (while still having HARDEN_LIST) enabled.

I think you are looking at this from an implementation perspective and not
the normal developer one.

This would mean the above function should get enabled by CONFIG_HARDEN_LIST
(and CONFIG_DEBUG would select CONFIG_HARDEN) and would look more like:

static __always_inline bool __list_add_valid(struct list_head *new,
					     struct list_head *prev,
					     struct list_head *next)
{
	bool ret = true;

	if (!IS_ENABLED(CONFIG_DEBUG_LIST)) {
		/*
		 * With the hardening version, elide checking if next and prev
		 * are NULL, since the immediate dereference of them below would
		 * result in a fault if NULL.
		 *
		 * With the reduced set of checks, we can afford to inline the
		 * checks, which also gives the compiler a chance to elide some
		 * of them completely if they can be proven at compile-time. If
		 * one of the pre-conditions does not hold, the slow-path will
		 * show a report which pre-condition failed.
		 */
		if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
			return true;
		ret = false;
	}

	ret &= __list_add_valid_or_report(new, prev, next);
	return ret;
}

That is, if DEBUG_LIST is enabled, we always call the
__list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we
do the shortcut.

-- Steve
  
Marco Elver Aug. 9, 2023, 4:32 p.m. UTC | #5
On Wed, Aug 09, 2023 at 11:30AM -0400, Steven Rostedt wrote:
[...]
> 
> I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other
> way around. It logically doesn't make sense that HARDEN_LIST would select
> DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not
> DEBUG_LIST (because who knows, it may add other features I don't want). But
> then, I may have stumbled over something and want more info, and enable
> DEBUG_LIST (while still having HARDEN_LIST) enabled.
> 
> I think you are looking at this from an implementation perspective and not
> the normal developer one.
> 
[...]
> 
> That is, if DEBUG_LIST is enabled, we always call the
> __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we
> do the shortcut.

Good point - I think this is better. See below tentative v4.

Kees: Does that also look more like what you had in mind?

Thanks,
-- Marco

------ >8 ------

From: Marco Elver <elver@google.com>
Date: Thu, 27 Jul 2023 22:19:02 +0200
Subject: [PATCH] list: Introduce CONFIG_HARDEN_LIST

Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The motivation behind this is that the option can be used
as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025
are mitigated by the option [4]).

The feature has never been designed with performance in mind, yet common
list manipulation is happening across hot paths all over the kernel.

Introduce CONFIG_HARDEN_LIST, which performs list pointer checking
inline, and only upon list corruption calls the reporting slow path.

Since DEBUG_LIST is functionally a superset of HARDEN_LIST, the Kconfig
variables are designed to reflect that: DEBUG_LIST selects HARDEN_LIST,
whereas HARDEN_LIST itself has no dependency on DEBUG_LIST.

To generate optimal machine code with CONFIG_HARDEN_LIST:

  1. Elide checking for pointer values which upon dereference would
     result in an immediate access fault -- therefore "minimal" checks.
     The trade-off is lower-quality error reports.

  2. Use the newly introduced __preserve_most function attribute
     (available with Clang, but not yet with GCC) to minimize the code
     footprint for calling the reporting slow path. As a result,
     function size of callers is reduced by avoiding saving registers
     before calling the rarely called reporting slow path.

     Note that all TUs in lib/Makefile already disable function tracing,
     including list_debug.c, and __preserve_most's implied notrace has
     no effect in this case.

  3. Because the inline checks are a subset of the full set of checks in
     __list_*_valid_or_report(), always return false if the inline
     checks failed.  This avoids redundant compare and conditional
     branch right after return from the slow path.

As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.

Running netperf with CONFIG_HARDEN_LIST (using a Clang compiler with
"preserve_most") shows throughput improvements, in my case of ~7% on
average (up to 20-30% on some test cases).

Link: https://r.android.com/1266735 [1]
Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4]
Signed-off-by: Marco Elver <elver@google.com>
---
v4:
* Rename to CONFIG_HARDEN_LIST, which can independently be selected from
  CONFIG_DEBUG_LIST.

v3:
* Rename ___list_*_valid() to __list_*_valid_or_report().
* More comments.

v2:
* Note that lib/Makefile disables function tracing for everything and
  __preserve_most's implied notrace is a noop here.
---
 arch/arm64/kvm/hyp/nvhe/Makefile     |  2 +-
 arch/arm64/kvm/hyp/nvhe/list_debug.c |  2 +
 include/linux/list.h                 | 64 +++++++++++++++++++++++++---
 lib/Kconfig.debug                    |  9 +++-
 lib/Makefile                         |  2 +-
 lib/list_debug.c                     |  5 ++-
 security/Kconfig.hardening           | 13 ++++++
 7 files changed, 86 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 9ddc025e4b86..c89c85a41ac4 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -25,7 +25,7 @@ hyp-obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o
 	 cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o ffa.o
 hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
-hyp-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
+hyp-obj-$(CONFIG_HARDEN_LIST) += list_debug.o
 hyp-obj-y += $(lib-objs)
 
 ##
diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index 16266a939a4c..46a2d4f2b3c6 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v)
 
 /* The predicates checked here are taken from lib/list_debug.c. */
 
+__list_valid_slowpath
 bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 				struct list_head *next)
 {
@@ -37,6 +38,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 	return true;
 }
 
+__list_valid_slowpath
 bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
diff --git a/include/linux/list.h b/include/linux/list.h
index 130c6a1bb45c..ef899c27c68b 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -38,39 +38,91 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
 	WRITE_ONCE(list->prev, list);
 }
 
+#ifdef CONFIG_HARDEN_LIST
+
 #ifdef CONFIG_DEBUG_LIST
+# define __list_valid_slowpath
+#else
+# define __list_valid_slowpath __cold __preserve_most
+#endif
+
 /*
  * Performs the full set of list corruption checks before __list_add().
  * On list corruption reports a warning, and returns false.
  */
-extern bool __list_add_valid_or_report(struct list_head *new,
-				       struct list_head *prev,
-				       struct list_head *next);
+extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new,
+							     struct list_head *prev,
+							     struct list_head *next);
 
 /*
  * Performs list corruption checks before __list_add(). Returns false if a
  * corruption is detected, true otherwise.
+ *
+ * With CONFIG_HARDEN_LIST only, performs minimal list integrity checking (that
+ * do not result in a fault) inline, and only if a corruption is detected calls
+ * the reporting function __list_add_valid_or_report().
  */
 static __always_inline bool __list_add_valid(struct list_head *new,
 					     struct list_head *prev,
 					     struct list_head *next)
 {
-	return __list_add_valid_or_report(new, prev, next);
+	bool ret = true;
+
+	if (!IS_ENABLED(CONFIG_DEBUG_LIST)) {
+		/*
+		 * With the hardening version, elide checking if next and prev
+		 * are NULL, since the immediate dereference of them below would
+		 * result in a fault if NULL.
+		 *
+		 * With the reduced set of checks, we can afford to inline the
+		 * checks, which also gives the compiler a chance to elide some
+		 * of them completely if they can be proven at compile-time. If
+		 * one of the pre-conditions does not hold, the slow-path will
+		 * show a report which pre-condition failed.
+		 */
+		if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_add_valid_or_report(new, prev, next);
+	return ret;
 }
 
 /*
  * Performs the full set of list corruption checks before __list_del_entry().
  * On list corruption reports a warning, and returns false.
  */
-extern bool __list_del_entry_valid_or_report(struct list_head *entry);
+extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry);
 
 /*
  * Performs list corruption checks before __list_del_entry(). Returns false if a
  * corruption is detected, true otherwise.
+ *
+ * With CONFIG_HARDEN_LIST only, performs minimal list integrity checking (that
+ * do not result in a fault) inline, and only if a corruption is detected calls
+ * the reporting function __list_del_entry_valid_or_report().
  */
 static __always_inline bool __list_del_entry_valid(struct list_head *entry)
 {
-	return __list_del_entry_valid_or_report(entry);
+	bool ret = true;
+
+	if (!IS_ENABLED(CONFIG_DEBUG_LIST)) {
+		struct list_head *prev = entry->prev;
+		struct list_head *next = entry->next;
+
+		/*
+		 * With the hardening version, elide checking if next and prev
+		 * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
+		 * dereference of them below would result in a fault.
+		 */
+		if (likely(prev->next == entry && next->prev == entry))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_del_entry_valid_or_report(entry);
+	return ret;
 }
 #else
 static inline bool __list_add_valid(struct list_head *new,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fbc89baf7de6..9e38956d6f50 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1674,9 +1674,14 @@ menu "Debug kernel data structures"
 config DEBUG_LIST
 	bool "Debug linked list manipulation"
 	depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION
+	select HARDEN_LIST
 	help
-	  Enable this to turn on extended checks in the linked-list
-	  walking routines.
+	  Enable this to turn on extended checks in the linked-list walking
+	  routines.
+
+	  This option trades better quality error reports for performance, and
+	  is more suitable for kernel debugging. If you care about performance,
+	  you should only enable CONFIG_HARDEN_LIST instead.
 
 	  If unsure, say N.
 
diff --git a/lib/Makefile b/lib/Makefile
index 42d307ade225..a7ebc9090f04 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -161,7 +161,7 @@ obj-$(CONFIG_BTREE) += btree.o
 obj-$(CONFIG_INTERVAL_TREE) += interval_tree.o
 obj-$(CONFIG_ASSOCIATIVE_ARRAY) += assoc_array.o
 obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
+obj-$(CONFIG_HARDEN_LIST) += list_debug.o
 obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o
 
 obj-$(CONFIG_BITREVERSE) += bitrev.o
diff --git a/lib/list_debug.c b/lib/list_debug.c
index 2def33b1491f..38ddc7c01eab 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -2,7 +2,8 @@
  * Copyright 2006, Red Hat, Inc., Dave Jones
  * Released under the General Public License (GPL).
  *
- * This file contains the linked list validation for DEBUG_LIST.
+ * This file contains the linked list validation and error reporting for
+ * HARDEN_LIST and DEBUG_LIST.
  */
 
 #include <linux/export.h>
@@ -17,6 +18,7 @@
  * attempt).
  */
 
+__list_valid_slowpath
 bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 				struct list_head *next)
 {
@@ -39,6 +41,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 }
 EXPORT_SYMBOL(__list_add_valid_or_report);
 
+__list_valid_slowpath
 bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 0f295961e773..19aa2b7d2f64 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -279,6 +279,19 @@ config ZERO_CALL_USED_REGS
 
 endmenu
 
+menu "Hardening of kernel data structures"
+
+config HARDEN_LIST
+	bool "Check integrity of linked list manipulation"
+	help
+	  Minimal integrity checking in the linked-list manipulation routines
+	  to catch memory corruptions that are not guaranteed to result in an
+	  immediate access fault.
+
+	  If unsure, say N.
+
+endmenu
+
 config CC_HAS_RANDSTRUCT
 	def_bool $(cc-option,-frandomize-layout-seed-file=/dev/null)
 	# Randstruct was first added in Clang 15, but it isn't safe to use until
  
Kees Cook Aug. 10, 2023, 8:11 p.m. UTC | #6
On Wed, Aug 09, 2023 at 06:32:37PM +0200, Marco Elver wrote:
> On Wed, Aug 09, 2023 at 11:30AM -0400, Steven Rostedt wrote:
> [...]
> > 
> > I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other
> > way around. It logically doesn't make sense that HARDEN_LIST would select
> > DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not
> > DEBUG_LIST (because who knows, it may add other features I don't want). But
> > then, I may have stumbled over something and want more info, and enable
> > DEBUG_LIST (while still having HARDEN_LIST) enabled.
> > 
> > I think you are looking at this from an implementation perspective and not
> > the normal developer one.
> > 
> [...]
> > 
> > That is, if DEBUG_LIST is enabled, we always call the
> > __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we
> > do the shortcut.
> 
> Good point - I think this is better. See below tentative v4.
> 
> Kees: Does that also look more like what you had in mind?

Yeah, this looks good. My only nit would be a naming one. All the
other hardening features are named "HARDENED", but perhaps the "ED"
is redundant in the others. Still, consistency seems nicer. What do you
think of CONFIG_LIST_HARDENED ? (The modern trend for Kconfig naming tends
to keep the subsystem name first and then apply optional elements after.)

One note: do the LKDTM list hardening tests still pass? i.e.
CORRUPT_LIST_ADD
CORRUPT_LIST_DEL

> [...]
> +		/*
> +		 * With the hardening version, elide checking if next and prev
> +		 * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
> +		 * dereference of them below would result in a fault.
> +		 */
> +		if (likely(prev->next == entry && next->prev == entry))
> +			return true;

I'm not super excited about skipping those checks, since they are
values that can be reached through kernel list management confusion. If
an attacker is using a system where the zero-page has been mapped
and is accessible (i.e. lacking SMAP etc), then attacks could still
be constructed. However, I do recognize this chain of exploitation
prerequisites is getting rather long, so probably this is a reasonable
trade off on modern systems.

-Kees
  
Marco Elver Aug. 11, 2023, 9:10 a.m. UTC | #7
On Thu, 10 Aug 2023 at 22:12, Kees Cook <keescook@chromium.org> wrote:
>
> On Wed, Aug 09, 2023 at 06:32:37PM +0200, Marco Elver wrote:
> > On Wed, Aug 09, 2023 at 11:30AM -0400, Steven Rostedt wrote:
> > [...]
> > >
> > > I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other
> > > way around. It logically doesn't make sense that HARDEN_LIST would select
> > > DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not
> > > DEBUG_LIST (because who knows, it may add other features I don't want). But
> > > then, I may have stumbled over something and want more info, and enable
> > > DEBUG_LIST (while still having HARDEN_LIST) enabled.
> > >
> > > I think you are looking at this from an implementation perspective and not
> > > the normal developer one.
> > >
> > [...]
> > >
> > > That is, if DEBUG_LIST is enabled, we always call the
> > > __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we
> > > do the shortcut.
> >
> > Good point - I think this is better. See below tentative v4.
> >
> > Kees: Does that also look more like what you had in mind?
>
> Yeah, this looks good. My only nit would be a naming one. All the
> other hardening features are named "HARDENED", but perhaps the "ED"
> is redundant in the others. Still, consistency seems nicer. What do you
> think of CONFIG_LIST_HARDENED ? (The modern trend for Kconfig naming tends
> to keep the subsystem name first and then apply optional elements after.)

Naming is a bit all over. :-/
I agree with the <subsystem>_<suboption> scheme, generally. I think
initially I tried to keep the name shorter, and also find a good
counter-part to DEBUG_<suboption>, therefore HARDEN_LIST.

Let's just change it to CONFIG_LIST_HARDENED, given the existing
"HARDENED" options.

I don't have a strong preference.

> One note: do the LKDTM list hardening tests still pass? i.e.
> CORRUPT_LIST_ADD
> CORRUPT_LIST_DEL

Yes, they do. Though I need to also adjust BUG_ON_DATA_CORRUPTION to
select LIST_HARDENED, and the test should check for the new option
(which is implied by DEBUG_LIST now). There will be an additional
patch to adjust that.

> > [...]
> > +             /*
> > +              * With the hardening version, elide checking if next and prev
> > +              * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
> > +              * dereference of them below would result in a fault.
> > +              */
> > +             if (likely(prev->next == entry && next->prev == entry))
> > +                     return true;
>
> I'm not super excited about skipping those checks, since they are
> values that can be reached through kernel list management confusion. If
> an attacker is using a system where the zero-page has been mapped
> and is accessible (i.e. lacking SMAP etc), then attacks could still
> be constructed. However, I do recognize this chain of exploitation
> prerequisites is getting rather long, so probably this is a reasonable
> trade off on modern systems.

Sure, it's a trade-off for systems which do have the bare minimum of
modern hardware security features.
  
Steven Rostedt Aug. 11, 2023, 7:33 p.m. UTC | #8
On Thu, 10 Aug 2023 13:11:58 -0700
Kees Cook <keescook@chromium.org> wrote:

> > [...]
> > +		/*
> > +		 * With the hardening version, elide checking if next and prev
> > +		 * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate
> > +		 * dereference of them below would result in a fault.
> > +		 */
> > +		if (likely(prev->next == entry && next->prev == entry))
> > +			return true;  
> 
> I'm not super excited about skipping those checks, since they are
> values that can be reached through kernel list management confusion. If
> an attacker is using a system where the zero-page has been mapped
> and is accessible (i.e. lacking SMAP etc), then attacks could still
> be constructed. However, I do recognize this chain of exploitation
> prerequisites is getting rather long, so probably this is a reasonable
> trade off on modern systems.

A totally hardened machine is one that doesn't run ;-)

Yes, hopefully that when the kernel is configured with HARDENED it will
eliminate steps to a prerequisite attack. I'm sure enabling lockdep would
also help harden the system too. But there is a balance between security
and performance. The more that adding security harms performance, the less
people will use that security.

-- Steve
  

Patch

diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c
index 16266a939a4c..46a2d4f2b3c6 100644
--- a/arch/arm64/kvm/hyp/nvhe/list_debug.c
+++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c
@@ -26,6 +26,7 @@  static inline __must_check bool nvhe_check_data_corruption(bool v)
 
 /* The predicates checked here are taken from lib/list_debug.c. */
 
+__list_valid_slowpath
 bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 				struct list_head *next)
 {
@@ -37,6 +38,7 @@  bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 	return true;
 }
 
+__list_valid_slowpath
 bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;
diff --git a/include/linux/list.h b/include/linux/list.h
index 130c6a1bb45c..066fe33e99bf 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -39,38 +39,90 @@  static inline void INIT_LIST_HEAD(struct list_head *list)
 }
 
 #ifdef CONFIG_DEBUG_LIST
+
+#ifdef CONFIG_DEBUG_LIST_MINIMAL
+# define __list_valid_slowpath __cold __preserve_most
+#else
+# define __list_valid_slowpath
+#endif
+
 /*
  * Performs the full set of list corruption checks before __list_add().
  * On list corruption reports a warning, and returns false.
  */
-extern bool __list_add_valid_or_report(struct list_head *new,
-				       struct list_head *prev,
-				       struct list_head *next);
+extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new,
+							     struct list_head *prev,
+							     struct list_head *next);
 
 /*
  * Performs list corruption checks before __list_add(). Returns false if a
  * corruption is detected, true otherwise.
+ *
+ * With CONFIG_DEBUG_LIST_MINIMAL set, performs minimal list integrity checking
+ * (that do not result in a fault) inline, and only if a corruption is detected
+ * calls the reporting function __list_add_valid_or_report().
  */
 static __always_inline bool __list_add_valid(struct list_head *new,
 					     struct list_head *prev,
 					     struct list_head *next)
 {
-	return __list_add_valid_or_report(new, prev, next);
+	bool ret = true;
+
+	if (IS_ENABLED(CONFIG_DEBUG_LIST_MINIMAL)) {
+		/*
+		 * In the minimal config, elide checking if next and prev are
+		 * NULL, since the immediate dereference of them below would
+		 * result in a fault if NULL.
+		 *
+		 * With the minimal config we can afford to inline the checks,
+		 * which also gives the compiler a chance to elide some of them
+		 * completely if they can be proven at compile-time. If one of
+		 * the pre-conditions does not hold, the slow-path will show a
+		 * report which pre-condition failed.
+		 */
+		if (likely(next->prev == prev && prev->next == next && new != prev && new != next))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_add_valid_or_report(new, prev, next);
+	return ret;
 }
 
 /*
  * Performs the full set of list corruption checks before __list_del_entry().
  * On list corruption reports a warning, and returns false.
  */
-extern bool __list_del_entry_valid_or_report(struct list_head *entry);
+extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry);
 
 /*
  * Performs list corruption checks before __list_del_entry(). Returns false if a
  * corruption is detected, true otherwise.
+ *
+ * With CONFIG_DEBUG_LIST_MINIMAL set, performs minimal list integrity checking
+ * (that do not result in a fault) inline, and only if a corruption is detected
+ * calls the reporting function __list_del_entry_valid_or_report().
  */
 static __always_inline bool __list_del_entry_valid(struct list_head *entry)
 {
-	return __list_del_entry_valid_or_report(entry);
+	bool ret = true;
+
+	if (IS_ENABLED(CONFIG_DEBUG_LIST_MINIMAL)) {
+		struct list_head *prev = entry->prev;
+		struct list_head *next = entry->next;
+
+		/*
+		 * In the minimal config, elide checking if next and prev are
+		 * NULL, LIST_POISON1 or LIST_POISON2, since the immediate
+		 * dereference of them below would result in a fault.
+		 */
+		if (likely(prev->next == entry && next->prev == entry))
+			return true;
+		ret = false;
+	}
+
+	ret &= __list_del_entry_valid_or_report(entry);
+	return ret;
 }
 #else
 static inline bool __list_add_valid(struct list_head *new,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fbc89baf7de6..e72cf08af0fa 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1680,6 +1680,21 @@  config DEBUG_LIST
 
 	  If unsure, say N.
 
+config DEBUG_LIST_MINIMAL
+	bool "Minimal linked list debug checks"
+	default !DEBUG_KERNEL
+	depends on DEBUG_LIST
+	help
+	  Only perform the minimal set of checks in the linked-list walking
+	  routines to catch corruptions that are not guaranteed to result in an
+	  immediate access fault.
+
+	  This trades lower quality error reports for improved performance: the
+	  generated code should be more optimal and provide trade-offs that may
+	  better serve safety- and performance- critical environments.
+
+	  If unsure, say Y.
+
 config DEBUG_PLIST
 	bool "Debug priority linked list manipulation"
 	depends on DEBUG_KERNEL
diff --git a/lib/list_debug.c b/lib/list_debug.c
index 2def33b1491f..0ff547910dd0 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -17,6 +17,7 @@ 
  * attempt).
  */
 
+__list_valid_slowpath
 bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 				struct list_head *next)
 {
@@ -39,6 +40,7 @@  bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev,
 }
 EXPORT_SYMBOL(__list_add_valid_or_report);
 
+__list_valid_slowpath
 bool __list_del_entry_valid_or_report(struct list_head *entry)
 {
 	struct list_head *prev, *next;