mm, debug: allow suppressing panic on CONFIG_DEBUG_VM checks

Message ID c9abf109-80f2-88f5-4aae-d6fd4a30bcd3@google.com
State New
Headers
Series mm, debug: allow suppressing panic on CONFIG_DEBUG_VM checks |

Commit Message

David Rientjes May 21, 2023, 11:07 p.m. UTC
  CONFIG_DEBUG_VM is used to enable additional MM debug checks at runtime.
This can be used to catch latent kernel bugs.

Because this is mainly used for debugging, it is seldom enabled in
production environments, including due to the added performance overhead.
Thus, the choice between VM_BUG_ON() and VM_WARN_ON() is somewhat loosely
defined.

VM_BUG_ON() is often used because debuggers would like to collect crash
dumps when unexpected conditions occur.

When CONFIG_DEBUG_VM is enabled on a very small set of production
deployments to catch any unexpected condition, however, VM_WARN_ON()
could be used as a substitute.  In these configurations, it would be
useful to surface the unexpected condition in the kernel log but not
panic the system.

In other words, it would be useful if checks done by CONFIG_DEBUG_ON
could both generate crash dumps for kernel developers *and* surface
issues but not crash depending on how it's configured.

 [ If it is really unsafe to continue operation, then BUG_ON() would be
   used instead so that the kernel panics regardless of whether
   CONFIG_DEBUG_VM is enabled or not. ]

Introduce the ability to suppress kernel panics when VM_BUG_ON*() variants
are used.  This leverages the existing vm_debug= kernel command line
option.

Additionally, this can reduce the risk of systems boot looping if
VM_BUG_ON() conditions are encountered during bootstrap.

Signed-off-by: David Rientjes <rientjes@google.com>
---
Note: the vm_debug= kernel parameter is only extensible for new debug
options, not for disabling existing debug options.

When adding the ability to selectively disable existing debug options,
such as in this patch, admins would need to know this future set of debug
options in advance.  In other words, if admins would like to preserve the
existing behavior of BUG() when VM_BUG_ON() is used after this patch, they
would have had to have the foresight to use vm_debug=B.

It would be useful to rewrite the vm_debug= interface to select the
specific options to disable rather than "disable all, and enable those
that are specified."  This could be done by making vm_debug only disable
the listed debug options rather than enabling them.

This change could be done before this patch is merged if that's the agreed
path forward.
---
 .../admin-guide/kernel-parameters.txt         |  1 +
 include/linux/mmdebug.h                       | 20 ++++++++++++++-----
 mm/debug.c                                    | 14 ++++++++++++-
 3 files changed, 29 insertions(+), 6 deletions(-)
  

Comments

David Hildenbrand May 22, 2023, 9:42 a.m. UTC | #1
Let me CC Linus, he might have an opinion on this.

On 22.05.23 01:07, David Rientjes wrote:
> CONFIG_DEBUG_VM is used to enable additional MM debug checks at runtime.
> This can be used to catch latent kernel bugs.
> 
> Because this is mainly used for debugging, it is seldom enabled in
> production environments, including due to the added performance overhead.
> Thus, the choice between VM_BUG_ON() and VM_WARN_ON() is somewhat loosely
> defined.
> 
> VM_BUG_ON() is often used because debuggers would like to collect crash
> dumps when unexpected conditions occur.
> 
> When CONFIG_DEBUG_VM is enabled on a very small set of production
> deployments to catch any unexpected condition, however, VM_WARN_ON()
> could be used as a substitute.  In these configurations, it would be
> useful to surface the unexpected condition in the kernel log but not
> panic the system.
> 
> In other words, it would be useful if checks done by CONFIG_DEBUG_ON
> could both generate crash dumps for kernel developers *and* surface
> issues but not crash depending on how it's configured.
> 
>   [ If it is really unsafe to continue operation, then BUG_ON() would be
>     used instead so that the kernel panics regardless of whether
>     CONFIG_DEBUG_VM is enabled or not. ]
> 
> Introduce the ability to suppress kernel panics when VM_BUG_ON*() variants
> are used.  This leverages the existing vm_debug= kernel command line
> option.
> 
> Additionally, this can reduce the risk of systems boot looping if
> VM_BUG_ON() conditions are encountered during bootstrap.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
> Note: the vm_debug= kernel parameter is only extensible for new debug
> options, not for disabling existing debug options.
> 
> When adding the ability to selectively disable existing debug options,
> such as in this patch, admins would need to know this future set of debug
> options in advance.  In other words, if admins would like to preserve the
> existing behavior of BUG() when VM_BUG_ON() is used after this patch, they
> would have had to have the foresight to use vm_debug=B.
> 
> It would be useful to rewrite the vm_debug= interface to select the
> specific options to disable rather than "disable all, and enable those
> that are specified."  This could be done by making vm_debug only disable
> the listed debug options rather than enabling them.
> 
> This change could be done before this patch is merged if that's the agreed
> path forward.


In general, I am not a fan of this. Someone told the system to 
VM_BUG_ON, but we ignore that and default to a warning. Yes, VM_BUG on 
get compiled out without CONFIG_DEBUG_VM, but we detected something 
(with more checks enabled!) that doesn't want the system to continue 
(could be an unrecoverable situation leading to data loss, for example).

Yes, we want to convert more VM_BUG to VM_WARN (or rather WARN+recovery 
code as documented in coding-style.rst ), or even simply remove some of 
the old VM_BUG leftovers that might no longer be required. But then I'd 
much invest more time doing that step by step (keeping the VM_BUG + BUG 
that are absolutely reasonable) instead of adding such a config options.


> ---
>   .../admin-guide/kernel-parameters.txt         |  1 +
>   include/linux/mmdebug.h                       | 20 ++++++++++++++-----
>   mm/debug.c                                    | 14 ++++++++++++-
>   3 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6818,6 +6818,7 @@
>   			debugging features.
>   
>   			Available options are:
> +			  B	Enable panic on MM debug checks
>   			  P	Enable page structure init time poisoning
>   			  -	Disable all of the above options
>   
> diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> --- a/include/linux/mmdebug.h
> +++ b/include/linux/mmdebug.h
> @@ -13,34 +13,44 @@ void dump_page(struct page *page, const char *reason);
>   void dump_vma(const struct vm_area_struct *vma);
>   void dump_mm(const struct mm_struct *mm);
>   
> +extern bool debug_vm_bug_enabled;
> +
>   #ifdef CONFIG_DEBUG_VM
> -#define VM_BUG_ON(cond) BUG_ON(cond)
> +#define VM_BUG_ON(cond)							\
> +	do {								\
> +		if (unlikely(cond)) {					\
> +			if (likely(debug_vm_bug_enabled))		\
> +				BUG();					\
> +			else						\
> +				WARN_ON(1);				\
> +		}							\
> +	} while (0)
>   #define VM_BUG_ON_PAGE(cond, page)					\
>   	do {								\
>   		if (unlikely(cond)) {					\
>   			dump_page(page, "VM_BUG_ON_PAGE(" __stringify(cond)")");\
> -			BUG();						\
> +			VM_BUG_ON(1);					\
>   		}							\
>   	} while (0)
>   #define VM_BUG_ON_FOLIO(cond, folio)					\
>   	do {								\
>   		if (unlikely(cond)) {					\
>   			dump_page(&folio->page, "VM_BUG_ON_FOLIO(" __stringify(cond)")");\
> -			BUG();						\
> +			VM_BUG_ON(1);					\
>   		}							\
>   	} while (0)
>   #define VM_BUG_ON_VMA(cond, vma)					\
>   	do {								\
>   		if (unlikely(cond)) {					\
>   			dump_vma(vma);					\
> -			BUG();						\
> +			VM_BUG_ON(1);					\
>   		}							\
>   	} while (0)
>   #define VM_BUG_ON_MM(cond, mm)						\
>   	do {								\
>   		if (unlikely(cond)) {					\
>   			dump_mm(mm);					\
> -			BUG();						\
> +			VM_BUG_ON(1);					\
>   		}							\
>   	} while (0)
>   #define VM_WARN_ON_ONCE_PAGE(cond, page)	({			\
> diff --git a/mm/debug.c b/mm/debug.c
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -224,10 +224,15 @@ void dump_mm(const struct mm_struct *mm)
>   }
>   EXPORT_SYMBOL(dump_mm);
>   
> +/* If disabled, warns but does not panic on added CONFIG_DEBUG_VM checks */
> +bool debug_vm_bug_enabled = true;
> +EXPORT_SYMBOL(debug_vm_bug_enabled);
> +
>   static bool page_init_poisoning __read_mostly = true;
>   
>   static int __init setup_vm_debug(char *str)
>   {
> +	bool __debug_vm_bug_enabled = true;
>   	bool __page_init_poisoning = true;
>   
>   	/*
> @@ -237,13 +242,17 @@ static int __init setup_vm_debug(char *str)
>   	if (*str++ != '=' || !*str)
>   		goto out;
>   
> +	__debug_vm_bug_enabled = false;
>   	__page_init_poisoning = false;
>   	if (*str == '-')
>   		goto out;
>   
>   	while (*str) {
>   		switch (tolower(*str)) {
> -		case'p':
> +		case 'b':
> +			__debug_vm_bug_enabled = true;
> +			break;
> +		case 'p':
>   			__page_init_poisoning = true;
>   			break;
>   		default:
> @@ -254,9 +263,12 @@ static int __init setup_vm_debug(char *str)
>   		str++;
>   	}
>   out:
> +	if (debug_vm_bug_enabled && !__debug_vm_bug_enabled)
> +		pr_warn("Panic on MM debug checks disabled by kernel command line option 'vm_debug'\n");
>   	if (page_init_poisoning && !__page_init_poisoning)
>   		pr_warn("Page struct poisoning disabled by kernel command line option 'vm_debug'\n");
>   
> +	debug_vm_bug_enabled = __debug_vm_bug_enabled;
>   	page_init_poisoning = __page_init_poisoning;
>   
>   	return 1;
>
  
David Rientjes May 22, 2023, 6:39 p.m. UTC | #2
On Mon, 22 May 2023, David Hildenbrand wrote:

> Let me CC Linus, he might have an opinion on this.
> 
> On 22.05.23 01:07, David Rientjes wrote:
> > CONFIG_DEBUG_VM is used to enable additional MM debug checks at runtime.
> > This can be used to catch latent kernel bugs.
> > 
> > Because this is mainly used for debugging, it is seldom enabled in
> > production environments, including due to the added performance overhead.
> > Thus, the choice between VM_BUG_ON() and VM_WARN_ON() is somewhat loosely
> > defined.
> > 
> > VM_BUG_ON() is often used because debuggers would like to collect crash
> > dumps when unexpected conditions occur.
> > 
> > When CONFIG_DEBUG_VM is enabled on a very small set of production
> > deployments to catch any unexpected condition, however, VM_WARN_ON()
> > could be used as a substitute.  In these configurations, it would be
> > useful to surface the unexpected condition in the kernel log but not
> > panic the system.
> > 
> > In other words, it would be useful if checks done by CONFIG_DEBUG_ON
> > could both generate crash dumps for kernel developers *and* surface
> > issues but not crash depending on how it's configured.
> > 
> >   [ If it is really unsafe to continue operation, then BUG_ON() would be
> >     used instead so that the kernel panics regardless of whether
> >     CONFIG_DEBUG_VM is enabled or not. ]
> > 
> > Introduce the ability to suppress kernel panics when VM_BUG_ON*() variants
> > are used.  This leverages the existing vm_debug= kernel command line
> > option.
> > 
> > Additionally, this can reduce the risk of systems boot looping if
> > VM_BUG_ON() conditions are encountered during bootstrap.
> > 
> > Signed-off-by: David Rientjes <rientjes@google.com>
> > ---
> > Note: the vm_debug= kernel parameter is only extensible for new debug
> > options, not for disabling existing debug options.
> > 
> > When adding the ability to selectively disable existing debug options,
> > such as in this patch, admins would need to know this future set of debug
> > options in advance.  In other words, if admins would like to preserve the
> > existing behavior of BUG() when VM_BUG_ON() is used after this patch, they
> > would have had to have the foresight to use vm_debug=B.
> > 
> > It would be useful to rewrite the vm_debug= interface to select the
> > specific options to disable rather than "disable all, and enable those
> > that are specified."  This could be done by making vm_debug only disable
> > the listed debug options rather than enabling them.
> > 
> > This change could be done before this patch is merged if that's the agreed
> > path forward.
> 
> 
> In general, I am not a fan of this. Someone told the system to VM_BUG_ON, but
> we ignore that and default to a warning. Yes, VM_BUG on get compiled out
> without CONFIG_DEBUG_VM, but we detected something (with more checks enabled!)
> that doesn't want the system to continue (could be an unrecoverable situation
> leading to data loss, for example).
> 

I think VM_BUG_ON*() and friends are used to crash the kernel for 
debugging so that we get a crash dump and because some variants don't 
exist for VM_WARN_ON().  There's no VM_WARN_ON_PAGE(), for example, unless 
implicitly converted with this patch.

I'm having a hard time finding a case where VM_BUG_ON() should *require* a 
kernel crash.  I'd be interested to know of these if they exist, though, 
because we have had good success discovering latent kernel bugs that have 
been reported to upstream with the exact approach being proposed here on a 
small set of production hosts.  To safely do that, we audited existing 
VM_BUG_ON()s in the code to make sure there was nothing that absolutely 
required a kernel crash.  That may have changed in more recent kernels, so 
any examples would be very useful.

> Yes, we want to convert more VM_BUG to VM_WARN (or rather WARN+recovery code
> as documented in coding-style.rst ), or even simply remove some of the old
> VM_BUG leftovers that might no longer be required. But then I'd much invest
> more time doing that step by step (keeping the VM_BUG + BUG that are
> absolutely reasonable) instead of adding such a config options.
> 

I'm not sure we actually want to do that, though, since VM_BUG_ON() does 
have a lot of benefit when debugging something: it can generate a crash 
dump that is tremendously useful to the kernel debugger who is iterating 
on their patch set.

The goal of this patch is to provide an additional use case for 
CONFIG_DEBUG_VM: we want to preserve the ability for kernel developers to 
quickly crash their debug kernel during development and add the additional 
use case of surfacing WARN_ON()s to the kernel log for unexpected issues 
on a small set of production hosts without crashing them.  This second use 
case has been very helpful in finding latent kernel bugs at runtime that 
we didn't even know existed in the kernel and could only be found while 
running on a limited production deployment.  If we had to crash the kernel 
to find those, and terminate all the associated guests/workloads, that 
would be a non-starter for us.

> 
> > ---
> >   .../admin-guide/kernel-parameters.txt         |  1 +
> >   include/linux/mmdebug.h                       | 20 ++++++++++++++-----
> >   mm/debug.c                                    | 14 ++++++++++++-
> >   3 files changed, 29 insertions(+), 6 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt
> > b/Documentation/admin-guide/kernel-parameters.txt
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -6818,6 +6818,7 @@
> >   			debugging features.
> >     			Available options are:
> > +			  B	Enable panic on MM debug checks
> >   			  P	Enable page structure init time poisoning
> >   			  -	Disable all of the above options
> >   diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> > --- a/include/linux/mmdebug.h
> > +++ b/include/linux/mmdebug.h
> > @@ -13,34 +13,44 @@ void dump_page(struct page *page, const char *reason);
> >   void dump_vma(const struct vm_area_struct *vma);
> >   void dump_mm(const struct mm_struct *mm);
> >   +extern bool debug_vm_bug_enabled;
> > +
> >   #ifdef CONFIG_DEBUG_VM
> > -#define VM_BUG_ON(cond) BUG_ON(cond)
> > +#define VM_BUG_ON(cond)
> > \
> > +	do {								\
> > +		if (unlikely(cond)) {					\
> > +			if (likely(debug_vm_bug_enabled))		\
> > +				BUG();					\
> > +			else						\
> > +				WARN_ON(1);				\
> > +		}							\
> > +	} while (0)
> >   #define VM_BUG_ON_PAGE(cond, page)					\
> >   	do {								\
> >   		if (unlikely(cond)) {					\
> >   			dump_page(page, "VM_BUG_ON_PAGE("
> > __stringify(cond)")");\
> > -			BUG();						\
> > +			VM_BUG_ON(1);					\
> >   		}							\
> >   	} while (0)
> >   #define VM_BUG_ON_FOLIO(cond, folio)
> > \
> >   	do {								\
> >   		if (unlikely(cond)) {					\
> >   			dump_page(&folio->page, "VM_BUG_ON_FOLIO("
> > __stringify(cond)")");\
> > -			BUG();						\
> > +			VM_BUG_ON(1);					\
> >   		}							\
> >   	} while (0)
> >   #define VM_BUG_ON_VMA(cond, vma)					\
> >   	do {								\
> >   		if (unlikely(cond)) {					\
> >   			dump_vma(vma);					\
> > -			BUG();						\
> > +			VM_BUG_ON(1);					\
> >   		}							\
> >   	} while (0)
> >   #define VM_BUG_ON_MM(cond, mm)
> > \
> >   	do {								\
> >   		if (unlikely(cond)) {					\
> >   			dump_mm(mm);					\
> > -			BUG();						\
> > +			VM_BUG_ON(1);					\
> >   		}							\
> >   	} while (0)
> >   #define VM_WARN_ON_ONCE_PAGE(cond, page)	({			\
> > diff --git a/mm/debug.c b/mm/debug.c
> > --- a/mm/debug.c
> > +++ b/mm/debug.c
> > @@ -224,10 +224,15 @@ void dump_mm(const struct mm_struct *mm)
> >   }
> >   EXPORT_SYMBOL(dump_mm);
> >   +/* If disabled, warns but does not panic on added CONFIG_DEBUG_VM checks
> > */
> > +bool debug_vm_bug_enabled = true;
> > +EXPORT_SYMBOL(debug_vm_bug_enabled);
> > +
> >   static bool page_init_poisoning __read_mostly = true;
> >     static int __init setup_vm_debug(char *str)
> >   {
> > +	bool __debug_vm_bug_enabled = true;
> >   	bool __page_init_poisoning = true;
> >     	/*
> > @@ -237,13 +242,17 @@ static int __init setup_vm_debug(char *str)
> >   	if (*str++ != '=' || !*str)
> >   		goto out;
> >   +	__debug_vm_bug_enabled = false;
> >   	__page_init_poisoning = false;
> >   	if (*str == '-')
> >   		goto out;
> >     	while (*str) {
> >   		switch (tolower(*str)) {
> > -		case'p':
> > +		case 'b':
> > +			__debug_vm_bug_enabled = true;
> > +			break;
> > +		case 'p':
> >   			__page_init_poisoning = true;
> >   			break;
> >   		default:
> > @@ -254,9 +263,12 @@ static int __init setup_vm_debug(char *str)
> >   		str++;
> >   	}
> >   out:
> > +	if (debug_vm_bug_enabled && !__debug_vm_bug_enabled)
> > +		pr_warn("Panic on MM debug checks disabled by kernel command
> > line option 'vm_debug'\n");
> >   	if (page_init_poisoning && !__page_init_poisoning)
> >   		pr_warn("Page struct poisoning disabled by kernel command line
> > option 'vm_debug'\n");
> >   +	debug_vm_bug_enabled = __debug_vm_bug_enabled;
> >   	page_init_poisoning = __page_init_poisoning;
> >     	return 1;
> > 
> 
> -- 
> Thanks,
> 
> David / dhildenb
> 
>
  
Linus Torvalds May 22, 2023, 6:48 p.m. UTC | #3
On Mon, May 22, 2023 at 11:39 AM David Rientjes <rientjes@google.com> wrote:
>
> I think VM_BUG_ON*() and friends are used to crash the kernel for
> debugging so that we get a crash dump and because some variants don't
> exist for VM_WARN_ON().

I do think that from a VM developer standpoint, I think it should be
fine to just effectively turn VM_BUG_ON() into WARN_ON_ONCE() together
with panic_on_warn.

Maybe we could even extend 'panic_on_warn' to be a bitmap and
effectively have a "don't panic on non-VM warnings" option.

                Linus
  
David Rientjes May 23, 2023, 12:51 a.m. UTC | #4
On Mon, 22 May 2023, Linus Torvalds wrote:

> On Mon, May 22, 2023 at 11:39 AM David Rientjes <rientjes@google.com> wrote:
> >
> > I think VM_BUG_ON*() and friends are used to crash the kernel for
> > debugging so that we get a crash dump and because some variants don't
> > exist for VM_WARN_ON().
> 
> I do think that from a VM developer standpoint, I think it should be
> fine to just effectively turn VM_BUG_ON() into WARN_ON_ONCE() together
> with panic_on_warn.
> 
> Maybe we could even extend 'panic_on_warn' to be a bitmap and
> effectively have a "don't panic on non-VM warnings" option.
> 

I hadn't thought of that approach, it would definitely help us achieve our 
goal of emitting warnings on a small set of production hosts that we don't 
want to crash.  It's also very clean.

Right now kernel.panic_on_warn can either be 0 or 1.  We can keep the 
lowest bit to be "panic on all warnings" and then bit-1 as "panic on debug 
VM warnings."  When CONFIG_DEBUG_VM is enabled, set the new bit by 
default so there's no behavior change.

Then, we can keep VM_BUG_ON*() and friends around and extend them to check 
whether they should BUG() after the WARN_ON(1) or not.

On our production hosts, we'll just set kernel.panic_on_oom to 0.

I'll give it a few days to see if anybody else has any comments or 
concerns; if not, I'll send a v2 based on this.
  
Linus Torvalds May 23, 2023, 1:47 a.m. UTC | #5
On Mon, May 22, 2023 at 5:52 PM David Rientjes <rientjes@google.com> wrote:
>
> Right now kernel.panic_on_warn can either be 0 or 1.  We can keep the
> lowest bit to be "panic on all warnings" and then bit-1 as "panic on debug
> VM warnings."  When CONFIG_DEBUG_VM is enabled, set the new bit by
> default so there's no behavior change.

So right now CONFIG_DEBUG_VM being off means that there's nothing at
all - not just no output, but also no code generation.

I don't think CONFIG_DEBUG_VM in itself should enable that bit-1 behavior.

That may be what *you* as a VM person wants, but VM people are not
exactly the common case.

So I think we've got several cases:

 (a) the "don't even build it" case (CONFIG_DEBUG_VM being off)

 (b) the "build it, and it is a WARN_ON_ONCE()" case

 (c) the *normal* "panic_on_warn=1" case, which by default would panic
on all warnings, including any warnings from CONFIG_DEBUG_VM

 (d) the "VM person" case, which might not panic on normal warnings,
but would panic on the VM warnings.

and I think the use-cases are for different classes of kernel use:

 (a) is for people who disable debugging code until they feel it is
needed (which I think covers a lot of kernel developers - I certainly
personally tend to not build with debug support unless I'm chasing
some issue down)

 (b) would probably be most distros - enable the warning so that the
distro can report it, but try not to kill the machine of random people

 (c) would be most cloud use cases, presumably together with reboot-on-panic

 (d) would be people who are actual VM developers, and basically want
the *current* behavior of VM_BUG_ON() with a machine that stops

and I think (d) is the smallest set of cases of all, but is the one
you're personally interested in.

             Linus



             Linus
  
Matthew Wilcox May 23, 2023, 3:43 a.m. UTC | #6
On Mon, May 22, 2023 at 11:39:27AM -0700, David Rientjes wrote:
> I think VM_BUG_ON*() and friends are used to crash the kernel for 
> debugging so that we get a crash dump and because some variants don't 
> exist for VM_WARN_ON().  There's no VM_WARN_ON_PAGE(), for example, unless 
> implicitly converted with this patch.

It could be added, but there's already a VM_WARN_ON_FOLIO() and
VM_WARN_ON_ONCE_PAGE(), so hopefully we just keep converting code
to folios until nobody notices that we might need such a thing.
  
Michal Hocko May 23, 2023, 7:46 a.m. UTC | #7
On Mon 22-05-23 11:48:52, Linus Torvalds wrote:
> On Mon, May 22, 2023 at 11:39 AM David Rientjes <rientjes@google.com> wrote:
> >
> > I think VM_BUG_ON*() and friends are used to crash the kernel for
> > debugging so that we get a crash dump and because some variants don't
> > exist for VM_WARN_ON().
> 
> I do think that from a VM developer standpoint, I think it should be
> fine to just effectively turn VM_BUG_ON() into WARN_ON_ONCE() together
> with panic_on_warn.

This is a very good idea. VM_BUG_ON has always been rather special and
from my past experience people are not really sure when to use it. It is
a conditional thing so it cannot be really used for really BUG_ON cases.
Turning them into VM_WARN (not sure about ONCE) makes a lot of sense
because as you say you can make them panic easily.
  
David Rientjes May 24, 2023, 12:53 a.m. UTC | #8
On Mon, 22 May 2023, Linus Torvalds wrote:

> On Mon, May 22, 2023 at 5:52 PM David Rientjes <rientjes@google.com> wrote:
> >
> > Right now kernel.panic_on_warn can either be 0 or 1.  We can keep the
> > lowest bit to be "panic on all warnings" and then bit-1 as "panic on debug
> > VM warnings."  When CONFIG_DEBUG_VM is enabled, set the new bit by
> > default so there's no behavior change.
> 
> So right now CONFIG_DEBUG_VM being off means that there's nothing at
> all - not just no output, but also no code generation.
> 
> I don't think CONFIG_DEBUG_VM in itself should enable that bit-1 behavior.
> 
> That may be what *you* as a VM person wants, but VM people are not
> exactly the common case.
> 
> So I think we've got several cases:
> 
>  (a) the "don't even build it" case (CONFIG_DEBUG_VM being off)
> 
>  (b) the "build it, and it is a WARN_ON_ONCE()" case
> 
>  (c) the *normal* "panic_on_warn=1" case, which by default would panic
> on all warnings, including any warnings from CONFIG_DEBUG_VM
> 
>  (d) the "VM person" case, which might not panic on normal warnings,
> but would panic on the VM warnings.
> 
> and I think the use-cases are for different classes of kernel use:
> 
>  (a) is for people who disable debugging code until they feel it is
> needed (which I think covers a lot of kernel developers - I certainly
> personally tend to not build with debug support unless I'm chasing
> some issue down)
> 
>  (b) would probably be most distros - enable the warning so that the
> distro can report it, but try not to kill the machine of random people
> 
>  (c) would be most cloud use cases, presumably together with reboot-on-panic
> 
>  (d) would be people who are actual VM developers, and basically want
> the *current* behavior of VM_BUG_ON() with a machine that stops
> 
> and I think (d) is the smallest set of cases of all, but is the one
> you're personally interested in.
> 

If we want to change the behavior from today toward something that we 
think is the more common case for enabling CONFIG_DEBUG_VM, that works 
too.  If we fully remove VM_BUG_ON() in favor of VM_WARN_ON() + 
kernel.panic_on_warn=1, then anybody relying on getting kernel panics when 
they qualify new kernels with CONFIG_DEBUG_VM will start getting WARNINGs 
but not panics unless they are already using kernel.panic_on_warn.

I think that's fine, but is a change in behavior.  My use cases work both 
ways.  If we don't set the bit-1 behavior by default on CONFIG_DEBUG_VM 
then I just won't need to clear it.

I'm personally interested in (d) for debugging issues, but the intent of 
this patch was actually to allow for (b) too.  I want to deploy 
CONFIG_DEBUG_VM with WARN_ON_ONCE() behavior to a small set of production 
machines to catch latent kernel issues we don't know about, but without 
impacting the workloads.

That's also very valuable because I want to surface CONFIG_DEBUG_VM checks 
that would never get hit because we panic before it can be, just because 
of some other VM_BUG_ON().  Your idea of WARN_ON_ONCE() will be great for 
that because we can make forward progress and not be too spammy to the 
kernel log.

There seems to be some agreement in the thread for removing VM_BUG_ON() 
and friends in favor of VM_WARN_ON(), so I'll wait a couple days for 
anymore feedback and then send a patch along.

This seems like it will be very clean and allow for (b), which is great.
  
David Rientjes May 24, 2023, 12:54 a.m. UTC | #9
On Tue, 23 May 2023, Matthew Wilcox wrote:

> On Mon, May 22, 2023 at 11:39:27AM -0700, David Rientjes wrote:
> > I think VM_BUG_ON*() and friends are used to crash the kernel for 
> > debugging so that we get a crash dump and because some variants don't 
> > exist for VM_WARN_ON().  There's no VM_WARN_ON_PAGE(), for example, unless 
> > implicitly converted with this patch.
> 
> It could be added, but there's already a VM_WARN_ON_FOLIO() and
> VM_WARN_ON_ONCE_PAGE(), so hopefully we just keep converting code
> to folios until nobody notices that we might need such a thing.
> 

Yeah, the lack of VM_WARN variants for VM_BUG_ON_MM or VM_BUG_ON_VMA are 
probably better examples.  But it looks like we're converging toward 
eliminating VM_BUG_ON* variants entirely and relying on 
kernel.panic_on_warn to do the BUG_ON() behavior if we want to opt into 
that.  So this will be a useful cleanup.
  
David Hildenbrand May 24, 2023, 8:42 a.m. UTC | #10
On 23.05.23 03:47, Linus Torvalds wrote:
> On Mon, May 22, 2023 at 5:52 PM David Rientjes <rientjes@google.com> wrote:
>>
>> Right now kernel.panic_on_warn can either be 0 or 1.  We can keep the
>> lowest bit to be "panic on all warnings" and then bit-1 as "panic on debug
>> VM warnings."  When CONFIG_DEBUG_VM is enabled, set the new bit by
>> default so there's no behavior change.
> 
> So right now CONFIG_DEBUG_VM being off means that there's nothing at
> all - not just no output, but also no code generation.
> 
> I don't think CONFIG_DEBUG_VM in itself should enable that bit-1 behavior.
> 
> That may be what *you* as a VM person wants, but VM people are not
> exactly the common case.
> 
> So I think we've got several cases:
> 
>   (a) the "don't even build it" case (CONFIG_DEBUG_VM being off)
> 
>   (b) the "build it, and it is a WARN_ON_ONCE()" case
> 
>   (c) the *normal* "panic_on_warn=1" case, which by default would panic
> on all warnings, including any warnings from CONFIG_DEBUG_VM
> 
>   (d) the "VM person" case, which might not panic on normal warnings,
> but would panic on the VM warnings.
> 
> and I think the use-cases are for different classes of kernel use:
> 
>   (a) is for people who disable debugging code until they feel it is
> needed (which I think covers a lot of kernel developers - I certainly
> personally tend to not build with debug support unless I'm chasing
> some issue down)
> 
>   (b) would probably be most distros - enable the warning so that the
> distro can report it, but try not to kill the machine of random people
> 
>   (c) would be most cloud use cases, presumably together with reboot-on-panic
> 
>   (d) would be people who are actual VM developers, and basically want
> the *current* behavior of VM_BUG_ON() with a machine that stops
> 
> and I think (d) is the smallest set of cases of all, but is the one
> you're personally interested in.

Just as a side note, I stumbled yesterday over [1], which apparently 
disables CONFIG_DEBUG_VM on !debug Fedora builds.

The commit description does not contain a rational ( it's empty :) ), 
and I don't know if this is just a temporary change.

I'll CC Justin, maybe Fedora also would like to keep building with 
CONFIG_DEBUG_VM, but default to WARN_ON_ONCE() instead.


[1] 
https://gitlab.com/cki-project/kernel-ark/-/commit/ade780e10ae1fdcb575ab100bf02d61eb12dd406
  
Justin M. Forbes May 24, 2023, 11:44 a.m. UTC | #11
On Wed, May 24, 2023 at 3:42 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 23.05.23 03:47, Linus Torvalds wrote:
> > On Mon, May 22, 2023 at 5:52 PM David Rientjes <rientjes@google.com> wrote:
> >>
> >> Right now kernel.panic_on_warn can either be 0 or 1.  We can keep the
> >> lowest bit to be "panic on all warnings" and then bit-1 as "panic on debug
> >> VM warnings."  When CONFIG_DEBUG_VM is enabled, set the new bit by
> >> default so there's no behavior change.
> >
> > So right now CONFIG_DEBUG_VM being off means that there's nothing at
> > all - not just no output, but also no code generation.
> >
> > I don't think CONFIG_DEBUG_VM in itself should enable that bit-1 behavior.
> >
> > That may be what *you* as a VM person wants, but VM people are not
> > exactly the common case.
> >
> > So I think we've got several cases:
> >
> >   (a) the "don't even build it" case (CONFIG_DEBUG_VM being off)
> >
> >   (b) the "build it, and it is a WARN_ON_ONCE()" case
> >
> >   (c) the *normal* "panic_on_warn=1" case, which by default would panic
> > on all warnings, including any warnings from CONFIG_DEBUG_VM
> >
> >   (d) the "VM person" case, which might not panic on normal warnings,
> > but would panic on the VM warnings.
> >
> > and I think the use-cases are for different classes of kernel use:
> >
> >   (a) is for people who disable debugging code until they feel it is
> > needed (which I think covers a lot of kernel developers - I certainly
> > personally tend to not build with debug support unless I'm chasing
> > some issue down)
> >
> >   (b) would probably be most distros - enable the warning so that the
> > distro can report it, but try not to kill the machine of random people
> >
> >   (c) would be most cloud use cases, presumably together with reboot-on-panic
> >
> >   (d) would be people who are actual VM developers, and basically want
> > the *current* behavior of VM_BUG_ON() with a machine that stops
> >
> > and I think (d) is the smallest set of cases of all, but is the one
> > you're personally interested in.
>
> Just as a side note, I stumbled yesterday over [1], which apparently
> disables CONFIG_DEBUG_VM on !debug Fedora builds.
>
> The commit description does not contain a rational ( it's empty :) ),
> and I don't know if this is just a temporary change.
>
> I'll CC Justin, maybe Fedora also would like to keep building with
> CONFIG_DEBUG_VM, but default to WARN_ON_ONCE() instead.
>
>
> [1]
> https://gitlab.com/cki-project/kernel-ark/-/commit/ade780e10ae1fdcb575ab100bf02d61eb12dd406

Do not read too much into this right now.  The RHEL performance folks
did a comparison of the RHEL config vs the Fedora config for 6.3 and
found Fedora was considerably slower in a couple of tests. We are
re-running those tests with some DEBUG configs turned off to see which
is the culprit. FWIW, CONFIG_DEBUG_VM made very little difference. As
we have not found the specific cause yet though, final configs have
not been restored.
As for my prefeernce, WARN_ON_ONCE() behavior would be much preferred.

Justin

> --
> Thanks,
>
> David / dhildenb
>
  

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6818,6 +6818,7 @@ 
 			debugging features.
 
 			Available options are:
+			  B	Enable panic on MM debug checks
 			  P	Enable page structure init time poisoning
 			  -	Disable all of the above options
 
diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -13,34 +13,44 @@  void dump_page(struct page *page, const char *reason);
 void dump_vma(const struct vm_area_struct *vma);
 void dump_mm(const struct mm_struct *mm);
 
+extern bool debug_vm_bug_enabled;
+
 #ifdef CONFIG_DEBUG_VM
-#define VM_BUG_ON(cond) BUG_ON(cond)
+#define VM_BUG_ON(cond)							\
+	do {								\
+		if (unlikely(cond)) {					\
+			if (likely(debug_vm_bug_enabled))		\
+				BUG();					\
+			else						\
+				WARN_ON(1);				\
+		}							\
+	} while (0)
 #define VM_BUG_ON_PAGE(cond, page)					\
 	do {								\
 		if (unlikely(cond)) {					\
 			dump_page(page, "VM_BUG_ON_PAGE(" __stringify(cond)")");\
-			BUG();						\
+			VM_BUG_ON(1);					\
 		}							\
 	} while (0)
 #define VM_BUG_ON_FOLIO(cond, folio)					\
 	do {								\
 		if (unlikely(cond)) {					\
 			dump_page(&folio->page, "VM_BUG_ON_FOLIO(" __stringify(cond)")");\
-			BUG();						\
+			VM_BUG_ON(1);					\
 		}							\
 	} while (0)
 #define VM_BUG_ON_VMA(cond, vma)					\
 	do {								\
 		if (unlikely(cond)) {					\
 			dump_vma(vma);					\
-			BUG();						\
+			VM_BUG_ON(1);					\
 		}							\
 	} while (0)
 #define VM_BUG_ON_MM(cond, mm)						\
 	do {								\
 		if (unlikely(cond)) {					\
 			dump_mm(mm);					\
-			BUG();						\
+			VM_BUG_ON(1);					\
 		}							\
 	} while (0)
 #define VM_WARN_ON_ONCE_PAGE(cond, page)	({			\
diff --git a/mm/debug.c b/mm/debug.c
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -224,10 +224,15 @@  void dump_mm(const struct mm_struct *mm)
 }
 EXPORT_SYMBOL(dump_mm);
 
+/* If disabled, warns but does not panic on added CONFIG_DEBUG_VM checks */
+bool debug_vm_bug_enabled = true;
+EXPORT_SYMBOL(debug_vm_bug_enabled);
+
 static bool page_init_poisoning __read_mostly = true;
 
 static int __init setup_vm_debug(char *str)
 {
+	bool __debug_vm_bug_enabled = true;
 	bool __page_init_poisoning = true;
 
 	/*
@@ -237,13 +242,17 @@  static int __init setup_vm_debug(char *str)
 	if (*str++ != '=' || !*str)
 		goto out;
 
+	__debug_vm_bug_enabled = false;
 	__page_init_poisoning = false;
 	if (*str == '-')
 		goto out;
 
 	while (*str) {
 		switch (tolower(*str)) {
-		case'p':
+		case 'b':
+			__debug_vm_bug_enabled = true;
+			break;
+		case 'p':
 			__page_init_poisoning = true;
 			break;
 		default:
@@ -254,9 +263,12 @@  static int __init setup_vm_debug(char *str)
 		str++;
 	}
 out:
+	if (debug_vm_bug_enabled && !__debug_vm_bug_enabled)
+		pr_warn("Panic on MM debug checks disabled by kernel command line option 'vm_debug'\n");
 	if (page_init_poisoning && !__page_init_poisoning)
 		pr_warn("Page struct poisoning disabled by kernel command line option 'vm_debug'\n");
 
+	debug_vm_bug_enabled = __debug_vm_bug_enabled;
 	page_init_poisoning = __page_init_poisoning;
 
 	return 1;