[mm-unstable] lib/Kconfig.debug: do not enable DEBUG_PREEMPT by default

Message ID 20230121033942.350387-1-42.hyeyoo@gmail.com
State New
Headers
Series [mm-unstable] lib/Kconfig.debug: do not enable DEBUG_PREEMPT by default |

Commit Message

Hyeonggon Yoo Jan. 21, 2023, 3:39 a.m. UTC
  In workloads where this_cpu operations are frequently performed,
enabling DEBUG_PREEMPT may result in significant increase in
runtime overhead due to frequent invocation of
__this_cpu_preempt_check() function.

This can be demonstrated through benchmarks such as hackbench where this
configuration results in a 10% reduction in performance, primarily due to
the added overhead within memcg charging path.

Therefore, do not to enable DEBUG_PREEMPT by default and make users aware
of its potential impact on performance in some workloads.

hackbench-process-sockets
		      debug_preempt	 no_debug_preempt
Amean     1       0.4743 (   0.00%)      0.4295 *   9.45%*
Amean     4       1.4191 (   0.00%)      1.2650 *  10.86%*
Amean     7       2.2677 (   0.00%)      2.0094 *  11.39%*
Amean     12      3.6821 (   0.00%)      3.2115 *  12.78%*
Amean     21      6.6752 (   0.00%)      5.7956 *  13.18%*
Amean     30      9.6646 (   0.00%)      8.5197 *  11.85%*
Amean     48     15.3363 (   0.00%)     13.5559 *  11.61%*
Amean     79     24.8603 (   0.00%)     22.0597 *  11.27%*
Amean     96     30.1240 (   0.00%)     26.8073 *  11.01%*

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 lib/Kconfig.debug | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
  

Comments

Vlastimil Babka Jan. 21, 2023, 11:29 a.m. UTC | #1
On 1/21/23 04:39, Hyeonggon Yoo wrote:
> In workloads where this_cpu operations are frequently performed,
> enabling DEBUG_PREEMPT may result in significant increase in
> runtime overhead due to frequent invocation of
> __this_cpu_preempt_check() function.
> 
> This can be demonstrated through benchmarks such as hackbench where this
> configuration results in a 10% reduction in performance, primarily due to
> the added overhead within memcg charging path.
> 
> Therefore, do not to enable DEBUG_PREEMPT by default and make users aware
> of its potential impact on performance in some workloads.
> 
> hackbench-process-sockets
> 		      debug_preempt	 no_debug_preempt
> Amean     1       0.4743 (   0.00%)      0.4295 *   9.45%*
> Amean     4       1.4191 (   0.00%)      1.2650 *  10.86%*
> Amean     7       2.2677 (   0.00%)      2.0094 *  11.39%*
> Amean     12      3.6821 (   0.00%)      3.2115 *  12.78%*
> Amean     21      6.6752 (   0.00%)      5.7956 *  13.18%*
> Amean     30      9.6646 (   0.00%)      8.5197 *  11.85%*
> Amean     48     15.3363 (   0.00%)     13.5559 *  11.61%*
> Amean     79     24.8603 (   0.00%)     22.0597 *  11.27%*
> Amean     96     30.1240 (   0.00%)     26.8073 *  11.01%*
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Looks like it's there since the beginning of preempt and pre-git. But
probably should be something for scheduler maintainers rather than mm/slab,
even if the impact manifests there. You did Cc Ingo (the original author) so
let me Cc the rest here.

> ---
>  lib/Kconfig.debug | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index ddbfac2adf9c..f6f845a4b9ec 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1176,13 +1176,16 @@ config DEBUG_TIMEKEEPING
>  config DEBUG_PREEMPT
>  	bool "Debug preemptible kernel"
>  	depends on DEBUG_KERNEL && PREEMPTION && TRACE_IRQFLAGS_SUPPORT
> -	default y
>  	help
>  	  If you say Y here then the kernel will use a debug variant of the
>  	  commonly used smp_processor_id() function and will print warnings
>  	  if kernel code uses it in a preemption-unsafe way. Also, the kernel
>  	  will detect preemption count underflows.
>  
> +	  This option has potential to introduce high runtime overhead,
> +	  depending on workload as it triggers debugging routines for each
> +	  this_cpu operation. It should only be used for debugging purposes.
> +
>  menu "Lock Debugging (spinlocks, mutexes, etc...)"
>  
>  config LOCK_DEBUGGING_SUPPORT
  
Hyeonggon Yoo Jan. 21, 2023, 11:54 a.m. UTC | #2
On Sat, Jan 21, 2023 at 12:29:44PM +0100, Vlastimil Babka wrote:
> On 1/21/23 04:39, Hyeonggon Yoo wrote:
> > In workloads where this_cpu operations are frequently performed,
> > enabling DEBUG_PREEMPT may result in significant increase in
> > runtime overhead due to frequent invocation of
> > __this_cpu_preempt_check() function.
> > 
> > This can be demonstrated through benchmarks such as hackbench where this
> > configuration results in a 10% reduction in performance, primarily due to
> > the added overhead within memcg charging path.
> > 
> > Therefore, do not to enable DEBUG_PREEMPT by default and make users aware
> > of its potential impact on performance in some workloads.
> > 
> > hackbench-process-sockets
> > 		      debug_preempt	 no_debug_preempt
> > Amean     1       0.4743 (   0.00%)      0.4295 *   9.45%*
> > Amean     4       1.4191 (   0.00%)      1.2650 *  10.86%*
> > Amean     7       2.2677 (   0.00%)      2.0094 *  11.39%*
> > Amean     12      3.6821 (   0.00%)      3.2115 *  12.78%*
> > Amean     21      6.6752 (   0.00%)      5.7956 *  13.18%*
> > Amean     30      9.6646 (   0.00%)      8.5197 *  11.85%*
> > Amean     48     15.3363 (   0.00%)     13.5559 *  11.61%*
> > Amean     79     24.8603 (   0.00%)     22.0597 *  11.27%*
> > Amean     96     30.1240 (   0.00%)     26.8073 *  11.01%*
> > 
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> 
> Looks like it's there since the beginning of preempt and pre-git. But
> probably should be something for scheduler maintainers rather than mm/slab,
> even if the impact manifests there. You did Cc Ingo (the original author) so
> let me Cc the rest here.

Whew, I still get confused about who to Cc, thanks for adding them.
and I also didn't include the percpu memory allocator maintainers, who may
have opinion. let's add them too.

> 
> > ---
> >  lib/Kconfig.debug | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index ddbfac2adf9c..f6f845a4b9ec 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1176,13 +1176,16 @@ config DEBUG_TIMEKEEPING
> >  config DEBUG_PREEMPT
> >  	bool "Debug preemptible kernel"
> >  	depends on DEBUG_KERNEL && PREEMPTION && TRACE_IRQFLAGS_SUPPORT
> > -	default y
> >  	help
> >  	  If you say Y here then the kernel will use a debug variant of the
> >  	  commonly used smp_processor_id() function and will print warnings
> >  	  if kernel code uses it in a preemption-unsafe way. Also, the kernel
> >  	  will detect preemption count underflows.
> >  
> > +	  This option has potential to introduce high runtime overhead,
> > +	  depending on workload as it triggers debugging routines for each
> > +	  this_cpu operation. It should only be used for debugging purposes.
> > +
> >  menu "Lock Debugging (spinlocks, mutexes, etc...)"
> >  
> >  config LOCK_DEBUGGING_SUPPORT
>
  
Michal Hocko Jan. 23, 2023, 8:58 a.m. UTC | #3
On Sat 21-01-23 20:54:15, Hyeonggon Yoo wrote:
> On Sat, Jan 21, 2023 at 12:29:44PM +0100, Vlastimil Babka wrote:
> > On 1/21/23 04:39, Hyeonggon Yoo wrote:
> > > In workloads where this_cpu operations are frequently performed,
> > > enabling DEBUG_PREEMPT may result in significant increase in
> > > runtime overhead due to frequent invocation of
> > > __this_cpu_preempt_check() function.
> > > 
> > > This can be demonstrated through benchmarks such as hackbench where this
> > > configuration results in a 10% reduction in performance, primarily due to
> > > the added overhead within memcg charging path.
> > > 
> > > Therefore, do not to enable DEBUG_PREEMPT by default and make users aware
> > > of its potential impact on performance in some workloads.
> > > 
> > > hackbench-process-sockets
> > > 		      debug_preempt	 no_debug_preempt
> > > Amean     1       0.4743 (   0.00%)      0.4295 *   9.45%*
> > > Amean     4       1.4191 (   0.00%)      1.2650 *  10.86%*
> > > Amean     7       2.2677 (   0.00%)      2.0094 *  11.39%*
> > > Amean     12      3.6821 (   0.00%)      3.2115 *  12.78%*
> > > Amean     21      6.6752 (   0.00%)      5.7956 *  13.18%*
> > > Amean     30      9.6646 (   0.00%)      8.5197 *  11.85%*
> > > Amean     48     15.3363 (   0.00%)     13.5559 *  11.61%*
> > > Amean     79     24.8603 (   0.00%)     22.0597 *  11.27%*
> > > Amean     96     30.1240 (   0.00%)     26.8073 *  11.01%*

Do you happen to have any perf data collected during those runs? I
would be interested in the memcg side of things. Maybe we can do
something better there.
  
Christoph Lameter Jan. 23, 2023, 11:05 a.m. UTC | #4
On Sat, 21 Jan 2023, Hyeonggon Yoo wrote:

> Whew, I still get confused about who to Cc, thanks for adding them.
> and I also didn't include the percpu memory allocator maintainers, who may
> have opinion. let's add them too.

Well looks ok to me.

However, I thought most distro kernels disable PREEMPT anyways for
performance reasons? So DEBUG_PREEMPT should be off as well. I guess that
is why this has not been an issue so far.
  
Mel Gorman Jan. 23, 2023, 2:01 p.m. UTC | #5
Adding Peter to the cc as this should go via the tip tree even though
Ingo is cc'd already. Leaving full context and responding inline.

On Sat, Jan 21, 2023 at 12:39:42PM +0900, Hyeonggon Yoo wrote:
> In workloads where this_cpu operations are frequently performed,
> enabling DEBUG_PREEMPT may result in significant increase in
> runtime overhead due to frequent invocation of
> __this_cpu_preempt_check() function.
> 
> This can be demonstrated through benchmarks such as hackbench where this
> configuration results in a 10% reduction in performance, primarily due to
> the added overhead within memcg charging path.
> 
> Therefore, do not to enable DEBUG_PREEMPT by default and make users aware
> of its potential impact on performance in some workloads.
> 
> hackbench-process-sockets
> 		      debug_preempt	 no_debug_preempt
> Amean     1       0.4743 (   0.00%)      0.4295 *   9.45%*
> Amean     4       1.4191 (   0.00%)      1.2650 *  10.86%*
> Amean     7       2.2677 (   0.00%)      2.0094 *  11.39%*
> Amean     12      3.6821 (   0.00%)      3.2115 *  12.78%*
> Amean     21      6.6752 (   0.00%)      5.7956 *  13.18%*
> Amean     30      9.6646 (   0.00%)      8.5197 *  11.85%*
> Amean     48     15.3363 (   0.00%)     13.5559 *  11.61%*
> Amean     79     24.8603 (   0.00%)     22.0597 *  11.27%*
> Amean     96     30.1240 (   0.00%)     26.8073 *  11.01%*
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

This has been default y since very early on in the development of the BKL
removal. It was probably selected by default because it was expected there
would be a bunch of new SMP-related bugs. These days, there is no real
reason to enable it by default except when debugging a preempt-related
issue or during development. It's not like CONFIG_SCHED_DEBUG which gets
enabled in a lot of distros as it has some features which are useful in
production (which is unfortunate but splitting CONFIG_SCHED_DEBUG is a
completely separate topic).

> ---
>  lib/Kconfig.debug | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index ddbfac2adf9c..f6f845a4b9ec 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1176,13 +1176,16 @@ config DEBUG_TIMEKEEPING
>  config DEBUG_PREEMPT
>  	bool "Debug preemptible kernel"
>  	depends on DEBUG_KERNEL && PREEMPTION && TRACE_IRQFLAGS_SUPPORT
> -	default y
>  	help
>  	  If you say Y here then the kernel will use a debug variant of the
>  	  commonly used smp_processor_id() function and will print warnings
>  	  if kernel code uses it in a preemption-unsafe way. Also, the kernel
>  	  will detect preemption count underflows.
>  
> +	  This option has potential to introduce high runtime overhead,
> +	  depending on workload as it triggers debugging routines for each
> +	  this_cpu operation. It should only be used for debugging purposes.
> +
>  menu "Lock Debugging (spinlocks, mutexes, etc...)"
>  
>  config LOCK_DEBUGGING_SUPPORT
> -- 
> 2.34.1
> 
>
  
Hyeonggon Yoo Jan. 24, 2023, 4:34 p.m. UTC | #6
On Mon, Jan 23, 2023 at 12:05:00PM +0100, Christoph Lameter wrote:
> On Sat, 21 Jan 2023, Hyeonggon Yoo wrote:
> 
> > Whew, I still get confused about who to Cc, thanks for adding them.
> > and I also didn't include the percpu memory allocator maintainers, who may
> > have opinion. let's add them too.
> 
> Well looks ok to me.

Thanks for looking at!

> However, I thought most distro kernels disable PREEMPT anyways for
> performance reasons? So DEBUG_PREEMPT should be off as well. I guess that
> is why this has not been an issue so far.

It depends on PREEMPTION, and PREEMPT_DYNAMIC ("Preemption behaviour defined on boot")
selects PREEMPTION even if I end up using PREEMPT_VOLUNTARY.

Not so many distros use DEBUG_PREEMPT, but I am occationally hit by this
because debian and fedora enabled it :)
  
Michal Hocko Jan. 25, 2023, 9:51 a.m. UTC | #7
On Thu 26-01-23 00:41:15, Hyeonggon Yoo wrote:
[...]
> > Do you happen to have any perf data collected during those runs? I
> > would be interested in the memcg side of things. Maybe we can do
> > something better there.
> 
> Yes, below is performance data I've collected.
> 
> 6.1.8-debug-preempt-dirty
> =========================
>   Overhead  Command       Shared Object     Symbol
> +    9.14%  hackbench        [kernel.vmlinux]  [k] check_preemption_disabled

Thanks! Could you just add callers that are showing in the profile for
this call please?
  
Hyeonggon Yoo Jan. 25, 2023, 3:41 p.m. UTC | #8
From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Ingo Molnar <mingo@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <muchun.song@linux.dev>,
	Matthew Wilcox <willy@infradead.org>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>
Bcc: 
Subject: Re: [PATCH mm-unstable] lib/Kconfig.debug: do not enable
 DEBUG_PREEMPT by default
Reply-To: 
In-Reply-To: <Y85MNmZDc5czMRUJ@dhcp22.suse.cz>

On Mon, Jan 23, 2023 at 09:58:30AM +0100, Michal Hocko wrote:
> On Sat 21-01-23 20:54:15, Hyeonggon Yoo wrote:
> > On Sat, Jan 21, 2023 at 12:29:44PM +0100, Vlastimil Babka wrote:
> > > On 1/21/23 04:39, Hyeonggon Yoo wrote:
> > > > In workloads where this_cpu operations are frequently performed,
> > > > enabling DEBUG_PREEMPT may result in significant increase in
> > > > runtime overhead due to frequent invocation of
> > > > __this_cpu_preempt_check() function.
> > > > 
> > > > This can be demonstrated through benchmarks such as hackbench where this
> > > > configuration results in a 10% reduction in performance, primarily due to
> > > > the added overhead within memcg charging path.
> > > > 
> > > > Therefore, do not to enable DEBUG_PREEMPT by default and make users aware
> > > > of its potential impact on performance in some workloads.
> > > > 
> > > > hackbench-process-sockets
> > > > 		      debug_preempt	 no_debug_preempt
> > > > Amean     1       0.4743 (   0.00%)      0.4295 *   9.45%*
> > > > Amean     4       1.4191 (   0.00%)      1.2650 *  10.86%*
> > > > Amean     7       2.2677 (   0.00%)      2.0094 *  11.39%*
> > > > Amean     12      3.6821 (   0.00%)      3.2115 *  12.78%*
> > > > Amean     21      6.6752 (   0.00%)      5.7956 *  13.18%*
> > > > Amean     30      9.6646 (   0.00%)      8.5197 *  11.85%*
> > > > Amean     48     15.3363 (   0.00%)     13.5559 *  11.61%*
> > > > Amean     79     24.8603 (   0.00%)     22.0597 *  11.27%*
> > > > Amean     96     30.1240 (   0.00%)     26.8073 *  11.01%*

Hello Michal, thanks for looking at this.

> Do you happen to have any perf data collected during those runs? I
> would be interested in the memcg side of things. Maybe we can do
> something better there.

Yes, below is performance data I've collected.

6.1.8-debug-preempt-dirty
=========================
  Overhead  Command       Shared Object     Symbol
+    9.14%  hackbench        [kernel.vmlinux]  [k] check_preemption_disabled
+    7.33%  hackbench        [kernel.vmlinux]  [k] copy_user_enhanced_fast_string
+    7.32%  hackbench        [kernel.vmlinux]  [k] mod_objcg_state
     3.55%  hackbench        [kernel.vmlinux]  [k] refill_obj_stock
     3.39%  hackbench        [kernel.vmlinux]  [k] debug_smp_processor_id
     2.97%  hackbench        [kernel.vmlinux]  [k] memset_erms
     2.55%  hackbench        [kernel.vmlinux]  [k] __check_object_size
+    2.36%  hackbench        [kernel.vmlinux]  [k] native_queued_spin_lock_slowpath
     1.76%  hackbench        [kernel.vmlinux]  [k] unix_stream_read_generic
     1.64%  hackbench        [kernel.vmlinux]  [k] __slab_free
     1.58%  hackbench        [kernel.vmlinux]  [k] unix_stream_sendmsg
     1.46%  hackbench        [kernel.vmlinux]  [k] memcg_slab_post_alloc_hook
     1.35%  hackbench        [kernel.vmlinux]  [k] vfs_write
     1.33%  hackbench        [kernel.vmlinux]  [k] vfs_read
     1.28%  hackbench        [kernel.vmlinux]  [k] __alloc_skb
     1.18%  hackbench        [kernel.vmlinux]  [k] sock_read_iter
     1.16%  hackbench        [kernel.vmlinux]  [k] obj_cgroup_charge
     1.16%  hackbench        [kernel.vmlinux]  [k] entry_SYSCALL_64
     1.14%  hackbench        [kernel.vmlinux]  [k] sock_write_iter
     1.12%  hackbench        [kernel.vmlinux]  [k] skb_release_data
     1.08%  hackbench        [kernel.vmlinux]  [k] sock_wfree
     1.07%  hackbench        [kernel.vmlinux]  [k] cache_from_obj
     0.96%  hackbench        [kernel.vmlinux]  [k] unix_destruct_scm
     0.95%  hackbench        [kernel.vmlinux]  [k] kmem_cache_free
     0.94%  hackbench        [kernel.vmlinux]  [k] __kmem_cache_alloc_node
     0.92%  hackbench        [kernel.vmlinux]  [k] kmem_cache_alloc_node
     0.89%  hackbench        [kernel.vmlinux]  [k] _raw_spin_lock_irqsave
     0.84%  hackbench        [kernel.vmlinux]  [k] __x86_indirect_thunk_array
     0.84%  hackbench        libc.so.6         [.] write
     0.81%  hackbench        [kernel.vmlinux]  [k] exit_to_user_mode_prepare
     0.76%  hackbench        libc.so.6         [.] read
     0.75%  hackbench        [kernel.vmlinux]  [k] syscall_trace_enter.constprop.0
     0.75%  hackbench        [kernel.vmlinux]  [k] preempt_count_add
     0.74%  hackbench        [kernel.vmlinux]  [k] cmpxchg_double_slab.constprop.0.isra.0
     0.69%  hackbench        [kernel.vmlinux]  [k] get_partial_node
     0.69%  hackbench        [kernel.vmlinux]  [k] __virt_addr_valid
     0.69%  hackbench        [kernel.vmlinux]  [k] __rcu_read_unlock
     0.65%  hackbench        [kernel.vmlinux]  [k] get_obj_cgroup_from_current
     0.63%  hackbench        [kernel.vmlinux]  [k] __kmem_cache_free
     0.62%  hackbench        [kernel.vmlinux]  [k] entry_SYSRETQ_unsafe_stack
     0.60%  hackbench        [kernel.vmlinux]  [k] __rcu_read_lock
     0.59%  hackbench        [kernel.vmlinux]  [k] syscall_exit_to_user_mode_prepare
     0.54%  hackbench        [kernel.vmlinux]  [k] __unfreeze_partials
     0.53%  hackbench        [kernel.vmlinux]  [k] check_stack_object
     0.52%  hackbench        [kernel.vmlinux]  [k] entry_SYSCALL_64_after_hwframe
     0.51%  hackbench        [kernel.vmlinux]  [k] security_file_permission
     0.50%  hackbench        [kernel.vmlinux]  [k] __x64_sys_write
     0.49%  hackbench        [kernel.vmlinux]  [k] bpf_lsm_file_permission
     0.48%  hackbench        [kernel.vmlinux]  [k] ___slab_alloc
     0.46%  hackbench        [kernel.vmlinux]  [k] __check_heap_object

and attached flamegraph-6.1.8-debug-preempt-dirty.svg.

6.1.8 (no debug preempt)
========================
  Overhead  Command       Shared Object     Symbol
+   10.96%  hackbench     [kernel.vmlinux]  [k] mod_objcg_state
+    8.16%  hackbench     [kernel.vmlinux]  [k] copy_user_enhanced_fast_string
     3.29%  hackbench     [kernel.vmlinux]  [k] memset_erms
     3.07%  hackbench     [kernel.vmlinux]  [k] __slab_free
     2.89%  hackbench     [kernel.vmlinux]  [k] refill_obj_stock
     2.82%  hackbench     [kernel.vmlinux]  [k] __check_object_size
+    2.72%  hackbench     [kernel.vmlinux]  [k] native_queued_spin_lock_slowpath
     1.96%  hackbench     [kernel.vmlinux]  [k] __x86_indirect_thunk_rax
     1.88%  hackbench     [kernel.vmlinux]  [k] memcg_slab_post_alloc_hook
     1.69%  hackbench     [kernel.vmlinux]  [k] __rcu_read_unlock
     1.54%  hackbench     [kernel.vmlinux]  [k] __alloc_skb
     1.53%  hackbench     [kernel.vmlinux]  [k] unix_stream_sendmsg
     1.46%  hackbench     [kernel.vmlinux]  [k] kmem_cache_free
     1.44%  hackbench     [kernel.vmlinux]  [k] vfs_write
     1.43%  hackbench     [kernel.vmlinux]  [k] vfs_read
     1.33%  hackbench     [kernel.vmlinux]  [k] unix_stream_read_generic
     1.31%  hackbench     [kernel.vmlinux]  [k] sock_write_iter
     1.27%  hackbench     [kernel.vmlinux]  [k] kmalloc_slab
     1.22%  hackbench     [kernel.vmlinux]  [k] __rcu_read_lock
     1.20%  hackbench     [kernel.vmlinux]  [k] sock_read_iter
     1.18%  hackbench     [kernel.vmlinux]  [k] __entry_text_start
     1.15%  hackbench     [kernel.vmlinux]  [k] kmem_cache_alloc_node
     1.12%  hackbench     [kernel.vmlinux]  [k] unix_stream_recvmsg
     1.10%  hackbench     [kernel.vmlinux]  [k] obj_cgroup_charge
     0.98%  hackbench     [kernel.vmlinux]  [k] __kmem_cache_alloc_node
     0.97%  hackbench     libc.so.6         [.] write
     0.91%  hackbench     [kernel.vmlinux]  [k] exit_to_user_mode_prepare
     0.88%  hackbench     [kernel.vmlinux]  [k] __kmem_cache_free
     0.87%  hackbench     [kernel.vmlinux]  [k] syscall_trace_enter.constprop.0
     0.86%  hackbench     [kernel.vmlinux]  [k] __kmalloc_node_track_caller
     0.84%  hackbench     libc.so.6         [.] read
     0.81%  hackbench     [kernel.vmlinux]  [k] __lock_text_start
     0.80%  hackbench     [kernel.vmlinux]  [k] cache_from_obj
     0.74%  hackbench     [kernel.vmlinux]  [k] get_obj_cgroup_from_current
     0.73%  hackbench     [kernel.vmlinux]  [k] entry_SYSRETQ_unsafe_stack
     0.72%  hackbench     [kernel.vmlinux]  [k] unix_destruct_scm
     0.70%  hackbench     [kernel.vmlinux]  [k] get_partial_node
     0.69%  hackbench     [kernel.vmlinux]  [k] syscall_exit_to_user_mode_prepare
     0.65%  hackbench     [kernel.vmlinux]  [k] kfree
     0.63%  hackbench     [kernel.vmlinux]  [k] __unfreeze_partials
     0.60%  hackbench     [kernel.vmlinux]  [k] cmpxchg_double_slab.constprop.0.isra.0
     0.58%  hackbench     [kernel.vmlinux]  [k] skb_release_data
     0.56%  hackbench     [kernel.vmlinux]  [k] __virt_addr_valid
     0.56%  hackbench     [kernel.vmlinux]  [k] entry_SYSCALL_64_after_hwframe
     0.56%  hackbench     [kernel.vmlinux]  [k] __check_heap_object
     0.55%  hackbench     [kernel.vmlinux]  [k] sock_wfree
     0.54%  hackbench     [kernel.vmlinux]  [k] __audit_syscall_entry
     0.53%  hackbench     [kernel.vmlinux]  [k] ___slab_alloc
     0.53%  hackbench     [kernel.vmlinux]  [k] check_stack_object
     0.52%  hackbench     [kernel.vmlinux]  [k] bpf_lsm_file_permission

and attached flamegraph-6.1.8.svg.

If you need more information, feel free to ask.

--
Thanks,
Hyeonggon

> -- 
> Michal Hocko
> SUSE Labs
  
Roman Gushchin Jan. 26, 2023, 2:02 a.m. UTC | #9
On Sat, Jan 21, 2023 at 12:39:42PM +0900, Hyeonggon Yoo wrote:
> In workloads where this_cpu operations are frequently performed,
> enabling DEBUG_PREEMPT may result in significant increase in
> runtime overhead due to frequent invocation of
> __this_cpu_preempt_check() function.
> 
> This can be demonstrated through benchmarks such as hackbench where this
> configuration results in a 10% reduction in performance, primarily due to
> the added overhead within memcg charging path.
> 
> Therefore, do not to enable DEBUG_PREEMPT by default and make users aware
> of its potential impact on performance in some workloads.
> 
> hackbench-process-sockets
> 		      debug_preempt	 no_debug_preempt
> Amean     1       0.4743 (   0.00%)      0.4295 *   9.45%*
> Amean     4       1.4191 (   0.00%)      1.2650 *  10.86%*
> Amean     7       2.2677 (   0.00%)      2.0094 *  11.39%*
> Amean     12      3.6821 (   0.00%)      3.2115 *  12.78%*
> Amean     21      6.6752 (   0.00%)      5.7956 *  13.18%*
> Amean     30      9.6646 (   0.00%)      8.5197 *  11.85%*
> Amean     48     15.3363 (   0.00%)     13.5559 *  11.61%*
> Amean     79     24.8603 (   0.00%)     22.0597 *  11.27%*
> Amean     96     30.1240 (   0.00%)     26.8073 *  11.01%*
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Nice!

I checkout my very simple kmem performance test (1M allocations 8-bytes allocations)
and it shows ~30% difference: 112319 us with vs 80836 us without.

Probably not that big for real workloads, but still nice to have.

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Thank you!
  
Hyeonggon Yoo Jan. 27, 2023, 11:43 a.m. UTC | #10
On Wed, Jan 25, 2023 at 10:51:05AM +0100, Michal Hocko wrote:
> On Thu 26-01-23 00:41:15, Hyeonggon Yoo wrote:
> [...]
> > > Do you happen to have any perf data collected during those runs? I
> > > would be interested in the memcg side of things. Maybe we can do
> > > something better there.
> > 
> > Yes, below is performance data I've collected.
> > 
> > 6.1.8-debug-preempt-dirty
> > =========================
> >   Overhead  Command       Shared Object     Symbol
> > +    9.14%  hackbench        [kernel.vmlinux]  [k] check_preemption_disabled
> 
> Thanks! Could you just add callers that are showing in the profile for
> this call please?

-   14.56%     9.14%  hackbench        [kernel.vmlinux]  [k] check_preemption_disabled                          
   - 6.37% check_preemption_disabled                                                                            
      + 3.48% mod_objcg_state                                                                                   
      + 1.10% obj_cgroup_charge                                                                                 
        1.02% refill_obj_stock                                                                                  
     0.67% memcg_slab_post_alloc_hook                                                                           
     0.58% mod_objcg_state      

According to perf, many memcg functions call this function
and that's because __this_cpu_xxxx checks if preemption is disabled.

in include/linux/percpu-defs.h:

/*
 * Operations for contexts that are safe from preemption/interrupts.  These
 * operations verify that preemption is disabled.
 */
#define __this_cpu_read(pcp)                                            \
({                                                                      \
        __this_cpu_preempt_check("read");                               \
        raw_cpu_read(pcp);                                              \
})

#define __this_cpu_write(pcp, val)                                      \
({                                                                      \
        __this_cpu_preempt_check("write");                              \
        raw_cpu_write(pcp, val);                                        \
})

#define __this_cpu_add(pcp, val)                                        \
({                                                                      \
        __this_cpu_preempt_check("add");                                \
        raw_cpu_add(pcp, val);                                          \
})

in lib/smp_processor_id.c:

noinstr void __this_cpu_preempt_check(const char *op)
{
        check_preemption_disabled("__this_cpu_", op);
}
EXPORT_SYMBOL(__this_cpu_preempt_check);


> -- 
> Michal Hocko
> SUSE Labs
  
Hyeonggon Yoo Jan. 27, 2023, 11:45 a.m. UTC | #11
On Wed, Jan 25, 2023 at 06:02:04PM -0800, Roman Gushchin wrote:
> On Sat, Jan 21, 2023 at 12:39:42PM +0900, Hyeonggon Yoo wrote:
> > In workloads where this_cpu operations are frequently performed,
> > enabling DEBUG_PREEMPT may result in significant increase in
> > runtime overhead due to frequent invocation of
> > __this_cpu_preempt_check() function.
> > 
> > This can be demonstrated through benchmarks such as hackbench where this
> > configuration results in a 10% reduction in performance, primarily due to
> > the added overhead within memcg charging path.
> > 
> > Therefore, do not to enable DEBUG_PREEMPT by default and make users aware
> > of its potential impact on performance in some workloads.
> > 
> > hackbench-process-sockets
> > 		      debug_preempt	 no_debug_preempt
> > Amean     1       0.4743 (   0.00%)      0.4295 *   9.45%*
> > Amean     4       1.4191 (   0.00%)      1.2650 *  10.86%*
> > Amean     7       2.2677 (   0.00%)      2.0094 *  11.39%*
> > Amean     12      3.6821 (   0.00%)      3.2115 *  12.78%*
> > Amean     21      6.6752 (   0.00%)      5.7956 *  13.18%*
> > Amean     30      9.6646 (   0.00%)      8.5197 *  11.85%*
> > Amean     48     15.3363 (   0.00%)     13.5559 *  11.61%*
> > Amean     79     24.8603 (   0.00%)     22.0597 *  11.27%*
> > Amean     96     30.1240 (   0.00%)     26.8073 *  11.01%*
> > 
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> 
> Nice!
> 
> I checkout my very simple kmem performance test (1M allocations 8-bytes allocations)
> and it shows ~30% difference: 112319 us with vs 80836 us without.

Hello Roman,

Oh, it has higher impact on micro benchmark.

> 
> Probably not that big for real workloads, but still nice to have.
> 
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Thank you for kindly measuring impact of this patch
and giving ack!

> Thank you!
> 

--
Thanks,
Hyeonggon
  
Michal Hocko Jan. 27, 2023, 12:33 p.m. UTC | #12
On Fri 27-01-23 20:43:20, Hyeonggon Yoo wrote:
> On Wed, Jan 25, 2023 at 10:51:05AM +0100, Michal Hocko wrote:
> > On Thu 26-01-23 00:41:15, Hyeonggon Yoo wrote:
> > [...]
> > > > Do you happen to have any perf data collected during those runs? I
> > > > would be interested in the memcg side of things. Maybe we can do
> > > > something better there.
> > > 
> > > Yes, below is performance data I've collected.
> > > 
> > > 6.1.8-debug-preempt-dirty
> > > =========================
> > >   Overhead  Command       Shared Object     Symbol
> > > +    9.14%  hackbench        [kernel.vmlinux]  [k] check_preemption_disabled
> > 
> > Thanks! Could you just add callers that are showing in the profile for
> > this call please?
> 
> -   14.56%     9.14%  hackbench        [kernel.vmlinux]  [k] check_preemption_disabled                          
>    - 6.37% check_preemption_disabled                                                                            
>       + 3.48% mod_objcg_state                                                                                   
>       + 1.10% obj_cgroup_charge                                                                                 
>         1.02% refill_obj_stock                                                                                  
>      0.67% memcg_slab_post_alloc_hook                                                                           
>      0.58% mod_objcg_state      
> 
> According to perf, many memcg functions call this function
> and that's because __this_cpu_xxxx checks if preemption is disabled.

OK, I see. Thanks! I was thinking whether we can optimize for that bu
IIUC __this_cpu* is already an optimized form. mod_objcg_state is
already called with local_lock so raw_cpu* could be used in that path
but I guess this is not really worth just to optimize for a debug
compile option to benefit.
  
Davidlohr Bueso Feb. 2, 2023, 3:09 a.m. UTC | #13
On Sat, 21 Jan 2023, Hyeonggon Yoo wrote:

>In workloads where this_cpu operations are frequently performed,
>enabling DEBUG_PREEMPT may result in significant increase in
>runtime overhead due to frequent invocation of
>__this_cpu_preempt_check() function.
>
>This can be demonstrated through benchmarks such as hackbench where this
>configuration results in a 10% reduction in performance, primarily due to
>the added overhead within memcg charging path.
>
>Therefore, do not to enable DEBUG_PREEMPT by default and make users aware
>of its potential impact on performance in some workloads.
>
>hackbench-process-sockets
>		      debug_preempt	 no_debug_preempt
>Amean     1       0.4743 (   0.00%)      0.4295 *   9.45%*
>Amean     4       1.4191 (   0.00%)      1.2650 *  10.86%*
>Amean     7       2.2677 (   0.00%)      2.0094 *  11.39%*
>Amean     12      3.6821 (   0.00%)      3.2115 *  12.78%*
>Amean     21      6.6752 (   0.00%)      5.7956 *  13.18%*
>Amean     30      9.6646 (   0.00%)      8.5197 *  11.85%*
>Amean     48     15.3363 (   0.00%)     13.5559 *  11.61%*
>Amean     79     24.8603 (   0.00%)     22.0597 *  11.27%*
>Amean     96     30.1240 (   0.00%)     26.8073 *  11.01%*
>
>Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Acked-by: Davidlohr Bueso <dave@stgolabs.net>
  

Patch

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ddbfac2adf9c..f6f845a4b9ec 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1176,13 +1176,16 @@  config DEBUG_TIMEKEEPING
 config DEBUG_PREEMPT
 	bool "Debug preemptible kernel"
 	depends on DEBUG_KERNEL && PREEMPTION && TRACE_IRQFLAGS_SUPPORT
-	default y
 	help
 	  If you say Y here then the kernel will use a debug variant of the
 	  commonly used smp_processor_id() function and will print warnings
 	  if kernel code uses it in a preemption-unsafe way. Also, the kernel
 	  will detect preemption count underflows.
 
+	  This option has potential to introduce high runtime overhead,
+	  depending on workload as it triggers debugging routines for each
+	  this_cpu operation. It should only be used for debugging purposes.
+
 menu "Lock Debugging (spinlocks, mutexes, etc...)"
 
 config LOCK_DEBUGGING_SUPPORT