[v5,3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT

Message ID 20221116075929.453876-4-jstultz@google.com
State New
Headers
Series Softirq aware -rt scheduling |

Commit Message

John Stultz Nov. 16, 2022, 7:59 a.m. UTC
  From: Pavankumar Kondeti <pkondeti@codeaurora.org>

Defer the softirq processing to ksoftirqd if a RT task is
running or queued on the current CPU. This complements the RT
task placement algorithm which tries to find a CPU that is not
currently busy with softirqs.

Currently NET_TX, NET_RX, BLOCK and IRQ_POLL softirqs are only
deferred as they can potentially run for long time.

Additionally, this patch stubs out ksoftirqd_running() logic,
in the CONFIG_RT_SOFTIRQ_AWARE_SCHED case, as deferring
potentially long-running softirqs will cause the logic to not
process shorter-running softirqs immediately. By stubbing it out
the potentially long running softirqs are deferred, but the
shorter running ones can still run immediately.

This patch includes folded-in fixes by:
  Lingutla Chandrasekhar <clingutla@codeaurora.org>
  Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
  J. Avila <elavila@google.com>

Cc: John Dias <joaodias@google.com>
Cc: Connor O'Brien <connoro@google.com>
Cc: Rick Yiu <rickyiu@google.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Chris Redpath <chris.redpath@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: kernel-team@android.com
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
[satyap@codeaurora.org: trivial merge conflict resolution.]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
[elavila: Port to mainline, squash with bugfix]
Signed-off-by: J. Avila <elavila@google.com>
[jstultz: Rebase to linus/HEAD, minor rearranging of code,
 included bug fix Reported-by: Qais Yousef <qais.yousef@arm.com> ]
Signed-off-by: John Stultz <jstultz@google.com>
---
v4:
* Fix commit message to accurately note long-running softirqs
  (suggested by Qais)
* Switch to using rt_task(current) (suggested by Qais)
v5:
* Switch to using CONFIG_RT_SOFTIRQ_AWARE_SCHED (suggested by
  Joel Fernandes <joel@joelfernandes.org>)
---
 kernel/softirq.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)
  

Comments

Peter Zijlstra Nov. 16, 2022, 10:37 a.m. UTC | #1
On Wed, Nov 16, 2022 at 07:59:28AM +0000, John Stultz wrote:
> From: Pavankumar Kondeti <pkondeti@codeaurora.org>
> 
> Defer the softirq processing to ksoftirqd if a RT task is
> running or queued on the current CPU. This complements the RT
> task placement algorithm which tries to find a CPU that is not
> currently busy with softirqs.
> 
> Currently NET_TX, NET_RX, BLOCK and IRQ_POLL softirqs are only
> deferred as they can potentially run for long time.
> 
> Additionally, this patch stubs out ksoftirqd_running() logic,
> in the CONFIG_RT_SOFTIRQ_AWARE_SCHED case, as deferring
> potentially long-running softirqs will cause the logic to not
> process shorter-running softirqs immediately. By stubbing it out
> the potentially long running softirqs are deferred, but the
> shorter running ones can still run immediately.

So I'm hating on the new config space, and dubious of the placement
logic (I'm thinking much the same gain can be had when actual softirq
processing stops quickly when we want to reschedule).

However I still have these here patches (revived from the dead):

  https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=core/softirq

That fix some of this same... I think the last time this fell on its
face due to regressions and me not having time/energy to chase them down
and the author of the hack-of-the-day solution walking off or something
like that.

I think this aspect of the whole softirq thing really needs fixing first
and not hidden under some obscure CONFIG symbol.
  
John Stultz Nov. 18, 2022, 8:18 a.m. UTC | #2
On Wed, Nov 16, 2022 at 2:37 AM Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Nov 16, 2022 at 07:59:28AM +0000, John Stultz wrote:
> > From: Pavankumar Kondeti <pkondeti@codeaurora.org>
> >
> > Defer the softirq processing to ksoftirqd if a RT task is
> > running or queued on the current CPU. This complements the RT
> > task placement algorithm which tries to find a CPU that is not
> > currently busy with softirqs.
> >
> > Currently NET_TX, NET_RX, BLOCK and IRQ_POLL softirqs are only
> > deferred as they can potentially run for long time.
> >
> > Additionally, this patch stubs out ksoftirqd_running() logic,
> > in the CONFIG_RT_SOFTIRQ_AWARE_SCHED case, as deferring
> > potentially long-running softirqs will cause the logic to not
> > process shorter-running softirqs immediately. By stubbing it out
> > the potentially long running softirqs are deferred, but the
> > shorter running ones can still run immediately.
>
> So I'm hating on the new config space, and dubious of the placement
> logic (I'm thinking much the same gain can be had when actual softirq
> processing stops quickly when we want to reschedule).

Hey Peter!
  Thanks for taking the time to look this over and provide feedback!

> However I still have these here patches (revived from the dead):
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=core/softirq
>
> That fix some of this same... I think the last time this fell on its
> face due to regressions and me not having time/energy to chase them down
> and the author of the hack-of-the-day solution walking off or something
> like that.

So I've taken a look at your patch series and backported it (as well
as extended it to use softirq_needs_break for the BLOCK softirq) to a
5.10 device kernel where I can compare the behaviors against the tuned
code that's shipping.

At first glance at your patches, I was optimistic, as it's great that
it nicely time bounds the number of softirqs run, but I fet that it
doesn't help if a single softirq takes longer than we'd like.
(And indeed, I can see single BLOCK softirqs invocations taking quite
some time - > 16ms! in my last run)

Obviously "fix that driver to not do that" would be the standard
response, but it does become a constant wack-a-mole game. Sort of like
some of the philosophy around some of the PREEMPT_RT changes, where a
broad solution is used to avoid having to fix and maintain latencies
in code across the kernel, this task placement solution helps avoid rt
latencies being dependent on the continual perfection of all drivers
used.

> I think this aspect of the whole softirq thing really needs fixing first
> and not hidden under some obscure CONFIG symbol.

Sure, and while I am happy to help with your current patch series, as
I do think it should improve things, I'm not sure if it will let us
move away from the softirq-rt placement optimization patches.
Any ideas for additional approaches that would be more agreeable to you?

We could pull most of the logic out from the CONFIG file, maybe let
userland set the long-softirq mask which rt tasks would avoid
scheduling onto? Though I know folks would probably like to avoid the
cpupri_find_fitness test logic if they could so I'll have to think how
to efficiently shortcut that if the mask is null.

Anyway, thanks so much again for the feedback!
-john
  
Joel Fernandes Dec. 8, 2022, 5:15 p.m. UTC | #3
Hi John,

On Wed, Nov 16, 2022 at 07:59:28AM +0000, John Stultz wrote:
> From: Pavankumar Kondeti <pkondeti@codeaurora.org>
> 
> Defer the softirq processing to ksoftirqd if a RT task is
> running or queued on the current CPU. This complements the RT
> task placement algorithm which tries to find a CPU that is not
> currently busy with softirqs.
> 
> Currently NET_TX, NET_RX, BLOCK and IRQ_POLL softirqs are only
> deferred as they can potentially run for long time.
> 
> Additionally, this patch stubs out ksoftirqd_running() logic,
> in the CONFIG_RT_SOFTIRQ_AWARE_SCHED case, as deferring
> potentially long-running softirqs will cause the logic to not
> process shorter-running softirqs immediately. By stubbing it out
> the potentially long running softirqs are deferred, but the
> shorter running ones can still run immediately.
> 
> This patch includes folded-in fixes by:
>   Lingutla Chandrasekhar <clingutla@codeaurora.org>
>   Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
>   J. Avila <elavila@google.com>
> 
[...]
> ---
> v4:
> * Fix commit message to accurately note long-running softirqs
>   (suggested by Qais)
> * Switch to using rt_task(current) (suggested by Qais)
> v5:
> * Switch to using CONFIG_RT_SOFTIRQ_AWARE_SCHED (suggested by
>   Joel Fernandes <joel@joelfernandes.org>)
> ---
>  kernel/softirq.c | 29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index dd92ce8f771b..5db2afd0be68 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -95,6 +95,7 @@ static void wakeup_softirqd(void)
>  		wake_up_process(tsk);
>  }
>  
> +#ifndef CONFIG_RT_SOFTIRQ_AWARE_SCHED
>  /*
>   * If ksoftirqd is scheduled, we do not want to process pending softirqs
>   * right now. Let ksoftirqd handle this at its own rate, to get fairness,
> @@ -109,6 +110,9 @@ static bool ksoftirqd_running(unsigned long pending)
>  		return false;
>  	return tsk && task_is_running(tsk) && !__kthread_should_park(tsk);
>  }
> +#else
> +#define ksoftirqd_running(pending) (false)
> +#endif /* CONFIG_RT_SOFTIRQ_AWARE_SCHED */
>  
>  #ifdef CONFIG_TRACE_IRQFLAGS
>  DEFINE_PER_CPU(int, hardirqs_enabled);
> @@ -540,6 +544,21 @@ static inline bool lockdep_softirq_start(void) { return false; }
>  static inline void lockdep_softirq_end(bool in_hardirq) { }
>  #endif
>  
> +#ifdef CONFIG_RT_SOFTIRQ_AWARE_SCHED
> +static __u32 softirq_deferred_for_rt(__u32 *pending)
> +{
> +	__u32 deferred = 0;
> +
> +	if (rt_task(current)) {

Over here, I suggest also check dl_task(current). SCHED_DEADLINE is being
used for the 'sugov' (schedutil governor threads) on currently shipping
products, and DL should be treated greater than RT priority.

On the other hand, the DL counterpart to avoid softirq is not there, but at
least softirq can defer for DL threads.

> +		deferred = *pending & LONG_SOFTIRQ_MASK;
> +		*pending &= ~LONG_SOFTIRQ_MASK;
> +	}
> +	return deferred;
> +}
> +#else
> +#define softirq_deferred_for_rt(x) (0)
> +#endif
> +
>  asmlinkage __visible void __softirq_entry __do_softirq(void)
>  {
>  	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
> @@ -547,6 +566,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  	int max_restart = MAX_SOFTIRQ_RESTART;
>  	struct softirq_action *h;
>  	bool in_hardirq;
> +	__u32 deferred;
>  	__u32 pending;
>  	int softirq_bit;
>  
> @@ -558,14 +578,16 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  	current->flags &= ~PF_MEMALLOC;
>  
>  	pending = local_softirq_pending();
> +	deferred = softirq_deferred_for_rt(&pending);
>  
>  	softirq_handle_begin();
> +
>  	in_hardirq = lockdep_softirq_start();
>  	account_softirq_enter(current);
>  
>  restart:
>  	/* Reset the pending bitmask before enabling irqs */
> -	set_softirq_pending(0);
> +	set_softirq_pending(deferred);

Over here, the local pending mask is set to whatever was deferred [1]..

>  	set_active_softirqs(pending);
>  
>  	local_irq_enable();
> @@ -604,13 +626,16 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  	local_irq_disable();
>  
>  	pending = local_softirq_pending();
> +	deferred = softirq_deferred_for_rt(&pending);

... so over here, pending should always be set to atleast the deferred bits
if deferred is set (unless something wokeup and executed softirqs while
interrupts are enabled -- which should not happen because BH is disabled
through out) [2].

> +
>  	if (pending) {
>  		if (time_before(jiffies, end) && !need_resched() &&
>  		    --max_restart)

So then here, pending (which also now contains deferred due to [1] [2])
should already take care off waking up ksoftirqd, except that perhaps it
should be like this:

  	if (pending) {
		// Also, don't restart if something was deferred, let the RT task
		// breath a bit.
  		if (time_before(jiffies, end) && !need_resched() &&
			!deferred && --max_restart)
			goto restart;

		wakeup_softirqd();
	}

Or, will that not work?

My point being that we probably don't want to go through the retry-cycle,
if something was deferred since the plan is to wake up ksoftirqd in such
cases.

> +	if (pending | deferred)
>  		wakeup_softirqd();

And then perhaps this check can be removed.

Thoughts?

thanks,

 - Joel



>  			goto restart;
> +	}
>  
> +	if (pending | deferred)
>  		wakeup_softirqd();
> -	}
>  
>  	account_softirq_exit(current);
>  	lockdep_softirq_end(in_hardirq);
> -- 
> 2.38.1.431.g37b22c650d-goog
>
  

Patch

diff --git a/kernel/softirq.c b/kernel/softirq.c
index dd92ce8f771b..5db2afd0be68 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -95,6 +95,7 @@  static void wakeup_softirqd(void)
 		wake_up_process(tsk);
 }
 
+#ifndef CONFIG_RT_SOFTIRQ_AWARE_SCHED
 /*
  * If ksoftirqd is scheduled, we do not want to process pending softirqs
  * right now. Let ksoftirqd handle this at its own rate, to get fairness,
@@ -109,6 +110,9 @@  static bool ksoftirqd_running(unsigned long pending)
 		return false;
 	return tsk && task_is_running(tsk) && !__kthread_should_park(tsk);
 }
+#else
+#define ksoftirqd_running(pending) (false)
+#endif /* CONFIG_RT_SOFTIRQ_AWARE_SCHED */
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 DEFINE_PER_CPU(int, hardirqs_enabled);
@@ -540,6 +544,21 @@  static inline bool lockdep_softirq_start(void) { return false; }
 static inline void lockdep_softirq_end(bool in_hardirq) { }
 #endif
 
+#ifdef CONFIG_RT_SOFTIRQ_AWARE_SCHED
+static __u32 softirq_deferred_for_rt(__u32 *pending)
+{
+	__u32 deferred = 0;
+
+	if (rt_task(current)) {
+		deferred = *pending & LONG_SOFTIRQ_MASK;
+		*pending &= ~LONG_SOFTIRQ_MASK;
+	}
+	return deferred;
+}
+#else
+#define softirq_deferred_for_rt(x) (0)
+#endif
+
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
 	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
@@ -547,6 +566,7 @@  asmlinkage __visible void __softirq_entry __do_softirq(void)
 	int max_restart = MAX_SOFTIRQ_RESTART;
 	struct softirq_action *h;
 	bool in_hardirq;
+	__u32 deferred;
 	__u32 pending;
 	int softirq_bit;
 
@@ -558,14 +578,16 @@  asmlinkage __visible void __softirq_entry __do_softirq(void)
 	current->flags &= ~PF_MEMALLOC;
 
 	pending = local_softirq_pending();
+	deferred = softirq_deferred_for_rt(&pending);
 
 	softirq_handle_begin();
+
 	in_hardirq = lockdep_softirq_start();
 	account_softirq_enter(current);
 
 restart:
 	/* Reset the pending bitmask before enabling irqs */
-	set_softirq_pending(0);
+	set_softirq_pending(deferred);
 	set_active_softirqs(pending);
 
 	local_irq_enable();
@@ -604,13 +626,16 @@  asmlinkage __visible void __softirq_entry __do_softirq(void)
 	local_irq_disable();
 
 	pending = local_softirq_pending();
+	deferred = softirq_deferred_for_rt(&pending);
+
 	if (pending) {
 		if (time_before(jiffies, end) && !need_resched() &&
 		    --max_restart)
 			goto restart;
+	}
 
+	if (pending | deferred)
 		wakeup_softirqd();
-	}
 
 	account_softirq_exit(current);
 	lockdep_softirq_end(in_hardirq);