[8/9] irqentry: define irqentry_exit_allow_resched()

Message ID 20230403052233.1880567-9-ankur.a.arora@oracle.com
State New
Headers
Series x86/clear_huge_page: multi-page clearing |

Commit Message

Ankur Arora April 3, 2023, 5:22 a.m. UTC
  Allow threads marked TIF_ALLOW_RESCHED to be rescheduled in irqexit.

This is only necessary under !preempt_model_preemptible() for which
we reuse the same logic as irqentry_exit_code_resched().

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 kernel/entry/common.c |  8 ++++++++
 kernel/sched/core.c   | 36 +++++++++++++++++++++---------------
 2 files changed, 29 insertions(+), 15 deletions(-)
  

Comments

Thomas Gleixner April 4, 2023, 9:38 a.m. UTC | #1
On Sun, Apr 02 2023 at 22:22, Ankur Arora wrote:
> Allow threads marked TIF_ALLOW_RESCHED to be rescheduled in irqexit.
>
> This is only necessary under !preempt_model_preemptible() for which
> we reuse the same logic as irqentry_exit_code_resched().

This tells what this patch is doing but completely fails to explain why
this is necessary and useful.

Thanks,

        tglx
  
Ankur Arora April 5, 2023, 5:29 a.m. UTC | #2
Thomas Gleixner <tglx@linutronix.de> writes:

> On Sun, Apr 02 2023 at 22:22, Ankur Arora wrote:
>> Allow threads marked TIF_ALLOW_RESCHED to be rescheduled in irqexit.
>>
>> This is only necessary under !preempt_model_preemptible() for which
>> we reuse the same logic as irqentry_exit_code_resched().
>
> This tells what this patch is doing but completely fails to explain why
> this is necessary and useful.

Thanks. Yeah, it does seem to miss that completely.

Needs some massaging, but does something like this clarify it's purpose?

On kernels with PREEMPTION_NONE/_VOLUNTARY, rescheduling of kernel
tasks happens when they allow it -- for instance by synchronously
calling cond_resched() in a long running task.

There are cases where it is not convenient to periodically call
cond_resched() -- for instance when executing a potentially
long running instruction (such as REP STOSB on x86).

To handle kernel code sections which can be safely preempted, but
cannot explicitly call cond_resched(), allow them to mark themselves
TIF_ALLOW_RESCHED.

Contexts marked such (via allow_resched()) can be rescheduled in the
irqexit path. This is, of course only needed with
!preempt_model_preemptible() and the rescheduling logic is functionally
same as irqentry_exit_code_resched().

--
ankur
  
Peter Zijlstra April 5, 2023, 8:22 p.m. UTC | #3
On Sun, Apr 02, 2023 at 10:22:32PM -0700, Ankur Arora wrote:
> Allow threads marked TIF_ALLOW_RESCHED to be rescheduled in irqexit.
> 
> This is only necessary under !preempt_model_preemptible() for which
> we reuse the same logic as irqentry_exit_code_resched().
> 
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>  kernel/entry/common.c |  8 ++++++++
>  kernel/sched/core.c   | 36 +++++++++++++++++++++---------------
>  2 files changed, 29 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index be61332c66b5..f1005595ebe7 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -390,6 +390,9 @@ void raw_irqentry_exit_cond_resched(void)
>  			preempt_schedule_irq();
>  	}
>  }
> +
> +void irqentry_exit_allow_resched(void) __alias(raw_irqentry_exit_cond_resched);

Because typing raw_irqentry_exit_cond_resched() was too much effort?

> +
>  #ifdef CONFIG_PREEMPT_DYNAMIC
>  #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
>  DEFINE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
> @@ -431,6 +434,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>  		instrumentation_begin();
>  		if (IS_ENABLED(CONFIG_PREEMPTION))
>  			irqentry_exit_cond_resched();
> +		/*
> +		 * We care about this clause only in the dynamic !preemptible case.
> +		 */
> +		if (unlikely(!preempt_model_preemptible() && resched_allowed()))
> +			irqentry_exit_allow_resched();

This is wrong, if we have dynamic preemption then we have
CONFIG_PREEMPTION and we'll have that irqentry_exit_cond_resched() call.

Basically what you've written above is something like:

	static_call(foo); // raw_foo() when A, NOP if !A
	if (!A)
		raw_foo();

And yeah, you've got the extra resched_allowed() thing in there, but
that doesn't make it any better -- this is all quite horrible.

What you really care about is the CONFIG_PREEMPTION=n case, but that you
don't actually handle.

>  		/* Covers both tracing and lockdep */
>  		trace_hardirqs_on();
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0d18c3969f90..11845a91b691 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c

> @@ -8597,28 +8599,32 @@ EXPORT_SYMBOL(__cond_resched_rwlock_write);
>   * SC:preempt_schedule
>   * SC:preempt_schedule_notrace
>   * SC:irqentry_exit_cond_resched
> + * SC:irqentry_exit_allow_resched
>   *
>   *
>   * NONE:
> - *   cond_resched               <- __cond_resched
> - *   might_resched              <- RET0
> - *   preempt_schedule           <- NOP
> - *   preempt_schedule_notrace   <- NOP
> - *   irqentry_exit_cond_resched <- NOP
> + *   cond_resched                <- __cond_resched
> + *   might_resched               <- RET0
> + *   preempt_schedule            <- NOP
> + *   preempt_schedule_notrace    <- NOP
> + *   irqentry_exit_cond_resched  <- NOP
> + *   irqentry_exit_allow_resched <- irqentry_exit_allow_resched
>   *
>   * VOLUNTARY:
> - *   cond_resched               <- __cond_resched
> - *   might_resched              <- __cond_resched
> - *   preempt_schedule           <- NOP
> - *   preempt_schedule_notrace   <- NOP
> - *   irqentry_exit_cond_resched <- NOP
> + *   cond_resched                <- __cond_resched
> + *   might_resched               <- __cond_resched
> + *   preempt_schedule            <- NOP
> + *   preempt_schedule_notrace    <- NOP
> + *   irqentry_exit_cond_resched  <- NOP
> + *   irqentry_exit_allow_resched <- irqentry_exit_allow_resched
>   *
>   * FULL:
> - *   cond_resched               <- RET0
> - *   might_resched              <- RET0
> - *   preempt_schedule           <- preempt_schedule
> - *   preempt_schedule_notrace   <- preempt_schedule_notrace
> - *   irqentry_exit_cond_resched <- irqentry_exit_cond_resched
> + *   cond_resched                <- RET0
> + *   might_resched               <- RET0
> + *   preempt_schedule            <- preempt_schedule
> + *   preempt_schedule_notrace    <- preempt_schedule_notrace
> + *   irqentry_exit_cond_resched  <- irqentry_exit_cond_resched
> + *   irqentry_exit_allow_resched <- NOP
>   */

This ^ is all complete nonsense.. irqentry_exit_allow_resched() is not
a static call. It's an alias of raw_irqentry_exit_cond_resched which
circumvents the static call entirely.
  
Ankur Arora April 6, 2023, 4:56 p.m. UTC | #4
Peter Zijlstra <peterz@infradead.org> writes:

> On Sun, Apr 02, 2023 at 10:22:32PM -0700, Ankur Arora wrote:
>> Allow threads marked TIF_ALLOW_RESCHED to be rescheduled in irqexit.
>>
>> This is only necessary under !preempt_model_preemptible() for which
>> we reuse the same logic as irqentry_exit_code_resched().
>>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>  kernel/entry/common.c |  8 ++++++++
>>  kernel/sched/core.c   | 36 +++++++++++++++++++++---------------
>>  2 files changed, 29 insertions(+), 15 deletions(-)
>>
>> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
>> index be61332c66b5..f1005595ebe7 100644
>> --- a/kernel/entry/common.c
>> +++ b/kernel/entry/common.c
>> @@ -390,6 +390,9 @@ void raw_irqentry_exit_cond_resched(void)
>>  			preempt_schedule_irq();
>>  	}
>>  }
>> +
>> +void irqentry_exit_allow_resched(void) __alias(raw_irqentry_exit_cond_resched);
>
> Because typing raw_irqentry_exit_cond_resched() was too much effort?

Probably unnecessary but wanted to underscore that irqentry_exit_allow_resched()
handled a separate user path.

>> +
>>  #ifdef CONFIG_PREEMPT_DYNAMIC
>>  #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
>>  DEFINE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
>> @@ -431,6 +434,11 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>>  		instrumentation_begin();
>>  		if (IS_ENABLED(CONFIG_PREEMPTION))
>>  			irqentry_exit_cond_resched();
>> +		/*
>> +		 * We care about this clause only in the dynamic !preemptible case.
>> +		 */
>> +		if (unlikely(!preempt_model_preemptible() && resched_allowed()))
>> +			irqentry_exit_allow_resched();
>
> This is wrong, if we have dynamic preemption then we have
> CONFIG_PREEMPTION and we'll have that irqentry_exit_cond_resched() call.
>
> Basically what you've written above is something like:
>
> 	static_call(foo); // raw_foo() when A, NOP if !A

Right, so:

CONFIG_PREEMPTION: raw_foo()
CONFIG_PREEMPTION && (preempt_dynamic_mode == preempt_dynamic_full): raw_foo()

This is NOP if CONFIG_PREEMPTION && preempt_dynamic_mode != preempt_dynamic_full.

> 	if (!A)
> 		raw_foo();

So I would call raw_foo() for the CONFIG_PREEMPTION=n case.

> What you really care about is the CONFIG_PREEMPTION=n case, but that you
> don't actually handle.

I don't see that. The CONFIG_PREEMPTION=n (or its dynamic version)
is being handled here.

Ignoring the RT case, preempt_model_preemptible() is either:
   IS_ENABLED(CONFIG_PREEMPT)
or:
   preempt_dynamic_mode == preempt_dynamic_full

So, I'm handling both the static and the dynamic case here.

if (!IS_ENABLED(CONFIG_PREEMPTION))
   raw_foo()
or
if (preempt_dynamic_mode != preempt_dynamic_full)
   raw_foo()

> And yeah, you've got the extra resched_allowed() thing in there, but
> that doesn't make it any better -- this is all quite horrible.

I don't disagree. There's a quite a lot of static/dynamic config options
here and adding this clause didn't improve things.

I think just going with a static call here for the allow-resched case
might have been cleaner. Alternately I'll see if it can be cleanly
folded in with the irqentry_exit_cond_resched() logic.

>>  		/* Covers both tracing and lockdep */
>>  		trace_hardirqs_on();
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 0d18c3969f90..11845a91b691 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>
>> @@ -8597,28 +8599,32 @@ EXPORT_SYMBOL(__cond_resched_rwlock_write);
>>   * SC:preempt_schedule
>>   * SC:preempt_schedule_notrace
>>   * SC:irqentry_exit_cond_resched
>> + * SC:irqentry_exit_allow_resched
>>   *
>>   *
>>   * NONE:
>> - *   cond_resched               <- __cond_resched
>> - *   might_resched              <- RET0
>> - *   preempt_schedule           <- NOP
>> - *   preempt_schedule_notrace   <- NOP
>> - *   irqentry_exit_cond_resched <- NOP
>> + *   cond_resched                <- __cond_resched
>> + *   might_resched               <- RET0
>> + *   preempt_schedule            <- NOP
>> + *   preempt_schedule_notrace    <- NOP
>> + *   irqentry_exit_cond_resched  <- NOP
>> + *   irqentry_exit_allow_resched <- irqentry_exit_allow_resched
>>   *
>>   * VOLUNTARY:
>> - *   cond_resched               <- __cond_resched
>> - *   might_resched              <- __cond_resched
>> - *   preempt_schedule           <- NOP
>> - *   preempt_schedule_notrace   <- NOP
>> - *   irqentry_exit_cond_resched <- NOP
>> + *   cond_resched                <- __cond_resched
>> + *   might_resched               <- __cond_resched
>> + *   preempt_schedule            <- NOP
>> + *   preempt_schedule_notrace    <- NOP
>> + *   irqentry_exit_cond_resched  <- NOP
>> + *   irqentry_exit_allow_resched <- irqentry_exit_allow_resched
>>   *
>>   * FULL:
>> - *   cond_resched               <- RET0
>> - *   might_resched              <- RET0
>> - *   preempt_schedule           <- preempt_schedule
>> - *   preempt_schedule_notrace   <- preempt_schedule_notrace
>> - *   irqentry_exit_cond_resched <- irqentry_exit_cond_resched
>> + *   cond_resched                <- RET0
>> + *   might_resched               <- RET0
>> + *   preempt_schedule            <- preempt_schedule
>> + *   preempt_schedule_notrace    <- preempt_schedule_notrace
>> + *   irqentry_exit_cond_resched  <- irqentry_exit_cond_resched
>> + *   irqentry_exit_allow_resched <- NOP
>>   */
>
> This ^ is all complete nonsense.. irqentry_exit_allow_resched() is not
> a static call. It's an alias of raw_irqentry_exit_cond_resched which
> circumvents the static call entirely.

Yeah you are right. I added the SC:irqentry_exit_allow_resched
prematurely. Will fix.

Thanks for the detailed comments.

--
ankur
  
Peter Zijlstra April 6, 2023, 8:13 p.m. UTC | #5
On Thu, Apr 06, 2023 at 09:56:07AM -0700, Ankur Arora wrote:

> 
> Right, so:
> 
> CONFIG_PREEMPTION: raw_foo()
> CONFIG_PREEMPTION && (preempt_dynamic_mode == preempt_dynamic_full): raw_foo()
> 
> This is NOP if CONFIG_PREEMPTION && preempt_dynamic_mode != preempt_dynamic_full.
> 
> > 	if (!A)
> > 		raw_foo();
> 
> So I would call raw_foo() for the CONFIG_PREEMPTION=n case.
> 
> > What you really care about is the CONFIG_PREEMPTION=n case, but that you
> > don't actually handle.
> 
> I don't see that. The CONFIG_PREEMPTION=n (or its dynamic version)
> is being handled here.

Bah, I overlooked we have multiple definitions of the
preempt_model_foo() things. For some reason I thought all that was only
for DYNAMIC_PREEMPT.

> > And yeah, you've got the extra resched_allowed() thing in there, but
> > that doesn't make it any better -- this is all quite horrible.
> 
> I don't disagree. There's a quite a lot of static/dynamic config options
> here and adding this clause didn't improve things.
> 
> I think just going with a static call here for the allow-resched case
> might have been cleaner. Alternately I'll see if it can be cleanly
> folded in with the irqentry_exit_cond_resched() logic.

Something like the below perhaps?

---
 include/linux/entry-common.h |  6 ++++++
 kernel/entry/common.c        | 23 +++++++++++++++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index d95ab85f96ba..0c365dc1f1c2 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -415,10 +415,16 @@ irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
  * Conditional reschedule with additional sanity checks.
  */
 void raw_irqentry_exit_cond_resched(void);
+void irqentry_exit_cond_resched_tif(void);
+
 #ifdef CONFIG_PREEMPT_DYNAMIC
 #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
 #define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
+#ifdef TIF_RESCHED_ALLOW
+#define irqentry_exit_cond_resched_dynamic_disabled	irqentry_exit_cond_resched_tif
+#else
 #define irqentry_exit_cond_resched_dynamic_disabled	NULL
+#endif
 DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
 #define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
 #elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index be61332c66b5..211d118aa672 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -390,6 +390,21 @@ void raw_irqentry_exit_cond_resched(void)
 			preempt_schedule_irq();
 	}
 }
+
+void irqentry_exit_cond_resched_tif(void)
+{
+#ifdef TIF_RESCHED_ALLOW
+	if (resched_allowed()) {
+		/* Sanity check RCU and thread stack */
+		rcu_irq_exit_check_preempt();
+		if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
+			WARN_ON_ONCE(!on_thread_stack());
+		if (need_resched())
+			preempt_schedule_irq();
+	}
+#endif
+}
+
 #ifdef CONFIG_PREEMPT_DYNAMIC
 #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
 DEFINE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
@@ -397,8 +412,10 @@ DEFINE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
 DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
 void dynamic_irqentry_exit_cond_resched(void)
 {
-	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
-		return;
+	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) {
+		irqentry_exit_cond_resched_tif();
+		return
+	}
 	raw_irqentry_exit_cond_resched();
 }
 #endif
@@ -431,6 +448,8 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
 		instrumentation_begin();
 		if (IS_ENABLED(CONFIG_PREEMPTION))
 			irqentry_exit_cond_resched();
+		else
+			irqentry_exit_cond_resched_tif();
 
 		/* Covers both tracing and lockdep */
 		trace_hardirqs_on();
  
Peter Zijlstra April 6, 2023, 8:16 p.m. UTC | #6
On Thu, Apr 06, 2023 at 10:13:59PM +0200, Peter Zijlstra wrote:
> +void irqentry_exit_cond_resched_tif(void)
> +{
> +#ifdef TIF_RESCHED_ALLOW
> +	if (resched_allowed()) {
> +		/* Sanity check RCU and thread stack */
> +		rcu_irq_exit_check_preempt();
> +		if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
> +			WARN_ON_ONCE(!on_thread_stack());
> +		if (need_resched())
> +			preempt_schedule_irq();

arguably this can simply call raw_irqentry_exit_cond_resched_tif()..
probably better than duplicating all that again.

> +	}
> +#endif
> +}
  
Ankur Arora April 7, 2023, 2:29 a.m. UTC | #7
Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Apr 06, 2023 at 09:56:07AM -0700, Ankur Arora wrote:
>
>>
>> Right, so:
>>
>> CONFIG_PREEMPTION: raw_foo()
>> CONFIG_PREEMPTION && (preempt_dynamic_mode == preempt_dynamic_full): raw_foo()
>>
>> This is NOP if CONFIG_PREEMPTION && preempt_dynamic_mode != preempt_dynamic_full.
>>
>> > 	if (!A)
>> > 		raw_foo();
>>
>> So I would call raw_foo() for the CONFIG_PREEMPTION=n case.
>>
>> > What you really care about is the CONFIG_PREEMPTION=n case, but that you
>> > don't actually handle.
>>
>> I don't see that. The CONFIG_PREEMPTION=n (or its dynamic version)
>> is being handled here.
>
> Bah, I overlooked we have multiple definitions of the
> preempt_model_foo() things. For some reason I thought all that was only
> for DYNAMIC_PREEMPT.
>
>> > And yeah, you've got the extra resched_allowed() thing in there, but
>> > that doesn't make it any better -- this is all quite horrible.
>>
>> I don't disagree. There's a quite a lot of static/dynamic config options
>> here and adding this clause didn't improve things.
>>
>> I think just going with a static call here for the allow-resched case
>> might have been cleaner. Alternately I'll see if it can be cleanly
>> folded in with the irqentry_exit_cond_resched() logic.
>
> Something like the below perhaps?
>
> ---
>  include/linux/entry-common.h |  6 ++++++
>  kernel/entry/common.c        | 23 +++++++++++++++++++++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index d95ab85f96ba..0c365dc1f1c2 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -415,10 +415,16 @@ irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
>   * Conditional reschedule with additional sanity checks.
>   */
>  void raw_irqentry_exit_cond_resched(void);
> +void irqentry_exit_cond_resched_tif(void);
> +
>  #ifdef CONFIG_PREEMPT_DYNAMIC
>  #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
>  #define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
> +#ifdef TIF_RESCHED_ALLOW
> +#define irqentry_exit_cond_resched_dynamic_disabled	irqentry_exit_cond_resched_tif
> +#else
>  #define irqentry_exit_cond_resched_dynamic_disabled	NULL
> +#endif

So this is clever. Essentially this would toggle between the two kinds
for the preempt_model_preemptible()/!preempt_model_preemptible() dynamic
case. Do I have that right?

>  DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
>  #define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
>  #elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index be61332c66b5..211d118aa672 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -390,6 +390,21 @@ void raw_irqentry_exit_cond_resched(void)
>  			preempt_schedule_irq();
>  	}
>  }
> +
> +void irqentry_exit_cond_resched_tif(void)
> +{
> +#ifdef TIF_RESCHED_ALLOW
> +	if (resched_allowed()) {
> +		/* Sanity check RCU and thread stack */
> +		rcu_irq_exit_check_preempt();
> +		if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
> +			WARN_ON_ONCE(!on_thread_stack());
> +		if (need_resched())
> +			preempt_schedule_irq();
> +	}
> +#endif
> +}
> +
>  #ifdef CONFIG_PREEMPT_DYNAMIC
>  #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
>  DEFINE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
> @@ -397,8 +412,10 @@ DEFINE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
>  DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
>  void dynamic_irqentry_exit_cond_resched(void)
>  {
> -	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> -		return;
> +	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) {
> +		irqentry_exit_cond_resched_tif();
> +		return
> +	}
>  	raw_irqentry_exit_cond_resched();
>  }
>  #endif
> @@ -431,6 +448,8 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
>  		instrumentation_begin();
>  		if (IS_ENABLED(CONFIG_PREEMPTION))
>  			irqentry_exit_cond_resched();
> +		else
> +			irqentry_exit_cond_resched_tif();

And, if we choose between the two resched modes at compile time then this
would work.

Might massage the names a little but this should work as is. Okay if I
use your Codeveloped-by/Suggested-by on this patch?

--
ankur
  
Peter Zijlstra April 7, 2023, 10:23 a.m. UTC | #8
On Thu, Apr 06, 2023 at 07:29:59PM -0700, Ankur Arora wrote:
> Peter Zijlstra <peterz@infradead.org> writes:

> > Something like the below perhaps?
> >
> > ---
> >  include/linux/entry-common.h |  6 ++++++
> >  kernel/entry/common.c        | 23 +++++++++++++++++++++--
> >  2 files changed, 27 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> > index d95ab85f96ba..0c365dc1f1c2 100644
> > --- a/include/linux/entry-common.h
> > +++ b/include/linux/entry-common.h
> > @@ -415,10 +415,16 @@ irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
> >   * Conditional reschedule with additional sanity checks.
> >   */
> >  void raw_irqentry_exit_cond_resched(void);
> > +void irqentry_exit_cond_resched_tif(void);
> > +
> >  #ifdef CONFIG_PREEMPT_DYNAMIC
> >  #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
> >  #define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
> > +#ifdef TIF_RESCHED_ALLOW
> > +#define irqentry_exit_cond_resched_dynamic_disabled	irqentry_exit_cond_resched_tif
> > +#else
> >  #define irqentry_exit_cond_resched_dynamic_disabled	NULL
> > +#endif
> 
> So this is clever. Essentially this would toggle between the two kinds
> for the preempt_model_preemptible()/!preempt_model_preemptible() dynamic
> case. Do I have that right?

Exactly!

> >  DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
> >  #define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
> >  #elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
> > diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> > index be61332c66b5..211d118aa672 100644
> > --- a/kernel/entry/common.c
> > +++ b/kernel/entry/common.c
> > @@ -390,6 +390,21 @@ void raw_irqentry_exit_cond_resched(void)
> >  			preempt_schedule_irq();
> >  	}
> >  }
> > +
> > +void irqentry_exit_cond_resched_tif(void)
> > +{
> > +#ifdef TIF_RESCHED_ALLOW
> > +	if (resched_allowed()) {
> > +		/* Sanity check RCU and thread stack */
> > +		rcu_irq_exit_check_preempt();
> > +		if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
> > +			WARN_ON_ONCE(!on_thread_stack());
> > +		if (need_resched())
> > +			preempt_schedule_irq();
> > +	}
> > +#endif
> > +}
> > +
> >  #ifdef CONFIG_PREEMPT_DYNAMIC
> >  #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
> >  DEFINE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
> > @@ -397,8 +412,10 @@ DEFINE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
> >  DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> >  void dynamic_irqentry_exit_cond_resched(void)
> >  {
> > -	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> > -		return;
> > +	if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) {
> > +		irqentry_exit_cond_resched_tif();
> > +		return
> > +	}
> >  	raw_irqentry_exit_cond_resched();
> >  }
> >  #endif
> > @@ -431,6 +448,8 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
> >  		instrumentation_begin();
> >  		if (IS_ENABLED(CONFIG_PREEMPTION))
> >  			irqentry_exit_cond_resched();
> > +		else
> > +			irqentry_exit_cond_resched_tif();
> 
> And, if we choose between the two resched modes at compile time then this
> would work.

Just so.

> Might massage the names a little but this should work as is.

Yeah, I'm not liking the naming either, perhaps your
irqentry_exit_allow_resched() would've been better, dunno, see what
works.

> Okay if I use your Codeveloped-by/Suggested-by on this patch?

Yep.
  

Patch

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index be61332c66b5..f1005595ebe7 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -390,6 +390,9 @@  void raw_irqentry_exit_cond_resched(void)
 			preempt_schedule_irq();
 	}
 }
+
+void irqentry_exit_allow_resched(void) __alias(raw_irqentry_exit_cond_resched);
+
 #ifdef CONFIG_PREEMPT_DYNAMIC
 #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
 DEFINE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
@@ -431,6 +434,11 @@  noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
 		instrumentation_begin();
 		if (IS_ENABLED(CONFIG_PREEMPTION))
 			irqentry_exit_cond_resched();
+		/*
+		 * We care about this clause only in the dynamic !preemptible case.
+		 */
+		if (unlikely(!preempt_model_preemptible() && resched_allowed()))
+			irqentry_exit_allow_resched();
 
 		/* Covers both tracing and lockdep */
 		trace_hardirqs_on();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0d18c3969f90..11845a91b691 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6500,6 +6500,8 @@  pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
  *          - explicit schedule() call
  *          - return from syscall or exception to user-space
  *          - return from interrupt-handler to user-space
+ *          - return from interrupt-handler with the task having set
+ *            TIF_RESCHED_ALLOW
  *
  * WARNING: must be called with preemption disabled!
  */
@@ -8597,28 +8599,32 @@  EXPORT_SYMBOL(__cond_resched_rwlock_write);
  * SC:preempt_schedule
  * SC:preempt_schedule_notrace
  * SC:irqentry_exit_cond_resched
+ * SC:irqentry_exit_allow_resched
  *
  *
  * NONE:
- *   cond_resched               <- __cond_resched
- *   might_resched              <- RET0
- *   preempt_schedule           <- NOP
- *   preempt_schedule_notrace   <- NOP
- *   irqentry_exit_cond_resched <- NOP
+ *   cond_resched                <- __cond_resched
+ *   might_resched               <- RET0
+ *   preempt_schedule            <- NOP
+ *   preempt_schedule_notrace    <- NOP
+ *   irqentry_exit_cond_resched  <- NOP
+ *   irqentry_exit_allow_resched <- irqentry_exit_allow_resched
  *
  * VOLUNTARY:
- *   cond_resched               <- __cond_resched
- *   might_resched              <- __cond_resched
- *   preempt_schedule           <- NOP
- *   preempt_schedule_notrace   <- NOP
- *   irqentry_exit_cond_resched <- NOP
+ *   cond_resched                <- __cond_resched
+ *   might_resched               <- __cond_resched
+ *   preempt_schedule            <- NOP
+ *   preempt_schedule_notrace    <- NOP
+ *   irqentry_exit_cond_resched  <- NOP
+ *   irqentry_exit_allow_resched <- irqentry_exit_allow_resched
  *
  * FULL:
- *   cond_resched               <- RET0
- *   might_resched              <- RET0
- *   preempt_schedule           <- preempt_schedule
- *   preempt_schedule_notrace   <- preempt_schedule_notrace
- *   irqentry_exit_cond_resched <- irqentry_exit_cond_resched
+ *   cond_resched                <- RET0
+ *   might_resched               <- RET0
+ *   preempt_schedule            <- preempt_schedule
+ *   preempt_schedule_notrace    <- preempt_schedule_notrace
+ *   irqentry_exit_cond_resched  <- irqentry_exit_cond_resched
+ *   irqentry_exit_allow_resched <- NOP
  */
 
 enum {