[RFC,57/86] coccinelle: script to remove cond_resched()

Message ID 20231107230822.371443-1-ankur.a.arora@oracle.com
State New
Headers
Series Make the kernel preemptible |

Commit Message

Ankur Arora Nov. 7, 2023, 11:07 p.m. UTC
  Rudimentary script to remove the straight-forward subset of
cond_resched() and allies:

1)  if (need_resched())
	  cond_resched()

2)  expression*;
    cond_resched();  /* or in the reverse order */

3)  if (expression)
	statement
    cond_resched();  /* or in the reverse order */

The last two patterns depend on the control flow level to ensure
that the complex cond_resched() patterns (ex. conditioned ones)
are left alone and we only pick up ones which are only minimally
related the neighbouring code.

Cc: Julia Lawall <Julia.Lawall@inria.fr>
Cc: Nicolas Palix <nicolas.palix@imag.fr>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 scripts/coccinelle/api/cond_resched.cocci | 53 +++++++++++++++++++++++
 1 file changed, 53 insertions(+)
 create mode 100644 scripts/coccinelle/api/cond_resched.cocci
  

Comments

Julia Lawall Nov. 7, 2023, 11:19 p.m. UTC | #1
On Tue, 7 Nov 2023, Ankur Arora wrote:

> Rudimentary script to remove the straight-forward subset of
> cond_resched() and allies:
>
> 1)  if (need_resched())
> 	  cond_resched()
>
> 2)  expression*;
>     cond_resched();  /* or in the reverse order */
>
> 3)  if (expression)
> 	statement
>     cond_resched();  /* or in the reverse order */
>
> The last two patterns depend on the control flow level to ensure
> that the complex cond_resched() patterns (ex. conditioned ones)
> are left alone and we only pick up ones which are only minimally
> related the neighbouring code.
>
> Cc: Julia Lawall <Julia.Lawall@inria.fr>
> Cc: Nicolas Palix <nicolas.palix@imag.fr>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>  scripts/coccinelle/api/cond_resched.cocci | 53 +++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
>  create mode 100644 scripts/coccinelle/api/cond_resched.cocci
>
> diff --git a/scripts/coccinelle/api/cond_resched.cocci b/scripts/coccinelle/api/cond_resched.cocci
> new file mode 100644
> index 000000000000..bf43768a8f8c
> --- /dev/null
> +++ b/scripts/coccinelle/api/cond_resched.cocci
> @@ -0,0 +1,53 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/// Remove naked cond_resched() statements
> +///
> +//# Remove cond_resched() statements when:
> +//#   - executing at the same control flow level as the previous or the
> +//#     next statement (this lets us avoid complicated conditionals in
> +//#     the neighbourhood.)
> +//#   - they are of the form "if (need_resched()) cond_resched()" which
> +//#     is always safe.
> +//#
> +//# Coccinelle generally takes care of comments in the immediate neighbourhood
> +//# but might need to handle other comments alluding to rescheduling.
> +//#
> +virtual patch
> +virtual context
> +
> +@ r1 @
> +identifier r;
> +@@
> +
> +(
> + r = cond_resched();
> +|
> +-if (need_resched())
> +-	cond_resched();
> +)

This rule doesn't make sense.  The first branch of the disjunction will
never match a a place where the second branch matches.  Anyway, in the
second branch there is no assignment, so I don't see what the first branch
is protecting against.

The disjunction is just useless.  Whether it is there or or whether only
the second brancha is there, doesn't have any impact on the result.

> +
> +@ r2 @
> +expression E;
> +statement S,T;
> +@@
> +(
> + E;
> +|
> + if (E) S

This case is not needed.  It will be matched by the next case.

> +|
> + if (E) S else T
> +|
> +)
> +-cond_resched();
> +
> +@ r3 @
> +expression E;
> +statement S,T;
> +@@
> +-cond_resched();
> +(
> + E;
> +|
> + if (E) S

As above.

> +|
> + if (E) S else T
> +)

I have the impression that you are trying to retain some cond_rescheds.
Could you send an example of one that you are trying to keep?  Overall,
the above rules seem a bit ad hoc.  You may be keeping some cases you
don't want to, or removing some cases that you want to keep.

Of course, if you are confident that the job is done with this semantic
patch as it is, then that's fine too.

julia
  
Ankur Arora Nov. 8, 2023, 8:29 a.m. UTC | #2
Julia Lawall <julia.lawall@inria.fr> writes:

> On Tue, 7 Nov 2023, Ankur Arora wrote:
>
>> Rudimentary script to remove the straight-forward subset of
>> cond_resched() and allies:
>>
>> 1)  if (need_resched())
>> 	  cond_resched()
>>
>> 2)  expression*;
>>     cond_resched();  /* or in the reverse order */
>>
>> 3)  if (expression)
>> 	statement
>>     cond_resched();  /* or in the reverse order */
>>
>> The last two patterns depend on the control flow level to ensure
>> that the complex cond_resched() patterns (ex. conditioned ones)
>> are left alone and we only pick up ones which are only minimally
>> related the neighbouring code.
>>
>> Cc: Julia Lawall <Julia.Lawall@inria.fr>
>> Cc: Nicolas Palix <nicolas.palix@imag.fr>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>  scripts/coccinelle/api/cond_resched.cocci | 53 +++++++++++++++++++++++
>>  1 file changed, 53 insertions(+)
>>  create mode 100644 scripts/coccinelle/api/cond_resched.cocci
>>
>> diff --git a/scripts/coccinelle/api/cond_resched.cocci b/scripts/coccinelle/api/cond_resched.cocci
>> new file mode 100644
>> index 000000000000..bf43768a8f8c
>> --- /dev/null
>> +++ b/scripts/coccinelle/api/cond_resched.cocci
>> @@ -0,0 +1,53 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/// Remove naked cond_resched() statements
>> +///
>> +//# Remove cond_resched() statements when:
>> +//#   - executing at the same control flow level as the previous or the
>> +//#     next statement (this lets us avoid complicated conditionals in
>> +//#     the neighbourhood.)
>> +//#   - they are of the form "if (need_resched()) cond_resched()" which
>> +//#     is always safe.
>> +//#
>> +//# Coccinelle generally takes care of comments in the immediate neighbourhood
>> +//# but might need to handle other comments alluding to rescheduling.
>> +//#
>> +virtual patch
>> +virtual context
>> +
>> +@ r1 @
>> +identifier r;
>> +@@
>> +
>> +(
>> + r = cond_resched();
>> +|
>> +-if (need_resched())
>> +-	cond_resched();
>> +)
>
> This rule doesn't make sense.  The first branch of the disjunction will
> never match a a place where the second branch matches.  Anyway, in the
> second branch there is no assignment, so I don't see what the first branch
> is protecting against.
>
> The disjunction is just useless.  Whether it is there or or whether only
> the second brancha is there, doesn't have any impact on the result.
>
>> +
>> +@ r2 @
>> +expression E;
>> +statement S,T;
>> +@@
>> +(
>> + E;
>> +|
>> + if (E) S
>
> This case is not needed.  It will be matched by the next case.
>
>> +|
>> + if (E) S else T
>> +|
>> +)
>> +-cond_resched();
>> +
>> +@ r3 @
>> +expression E;
>> +statement S,T;
>> +@@
>> +-cond_resched();
>> +(
>> + E;
>> +|
>> + if (E) S
>
> As above.
>
>> +|
>> + if (E) S else T
>> +)
>
> I have the impression that you are trying to retain some cond_rescheds.
> Could you send an example of one that you are trying to keep?  Overall,
> the above rules seem a bit ad hoc.  You may be keeping some cases you
> don't want to, or removing some cases that you want to keep.

Right. I was trying to ensure that the script only handled the cases
that didn't have any "interesting" connections to the surrounding code.

Just to give you an example of the kind of constructs that I wanted
to avoid:

mm/memoy.c::zap_pmd_range():

                if (addr != next)
                        pmd--;
        } while (pmd++, cond_resched(), addr != end);

mm/backing-dev.c::cleanup_offline_cgwbs_workfn()

                while (cleanup_offline_cgwb(wb))
                        cond_resched();


                while (cleanup_offline_cgwb(wb))
                        cond_resched();

But from a quick check the simplest coccinelle script does a much
better job than my overly complex (and incorrect) one:

@r1@
@@
-       cond_resched();

It avoids the first one. And transforms the second to:

                while (cleanup_offline_cgwb(wb))
                        {}

which is exactly what I wanted.

> Of course, if you are confident that the job is done with this semantic
> patch as it is, then that's fine too.

Not at all. Thanks for pointing out the mistakes.



--
ankur
  
Julia Lawall Nov. 8, 2023, 9:49 a.m. UTC | #3
On Wed, 8 Nov 2023, Ankur Arora wrote:

>
> Julia Lawall <julia.lawall@inria.fr> writes:
>
> > On Tue, 7 Nov 2023, Ankur Arora wrote:
> >
> >> Rudimentary script to remove the straight-forward subset of
> >> cond_resched() and allies:
> >>
> >> 1)  if (need_resched())
> >> 	  cond_resched()
> >>
> >> 2)  expression*;
> >>     cond_resched();  /* or in the reverse order */
> >>
> >> 3)  if (expression)
> >> 	statement
> >>     cond_resched();  /* or in the reverse order */
> >>
> >> The last two patterns depend on the control flow level to ensure
> >> that the complex cond_resched() patterns (ex. conditioned ones)
> >> are left alone and we only pick up ones which are only minimally
> >> related the neighbouring code.
> >>
> >> Cc: Julia Lawall <Julia.Lawall@inria.fr>
> >> Cc: Nicolas Palix <nicolas.palix@imag.fr>
> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> >> ---
> >>  scripts/coccinelle/api/cond_resched.cocci | 53 +++++++++++++++++++++++
> >>  1 file changed, 53 insertions(+)
> >>  create mode 100644 scripts/coccinelle/api/cond_resched.cocci
> >>
> >> diff --git a/scripts/coccinelle/api/cond_resched.cocci b/scripts/coccinelle/api/cond_resched.cocci
> >> new file mode 100644
> >> index 000000000000..bf43768a8f8c
> >> --- /dev/null
> >> +++ b/scripts/coccinelle/api/cond_resched.cocci
> >> @@ -0,0 +1,53 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +/// Remove naked cond_resched() statements
> >> +///
> >> +//# Remove cond_resched() statements when:
> >> +//#   - executing at the same control flow level as the previous or the
> >> +//#     next statement (this lets us avoid complicated conditionals in
> >> +//#     the neighbourhood.)
> >> +//#   - they are of the form "if (need_resched()) cond_resched()" which
> >> +//#     is always safe.
> >> +//#
> >> +//# Coccinelle generally takes care of comments in the immediate neighbourhood
> >> +//# but might need to handle other comments alluding to rescheduling.
> >> +//#
> >> +virtual patch
> >> +virtual context
> >> +
> >> +@ r1 @
> >> +identifier r;
> >> +@@
> >> +
> >> +(
> >> + r = cond_resched();
> >> +|
> >> +-if (need_resched())
> >> +-	cond_resched();
> >> +)
> >
> > This rule doesn't make sense.  The first branch of the disjunction will
> > never match a a place where the second branch matches.  Anyway, in the
> > second branch there is no assignment, so I don't see what the first branch
> > is protecting against.
> >
> > The disjunction is just useless.  Whether it is there or or whether only
> > the second brancha is there, doesn't have any impact on the result.
> >
> >> +
> >> +@ r2 @
> >> +expression E;
> >> +statement S,T;
> >> +@@
> >> +(
> >> + E;
> >> +|
> >> + if (E) S
> >
> > This case is not needed.  It will be matched by the next case.
> >
> >> +|
> >> + if (E) S else T
> >> +|
> >> +)
> >> +-cond_resched();
> >> +
> >> +@ r3 @
> >> +expression E;
> >> +statement S,T;
> >> +@@
> >> +-cond_resched();
> >> +(
> >> + E;
> >> +|
> >> + if (E) S
> >
> > As above.
> >
> >> +|
> >> + if (E) S else T
> >> +)
> >
> > I have the impression that you are trying to retain some cond_rescheds.
> > Could you send an example of one that you are trying to keep?  Overall,
> > the above rules seem a bit ad hoc.  You may be keeping some cases you
> > don't want to, or removing some cases that you want to keep.
>
> Right. I was trying to ensure that the script only handled the cases
> that didn't have any "interesting" connections to the surrounding code.
>
> Just to give you an example of the kind of constructs that I wanted
> to avoid:
>
> mm/memoy.c::zap_pmd_range():
>
>                 if (addr != next)
>                         pmd--;
>         } while (pmd++, cond_resched(), addr != end);
>
> mm/backing-dev.c::cleanup_offline_cgwbs_workfn()
>
>                 while (cleanup_offline_cgwb(wb))
>                         cond_resched();
>
>
>                 while (cleanup_offline_cgwb(wb))
>                         cond_resched();
>
> But from a quick check the simplest coccinelle script does a much
> better job than my overly complex (and incorrect) one:
>
> @r1@
> @@
> -       cond_resched();
>
> It avoids the first one. And transforms the second to:
>
>                 while (cleanup_offline_cgwb(wb))
>                         {}
>
> which is exactly what I wanted.

Perfect!

It could be good to run both scripts and compare the results.

julia

>
> > Of course, if you are confident that the job is done with this semantic
> > patch as it is, then that's fine too.
>
> Not at all. Thanks for pointing out the mistakes.
>
>
>
> --
> ankur
>
  
Paul E. McKenney Nov. 21, 2023, 12:45 a.m. UTC | #4
On Tue, Nov 07, 2023 at 03:07:53PM -0800, Ankur Arora wrote:
> Rudimentary script to remove the straight-forward subset of
> cond_resched() and allies:
> 
> 1)  if (need_resched())
> 	  cond_resched()
> 
> 2)  expression*;
>     cond_resched();  /* or in the reverse order */
> 
> 3)  if (expression)
> 	statement
>     cond_resched();  /* or in the reverse order */
> 
> The last two patterns depend on the control flow level to ensure
> that the complex cond_resched() patterns (ex. conditioned ones)
> are left alone and we only pick up ones which are only minimally
> related the neighbouring code.

This series looks to get rid of stall warnings for long in-kernel
preempt-enabled code paths, which is of course a very good thing.
But removing all of the cond_resched() calls can actually increase
scheduling latency compared to the current CONFIG_PREEMPT_NONE=y state,
correct?

If so, it would be good to take a measured approach.  For example, it
is clear that a loop that does a cond_resched() every (say) ten jiffies
can remove that cond_resched() without penalty, at least in kernels built
with either CONFIG_NO_HZ_FULL=n or CONFIG_PREEMPT=y.  But this is not so
clear for a loop that does a cond_resched() every (say) ten microseconds.

Or am I missing something here?

							Thanx, Paul

> Cc: Julia Lawall <Julia.Lawall@inria.fr>
> Cc: Nicolas Palix <nicolas.palix@imag.fr>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>  scripts/coccinelle/api/cond_resched.cocci | 53 +++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
>  create mode 100644 scripts/coccinelle/api/cond_resched.cocci
> 
> diff --git a/scripts/coccinelle/api/cond_resched.cocci b/scripts/coccinelle/api/cond_resched.cocci
> new file mode 100644
> index 000000000000..bf43768a8f8c
> --- /dev/null
> +++ b/scripts/coccinelle/api/cond_resched.cocci
> @@ -0,0 +1,53 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/// Remove naked cond_resched() statements
> +///
> +//# Remove cond_resched() statements when:
> +//#   - executing at the same control flow level as the previous or the
> +//#     next statement (this lets us avoid complicated conditionals in
> +//#     the neighbourhood.)
> +//#   - they are of the form "if (need_resched()) cond_resched()" which
> +//#     is always safe.
> +//#
> +//# Coccinelle generally takes care of comments in the immediate neighbourhood
> +//# but might need to handle other comments alluding to rescheduling.
> +//#
> +virtual patch
> +virtual context
> +
> +@ r1 @
> +identifier r;
> +@@
> +
> +(
> + r = cond_resched();
> +|
> +-if (need_resched())
> +-	cond_resched();
> +)
> +
> +@ r2 @
> +expression E;
> +statement S,T;
> +@@
> +(
> + E;
> +|
> + if (E) S
> +|
> + if (E) S else T
> +|
> +)
> +-cond_resched();
> +
> +@ r3 @
> +expression E;
> +statement S,T;
> +@@
> +-cond_resched();
> +(
> + E;
> +|
> + if (E) S
> +|
> + if (E) S else T
> +)
> -- 
> 2.31.1
>
  
Ankur Arora Nov. 21, 2023, 5:16 a.m. UTC | #5
Paul E. McKenney <paulmck@kernel.org> writes:

> On Tue, Nov 07, 2023 at 03:07:53PM -0800, Ankur Arora wrote:
>> Rudimentary script to remove the straight-forward subset of
>> cond_resched() and allies:
>>
>> 1)  if (need_resched())
>> 	  cond_resched()
>>
>> 2)  expression*;
>>     cond_resched();  /* or in the reverse order */
>>
>> 3)  if (expression)
>> 	statement
>>     cond_resched();  /* or in the reverse order */
>>
>> The last two patterns depend on the control flow level to ensure
>> that the complex cond_resched() patterns (ex. conditioned ones)
>> are left alone and we only pick up ones which are only minimally
>> related the neighbouring code.
>
> This series looks to get rid of stall warnings for long in-kernel
> preempt-enabled code paths, which is of course a very good thing.
> But removing all of the cond_resched() calls can actually increase
> scheduling latency compared to the current CONFIG_PREEMPT_NONE=y state,
> correct?

Not necessarily.

If TIF_NEED_RESCHED_LAZY is set, then we let the current task finish
before preempting. If that task runs for arbitrarily long (what Thomas
calls the hog problem) -- currently we allow them to run for upto one
extra tick (which might shorten/become a tunable.)

If TIF_NEED_RESCHED is set, then it gets folded the same it does now
and preemption happens at the next safe preemption point.

So, I guess the scheduling latency would always be bounded but how much
latency a task would incur would be scheduler policy dependent.

This is early days, so the policy (or really the rest of it) isn't set
in stone but having two levels of preemption -- immediate and
deferred -- does seem to give the scheduler greater freedom of policy.

Btw, are you concerned about the scheduling latencies in general or the
scheduling latency of a particular set of tasks?

> If so, it would be good to take a measured approach.  For example, it
> is clear that a loop that does a cond_resched() every (say) ten jiffies
> can remove that cond_resched() without penalty, at least in kernels built
> with either CONFIG_NO_HZ_FULL=n or CONFIG_PREEMPT=y.  But this is not so
> clear for a loop that does a cond_resched() every (say) ten microseconds.

True. Though both of those loops sound bad :).

Yeah, and as we were discussing offlist, the question is the comparative
density of preempt_dec_and_test() is true vs calls to cond_resched().

And if they are similar then we could replace cond_resched() quiescence
reporting with ones in preempt_enable() (as you mention elsewhere in the
thread.)


Thanks

--
ankur
  
Paul E. McKenney Nov. 21, 2023, 3:26 p.m. UTC | #6
On Mon, Nov 20, 2023 at 09:16:19PM -0800, Ankur Arora wrote:
> 
> Paul E. McKenney <paulmck@kernel.org> writes:
> 
> > On Tue, Nov 07, 2023 at 03:07:53PM -0800, Ankur Arora wrote:
> >> Rudimentary script to remove the straight-forward subset of
> >> cond_resched() and allies:
> >>
> >> 1)  if (need_resched())
> >> 	  cond_resched()
> >>
> >> 2)  expression*;
> >>     cond_resched();  /* or in the reverse order */
> >>
> >> 3)  if (expression)
> >> 	statement
> >>     cond_resched();  /* or in the reverse order */
> >>
> >> The last two patterns depend on the control flow level to ensure
> >> that the complex cond_resched() patterns (ex. conditioned ones)
> >> are left alone and we only pick up ones which are only minimally
> >> related the neighbouring code.
> >
> > This series looks to get rid of stall warnings for long in-kernel
> > preempt-enabled code paths, which is of course a very good thing.
> > But removing all of the cond_resched() calls can actually increase
> > scheduling latency compared to the current CONFIG_PREEMPT_NONE=y state,
> > correct?
> 
> Not necessarily.
> 
> If TIF_NEED_RESCHED_LAZY is set, then we let the current task finish
> before preempting. If that task runs for arbitrarily long (what Thomas
> calls the hog problem) -- currently we allow them to run for upto one
> extra tick (which might shorten/become a tunable.)

Agreed, and that is the easy case.  But getting rid of the cond_resched()
calls really can increase scheduling latency of this patchset compared
to status-quo mainline.

> If TIF_NEED_RESCHED is set, then it gets folded the same it does now
> and preemption happens at the next safe preemption point.
> 
> So, I guess the scheduling latency would always be bounded but how much
> latency a task would incur would be scheduler policy dependent.
> 
> This is early days, so the policy (or really the rest of it) isn't set
> in stone but having two levels of preemption -- immediate and
> deferred -- does seem to give the scheduler greater freedom of policy.

"Give the scheduler freedom!" is a wonderful slogan, but not necessarily
a useful one-size-fits-all design principle.  The scheduler does not
and cannot know everything, after all.

> Btw, are you concerned about the scheduling latencies in general or the
> scheduling latency of a particular set of tasks?

There are a lot of workloads out there with a lot of objective functions
and constraints, but it is safe to say that both will be important, as
will other things, depending on the workload.

But you knew that already, right?  ;-)

> > If so, it would be good to take a measured approach.  For example, it
> > is clear that a loop that does a cond_resched() every (say) ten jiffies
> > can remove that cond_resched() without penalty, at least in kernels built
> > with either CONFIG_NO_HZ_FULL=n or CONFIG_PREEMPT=y.  But this is not so
> > clear for a loop that does a cond_resched() every (say) ten microseconds.
> 
> True. Though both of those loops sound bad :).

Yes, but do they sound bad enough to be useful in the real world?  ;-)

> Yeah, and as we were discussing offlist, the question is the comparative
> density of preempt_dec_and_test() is true vs calls to cond_resched().
> 
> And if they are similar then we could replace cond_resched() quiescence
> reporting with ones in preempt_enable() (as you mention elsewhere in the
> thread.)

Here is hoping that something like that can help.

I am quite happy with the thought of reducing the number of cond_resched()
invocations, but not at the expense of the Linux kernel failing to do
its job.

							Thanx, Paul
  

Patch

diff --git a/scripts/coccinelle/api/cond_resched.cocci b/scripts/coccinelle/api/cond_resched.cocci
new file mode 100644
index 000000000000..bf43768a8f8c
--- /dev/null
+++ b/scripts/coccinelle/api/cond_resched.cocci
@@ -0,0 +1,53 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/// Remove naked cond_resched() statements
+///
+//# Remove cond_resched() statements when:
+//#   - executing at the same control flow level as the previous or the
+//#     next statement (this lets us avoid complicated conditionals in
+//#     the neighbourhood.)
+//#   - they are of the form "if (need_resched()) cond_resched()" which
+//#     is always safe.
+//#
+//# Coccinelle generally takes care of comments in the immediate neighbourhood
+//# but might need to handle other comments alluding to rescheduling.
+//#
+virtual patch
+virtual context
+
+@ r1 @
+identifier r;
+@@
+
+(
+ r = cond_resched();
+|
+-if (need_resched())
+-	cond_resched();
+)
+
+@ r2 @
+expression E;
+statement S,T;
+@@
+(
+ E;
+|
+ if (E) S
+|
+ if (E) S else T
+|
+)
+-cond_resched();
+
+@ r3 @
+expression E;
+statement S,T;
+@@
+-cond_resched();
+(
+ E;
+|
+ if (E) S
+|
+ if (E) S else T
+)