[v2,4/9] cgroup: rstat: add WARN_ON_ONCE() if flushing outside task context

Message ID 20230328221644.803272-5-yosryahmed@google.com
State New
Headers
Series memcg: make rstat flushing irq and sleep |

Commit Message

Yosry Ahmed March 28, 2023, 10:16 p.m. UTC
  rstat flushing is too expensive to perform in irq context.
The previous patch removed the only context that may invoke an rstat
flush from irq context, add a WARN_ON_ONCE() to detect future
violations, or those that we are not aware of.

Ideally, we wouldn't flush with irqs disabled either, but we have one
context today that does so in mem_cgroup_usage(). Forbid callers from
irq context for now, and hopefully we can also forbid callers with irqs
disabled in the future when we can get rid of this callsite.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
---
 kernel/cgroup/rstat.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Michal Hocko March 29, 2023, 11:22 a.m. UTC | #1
On Tue 28-03-23 22:16:39, Yosry Ahmed wrote:
> rstat flushing is too expensive to perform in irq context.
> The previous patch removed the only context that may invoke an rstat
> flush from irq context, add a WARN_ON_ONCE() to detect future
> violations, or those that we are not aware of.
> 
> Ideally, we wouldn't flush with irqs disabled either, but we have one
> context today that does so in mem_cgroup_usage(). Forbid callers from
> irq context for now, and hopefully we can also forbid callers with irqs
> disabled in the future when we can get rid of this callsite.

I am sorry to be late to the discussion. I wanted to follow up on
Johannes reply in the previous version but you are too fast ;)

I do agree that this looks rather arbitrary. You do not explain how the
warning actually helps. Is the intention to be really verbose to the
kernel log when somebody uses this interface from the IRQ context and
get bug reports? What about configurations with panic on warn? Do we
really want to crash their systems for something like that?

> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> ---
>  kernel/cgroup/rstat.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index d3252b0416b6..c2571939139f 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -176,6 +176,8 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
>  {
>  	int cpu;
>  
> +	/* rstat flushing is too expensive for irq context */
> +	WARN_ON_ONCE(!in_task());
>  	lockdep_assert_held(&cgroup_rstat_lock);
>  
>  	for_each_possible_cpu(cpu) {
> -- 
> 2.40.0.348.gf938b09366-goog
  
Yosry Ahmed March 29, 2023, 6:41 p.m. UTC | #2
On Wed, Mar 29, 2023 at 4:22 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 28-03-23 22:16:39, Yosry Ahmed wrote:
> > rstat flushing is too expensive to perform in irq context.
> > The previous patch removed the only context that may invoke an rstat
> > flush from irq context, add a WARN_ON_ONCE() to detect future
> > violations, or those that we are not aware of.
> >
> > Ideally, we wouldn't flush with irqs disabled either, but we have one
> > context today that does so in mem_cgroup_usage(). Forbid callers from
> > irq context for now, and hopefully we can also forbid callers with irqs
> > disabled in the future when we can get rid of this callsite.
>
> I am sorry to be late to the discussion. I wanted to follow up on
> Johannes reply in the previous version but you are too fast ;)
>
> I do agree that this looks rather arbitrary. You do not explain how the
> warning actually helps. Is the intention to be really verbose to the
> kernel log when somebody uses this interface from the IRQ context and
> get bug reports? What about configurations with panic on warn? Do we
> really want to crash their systems for something like that?

Thanks for taking a look, Michal!

The ultimate goal is not to flush in irq context or with irqs
disabled, as in some cases it causes irqs to be disabled for a long
time, as flushing is an expensive operation. The previous patch in the
series should have removed the only context that flushes in irq
context, and the purpose of the WARN_ON_ONCE() is to catch future uses
or uses that we might have missed.

There is still one code path that flushes with irqs disabled (also
mem_cgroup_usage()), and we cannot remove this just yet; we need to
deprecate usage threshold events for root to do that. So we cannot
enforce not flushing with irqs disabled yet.

So basically the patch is trying to enforce what we have now, not
flushing in irq context, and hopefully at some point we will also be
able to enforce not flushing with irqs disabled.

If WARN_ON_ONCE() is the wrong tool for this, please let me know.

>
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > ---
> >  kernel/cgroup/rstat.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> > index d3252b0416b6..c2571939139f 100644
> > --- a/kernel/cgroup/rstat.c
> > +++ b/kernel/cgroup/rstat.c
> > @@ -176,6 +176,8 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
> >  {
> >       int cpu;
> >
> > +     /* rstat flushing is too expensive for irq context */
> > +     WARN_ON_ONCE(!in_task());
> >       lockdep_assert_held(&cgroup_rstat_lock);
> >
> >       for_each_possible_cpu(cpu) {
> > --
> > 2.40.0.348.gf938b09366-goog
>
> --
> Michal Hocko
> SUSE Labs
  
Shakeel Butt March 29, 2023, 7:20 p.m. UTC | #3
On Wed, Mar 29, 2023 at 11:41:39AM -0700, Yosry Ahmed wrote:
> On Wed, Mar 29, 2023 at 4:22 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Tue 28-03-23 22:16:39, Yosry Ahmed wrote:
> > > rstat flushing is too expensive to perform in irq context.
> > > The previous patch removed the only context that may invoke an rstat
> > > flush from irq context, add a WARN_ON_ONCE() to detect future
> > > violations, or those that we are not aware of.
> > >
> > > Ideally, we wouldn't flush with irqs disabled either, but we have one
> > > context today that does so in mem_cgroup_usage(). Forbid callers from
> > > irq context for now, and hopefully we can also forbid callers with irqs
> > > disabled in the future when we can get rid of this callsite.
> >
> > I am sorry to be late to the discussion. I wanted to follow up on
> > Johannes reply in the previous version but you are too fast ;)
> >
> > I do agree that this looks rather arbitrary. You do not explain how the
> > warning actually helps. Is the intention to be really verbose to the
> > kernel log when somebody uses this interface from the IRQ context and
> > get bug reports? What about configurations with panic on warn? Do we
> > really want to crash their systems for something like that?
> 
> Thanks for taking a look, Michal!
> 
> The ultimate goal is not to flush in irq context or with irqs
> disabled, as in some cases it causes irqs to be disabled for a long
> time, as flushing is an expensive operation. The previous patch in the
> series should have removed the only context that flushes in irq
> context, and the purpose of the WARN_ON_ONCE() is to catch future uses
> or uses that we might have missed.
> 
> There is still one code path that flushes with irqs disabled (also
> mem_cgroup_usage()), and we cannot remove this just yet; we need to
> deprecate usage threshold events for root to do that. So we cannot
> enforce not flushing with irqs disabled yet.
> 
> So basically the patch is trying to enforce what we have now, not
> flushing in irq context, and hopefully at some point we will also be
> able to enforce not flushing with irqs disabled.
> 
> If WARN_ON_ONCE() is the wrong tool for this, please let me know.
> 

If I understand Michal's concern, the question is should be start with
pr_warn_once() instead of WARN_ON_ONCE() and I think yes we should start
with pr_warn_once().
  
Michal Hocko March 30, 2023, 7:06 a.m. UTC | #4
On Wed 29-03-23 19:20:59, Shakeel Butt wrote:
> On Wed, Mar 29, 2023 at 11:41:39AM -0700, Yosry Ahmed wrote:
> > On Wed, Mar 29, 2023 at 4:22 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Tue 28-03-23 22:16:39, Yosry Ahmed wrote:
> > > > rstat flushing is too expensive to perform in irq context.
> > > > The previous patch removed the only context that may invoke an rstat
> > > > flush from irq context, add a WARN_ON_ONCE() to detect future
> > > > violations, or those that we are not aware of.
> > > >
> > > > Ideally, we wouldn't flush with irqs disabled either, but we have one
> > > > context today that does so in mem_cgroup_usage(). Forbid callers from
> > > > irq context for now, and hopefully we can also forbid callers with irqs
> > > > disabled in the future when we can get rid of this callsite.
> > >
> > > I am sorry to be late to the discussion. I wanted to follow up on
> > > Johannes reply in the previous version but you are too fast ;)
> > >
> > > I do agree that this looks rather arbitrary. You do not explain how the
> > > warning actually helps. Is the intention to be really verbose to the
> > > kernel log when somebody uses this interface from the IRQ context and
> > > get bug reports? What about configurations with panic on warn? Do we
> > > really want to crash their systems for something like that?
> > 
> > Thanks for taking a look, Michal!
> > 
> > The ultimate goal is not to flush in irq context or with irqs
> > disabled, as in some cases it causes irqs to be disabled for a long
> > time, as flushing is an expensive operation. The previous patch in the
> > series should have removed the only context that flushes in irq
> > context, and the purpose of the WARN_ON_ONCE() is to catch future uses
> > or uses that we might have missed.
> > 
> > There is still one code path that flushes with irqs disabled (also
> > mem_cgroup_usage()), and we cannot remove this just yet; we need to
> > deprecate usage threshold events for root to do that. So we cannot
> > enforce not flushing with irqs disabled yet.
> > 
> > So basically the patch is trying to enforce what we have now, not
> > flushing in irq context, and hopefully at some point we will also be
> > able to enforce not flushing with irqs disabled.
> > 
> > If WARN_ON_ONCE() is the wrong tool for this, please let me know.
> > 
> 
> If I understand Michal's concern, the question is should be start with
> pr_warn_once() instead of WARN_ON_ONCE() and I think yes we should start
> with pr_warn_once().

Yes, I do not really like the WARN_ON here. It is an overkill. pr_warn
would much less intrusive but potentially incomplete because you won't
know who that offender is. So if you really care about those then you
would need to call dump_stack as well.

So the real question is. Do we really care so deeply? After all somebody
might be calling this from within a spin lock or irq disabled section
resulting in a similar situation without noticing.
  
Yosry Ahmed March 30, 2023, 7:13 a.m. UTC | #5
On Thu, Mar 30, 2023 at 12:06 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 29-03-23 19:20:59, Shakeel Butt wrote:
> > On Wed, Mar 29, 2023 at 11:41:39AM -0700, Yosry Ahmed wrote:
> > > On Wed, Mar 29, 2023 at 4:22 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Tue 28-03-23 22:16:39, Yosry Ahmed wrote:
> > > > > rstat flushing is too expensive to perform in irq context.
> > > > > The previous patch removed the only context that may invoke an rstat
> > > > > flush from irq context, add a WARN_ON_ONCE() to detect future
> > > > > violations, or those that we are not aware of.
> > > > >
> > > > > Ideally, we wouldn't flush with irqs disabled either, but we have one
> > > > > context today that does so in mem_cgroup_usage(). Forbid callers from
> > > > > irq context for now, and hopefully we can also forbid callers with irqs
> > > > > disabled in the future when we can get rid of this callsite.
> > > >
> > > > I am sorry to be late to the discussion. I wanted to follow up on
> > > > Johannes reply in the previous version but you are too fast ;)
> > > >
> > > > I do agree that this looks rather arbitrary. You do not explain how the
> > > > warning actually helps. Is the intention to be really verbose to the
> > > > kernel log when somebody uses this interface from the IRQ context and
> > > > get bug reports? What about configurations with panic on warn? Do we
> > > > really want to crash their systems for something like that?
> > >
> > > Thanks for taking a look, Michal!
> > >
> > > The ultimate goal is not to flush in irq context or with irqs
> > > disabled, as in some cases it causes irqs to be disabled for a long
> > > time, as flushing is an expensive operation. The previous patch in the
> > > series should have removed the only context that flushes in irq
> > > context, and the purpose of the WARN_ON_ONCE() is to catch future uses
> > > or uses that we might have missed.
> > >
> > > There is still one code path that flushes with irqs disabled (also
> > > mem_cgroup_usage()), and we cannot remove this just yet; we need to
> > > deprecate usage threshold events for root to do that. So we cannot
> > > enforce not flushing with irqs disabled yet.
> > >
> > > So basically the patch is trying to enforce what we have now, not
> > > flushing in irq context, and hopefully at some point we will also be
> > > able to enforce not flushing with irqs disabled.
> > >
> > > If WARN_ON_ONCE() is the wrong tool for this, please let me know.
> > >
> >
> > If I understand Michal's concern, the question is should be start with
> > pr_warn_once() instead of WARN_ON_ONCE() and I think yes we should start
> > with pr_warn_once().
>
> Yes, I do not really like the WARN_ON here. It is an overkill. pr_warn
> would much less intrusive but potentially incomplete because you won't
> know who that offender is. So if you really care about those then you
> would need to call dump_stack as well.
>
> So the real question is. Do we really care so deeply? After all somebody
> might be calling this from within a spin lock or irq disabled section
> resulting in a similar situation without noticing.

There are discussions in [1] about making atomic rstat flush not
disable irqs throughout the process, so in that case it would only
result in a similar situation if the caller has irq disabled. The only
caller that might have irq disabled is the same caller that might be
in irq context before this series: mem_cgroup_usage().

On that note, and while I have your attention, I was wondering if we
can eliminate the flush call completely from mem_cgroup_usage(), and
read the global stats counters for root memcg instead of the root
counters. There might be subtle differences, but the root memcg usage
isn't super accurate now anyway (e.g. it doesn't include kernel
memory).

With that removed, no callers to rstat flushing would be from irq
context or have irqs disabled. There will only be one atomic flusher
(mem_cgroup_wb_stats()), and we can proceed with [1] if it causes a
problem.

What do you think?

[1] https://lore.kernel.org/lkml/CAJD7tkZrp=4zWvjE9_010TAG1T_crCbf9P64UzJABspgcrGPKg@mail.gmail.com/

> --
> Michal Hocko
> SUSE Labs
  
Michal Hocko March 30, 2023, 7:49 a.m. UTC | #6
On Thu 30-03-23 00:13:16, Yosry Ahmed wrote:
> On Thu, Mar 30, 2023 at 12:06 AM Michal Hocko <mhocko@suse.com> wrote:
[...]
> > So the real question is. Do we really care so deeply? After all somebody
> > might be calling this from within a spin lock or irq disabled section
> > resulting in a similar situation without noticing.
> 
> There are discussions in [1] about making atomic rstat flush not
> disable irqs throughout the process, so in that case it would only
> result in a similar situation if the caller has irq disabled. The only
> caller that might have irq disabled is the same caller that might be
> in irq context before this series: mem_cgroup_usage().
> 
> On that note, and while I have your attention, I was wondering if we
> can eliminate the flush call completely from mem_cgroup_usage(), and
> read the global stats counters for root memcg instead of the root
> counters. There might be subtle differences, but the root memcg usage
> isn't super accurate now anyway (e.g. it doesn't include kernel
> memory).

root memcg stats are imprecise indeed and I have to admit I do not
really know why we are adding more work for that case. I have tried to
dig into git history for that yesterday but couldn't find a satisfying
answer. It goes all the way down to 2d146aa3aa842 which has done the
switch to rstat. Maybe Johannes remembers.

Anyway, back to this particular patch. I still do not see strong reasons
to be verbose about !in_task case so I would just drop this patch.
  
Yosry Ahmed March 30, 2023, 8:06 a.m. UTC | #7
On Thu, Mar 30, 2023 at 12:49 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 30-03-23 00:13:16, Yosry Ahmed wrote:
> > On Thu, Mar 30, 2023 at 12:06 AM Michal Hocko <mhocko@suse.com> wrote:
> [...]
> > > So the real question is. Do we really care so deeply? After all somebody
> > > might be calling this from within a spin lock or irq disabled section
> > > resulting in a similar situation without noticing.
> >
> > There are discussions in [1] about making atomic rstat flush not
> > disable irqs throughout the process, so in that case it would only
> > result in a similar situation if the caller has irq disabled. The only
> > caller that might have irq disabled is the same caller that might be
> > in irq context before this series: mem_cgroup_usage().
> >
> > On that note, and while I have your attention, I was wondering if we
> > can eliminate the flush call completely from mem_cgroup_usage(), and
> > read the global stats counters for root memcg instead of the root
> > counters. There might be subtle differences, but the root memcg usage
> > isn't super accurate now anyway (e.g. it doesn't include kernel
> > memory).
>
> root memcg stats are imprecise indeed and I have to admit I do not
> really know why we are adding more work for that case. I have tried to
> dig into git history for that yesterday but couldn't find a satisfying
> answer. It goes all the way down to 2d146aa3aa842 which has done the
> switch to rstat. Maybe Johannes remembers.

I think it goes back even before that. Even before rstat, we used to
get the root usage by iterating over the hierarchy. Maybe we didn't
have the global counters at some point or they weren't in line with
the root memcg (e.g. use_hierarchy = 0). It would be great if we can
just use the global counters here. Hopefully Johannes can confirm or
deny this.

>
> Anyway, back to this particular patch. I still do not see strong reasons
> to be verbose about !in_task case so I would just drop this patch.

I will drop this patch in the next iteration. If I implement a patch
that makes rstat flushing not disable irqs all throughout (like [1]),
whether part of this series or not, and we remove flushing from
mem_cgroup_usage(), then at this point:
- No existing flushers will be disabling irqs.
- rstat flushing itself will not be disabling irqs throughout the entire flush.

If we achieve that, do you think it makes sense to add
WARN_ON_ONCE(irqs_disabled()) instead to prevent future users from
flushing while disabling irqs or in irq context?

All I am trying to achieve here is make sure we don't regress. I don't
want to minimize the current atomic flushers now only to have them
increase later.

[1] https://lore.kernel.org/lkml/CAJD7tkZrp=4zWvjE9_010TAG1T_crCbf9P64UzJABspgcrGPKg@mail.gmail.com/

> --
> Michal Hocko
> SUSE Labs
  
Michal Hocko March 30, 2023, 8:14 a.m. UTC | #8
On Thu 30-03-23 01:06:26, Yosry Ahmed wrote:
[...]
> If we achieve that, do you think it makes sense to add
> WARN_ON_ONCE(irqs_disabled()) instead to prevent future users from
> flushing while disabling irqs or in irq context?

WARN_ON (similar to BUG_ON) will not prevent anybody from doing bad
things. We already have means to shout about sleepable code being
invoked from an atomic context and there is no reason to duplicate that.
As I've said earlier WARN_ON might panic the system in some
configurations (and yes they are used also in production systems - do
not ask me why...). So please be careful about that and use that only
when something really bad (yet recoverable) is going on.
  
Yosry Ahmed March 30, 2023, 8:19 a.m. UTC | #9
On Thu, Mar 30, 2023 at 1:15 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 30-03-23 01:06:26, Yosry Ahmed wrote:
> [...]
> > If we achieve that, do you think it makes sense to add
> > WARN_ON_ONCE(irqs_disabled()) instead to prevent future users from
> > flushing while disabling irqs or in irq context?
>
> WARN_ON (similar to BUG_ON) will not prevent anybody from doing bad
> things. We already have means to shout about sleepable code being
> invoked from an atomic context and there is no reason to duplicate that.
> As I've said earlier WARN_ON might panic the system in some
> configurations (and yes they are used also in production systems - do
> not ask me why...). So please be careful about that and use that only
> when something really bad (yet recoverable) is going on.

Thanks for the information (I was about to ask why about production
systems, but okay..). I will avoid WARN_ON completely. For the
purposes of this series I will drop this patch anyway.

Any idea how to shout about "hey this may take too long, why are you
doing it with irqs disabled?!"?

>
> --
> Michal Hocko
> SUSE Labs
  
Michal Hocko March 30, 2023, 8:39 a.m. UTC | #10
On Thu 30-03-23 01:19:29, Yosry Ahmed wrote:
> On Thu, Mar 30, 2023 at 1:15 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 30-03-23 01:06:26, Yosry Ahmed wrote:
> > [...]
> > > If we achieve that, do you think it makes sense to add
> > > WARN_ON_ONCE(irqs_disabled()) instead to prevent future users from
> > > flushing while disabling irqs or in irq context?
> >
> > WARN_ON (similar to BUG_ON) will not prevent anybody from doing bad
> > things. We already have means to shout about sleepable code being
> > invoked from an atomic context and there is no reason to duplicate that.
> > As I've said earlier WARN_ON might panic the system in some
> > configurations (and yes they are used also in production systems - do
> > not ask me why...). So please be careful about that and use that only
> > when something really bad (yet recoverable) is going on.
> 
> Thanks for the information (I was about to ask why about production
> systems, but okay..). I will avoid WARN_ON completely. For the
> purposes of this series I will drop this patch anyway.

Thanks! People do strange things sometimes...

> Any idea how to shout about "hey this may take too long, why are you
> doing it with irqs disabled?!"?

Well we have a hard lockup detector. It hits at a much higher stall by
default but if you care about IRQ latencies in general then you likely
want to lower. Another thing would be IRQ tracing. In any case this code
path shouldn't be any special. Sure it can take long on large systems
but I bet there are more of those.
  
Yosry Ahmed March 30, 2023, 8:53 a.m. UTC | #11
On Thu, Mar 30, 2023 at 1:39 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 30-03-23 01:19:29, Yosry Ahmed wrote:
> > On Thu, Mar 30, 2023 at 1:15 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Thu 30-03-23 01:06:26, Yosry Ahmed wrote:
> > > [...]
> > > > If we achieve that, do you think it makes sense to add
> > > > WARN_ON_ONCE(irqs_disabled()) instead to prevent future users from
> > > > flushing while disabling irqs or in irq context?
> > >
> > > WARN_ON (similar to BUG_ON) will not prevent anybody from doing bad
> > > things. We already have means to shout about sleepable code being
> > > invoked from an atomic context and there is no reason to duplicate that.
> > > As I've said earlier WARN_ON might panic the system in some
> > > configurations (and yes they are used also in production systems - do
> > > not ask me why...). So please be careful about that and use that only
> > > when something really bad (yet recoverable) is going on.
> >
> > Thanks for the information (I was about to ask why about production
> > systems, but okay..). I will avoid WARN_ON completely. For the
> > purposes of this series I will drop this patch anyway.
>
> Thanks! People do strange things sometimes...
>
> > Any idea how to shout about "hey this may take too long, why are you
> > doing it with irqs disabled?!"?
>
> Well we have a hard lockup detector. It hits at a much higher stall by
> default but if you care about IRQ latencies in general then you likely
> want to lower. Another thing would be IRQ tracing. In any case this code
> path shouldn't be any special. Sure it can take long on large systems
> but I bet there are more of those.

We did see hard lockups in extreme cases, and we do have setups with
"nmi_watchdog=panic" that will panic when this happens, so we would
rather catch the code paths that can lead to that before it actually
happens.

Maybe we can add a primitive like might_sleep() for this, just food for thought.

> --
> Michal Hocko
> SUSE Labs
  
Michal Hocko March 31, 2023, 11:02 a.m. UTC | #12
On Thu 30-03-23 01:53:38, Yosry Ahmed wrote:
[...]
> Maybe we can add a primitive like might_sleep() for this, just food for thought.

I do not think it is the correct to abuse might_sleep if the function
itself doesn't sleep. If it does might_sleep is already involved.
  
Yosry Ahmed March 31, 2023, 7:03 p.m. UTC | #13
On Fri, Mar 31, 2023 at 4:02 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 30-03-23 01:53:38, Yosry Ahmed wrote:
> [...]
> > Maybe we can add a primitive like might_sleep() for this, just food for thought.
>
> I do not think it is the correct to abuse might_sleep if the function
> itself doesn't sleep. If it does might_sleep is already involved.

Oh, sorry if I wasn't clear, I did not mean to reuse might_sleep() --
I meant introducing a new similar debug primitive that shouts if irqs
are disabled.

> --
> Michal Hocko
> SUSE Labs
  
Michal Hocko April 3, 2023, 8:38 a.m. UTC | #14
On Fri 31-03-23 12:03:47, Yosry Ahmed wrote:
> On Fri, Mar 31, 2023 at 4:02 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 30-03-23 01:53:38, Yosry Ahmed wrote:
> > [...]
> > > Maybe we can add a primitive like might_sleep() for this, just food for thought.
> >
> > I do not think it is the correct to abuse might_sleep if the function
> > itself doesn't sleep. If it does might_sleep is already involved.
> 
> Oh, sorry if I wasn't clear, I did not mean to reuse might_sleep() --
> I meant introducing a new similar debug primitive that shouts if irqs
> are disabled.

This is circling back to original concerns about arbitrary decision to
care about IRQs. Is this really any different from spin locks or preempt
disabled critical sections preventing any scheduling and potentially
triggereing soft lockups?
  
Yosry Ahmed April 3, 2023, 8:39 p.m. UTC | #15
On Mon, Apr 3, 2023 at 1:38 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 31-03-23 12:03:47, Yosry Ahmed wrote:
> > On Fri, Mar 31, 2023 at 4:02 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Thu 30-03-23 01:53:38, Yosry Ahmed wrote:
> > > [...]
> > > > Maybe we can add a primitive like might_sleep() for this, just food for thought.
> > >
> > > I do not think it is the correct to abuse might_sleep if the function
> > > itself doesn't sleep. If it does might_sleep is already involved.
> >
> > Oh, sorry if I wasn't clear, I did not mean to reuse might_sleep() --
> > I meant introducing a new similar debug primitive that shouts if irqs
> > are disabled.
>
> This is circling back to original concerns about arbitrary decision to
> care about IRQs. Is this really any different from spin locks or preempt
> disabled critical sections preventing any scheduling and potentially
> triggereing soft lockups?

Not really, I am sure there are other code paths that may cause
similar problems. At least we can start annotating them so that we
don't regress them (e.g. by introducing a caller that disables irqs --
converting them into hard lockups).

> --
> Michal Hocko
> SUSE Labs
  

Patch

diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index d3252b0416b6..c2571939139f 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -176,6 +176,8 @@  static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
 {
 	int cpu;
 
+	/* rstat flushing is too expensive for irq context */
+	WARN_ON_ONCE(!in_task());
 	lockdep_assert_held(&cgroup_rstat_lock);
 
 	for_each_possible_cpu(cpu) {