[v1,7/9] workingset: memcg: sleep when flushing stats in workingset_refault()

Message ID 20230328061638.203420-8-yosryahmed@google.com
State New
Headers
Series memcg: make rstat flushing irq and sleep friendly |

Commit Message

Yosry Ahmed March 28, 2023, 6:16 a.m. UTC
  In workingset_refault(), we call mem_cgroup_flush_stats_ratelimited()
to flush stats within an RCU read section and with sleeping disallowed.
Move the call to mem_cgroup_flush_stats_ratelimited() above the RCU read
section and allow sleeping to avoid unnecessarily performing a lot of
work without sleeping.

Since workingset_refault() is the only caller of
mem_cgroup_flush_stats_ratelimited(), just make it call the non-atomic
mem_cgroup_flush_stats().

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/memcontrol.c | 12 ++++++------
 mm/workingset.c |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)
  

Comments

Shakeel Butt March 28, 2023, 3:18 p.m. UTC | #1
On Mon, Mar 27, 2023 at 11:16 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> In workingset_refault(), we call mem_cgroup_flush_stats_ratelimited()
> to flush stats within an RCU read section and with sleeping disallowed.
> Move the call to mem_cgroup_flush_stats_ratelimited() above the RCU read
> section and allow sleeping to avoid unnecessarily performing a lot of
> work without sleeping.
>
> Since workingset_refault() is the only caller of
> mem_cgroup_flush_stats_ratelimited(), just make it call the non-atomic
> mem_cgroup_flush_stats().
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

A nit below:

Acked-by: Shakeel Butt <shakeelb@google.com>

> ---
>  mm/memcontrol.c | 12 ++++++------
>  mm/workingset.c |  4 ++--
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 57e8cbf701f3..0c0e74188e90 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -674,12 +674,6 @@ void mem_cgroup_flush_stats_atomic(void)
>                 __mem_cgroup_flush_stats_atomic();
>  }
>
> -void mem_cgroup_flush_stats_ratelimited(void)
> -{
> -       if (time_after64(jiffies_64, READ_ONCE(flush_next_time)))
> -               mem_cgroup_flush_stats_atomic();
> -}
> -
>  /* non-atomic functions, only safe from sleepable contexts */
>  static void __mem_cgroup_flush_stats(void)
>  {
> @@ -695,6 +689,12 @@ void mem_cgroup_flush_stats(void)
>                 __mem_cgroup_flush_stats();
>  }
>
> +void mem_cgroup_flush_stats_ratelimited(void)
> +{
> +       if (time_after64(jiffies_64, READ_ONCE(flush_next_time)))
> +               mem_cgroup_flush_stats();
> +}
> +
>  static void flush_memcg_stats_dwork(struct work_struct *w)
>  {
>         __mem_cgroup_flush_stats();
> diff --git a/mm/workingset.c b/mm/workingset.c
> index af862c6738c3..7d7ecc46521c 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -406,6 +406,8 @@ void workingset_refault(struct folio *folio, void *shadow)
>         unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset);
>         eviction <<= bucket_order;
>
> +       /* Flush stats (and potentially sleep) before holding RCU read lock */

I think the only reason we use rcu lock is due to
mem_cgroup_from_id(). Maybe we should add mem_cgroup_tryget_from_id().
The other caller of mem_cgroup_from_id() in vmscan is already doing
the same and could use mem_cgroup_tryget_from_id().

Though this can be done separately to this series (if we decide to do
it at all).
  
Johannes Weiner March 28, 2023, 6:43 p.m. UTC | #2
On Tue, Mar 28, 2023 at 06:16:36AM +0000, Yosry Ahmed wrote:
> @@ -406,6 +406,8 @@ void workingset_refault(struct folio *folio, void *shadow)
>  	unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset);
>  	eviction <<= bucket_order;
>  
> +	/* Flush stats (and potentially sleep) before holding RCU read lock */
> +	mem_cgroup_flush_stats_ratelimited();
>  	rcu_read_lock();

Minor nit, but please keep the lock section visually separated by an
empty line between the flush and the rcu lock.

Other than that,

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
  
Johannes Weiner March 28, 2023, 6:47 p.m. UTC | #3
On Tue, Mar 28, 2023 at 08:18:11AM -0700, Shakeel Butt wrote:
> > @@ -406,6 +406,8 @@ void workingset_refault(struct folio *folio, void *shadow)
> >         unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset);
> >         eviction <<= bucket_order;
> >
> > +       /* Flush stats (and potentially sleep) before holding RCU read lock */
> 
> I think the only reason we use rcu lock is due to
> mem_cgroup_from_id(). Maybe we should add mem_cgroup_tryget_from_id().
> The other caller of mem_cgroup_from_id() in vmscan is already doing
> the same and could use mem_cgroup_tryget_from_id().

Good catch. Nothing else in there is protected by RCU. We can just
hold the ref instead.

> Though this can be done separately to this series (if we decide to do
> it at all).

Agreed
  
Yosry Ahmed March 28, 2023, 7:25 p.m. UTC | #4
On Tue, Mar 28, 2023 at 8:18 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Mon, Mar 27, 2023 at 11:16 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > In workingset_refault(), we call mem_cgroup_flush_stats_ratelimited()
> > to flush stats within an RCU read section and with sleeping disallowed.
> > Move the call to mem_cgroup_flush_stats_ratelimited() above the RCU read
> > section and allow sleeping to avoid unnecessarily performing a lot of
> > work without sleeping.
> >
> > Since workingset_refault() is the only caller of
> > mem_cgroup_flush_stats_ratelimited(), just make it call the non-atomic
> > mem_cgroup_flush_stats().
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
>
> A nit below:
>
> Acked-by: Shakeel Butt <shakeelb@google.com>
>
> > ---
> >  mm/memcontrol.c | 12 ++++++------
> >  mm/workingset.c |  4 ++--
> >  2 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 57e8cbf701f3..0c0e74188e90 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -674,12 +674,6 @@ void mem_cgroup_flush_stats_atomic(void)
> >                 __mem_cgroup_flush_stats_atomic();
> >  }
> >
> > -void mem_cgroup_flush_stats_ratelimited(void)
> > -{
> > -       if (time_after64(jiffies_64, READ_ONCE(flush_next_time)))
> > -               mem_cgroup_flush_stats_atomic();
> > -}
> > -
> >  /* non-atomic functions, only safe from sleepable contexts */
> >  static void __mem_cgroup_flush_stats(void)
> >  {
> > @@ -695,6 +689,12 @@ void mem_cgroup_flush_stats(void)
> >                 __mem_cgroup_flush_stats();
> >  }
> >
> > +void mem_cgroup_flush_stats_ratelimited(void)
> > +{
> > +       if (time_after64(jiffies_64, READ_ONCE(flush_next_time)))
> > +               mem_cgroup_flush_stats();
> > +}
> > +
> >  static void flush_memcg_stats_dwork(struct work_struct *w)
> >  {
> >         __mem_cgroup_flush_stats();
> > diff --git a/mm/workingset.c b/mm/workingset.c
> > index af862c6738c3..7d7ecc46521c 100644
> > --- a/mm/workingset.c
> > +++ b/mm/workingset.c
> > @@ -406,6 +406,8 @@ void workingset_refault(struct folio *folio, void *shadow)
> >         unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset);
> >         eviction <<= bucket_order;
> >
> > +       /* Flush stats (and potentially sleep) before holding RCU read lock */
>
> I think the only reason we use rcu lock is due to
> mem_cgroup_from_id(). Maybe we should add mem_cgroup_tryget_from_id().
> The other caller of mem_cgroup_from_id() in vmscan is already doing
> the same and could use mem_cgroup_tryget_from_id().

I think different callers of mem_cgroup_from_id() want different things.

(a) workingset_refault() reads the memcg from the id and doesn't
really care if the memcg is online or not.

(b) __mem_cgroup_uncharge_swap() reads the memcg from the id and drops
refs acquired on the swapout path. It doesn't need tryget as we should
know for a fact that we are holding refs from the swapout path. It
doesn't care if the memcg is online or not.

(c) mem_cgroup_swapin_charge_folio() reads the memcg from the id and
then gets a ref with css_tryget_online() -- so only if the refcount is
non-zero and the memcg is online.

So we would at least need mem_cgroup_tryget_from_id() and
mem_cgroup_tryget_online_from_id() to eliminate all direct calls of
mem_cgroup_from_id(). I am hesitant about (b) because if we use
mem_cgroup_tryget_from_id() the code will be getting a ref, then
dropping the ref we have been carrying from swapout, then dropping the
ref we just acquired.

 WDYT?


>
> Though this can be done separately to this series (if we decide to do
> it at all).
  

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 57e8cbf701f3..0c0e74188e90 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -674,12 +674,6 @@  void mem_cgroup_flush_stats_atomic(void)
 		__mem_cgroup_flush_stats_atomic();
 }
 
-void mem_cgroup_flush_stats_ratelimited(void)
-{
-	if (time_after64(jiffies_64, READ_ONCE(flush_next_time)))
-		mem_cgroup_flush_stats_atomic();
-}
-
 /* non-atomic functions, only safe from sleepable contexts */
 static void __mem_cgroup_flush_stats(void)
 {
@@ -695,6 +689,12 @@  void mem_cgroup_flush_stats(void)
 		__mem_cgroup_flush_stats();
 }
 
+void mem_cgroup_flush_stats_ratelimited(void)
+{
+	if (time_after64(jiffies_64, READ_ONCE(flush_next_time)))
+		mem_cgroup_flush_stats();
+}
+
 static void flush_memcg_stats_dwork(struct work_struct *w)
 {
 	__mem_cgroup_flush_stats();
diff --git a/mm/workingset.c b/mm/workingset.c
index af862c6738c3..7d7ecc46521c 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -406,6 +406,8 @@  void workingset_refault(struct folio *folio, void *shadow)
 	unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset);
 	eviction <<= bucket_order;
 
+	/* Flush stats (and potentially sleep) before holding RCU read lock */
+	mem_cgroup_flush_stats_ratelimited();
 	rcu_read_lock();
 	/*
 	 * Look up the memcg associated with the stored ID. It might
@@ -461,8 +463,6 @@  void workingset_refault(struct folio *folio, void *shadow)
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
-
-	mem_cgroup_flush_stats_ratelimited();
 	/*
 	 * Compare the distance to the existing workingset size. We
 	 * don't activate pages that couldn't stay resident even if