zswap: do not shrink if cgroup may not zswap

Message ID 20230530222440.2777700-1-nphamcs@gmail.com
State New
Headers
Series zswap: do not shrink if cgroup may not zswap |

Commit Message

Nhat Pham May 30, 2023, 10:24 p.m. UTC
  Before storing a page, zswap first checks if the number of stored pages
exceeds the limit specified by memory.zswap.max, for each cgroup in the
hierarchy. If this limit is reached or exceeded, then zswap shrinking is
triggered and short-circuits the store attempt.

However, since the zswap's LRU is not memcg-aware, this can create the
following pathological behavior: the cgroup whose zswap limit is
reached will evict pages from other cgroups continually, without
lowering its own zswap usage. This means the shrinking will continue
until the need for swap ceases or the pool becomes empty.

As a result of this, we observe a disproportionate amount of zswap
writeback and a perpetually small zswap pool in our experiments, even
though the pool limit is never hit.

This patch fixes the issue by rejecting zswap store attempt without
shrinking the pool when obj_cgroup_may_zswap() returns false.

Fixes: f4840ccfca25 ("zswap: memcg accounting")
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
 mm/zswap.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)
  

Comments

Andrew Morton May 30, 2023, 10:30 p.m. UTC | #1
On Tue, 30 May 2023 15:24:40 -0700 Nhat Pham <nphamcs@gmail.com> wrote:

> Before storing a page, zswap first checks if the number of stored pages
> exceeds the limit specified by memory.zswap.max, for each cgroup in the
> hierarchy. If this limit is reached or exceeded, then zswap shrinking is
> triggered and short-circuits the store attempt.
> 
> However, since the zswap's LRU is not memcg-aware, this can create the
> following pathological behavior: the cgroup whose zswap limit is
> reached will evict pages from other cgroups continually, without
> lowering its own zswap usage. This means the shrinking will continue
> until the need for swap ceases or the pool becomes empty.
> 
> As a result of this, we observe a disproportionate amount of zswap
> writeback and a perpetually small zswap pool in our experiments, even
> though the pool limit is never hit.

That sounds unpleasant.  Do you think the patch should be backported
into earlier (-stable) kernels?

> This patch fixes the issue by rejecting zswap store attempt without
> shrinking the pool when obj_cgroup_may_zswap() returns false.
  
Yosry Ahmed May 30, 2023, 10:36 p.m. UTC | #2
On Tue, May 30, 2023 at 3:24 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> Before storing a page, zswap first checks if the number of stored pages
> exceeds the limit specified by memory.zswap.max, for each cgroup in the
> hierarchy. If this limit is reached or exceeded, then zswap shrinking is
> triggered and short-circuits the store attempt.
>
> However, since the zswap's LRU is not memcg-aware, this can create the
> following pathological behavior: the cgroup whose zswap limit is
> reached will evict pages from other cgroups continually, without
> lowering its own zswap usage. This means the shrinking will continue
> until the need for swap ceases or the pool becomes empty.

This pathological behavior will only happen if the zswap limit is 0.
Otherwise, we will see a different pathological behavior where we
unnecessarily evict X pages from other cgroups before we drive the
memcg back below its limit.

Perhaps we should clarify this?

>
> As a result of this, we observe a disproportionate amount of zswap
> writeback and a perpetually small zswap pool in our experiments, even
> though the pool limit is never hit.

I am guessing this is also related to the case where the limit is 0.
It would be useful to clarify this.

>
> This patch fixes the issue by rejecting zswap store attempt without
> shrinking the pool when obj_cgroup_may_zswap() returns false.
>
> Fixes: f4840ccfca25 ("zswap: memcg accounting")
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
> ---
>  mm/zswap.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 59da2a415fbb..cff93643a6ab 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1174,9 +1174,14 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
>                 goto reject;
>         }
>
> +       /*
> +        * XXX: zswap reclaim does not work with cgroups yet. Without a
> +        * cgroup-aware entry LRU, we will push out entries system-wide based on
> +        * local cgroup limits.
> +        */
>         objcg = get_obj_cgroup_from_page(page);
>         if (objcg && !obj_cgroup_may_zswap(objcg))
> -               goto shrink;
> +               goto reject;
>
>         /* reclaim space if needed */
>         if (zswap_is_full()) {
> --
> 2.34.1
>

With commit log nits above:

Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
  
Nhat Pham May 30, 2023, 11:37 p.m. UTC | #3
On Tue, May 30, 2023 at 3:30 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 30 May 2023 15:24:40 -0700 Nhat Pham <nphamcs@gmail.com> wrote:
>
> > Before storing a page, zswap first checks if the number of stored pages
> > exceeds the limit specified by memory.zswap.max, for each cgroup in the
> > hierarchy. If this limit is reached or exceeded, then zswap shrinking is
> > triggered and short-circuits the store attempt.
> >
> > However, since the zswap's LRU is not memcg-aware, this can create the
> > following pathological behavior: the cgroup whose zswap limit is
> > reached will evict pages from other cgroups continually, without
> > lowering its own zswap usage. This means the shrinking will continue
> > until the need for swap ceases or the pool becomes empty.
> >
> > As a result of this, we observe a disproportionate amount of zswap
> > writeback and a perpetually small zswap pool in our experiments, even
> > though the pool limit is never hit.
>
> That sounds unpleasant.  Do you think the patch should be backported
> into earlier (-stable) kernels?

I think it should be, for any kernel version after f4840ccfca25.



>
> > This patch fixes the issue by rejecting zswap store attempt without
> > shrinking the pool when obj_cgroup_may_zswap() returns false.
>
  

Patch

diff --git a/mm/zswap.c b/mm/zswap.c
index 59da2a415fbb..cff93643a6ab 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1174,9 +1174,14 @@  static int zswap_frontswap_store(unsigned type, pgoff_t offset,
 		goto reject;
 	}
 
+	/*
+	 * XXX: zswap reclaim does not work with cgroups yet. Without a
+	 * cgroup-aware entry LRU, we will push out entries system-wide based on
+	 * local cgroup limits.
+	 */
 	objcg = get_obj_cgroup_from_page(page);
 	if (objcg && !obj_cgroup_may_zswap(objcg))
-		goto shrink;
+		goto reject;
 
 	/* reclaim space if needed */
 	if (zswap_is_full()) {