mm/swap_state: update zswap LRU's protection range with the folio locked

Message ID 20240205232442.3240571-1-nphamcs@gmail.com
State New
Headers
Series mm/swap_state: update zswap LRU's protection range with the folio locked |

Commit Message

Nhat Pham Feb. 5, 2024, 11:24 p.m. UTC
  Move the zswap LRU protection range update above the swap_read_folio()
call, and only when a new page is allocated. This is the case where
(z)swapin could happen, which is a signal that the zswap shrinker should
be more conservative with its reclaiming action.

It also prevents a race, in which folio migration can clear the
memcg_data of the now unlocked folio, resulting in a warning in the
inlined folio_lruvec() call.

Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
 mm/swap_state.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)


base-commit: 91f3daa1765ee4e0c89987dc25f72c40f07af34d
  

Comments

Chengming Zhou Feb. 6, 2024, 2:07 a.m. UTC | #1
On 2024/2/6 07:24, Nhat Pham wrote:
> Move the zswap LRU protection range update above the swap_read_folio()
> call, and only when a new page is allocated. This is the case where
> (z)swapin could happen, which is a signal that the zswap shrinker should
> be more conservative with its reclaiming action.
> 
> It also prevents a race, in which folio migration can clear the
> memcg_data of the now unlocked folio, resulting in a warning in the
> inlined folio_lruvec() call.
> 
> Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
> Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>

LGTM, thanks!

Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>

> ---
>  mm/swap_state.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index e671266ad772..7255c01a1e4e 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>  	/* The page was likely read above, so no need for plugging here */
>  	folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
>  					&page_allocated, false);
> -	if (unlikely(page_allocated))
> +	if (unlikely(page_allocated)) {
> +		zswap_folio_swapin(folio);
>  		swap_read_folio(folio, false, NULL);
> -	zswap_folio_swapin(folio);
> +	}
>  	return folio;
>  }
>  
> @@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
>  	/* The folio was likely read above, so no need for plugging here */
>  	folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
>  					&page_allocated, false);
> -	if (unlikely(page_allocated))
> +	if (unlikely(page_allocated)) {
> +		zswap_folio_swapin(folio);
>  		swap_read_folio(folio, false, NULL);
> -	zswap_folio_swapin(folio);
> +	}
>  	return folio;
>  }
>  
> 
> base-commit: 91f3daa1765ee4e0c89987dc25f72c40f07af34d
  
Johannes Weiner Feb. 6, 2024, 3:15 p.m. UTC | #2
On Mon, Feb 05, 2024 at 03:24:42PM -0800, Nhat Pham wrote:
> Move the zswap LRU protection range update above the swap_read_folio()
> call, and only when a new page is allocated. This is the case where
> (z)swapin could happen, which is a signal that the zswap shrinker should
> be more conservative with its reclaiming action.
> 
> It also prevents a race, in which folio migration can clear the
> memcg_data of the now unlocked folio, resulting in a warning in the
> inlined folio_lruvec() call.

The warning is the most probable outcome, and it will cause the update
to go against the root cgroup which is safe at least.

But AFAICS there is no ordering guarantee to rule out a UAF if the
lookup succeeds but the memcg and lruvec get freed before the update.

I think that part should be more prominent in the changelog. It's more
important than the first paragraph. Consider somebody scrolling
through the git log and trying to decide whether to backport or not;
it's helpful to describe the bug and its impact first thing, then put
the explanation of the fix after.

> Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
> Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>

Would it make sense to add

	VM_WARN_ON_ONCE(!folio_test_locked(folio));

to zswap_folio_swapin() as well?
  
Nhat Pham Feb. 6, 2024, 5:31 p.m. UTC | #3
On Tue, Feb 6, 2024 at 7:15 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Mon, Feb 05, 2024 at 03:24:42PM -0800, Nhat Pham wrote:
> > Move the zswap LRU protection range update above the swap_read_folio()
> > call, and only when a new page is allocated. This is the case where
> > (z)swapin could happen, which is a signal that the zswap shrinker should
> > be more conservative with its reclaiming action.
> >
> > It also prevents a race, in which folio migration can clear the
> > memcg_data of the now unlocked folio, resulting in a warning in the
> > inlined folio_lruvec() call.
>
> The warning is the most probable outcome, and it will cause the update
> to go against the root cgroup which is safe at least.
>
> But AFAICS there is no ordering guarantee to rule out a UAF if the
> lookup succeeds but the memcg and lruvec get freed before the update.

Ah nice. I didn't consider that. IIUC, having the folio locked should
prevent this too. Based on the documentation:

 * For a non-kmem folio any of the following ensures folio and memcg binding
 * stability:
 *
 * - the folio lock

I'll rework the commit log to include this, and make this more prominent :)

>
> I think that part should be more prominent in the changelog. It's more
> important than the first paragraph. Consider somebody scrolling
> through the git log and trying to decide whether to backport or not;
> it's helpful to describe the bug and its impact first thing, then put
> the explanation of the fix after.
>
> > Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@googlecom/
> > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> > Signed-off-by: Nhat Pham <nphamcs@gmail.com>
>
> Would it make sense to add
>
>         VM_WARN_ON_ONCE(!folio_test_locked(folio));
>
> to zswap_folio_swapin() as well?
  

Patch

diff --git a/mm/swap_state.c b/mm/swap_state.c
index e671266ad772..7255c01a1e4e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -680,9 +680,10 @@  struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	/* The page was likely read above, so no need for plugging here */
 	folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
 					&page_allocated, false);
-	if (unlikely(page_allocated))
+	if (unlikely(page_allocated)) {
+		zswap_folio_swapin(folio);
 		swap_read_folio(folio, false, NULL);
-	zswap_folio_swapin(folio);
+	}
 	return folio;
 }
 
@@ -855,9 +856,10 @@  static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
 	/* The folio was likely read above, so no need for plugging here */
 	folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
 					&page_allocated, false);
-	if (unlikely(page_allocated))
+	if (unlikely(page_allocated)) {
+		zswap_folio_swapin(folio);
 		swap_read_folio(folio, false, NULL);
-	zswap_folio_swapin(folio);
+	}
 	return folio;
 }