mm: zswap: fix potential memory corruption on duplicate store

Message ID 20230922172211.1704917-1-cerasuolodomenico@gmail.com
State New
Headers
Series mm: zswap: fix potential memory corruption on duplicate store |

Commit Message

Domenico Cerasuolo Sept. 22, 2023, 5:22 p.m. UTC
  While stress-testing zswap a memory corruption was happening when writing
back pages. __frontswap_store used to check for duplicate entries before
attempting to store a page in zswap, this was because if the store fails
the old entry isn't removed from the tree. This change removes duplicate
entries in zswap_store before the actual attempt.

Based on commit ce9ecca0238b ("Linux 6.6-rc2")

Fixes: 42c06a0e8ebe ("mm: kill frontswap")
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
---
 mm/zswap.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)
  

Comments

Johannes Weiner Sept. 22, 2023, 5:42 p.m. UTC | #1
On Fri, Sep 22, 2023 at 07:22:11PM +0200, Domenico Cerasuolo wrote:
> While stress-testing zswap a memory corruption was happening when writing
> back pages. __frontswap_store used to check for duplicate entries before
> attempting to store a page in zswap, this was because if the store fails
> the old entry isn't removed from the tree. This change removes duplicate
> entries in zswap_store before the actual attempt.
> 
> Based on commit ce9ecca0238b ("Linux 6.6-rc2")
> 
> Fixes: 42c06a0e8ebe ("mm: kill frontswap")
> Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

> @@ -1218,6 +1218,19 @@ bool zswap_store(struct folio *folio)
>  	if (!zswap_enabled || !tree)
>  		return false;
>  
> +	/*
> +	 * If this is a duplicate, it must be removed before attempting to store
> +	 * it, otherwise, if the store fails the old page won't be removed from
> +	 * the tree, and it might be written back overriding the new data.
> +	 */
> +	spin_lock(&tree->lock);
> +	dupentry = zswap_rb_search(&tree->rbroot, offset);
> +	if (dupentry) {
> +		zswap_duplicate_entry++;
> +		zswap_invalidate_entry(tree, dupentry);
> +	}
> +	spin_unlock(&tree->lock);

Do we still need the dupe handling at the end of the function then?

The dupe store happens because a page that's already in swapcache has
changed and we're trying to swap_writepage() it again with new data.

But the page is locked at this point, pinning the swap entry. So even
after the tree lock is dropped I don't see how *another* store to the
tree at this offset could occur while we're compressing.
  
Nhat Pham Sept. 22, 2023, 6:17 p.m. UTC | #2
On Fri, Sep 22, 2023 at 10:22 AM Domenico Cerasuolo
<cerasuolodomenico@gmail.com> wrote:
>
> While stress-testing zswap a memory corruption was happening when writing
> back pages. __frontswap_store used to check for duplicate entries before
> attempting to store a page in zswap, this was because if the store fails
> the old entry isn't removed from the tree. This change removes duplicate
> entries in zswap_store before the actual attempt.
>
> Based on commit ce9ecca0238b ("Linux 6.6-rc2")
>
> Fixes: 42c06a0e8ebe ("mm: kill frontswap")
> Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
> ---
>  mm/zswap.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 412b1409a0d7..9146f9f19061 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1218,6 +1218,19 @@ bool zswap_store(struct folio *folio)
>         if (!zswap_enabled || !tree)
>                 return false;
>
> +       /*
> +        * If this is a duplicate, it must be removed before attempting to store
> +        * it, otherwise, if the store fails the old page won't be removed from
> +        * the tree, and it might be written back overriding the new data.
> +        */
> +       spin_lock(&tree->lock);
> +       dupentry = zswap_rb_search(&tree->rbroot, offset);
> +       if (dupentry) {
> +               zswap_duplicate_entry++;
> +               zswap_invalidate_entry(tree, dupentry);
> +       }
> +       spin_unlock(&tree->lock);
> +
>         /*
>          * XXX: zswap reclaim does not work with cgroups yet. Without a
>          * cgroup-aware entry LRU, we will push out entries system-wide based on
> --
> 2.34.1
>
  
Domenico Cerasuolo Sept. 25, 2023, 8:36 a.m. UTC | #3
On Fri, Sep 22, 2023 at 7:42 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, Sep 22, 2023 at 07:22:11PM +0200, Domenico Cerasuolo wrote:
> > While stress-testing zswap a memory corruption was happening when writing
> > back pages. __frontswap_store used to check for duplicate entries before
> > attempting to store a page in zswap, this was because if the store fails
> > the old entry isn't removed from the tree. This change removes duplicate
> > entries in zswap_store before the actual attempt.
> >
> > Based on commit ce9ecca0238b ("Linux 6.6-rc2")
> >
> > Fixes: 42c06a0e8ebe ("mm: kill frontswap")
> > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
> > @@ -1218,6 +1218,19 @@ bool zswap_store(struct folio *folio)
> >       if (!zswap_enabled || !tree)
> >               return false;
> >
> > +     /*
> > +      * If this is a duplicate, it must be removed before attempting to store
> > +      * it, otherwise, if the store fails the old page won't be removed from
> > +      * the tree, and it might be written back overriding the new data.
> > +      */
> > +     spin_lock(&tree->lock);
> > +     dupentry = zswap_rb_search(&tree->rbroot, offset);
> > +     if (dupentry) {
> > +             zswap_duplicate_entry++;
> > +             zswap_invalidate_entry(tree, dupentry);
> > +     }
> > +     spin_unlock(&tree->lock);
>
> Do we still need the dupe handling at the end of the function then?
>
> The dupe store happens because a page that's already in swapcache has
> changed and we're trying to swap_writepage() it again with new data.
>
> But the page is locked at this point, pinning the swap entry. So even
> after the tree lock is dropped I don't see how *another* store to the
> tree at this offset could occur while we're compressing.

My reasoning here was that frontswap used to invalidate a dupe right before
calling store(), so I thought that the check at the end of zswap_store() was
actually needed.
Since we acquire the tree lock anyway to add the new entry to the LRU, wouldn't
checking the result of zswap_rb_insert be a very cheap sanity check? We could
treat it as an error and fail the store(), or just add a warning and still
invalidate the dupe?
  
Johannes Weiner Sept. 25, 2023, 12:25 p.m. UTC | #4
On Mon, Sep 25, 2023 at 10:36:06AM +0200, Domenico Cerasuolo wrote:
> On Fri, Sep 22, 2023 at 7:42 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Fri, Sep 22, 2023 at 07:22:11PM +0200, Domenico Cerasuolo wrote:
> > > While stress-testing zswap a memory corruption was happening when writing
> > > back pages. __frontswap_store used to check for duplicate entries before
> > > attempting to store a page in zswap, this was because if the store fails
> > > the old entry isn't removed from the tree. This change removes duplicate
> > > entries in zswap_store before the actual attempt.
> > >
> > > Based on commit ce9ecca0238b ("Linux 6.6-rc2")
> > >
> > > Fixes: 42c06a0e8ebe ("mm: kill frontswap")
> > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
> >
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> >
> > > @@ -1218,6 +1218,19 @@ bool zswap_store(struct folio *folio)
> > >       if (!zswap_enabled || !tree)
> > >               return false;
> > >
> > > +     /*
> > > +      * If this is a duplicate, it must be removed before attempting to store
> > > +      * it, otherwise, if the store fails the old page won't be removed from
> > > +      * the tree, and it might be written back overriding the new data.
> > > +      */
> > > +     spin_lock(&tree->lock);
> > > +     dupentry = zswap_rb_search(&tree->rbroot, offset);
> > > +     if (dupentry) {
> > > +             zswap_duplicate_entry++;
> > > +             zswap_invalidate_entry(tree, dupentry);
> > > +     }
> > > +     spin_unlock(&tree->lock);
> >
> > Do we still need the dupe handling at the end of the function then?
> >
> > The dupe store happens because a page that's already in swapcache has
> > changed and we're trying to swap_writepage() it again with new data.
> >
> > But the page is locked at this point, pinning the swap entry. So even
> > after the tree lock is dropped I don't see how *another* store to the
> > tree at this offset could occur while we're compressing.
> 
> My reasoning here was that frontswap used to invalidate a dupe right before
> calling store(), so I thought that the check at the end of zswap_store() was
> actually needed.

Ok the git history is actually really enlightening of how this came to
be. Initially, frontswap didn't invalidate. Only zswap did. Then
someone ran into exactly the scenario that you encountered:

commit fb993fa1a2f669215fa03a09eed7848f2663e336
Author: Weijie Yang <weijie.yang@samsung.com>
Date:   Tue Dec 2 15:59:25 2014 -0800

    mm: frontswap: invalidate expired data on a dup-store failure
    
    If a frontswap dup-store failed, it should invalidate the expired page
    in the backend, or it could trigger some data corruption issue.
    Such as:
     1. use zswap as the frontswap backend with writeback feature
     2. store a swap page(version_1) to entry A, success
     3. dup-store a newer page(version_2) to the same entry A, fail
     4. use __swap_writepage() write version_2 page to swapfile, success
     5. zswap do shrink, writeback version_1 page to swapfile
     6. version_2 page is overwrited by version_1, data corrupt.
    
    This patch fixes this issue by invalidating expired data immediately
    when meet a dup-store failure.

This split the invalidation duty: On store success, zswap would
invalidate. On store failure, frontswap would.

Then this patch moved the invalidate in frontswap to before the store:

commit d1dc6f1bcf1e998e7ce65fc120da371ab047a999
Author: Dan Streetman <ddstreet@ieee.org>
Date:   Wed Jun 24 16:58:18 2015 -0700

    frontswap: allow multiple backends

at which point the after-store invalidate in zswap became unnecessary.

> Since we acquire the tree lock anyway to add the new entry to the LRU, wouldn't
> checking the result of zswap_rb_insert be a very cheap sanity check? We could
> treat it as an error and fail the store(), or just add a warning and still
> invalidate the dupe?

Yeah, it's less about performance and more about code clarity.

A warning probably makes the most sense. It gives us the sanity check
and an opportunity to document expectations wrt the swapcache, while
keeping the code resilient against data corruption.
  

Patch

diff --git a/mm/zswap.c b/mm/zswap.c
index 412b1409a0d7..9146f9f19061 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1218,6 +1218,19 @@  bool zswap_store(struct folio *folio)
 	if (!zswap_enabled || !tree)
 		return false;
 
+	/*
+	 * If this is a duplicate, it must be removed before attempting to store
+	 * it, otherwise, if the store fails the old page won't be removed from
+	 * the tree, and it might be written back overriding the new data.
+	 */
+	spin_lock(&tree->lock);
+	dupentry = zswap_rb_search(&tree->rbroot, offset);
+	if (dupentry) {
+		zswap_duplicate_entry++;
+		zswap_invalidate_entry(tree, dupentry);
+	}
+	spin_unlock(&tree->lock);
+
 	/*
 	 * XXX: zswap reclaim does not work with cgroups yet. Without a
 	 * cgroup-aware entry LRU, we will push out entries system-wide based on