diff mbox series

[1/2] mm: zswap.c: add xarray tree to zswap

Message ID	20240117-zswap-xarray-v1-1-6daa86c08fae@kernel.org
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-29679-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; From: Chris Li <chrisl@kernel.org> Date: Wed, 17 Jan 2024 19:05:41 -0800 Subject: [PATCH 1/2] mm: zswap.c: add xarray tree to zswap Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240117-zswap-xarray-v1-1-6daa86c08fae@kernel.org> References: <20240117-zswap-xarray-v1-0-6daa86c08fae@kernel.org> In-Reply-To: <20240117-zswap-xarray-v1-0-6daa86c08fae@kernel.org> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, =?utf-8?b?V2VpIFh177+8?= <weixugc@google.com>, Yu Zhao <yuzhao@google.com>, Greg Thelen <gthelen@google.com>, Chun-Tse Shao <ctshao@google.com>, =?utf-8?q?Suren_Baghdasaryan=EF=BF=BC?= <surenb@google.com>, Yosry Ahmed <yosryahmed@google.com>, Brain Geffon <bgeffon@google.com>, Minchan Kim <minchan@kernel.org>, Michal Hocko <mhocko@suse.com>, Mel Gorman <mgorman@techsingularity.net>, Huang Ying <ying.huang@intel.com>, Nhat Pham <nphamcs@gmail.com>, Johannes Weiner <hannes@cmpxchg.org>, Kairui Song <kasong@tencent.com>, Zhongkun He <hezhongkun.hzk@bytedance.com>, Kemeng Shi <shikemeng@huaweicloud.com>, Barry Song <v-songbaohua@oppo.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, "Liam R. Howlett" <Liam.Howlett@oracle.com>, Joel Fernandes <joel@joelfernandes.org>, Chengming Zhou <zhouchengming@bytedance.com>, Chris Li <chrisl@kernel.org>
Series	RFC: zswap tree use xarray instead of RB tree \| [0/2] RFC: zswap tree use xarray instead of RB tree [1/2] mm: zswap.c: add xarray tree to zswap [2/2] mm: zswap.c: remove RB tree

Commit Message

Chris Li Jan. 18, 2024, 3:05 a.m. UTC

  The xarray tree is added alongside the zswap RB tree.
Checks for the xarray get the same result as the RB tree operations.

Rename the zswap RB tree function to a more generic function
name without the RB part.
---
 mm/zswap.c | 60 ++++++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 42 insertions(+), 18 deletions(-)

Comments

Yosry Ahmed Jan. 18, 2024, 6:20 a.m. UTC | #1

On Wed, Jan 17, 2024 at 7:06 PM Chris Li <chrisl@kernel.org> wrote:
>
> The xarray tree is added alongside the zswap RB tree.
> Checks for the xarray get the same result as the RB tree operations.
>
> Rename the zswap RB tree function to a more generic function
> name without the RB part.

As I mentioned in the cover letter, I believe this should be squashed
into the second patch. I have some comments below as well on the parts
that should remain after the squash.

[..]
>
> @@ -462,9 +463,9 @@ static void zswap_lru_putback(struct list_lru *list_lru,
>  /*********************************
>  * rbtree functions
>  **********************************/
> -static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset)
> +static struct zswap_entry *zswap_search(struct zswap_tree *tree, pgoff_t offset)

Let's change the zswap_rb_* prefixes to zswap_tree_* instead of just
zswap_*. Otherwise, it will be confusing to have both zswap_store and
zswap_insert (as well as zswap_load and zswap_search).

[..]
> @@ -1790,15 +1808,21 @@ void zswap_swapon(int type)
>  void zswap_swapoff(int type)
>  {
>         struct zswap_tree *tree = zswap_trees[type];
> -       struct zswap_entry *entry, *n;
> +       struct zswap_entry *entry, *e, *n;
> +       XA_STATE(xas, tree ? &tree->xarray : NULL, 0);
>
>         if (!tree)
>                 return;
>
>         /* walk the tree and free everything */
>         spin_lock(&tree->lock);
> +
> +       xas_for_each(&xas, e, ULONG_MAX)

Why not use xa_for_each?

> +               zswap_invalidate_entry(tree, e);
> +
>         rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode)
> -               zswap_free_entry(entry);

Replacing zswap_free_entry() with zswap_invalidate_entry() is a
behavioral change that should be done separate from this series, but I
am wondering why it's needed. IIUC, the swapoff code should be making
sure there are no ongoing swapin/swapout operations, and there are no
pages left in zswap to writeback.

Is it the case that swapoff may race with writeback, such that
writeback is holding the last remaining ref after zswap_invalidate()
is called, and then zswap_swapoff() is called freeing the zswap entry
while writeback is still accessing it?

Matthew Wilcox Jan. 18, 2024, 1:52 p.m. UTC | #2

On Wed, Jan 17, 2024 at 10:20:29PM -0800, Yosry Ahmed wrote:
> >         /* walk the tree and free everything */
> >         spin_lock(&tree->lock);
> > +
> > +       xas_for_each(&xas, e, ULONG_MAX)
> 
> Why not use xa_for_each?

xas_for_each() is O(n) while xa_for_each() is O(n log n), as mentioned
in the fine documentation.  If you don't need to drop the lock while
in the body of the loop, always prefer xas_for_each().

Yosry Ahmed Jan. 18, 2024, 4:59 p.m. UTC | #3

On Thu, Jan 18, 2024 at 5:52 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Jan 17, 2024 at 10:20:29PM -0800, Yosry Ahmed wrote:
> > >         /* walk the tree and free everything */
> > >         spin_lock(&tree->lock);
> > > +
> > > +       xas_for_each(&xas, e, ULONG_MAX)
> >
> > Why not use xa_for_each?
>
> xas_for_each() is O(n) while xa_for_each() is O(n log n), as mentioned
> in the fine documentation.  If you don't need to drop the lock while
> in the body of the loop, always prefer xas_for_each().

Thanks for pointing this out. Out of ignorance, I skipped reading the
doc for this one and operated under the general assumption to use xa_*
functions were possible.

The doc also says we should hold either the RCU read lock or the
xa_lock while iterating, we are not doing either here, no?

Matthew Wilcox Jan. 18, 2024, 6:25 p.m. UTC | #4

On Thu, Jan 18, 2024 at 08:59:55AM -0800, Yosry Ahmed wrote:
> On Thu, Jan 18, 2024 at 5:52 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Jan 17, 2024 at 10:20:29PM -0800, Yosry Ahmed wrote:
> > > >         /* walk the tree and free everything */
> > > >         spin_lock(&tree->lock);
> > > > +
> > > > +       xas_for_each(&xas, e, ULONG_MAX)
> > >
> > > Why not use xa_for_each?
> >
> > xas_for_each() is O(n) while xa_for_each() is O(n log n), as mentioned
> > in the fine documentation.  If you don't need to drop the lock while
> > in the body of the loop, always prefer xas_for_each().
> 
> Thanks for pointing this out. Out of ignorance, I skipped reading the
> doc for this one and operated under the general assumption to use xa_*
> functions were possible.
> 
> The doc also says we should hold either the RCU read lock or the
> xa_lock while iterating, we are not doing either here, no?

I have no idea; I haven't studied the patches in detail yet.  I have
debugging assertions for that, so I was assuming that Chris had been
developing with debug options turned on.  If not, I guess the bots will
do it for him.

Chris Li Jan. 19, 2024, 5:24 a.m. UTC | #5

Hi Yosry,

On Wed, Jan 17, 2024 at 10:21 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Wed, Jan 17, 2024 at 7:06 PM Chris Li <chrisl@kernel.org> wrote:
> >
> > The xarray tree is added alongside the zswap RB tree.
> > Checks for the xarray get the same result as the RB tree operations.
> >
> > Rename the zswap RB tree function to a more generic function
> > name without the RB part.
>
> As I mentioned in the cover letter, I believe this should be squashed
> into the second patch. I have some comments below as well on the parts
> that should remain after the squash.
>
> [..]
> >
> > @@ -462,9 +463,9 @@ static void zswap_lru_putback(struct list_lru *list_lru,
> >  /*********************************
> >  * rbtree functions
> >  **********************************/
> > -static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset)
> > +static struct zswap_entry *zswap_search(struct zswap_tree *tree, pgoff_t offset)
>
> Let's change the zswap_rb_* prefixes to zswap_tree_* instead of just
> zswap_*. Otherwise, it will be confusing to have both zswap_store and
> zswap_insert (as well as zswap_load and zswap_search).

How about zswap_xa_* ?

>
> [..]
> > @@ -1790,15 +1808,21 @@ void zswap_swapon(int type)
> >  void zswap_swapoff(int type)
> >  {
> >         struct zswap_tree *tree = zswap_trees[type];
> > -       struct zswap_entry *entry, *n;
> > +       struct zswap_entry *entry, *e, *n;
> > +       XA_STATE(xas, tree ? &tree->xarray : NULL, 0);
> >
> >         if (!tree)
> >                 return;
> >
> >         /* walk the tree and free everything */
> >         spin_lock(&tree->lock);
> > +
> > +       xas_for_each(&xas, e, ULONG_MAX)
>
> Why not use xa_for_each?
>
> > +               zswap_invalidate_entry(tree, e);
> > +
> >         rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode)
> > -               zswap_free_entry(entry);
>
> Replacing zswap_free_entry() with zswap_invalidate_entry() is a
> behavioral change that should be done separate from this series, but I
> am wondering why it's needed. IIUC, the swapoff code should be making
> sure there are no ongoing swapin/swapout operations, and there are no
> pages left in zswap to writeback.
>
> Is it the case that swapoff may race with writeback, such that
> writeback is holding the last remaining ref after zswap_invalidate()
> is called, and then zswap_swapoff() is called freeing the zswap entry
> while writeback is still accessing it?

For the RB tree the mapping is stored in the zswap entry as RB node.
That is different from xarray. Xarry stores the mapping outside of
zswap entry. Just freeing the entry does not remove the mapping from
xarray. Therefore it needs to call zswap_invalidate_entry() to remove
the entry from the xarray. I could call zswap_erase() then free entry.
I just think zswap_invalidate_entry() is more consistent with the rest
of the code.

Chris

Chris Li Jan. 19, 2024, 5:28 a.m. UTC | #6

On Thu, Jan 18, 2024 at 10:25 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Jan 18, 2024 at 08:59:55AM -0800, Yosry Ahmed wrote:
> > On Thu, Jan 18, 2024 at 5:52 AM Matthew Wilcox <willy@infradeadorg> wrote:
> > >
> > > On Wed, Jan 17, 2024 at 10:20:29PM -0800, Yosry Ahmed wrote:
> > > > >         /* walk the tree and free everything */
> > > > >         spin_lock(&tree->lock);
> > > > > +
> > > > > +       xas_for_each(&xas, e, ULONG_MAX)
> > > >
> > > > Why not use xa_for_each?
> > >
> > > xas_for_each() is O(n) while xa_for_each() is O(n log n), as mentioned
> > > in the fine documentation.  If you don't need to drop the lock while
> > > in the body of the loop, always prefer xas_for_each().
> >
> > Thanks for pointing this out. Out of ignorance, I skipped reading the
> > doc for this one and operated under the general assumption to use xa_*
> > functions were possible.
> >
> > The doc also says we should hold either the RCU read lock or the
> > xa_lock while iterating, we are not doing either here, no?
>
> I have no idea; I haven't studied the patches in detail yet.  I have
> debugging assertions for that, so I was assuming that Chris had been
> developing with debug options turned on.  If not, I guess the bots will
> do it for him.

It is fine now because we have the extra zswap tree spin lock. When we
remove the zswap tree spin lock it does require RCU read lock. You are
right I would get assertion failure.

Chris

Yosry Ahmed Jan. 19, 2024, 7:29 p.m. UTC | #7

My previous email got messed up, sorry.

> > > @@ -462,9 +463,9 @@ static void zswap_lru_putback(struct list_lru *list_lru,
> > >  /*********************************
> > >  * rbtree functions
> > >  **********************************/
> > > -static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset)
> > > +static struct zswap_entry *zswap_search(struct zswap_tree *tree, pgoff_t offset)
> >
> > Let's change the zswap_rb_* prefixes to zswap_tree_* instead of just
> > zswap_*. Otherwise, it will be confusing to have both zswap_store and
> > zswap_insert (as well as zswap_load and zswap_search).
>
> How about zswap_xa_* ?

SGTM.

> >
> > [..]
> > > @@ -1790,15 +1808,21 @@ void zswap_swapon(int type)
> > >  void zswap_swapoff(int type)
> > >  {
> > >         struct zswap_tree *tree = zswap_trees[type];
> > > -       struct zswap_entry *entry, *n;
> > > +       struct zswap_entry *entry, *e, *n;
> > > +       XA_STATE(xas, tree ? &tree->xarray : NULL, 0);
> > >
> > >         if (!tree)
> > >                 return;
> > >
> > >         /* walk the tree and free everything */
> > >         spin_lock(&tree->lock);
> > > +
> > > +       xas_for_each(&xas, e, ULONG_MAX)
> >
> > Why not use xa_for_each?
> >
> > > +               zswap_invalidate_entry(tree, e);
> > > +
> > >         rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode)
> > > -               zswap_free_entry(entry);
> >
> > Replacing zswap_free_entry() with zswap_invalidate_entry() is a
> > behavioral change that should be done separate from this series, but I
> > am wondering why it's needed. IIUC, the swapoff code should be making
> > sure there are no ongoing swapin/swapout operations, and there are no
> > pages left in zswap to writeback.
> >
> > Is it the case that swapoff may race with writeback, such that
> > writeback is holding the last remaining ref after zswap_invalidate()
> > is called, and then zswap_swapoff() is called freeing the zswap entry
> > while writeback is still accessing it?
>
> For the RB tree the mapping is stored in the zswap entry as RB node.
> That is different from xarray. Xarry stores the mapping outside of
> zswap entry. Just freeing the entry does not remove the mapping from
> xarray. Therefore it needs to call zswap_invalidate_entry() to remove
> the entry from the xarray. I could call zswap_erase() then free entry.
> I just think zswap_invalidate_entry() is more consistent with the rest
> of the code.

I see, but it's not clear to me if the xarray is being properly
cleaned up in this case.

Do we have to call xa_destroy() anyway to make sure everything is
cleaned up in the xarray? In that case, we can just do that after the
loop.

Yosry Ahmed Jan. 19, 2024, 7:30 p.m. UTC | #8

On Thu, Jan 18, 2024 at 9:28 PM Chris Li <chrisl@kernel.org> wrote:
>
> On Thu, Jan 18, 2024 at 10:25 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Jan 18, 2024 at 08:59:55AM -0800, Yosry Ahmed wrote:
> > > On Thu, Jan 18, 2024 at 5:52 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Wed, Jan 17, 2024 at 10:20:29PM -0800, Yosry Ahmed wrote:
> > > > > >         /* walk the tree and free everything */
> > > > > >         spin_lock(&tree->lock);
> > > > > > +
> > > > > > +       xas_for_each(&xas, e, ULONG_MAX)
> > > > >
> > > > > Why not use xa_for_each?
> > > >
> > > > xas_for_each() is O(n) while xa_for_each() is O(n log n), as mentioned
> > > > in the fine documentation.  If you don't need to drop the lock while
> > > > in the body of the loop, always prefer xas_for_each().
> > >
> > > Thanks for pointing this out. Out of ignorance, I skipped reading the
> > > doc for this one and operated under the general assumption to use xa_*
> > > functions were possible.
> > >
> > > The doc also says we should hold either the RCU read lock or the
> > > xa_lock while iterating, we are not doing either here, no?
> >
> > I have no idea; I haven't studied the patches in detail yet.  I have
> > debugging assertions for that, so I was assuming that Chris had been
> > developing with debug options turned on.  If not, I guess the bots will
> > do it for him.
>
> It is fine now because we have the extra zswap tree spin lock. When we
> remove the zswap tree spin lock it does require RCU read lock. You are
> right I would get assertion failure.

I would imagine the assertions are that we are holding either the RCU
read lock or the xa_lock, how would holding the zswap tree lock help?

Matthew Wilcox Jan. 19, 2024, 8:04 p.m. UTC | #9

On Fri, Jan 19, 2024 at 11:29:42AM -0800, Yosry Ahmed wrote:
> I see, but it's not clear to me if the xarray is being properly
> cleaned up in this case.
> 
> Do we have to call xa_destroy() anyway to make sure everything is
> cleaned up in the xarray? In that case, we can just do that after the
> loop.

You do not need to call xa_destroy().  xa_destroy() exists for two
patterns: first, that you're storing values, not pointers in the tree,
and you can just delete the tree without leaking memory.  second, that
you do xas_for_each() { kfree(p); }; xa_destroy();  that's more
efficient than xas_for_each() { kfree(p); xas_store(NULL); } as it
batches the freeing of the nodes to the end.

if your code is naturally structured so that you delete the entries
after freeing them, you have no reason to call xa_destroy().

Yosry Ahmed Jan. 19, 2024, 9:41 p.m. UTC | #10

On Fri, Jan 19, 2024 at 12:04 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Jan 19, 2024 at 11:29:42AM -0800, Yosry Ahmed wrote:
> > I see, but it's not clear to me if the xarray is being properly
> > cleaned up in this case.
> >
> > Do we have to call xa_destroy() anyway to make sure everything is
> > cleaned up in the xarray? In that case, we can just do that after the
> > loop.
>
> You do not need to call xa_destroy().  xa_destroy() exists for two
> patterns: first, that you're storing values, not pointers in the tree,
> and you can just delete the tree without leaking memory.  second, that
> you do xas_for_each() { kfree(p); }; xa_destroy();  that's more
> efficient than xas_for_each() { kfree(p); xas_store(NULL); } as it
> batches the freeing of the nodes to the end.
>
> if your code is naturally structured so that you delete the entries
> after freeing them, you have no reason to call xa_destroy().

Thanks for elaborating. Based on this, I believe doing xas_for_each()
{ zswap_free_entry(); }; xa_destroy(); is both closer to the current
code structure and more efficient.

Chris Li Jan. 19, 2024, 10:05 p.m. UTC | #11

Hi Yosry,

On Fri, Jan 19, 2024 at 1:41 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > if your code is naturally structured so that you delete the entries
> > after freeing them, you have no reason to call xa_destroy().
>
> Thanks for elaborating. Based on this, I believe doing xas_for_each()
> { zswap_free_entry(); }; xa_destroy(); is both closer to the current
> code structure and more efficient.
>
Can't do that in this case though.  Because you get the RCU read lock
on the tree.
Other threads can still lookup the xarray (also RCU read lock) and get
a pointer to the already freed memory.

Chris

Yosry Ahmed Jan. 19, 2024, 10:08 p.m. UTC | #12

On Fri, Jan 19, 2024 at 2:05 PM Chris Li <chriscli@google.com> wrote:
>
> Hi Yosry,
>
> On Fri, Jan 19, 2024 at 1:41 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > > if your code is naturally structured so that you delete the entries
> > > after freeing them, you have no reason to call xa_destroy().
> >
> > Thanks for elaborating. Based on this, I believe doing xas_for_each()
> > { zswap_free_entry(); }; xa_destroy(); is both closer to the current
> > code structure and more efficient.
> >
> Can't do that in this case though.  Because you get the RCU read lock
> on the tree.
> Other threads can still lookup the xarray (also RCU read lock) and get
> a pointer to the already freed memory.

During swapoff no one should be trying to swapin or swapout, where
would the lookups come from?

diff mbox series

Patch

diff --git a/mm/zswap.c b/mm/zswap.c
index f8bc9e089268..a40b0076722b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -235,6 +235,7 @@  struct zswap_entry {
  */
 struct zswap_tree {
 	struct rb_root rbroot;
+	struct xarray xarray;
 	spinlock_t lock;
 };
 
@@ -462,9 +463,9 @@  static void zswap_lru_putback(struct list_lru *list_lru,
 /*********************************
 * rbtree functions
 **********************************/
-static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset)
+static struct zswap_entry *zswap_search(struct zswap_tree *tree, pgoff_t offset)
 {
-	struct rb_node *node = root->rb_node;
+	struct rb_node *node = tree->rbroot.rb_node;
 	struct zswap_entry *entry;
 	pgoff_t entry_offset;
 
@@ -475,8 +476,12 @@  static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset)
 			node = node->rb_left;
 		else if (entry_offset < offset)
 			node = node->rb_right;
-		else
+		else {
+			struct zswap_entry *e = xa_load(&tree->xarray, offset);
+
+			BUG_ON(entry != e);
 			return entry;
+		}
 	}
 	return NULL;
 }
@@ -485,13 +490,15 @@  static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset)
  * In the case that a entry with the same offset is found, a pointer to
  * the existing entry is stored in dupentry and the function returns -EEXIST
  */
-static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry,
+static int zswap_insert(struct zswap_tree *tree, struct zswap_entry *entry,
 			struct zswap_entry **dupentry)
 {
+	struct rb_root *root = &tree->rbroot;
 	struct rb_node **link = &root->rb_node, *parent = NULL;
-	struct zswap_entry *myentry;
+	struct zswap_entry *myentry, *old;
 	pgoff_t myentry_offset, entry_offset = swp_offset(entry->swpentry);
 
+
 	while (*link) {
 		parent = *link;
 		myentry = rb_entry(parent, struct zswap_entry, rbnode);
@@ -501,19 +508,26 @@  static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry,
 		else if (myentry_offset < entry_offset)
 			link = &(*link)->rb_right;
 		else {
+			old = xa_load(&tree->xarray, entry_offset);
+			BUG_ON(old != myentry);
 			*dupentry = myentry;
 			return -EEXIST;
 		}
 	}
 	rb_link_node(&entry->rbnode, parent, link);
 	rb_insert_color(&entry->rbnode, root);
+	old = xa_store(&tree->xarray, entry_offset, entry, GFP_KERNEL);
 	return 0;
 }
 
-static bool zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry)
+static bool zswap_erase(struct zswap_tree *tree, struct zswap_entry *entry)
 {
+	pgoff_t offset = swp_offset(entry->swpentry);
 	if (!RB_EMPTY_NODE(&entry->rbnode)) {
-		rb_erase(&entry->rbnode, root);
+		struct zswap_entry *old;
+		old = xa_erase(&tree->xarray, offset);
+		BUG_ON(old != entry);
+		rb_erase(&entry->rbnode, &tree->rbroot);
 		RB_CLEAR_NODE(&entry->rbnode);
 		return true;
 	}
@@ -575,12 +589,12 @@  static void zswap_entry_put(struct zswap_tree *tree,
 }
 
 /* caller must hold the tree lock */
-static struct zswap_entry *zswap_entry_find_get(struct rb_root *root,
+static struct zswap_entry *zswap_entry_find_get(struct zswap_tree *tree,
 				pgoff_t offset)
 {
 	struct zswap_entry *entry;
 
-	entry = zswap_rb_search(root, offset);
+	entry = zswap_search(tree, offset);
 	if (entry)
 		zswap_entry_get(entry);
 
@@ -845,7 +859,7 @@  static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor)
 static void zswap_invalidate_entry(struct zswap_tree *tree,
 				   struct zswap_entry *entry)
 {
-	if (zswap_rb_erase(&tree->rbroot, entry))
+	if (zswap_erase(tree, entry))
 		zswap_entry_put(tree, entry);
 }
 
@@ -875,7 +889,7 @@  static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
 
 	/* Check for invalidate() race */
 	spin_lock(&tree->lock);
-	if (entry != zswap_rb_search(&tree->rbroot, swpoffset))
+	if (entry != zswap_search(tree, swpoffset))
 		goto unlock;
 
 	/* Hold a reference to prevent a free during writeback */
@@ -1407,6 +1421,8 @@  static int zswap_writeback_entry(struct zswap_entry *entry,
 				 struct zswap_tree *tree)
 {
 	swp_entry_t swpentry = entry->swpentry;
+	pgoff_t offset = swp_offset(swpentry);
+	struct zswap_entry *e;
 	struct folio *folio;
 	struct mempolicy *mpol;
 	bool folio_was_allocated;
@@ -1439,7 +1455,8 @@  static int zswap_writeback_entry(struct zswap_entry *entry,
 	 * avoid overwriting a new swap folio with old compressed data.
 	 */
 	spin_lock(&tree->lock);
-	if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) {
+	e = zswap_search(tree, offset);
+	if (e != entry) {
 		spin_unlock(&tree->lock);
 		delete_from_swap_cache(folio);
 		return -ENOMEM;
@@ -1528,7 +1545,7 @@  bool zswap_store(struct folio *folio)
 	 * the tree, and it might be written back overriding the new data.
 	 */
 	spin_lock(&tree->lock);
-	dupentry = zswap_rb_search(&tree->rbroot, offset);
+	dupentry = zswap_search(tree, offset);
 	if (dupentry) {
 		zswap_duplicate_entry++;
 		zswap_invalidate_entry(tree, dupentry);
@@ -1671,7 +1688,7 @@  bool zswap_store(struct folio *folio)
 	 * found again here it means that something went wrong in the swap
 	 * cache.
 	 */
-	while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) {
+	while (zswap_insert(tree, entry, &dupentry) == -EEXIST) {
 		WARN_ON(1);
 		zswap_duplicate_entry++;
 		zswap_invalidate_entry(tree, dupentry);
@@ -1722,7 +1739,7 @@  bool zswap_load(struct folio *folio)
 
 	/* find */
 	spin_lock(&tree->lock);
-	entry = zswap_entry_find_get(&tree->rbroot, offset);
+	entry = zswap_entry_find_get(tree, offset);
 	if (!entry) {
 		spin_unlock(&tree->lock);
 		return false;
@@ -1762,7 +1779,7 @@  void zswap_invalidate(int type, pgoff_t offset)
 
 	/* find */
 	spin_lock(&tree->lock);
-	entry = zswap_rb_search(&tree->rbroot, offset);
+	entry = zswap_search(tree, offset);
 	if (!entry) {
 		/* entry was written back */
 		spin_unlock(&tree->lock);
@@ -1783,6 +1800,7 @@  void zswap_swapon(int type)
 	}
 
 	tree->rbroot = RB_ROOT;
+	xa_init(&tree->xarray);
 	spin_lock_init(&tree->lock);
 	zswap_trees[type] = tree;
 }
@@ -1790,15 +1808,21 @@  void zswap_swapon(int type)
 void zswap_swapoff(int type)
 {
 	struct zswap_tree *tree = zswap_trees[type];
-	struct zswap_entry *entry, *n;
+	struct zswap_entry *entry, *e, *n;
+	XA_STATE(xas, tree ? &tree->xarray : NULL, 0);
 
 	if (!tree)
 		return;
 
 	/* walk the tree and free everything */
 	spin_lock(&tree->lock);
+
+	xas_for_each(&xas, e, ULONG_MAX)
+		zswap_invalidate_entry(tree, e);
+
 	rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode)
-		zswap_free_entry(entry);
+		BUG_ON(entry);
+
 	tree->rbroot = RB_ROOT;
 	spin_unlock(&tree->lock);
 	kfree(tree);